week 10 - genetic analysis Flashcards

Question 1

Q

Genetic Variation Across Species

chatgpt

Answer

A

Genetic variation exists in all species and includes SNPs, indels, CNVs, structural variants, and novel mutations. These variations can be common or rare, and their frequency and effects can vary by species and population.

Plants: often show high structural variation (e.g. polyploidy, TE activity)

Animals: domesticated species have artificial selection-driven variation

Humans: millions of SNPs, rare variants, and de novo mutations per individual

Question 2

Q

Understanding Complex Traits

chatgpt

Answer

A

Complex traits:

Controlled by many genes (polygenic)

Influenced by environment and gene-environment interactions

Show continuous variation (e.g. height, yield)

Often have non-Mendelian inheritance and no clear genotype-phenotype map

Question 3

Q

QTL Mapping Approaches

Answer

A

linkage analysis

GWAS

pedigree analysis

Question 4

Q

QTL Mapping Approaches
linkage analysis

Answer

A

Population Type: Controlled crosses or pedigrees
Resolution: Low (Mb–cM)
Pros: Powerful for rare variants, family-based
Cons: Low resolution, limited by recombination

Question 5

Q

QTL Mapping Approaches
GWAS

Answer

A

Population Type: Natural populations
Resolution: High (kb–100 kb)
Pros: High resolution, no prior knowledge needed
Cons: Sensitive to confounding, missing heritability

Question 6

Q

QTL Mapping Approaches
Pedigree Analysis

Answer

A

Population Type: Human/animal pedigrees
Resolution: Medium
Pros: Tracks inheritance of traits
Cons: Relies on known family structure

Question 7

Q

Design and Challenges of GWAS

chatgpt

Answer

A

GWAS design principles include using large, diverse, well-matched populations; high-density SNP genotyping; and statistical thresholds to detect associations.
Challenges:

Missing heritability

Linkage disequilibrium confounds

Rare variant detection

Causal variant identification

Population stratification

Question 8

Q

From QTL to Causal Variant

chatgpt

Answer

A

Moving from QTL to causal variants involves:

Fine-mapping within associated regions (e.g., higher resolution studies)

Functional validation via expression data, reporter assays, or gene editing

Integration with other data (e.g., eQTLs, chromatin state, transcriptomics) It’s difficult because QTL peaks often span many genes, and some signals lie in non-coding or regulatory regions far from known genes.

Question 9

Q

Genetic variation and polymorphism

Answer

A

Variation: the existence if two or more forms (alleles) of a section of DNA
If a variation occur with frequency >0.5% then it is a polymorphism
Genetic variation could lead to observable effects, but the majority do not

Question 10

Q

Genetic variation and polymorphism
example

Answer

A

Example of an Arabidopsis dwarf mutant that has a SNP which converts a normal looking plant – like this one – to a dwarf plant.
o This is a simple case where there is a 1:1 mapping between genotype and phenotype.
o The dwarf phenotype is caused by mutation in a single gene

Question 11

Q

Classes of genetic variants:
Single Nucleotide Variant (SNV/SNP)

Answer

A

Change of one base (A→G, T→C); most common variant; may be silent or impactful

Question 12

Q

Classes of genetic variants:
- Insertion-deletion variant

Answer

A

o INDEls occur where one or more bases are present in some genome and absent in others.
o Generally only a few bases long but can be up to 80kb in length!

Addition or loss of one or more bases; can shift reading frames in coding regions

Question 13

Q

Classes of genetic variants:
- Block substitution

Answer

A

o a string of adjacent nucleotides varies between individuals

Question 14

Q

Classes of genetic variants:
- Inversion variant

Answer

A

o the order of the bases is reversed in a defined section of the genome.

A segment of DNA is reversed within the chromosome

Question 15

Q

Classes of genetic variants:
- Copy number variant

Answer

A

Large DNA segments are duplicated or deleted; affect gene dosage

o identical or nearly identical sequences are repeated in some genomes and not others.

Question 16

Q

Genetic variation and polymorphism
frequency?

Answer

A

Human genetic variations are referred to as either COMMON or RARE, to denote the frequency of the minor allele – the less frequent allele in the population
o Rare variants are population-specific

Question 17

Q

Genetic variation and polymorphism
common variant

Answer

A

Common variants have minor allele frequency (MAF)
o >1%
o E.g. a C/T SNP with 5% frequency of the T allele

Question 18

Q

Genetic variation and polymorphism
rare variant

Answer

A

Rare variants have minor allele frequency
o <0.5%

Question 19

Q

Genetic variation and polymorphism
- Novel/de novo variant

Answer

A

Novel/de novo variants occur only in a single family/individual
o E.g. a variant that we do not share with our parent

Question 20

Q

Genetic variation and polymorphism:
Single nucleotide polymorphism (SNP)

Answer

A

Single base pair substitutions
Arise through mistakes in DNA replication or caused by mutagens
o E.g. mutation rate in Arabidopsis is 7x10^-9 base substitutions per site per generation
Biallelic – 2 alleles (in diploids)

Question 21

Q

Genetic variation and polymorphism:
Single nucleotide polymorphism (SNP):
frequency

Answer

A

Minor allele frequency can range from <1% to 50%

Question 22

Q

Genetic variation and polymorphism:
Single nucleotide polymorphism (SNP):
methods for detecting

Answer

A

Many methods for detecting SNPs
o SNP microarrays

Question 23

Q

Genetic variation and polymorphism:
Single nucleotide polymorphism (SNP):
common?

Answer

A

SNPs are the most common
o Which is why they are used a lot

Question 24

Q

Deletions, duplications and insertions

Answer

A

Expand or contract the length of non-repetitive DNA
Small deletions and duplications arise by unequal crossing over
Small insertions can arise through the activity of transposable elements

Question 25

Q

Deletions, duplications and insertions
types

Answer

A

deletion
novel sequence insertion
mobile element insertion
tandem duplication
interspersed duplication
inversion
translocation

Question 26

Q

Human genetic variation

Answer

A

4-5 million differences between any 2 humans
o 1 in 1000 bases
Most differences occur at common locations
o 4-5 million common SNPs (>0.5%)
o 50K rare mutations (<0.5%)
o 40-80 de novo mutations

Question 27

Q

Human genetic variation
QTL mapping

Answer

A

So what we will attempt to do with QTL mapping is to relate these differences to differences in the phenotype of a trait of interest.

Question 28

Q

PHYSICAL VS GENETIC MAP
physical map

Answer

A

Physical distance in nucleotide bases (kb)
The actual distance in bp between two variants

Question 29

Q

PHYSICAL VS GENETIC MAP
genetic map

Answer

A

RF between two markers
Based on the number of recombination events occurring in a region
RF is relation to genetic distance in cM via a mapping function
Physical distance is usually correlated with genetic distance
Markers are closer together in regions with low RF
Markers are further apart in regions of high RF

Question 30

Q

PHYSICAL VS GENETIC MAP
genetic map: accurate?

Answer

A

Not always accurate
o Hotspots
o Regions where you would expect to find more recombination that others

Question 31

Q

PHYSICAL VS GENETIC MAP
why can Two SNPs be physically close together but genetically far apart

Answer

A

Reason:
o Because there might be no recombination between them
o So genetically look far away from each other (even though they are physically close)

Question 32

Q

PHYSICAL VS GENETIC MAP
Principles of genetic mapping

Answer

A

Find regions of the genome that are variable (markers)
Map and order these regions to produce a genetic marker map
Map traits of interest to these markers

Question 33

Q

SUMMARY

Answer

A

There are many types of DNA sequence variation in populations of plant, animal or microbial species
By assembling a map of variation we can link this map to variation in phenotypic traits of interest

Question 34

Q

QUANTITATIVE GENETICS
how is variation divided?

Answer

A

Variation in humans, plants and animals is broadly divided into 2 types:

qualitative
quantitate

Question 35

Q

QUANTITATIVE GENETICS
Qualitative

Answer

A

 Blood groups, eye colour, flower colour
 Only a few genotypes

Question 36

Q

QUANTITATIVE GENETICS
Quantitative

Answer

A

 Height, weight
 Many genotypes

Question 37

Q

Quantitative or biometrical data

Answer

A

Deals with the study of inheritance of the quantitatively varying characters (complex or quantitative traits) that are controlled by many genes and also to a considerable extent by the environment

Question 38

Q

example so quantitative traits
Plants
morphology

Answer

A

Yield
Quality
Maturity
Size (height, girth, biomass)

Question 39

Q

example so quantitative traits
Plants
physiology

Answer

A

Abiotic stress responses (e.g. drought tolerance)
Biotic stress responses (e.g. disease resistance, photosynthetic capability)

Question 40

Q

example so quantitative traits
animals
morphology

Answer

A

Size (e.g. weight and height)
Productivity (e.g. milk and egg production)
Quality (e.g. meat/wool)
Fecundity

Question 41

Q

example so quantitative traits
animals
physiology

Answer

A

Growth rate
Abiotic stress responses (e.g. heat tolerance)
Biotic stress responses (e.g. disease resistance)
Strength

Question 42

Q

example so quantitative traits
animals
behaviour

Answer

A

Intelligence
personality

Question 43

Q

example so quantitative traits
humans
morphology

Answer

A

size (weight, height, ect)
colour

Question 44

Q

example so quantitative traits
humans
physiology

Answer

A

metabolic rates
diabetes
hypertension

Question 45

Q

example so quantitative traits
humans
behaviour

Answer

A

intelligence
personality

Question 46

Q

Threshold traits

Answer

A

trait which has complex/polygenic inheritance, but only two obvious phenotypes
o e.g. affected or not affected by disease
E.g. type II diabetes
o Individuals who exceed a certain number of risk factors (genetic and/or environmental) will develop the disease and others will not.

Question 47

Q

Central dogma of molecular biology

Answer

A

Flow of information
DNA –> RNA –> protein –> produces trait
Goal in genetic is to relate genetic variation to phenotypic variation in a trait

Question 48

Q

Genetic basis of a quantitative trait
debate

Answer

A

Debate between mendelian and biometrician
Willian Ratson
Quantitative traits do not follow discreet patterns and cannot be inherited
Francis Galton
Quantitative traits can be inhertited and the degree of inheritance can be estimated by pure stats

Question 49

Q

Genetic basis of a quantitative trait
debate: ronald fisher

Answer

A

Ronald Fisher came up with the polygenic model
- To reconcile these two opposing views on quantitative traits

Question 50

Q

Genetic basis of a quantitative trait
the polygenic model

Answer

A

Fisher first highlighted the polygenic nature of quantitative trait
o The random sampling of alleles at each gene produces a continuous normally distributed phenotype in the population

Question 51

Q

Genetic basis of a quantitative trait
the polygenic model:
- Quantitative traits are mostly controlled by …

Answer

A

several genes

o Polygenes or QLT (quantitative trait loci)

Question 52

Q

Genetic basis of a quantitative trait
the polygenic model:
each genes behaves like…

Answer

A

a mendelian gene

o Each gene can segregate independently

Question 53

Q

Genetic basis of a quantitative trait
the polygenic model:
- There are effects arising from…

Answer

A

from environmental variance

Question 54

Q

Genetic basis of a quantitative trait
the polygenic model:
interactions

Answer

A

There is also interaction within each gene (dominance and co-dominance)
And interaction between genes (linkage and epistasis)
And interaction between the gene and the environment

Question 55

Q

Modes of gene action (Interaction between alleles at a locus)
additive effects

Answer

A

o Measure the quasi-independent effects of alleles on a trait

Question 56

Q

Modes of gene action (Interaction between alleles at a locus)
dominance effects

Answer

A

o Measure the interactions between alleles at a single locus
o E.g. complete dominance (one allele can mask the effect of the other)

Question 57

Q

Modes of gene action (Interaction between alleles at a locus):
Alleles can interact with each other in a number of different ways to produce…

Answer

A

produce variability of the phenotype.

Question 58

Q

Modes of gene action (Interaction between alleles at a locus):
additive gene action

Answer

A

When the heterozygote phenotypic value is half way between that of the two homozygotes, gene action is defined as additive.
o Can see in the graph that each A2 allele contributes an increase of i to the phenotype value, in this case +1.

Question 59

Q

Modes of gene action (Interaction between alleles at a locus):
complete dominance

Answer

A

2) Complete dominance – the phenotype is the same whether you have 1 or 2 A2 alleles.
o Can see in the graph how there is an underlying additive genetic component as shown by the slope of the line – but the values deviate from additivity due to dominance effects. In this case we can see that A1A2 and A2A2 phenotypes are quite similar.

Question 60

Q

Modes of gene action (Interaction between alleles at a locus):
within dominance:
- Complete dominance

Answer

A

o e.g. Mendels crosses between pea plants with purple flowers or white flowers – all progeny of F1 are purple as Purple P allele is dominant.

Question 61

Q

Modes of gene action (Interaction between alleles at a locus):
within dominance: incomplete dominance

Answer

A

o heterozygote value is over half way but not quite as high as the A2 homozygote.
o E.g. snapdragons – cross red to white and see pink as neither allele is dominant – a blending of the phenotypes

Question 62

Q

Modes of gene action (Interaction between alleles at a locus):
within dominance: overdominance

Answer

A

o rare- phenotype of the heterozygote is beyond the range of either homozygote.

Question 63

Q

Modes of gene action (Interaction between alleles at a locus):
within dominance heterozygote advantage

Answer

A

Or the “heterozygote advantage” where the heterozygote has better fitness than either of the homozygotes.
o Sickle cell anaemia – where the heterozygote has partial resistance to malaria.

Question 64

Q

Will genetic architecture be the same in all populations?

Answer

A

Genetic architecture will differ in different populations.
Different populations will have different environmental exposures and different alleles segregating.

Answer 65

A

Genotype not identifiable from phenotype
Epistasis (gene-gene interaction)
Genotype x environment interaction (G x E)

Answer 66

A

If we could see which individuals had which genes at QTL locus B
The distributions for the 3 genotypes overlap so we cannot determine genotype just by looking at phenotype.
An individual with avergae height could be either of the 3 genotypes.
Let’s say height = 30cm – could be any of the three genotypes!

Answer 67

A

Different genotypes responding in different ways to changes in the environment

e.g.
a. Trait is not sensitive to environment

b. Trait value higher in environment 2

c. Some genotypes have higher trait value in environment 2, others have lower trait value in environment 2.

Answer 68

A

Phenotype of the trait
Marker genotype
Genetic structure of mapping populations

Answer 69

A

Use statistical methods to bring together the 3 sources of data and identify regions of the chromosome, which we call QTL, that are associated with variation in the trait.

Answer 70

A

several genes and by the environment
o Rare combination of polygenes effect can lead to unexpected phenotypes

Answer 71

A

Mendelian phenotype ratios to work out what is going on and will need new methods of modelling these traits

Answer 72

A

1) linkage analysis

2) genome wide association analysis

Answer 73

A

o Uses defined population either created using crosses or with familial relationships known (pedigrees)

Answer 74

A

o Uses naturally occurring populations of individuals
o Used in human, plant and animal populations

Answer 75

A

use a segregating population

large recombinant blocks (recent recombination)

F2 mosaic
- due to recombination

makes use of recombination to define where gene of interest might be

Answer 76

A

take natural population as they are and use for analysis

make use of historic recombination over thousands of years
- lots of small recombination blocks
- narrows down region of interest to a smaller interval

can find commonalities between genotype and phenotype
- looking for shared phenotypes between individuals

Answer 77

A

Association – e.g. human populations are the results of many generation of recombination in meiosis, producing genomes with short blocks from ancestral individuals.

Answer 78

A

Collected genotype and phenotype data as shown in the table.
Use statistical methods to generate graphs like these in which we have the locations of the markers on the genetic map (x-axis) and on the y-axis, the evidence for a QTL.
As with all statistical tests we define a threshold for deciding whether or not a result is significant (link to minitab workshops).
Can see a broad peak. This is a QTL.
The results of association analysis are presented in a slightly different way. Individual dots = markers.
See narrow band of markers with significant evidence for QTL.
Note the band is narrower than with linkage analysis.

Answer 79

A

So then the next step is to look at which genes are in the region of the significant markers and perform experiments to confirm that these genetic markers really affect the trait.

Answer 80

A

Previous knowledge of the genetic of the trait not required
Can fine map QTL to 10-100kb because many recombination events have occurred in the history of the population
Can reveal causal genes in an unbiased way

Answer 81

A

A population that segregates for a trait of interest
Mostly case-control groups
o Case: individuals with the traits (e.g. patients with disease, personality etc.)
 Have the things you are trying to find the genetic basis of
o Controls: individuals without the traits (healthy individual)
o Want case and control groups to be similar to each other (avoids confounding factors)
 Want a mix of population in both groups
Genotype data
o Mostly SNP arrays that genotype a pre-selected subset of SNPs
o Whole genome or exome sequencing
Statistical analysis to pick up association signal

Answer 82

A

Compares prevalence of polymorphism between subjects who have that condition (cases) with patients who do not have the condition (control)
In theory, the case-control study can be described simply.
o First, identify the cases (a group known to have the outcome) and the controls (a group known to be free of the outcome).
o Then, look back in time to learn which subjects in each group had the exposure(s), comparing the frequency of the exposure in the case group to the control group.

Answer 83

A

Null hypothesis: there is no association between the marker (e.g. SNP) genotype and the trait
Alternative hypothesis: there is association between the marker (e.g. SNP) genotype and the trait
For categorical traits
o Can use statistics similar to chi-square to compare the frequency of genetic variants in the two groups (case/control)

Answer 84

A

o Can use parametric analyses like ANOVA and regression to compare normally distributed traits between the genotype group

Answer 85

A

association of thousands to millions of markers (polymorphism,, e.g. SNP, individual points) with trait status in hundreds to ten of thousands of individuals

Answer 86

A

Looking for the peaks that stand out
o Tells us that there is a specific association
Reach dot represents a variant (a SNP)
o Have many SNP spread across the chromosome
Significant markers (red points) with statistical support (log(p)) above the statistical threshold (red line) are causal of the trait variation or are in LD with the unknown causes variations

Answer 87

A

Linkage disequilibrium (LD) refer to correlation between SNPs
If equilibrium would be a random association

look at notes for table

Answer 88

A

Many associations are still hidden and can be uncovered by:
o Studying a larger sample population (> 1 million)
o Studying more diverse populations (non-European populations)
o Incorporating gene x gene and gene x environment interaction
o Considering non-additive models of gene actions

Answer 89

A

GWAS have revealed “molecular sub-phenotypes” or ‘heterogenetiy’ of disease
Increased understanding of disease pathways will provide new drug targets and promote personalised medicine
o i.e. target treatment to underlying subtype
e.g. Can be many genes responsible for e.g. cancer
o These can differ between individuals
o So understanding which genetic pathway in a particular individual will allow for targeted drug treatment

Answer 90

A

The associated SNP may be within or close to a gene that Is relevant to the trait of interest
Goal is to identify this gene and its variants (alleles) that conder different disease risks:
o Quantitative trait nucleotide
Some associations are nowhere near a functional gene
o E.g. a variant on chr9p21 associated with heart attack is 150kb from the nearest gene

Answer 91

A

The bulk of the genetic variance underlying the trait heritability has still not been explained
E.g. > 30 markers associated with Crohn’s disease explains < 10% of genetic variance
The so-called “missing heritability” problem

Answer 92

A

Not very useful for very rare disease
o Small amount of population
o Statistics do not really work on small populations
Disease prediction
True signals
Population stratification
Ultra-rare mutations
Epistasis
Cause variants or gene
Missing heritability

Answer 93

A

Identification of novel SNV-trait associations
Discovery of novel biological mechanisms
Diverse clinical applications
Insight into ethnic variation of complex traits
Relevant to low frequency rare variants
Identification of novel monogenic and oligogenic disease genes
Relevant to the study of structural variation
Multiple applications beyond gene identification
Straightforward GWAS generation management and analysis
Easy to share and publicly available data

Answer 94

A

Complex
o Highly polygenic

Answer 95

A

mapping complex traits in natural populations

Answer 96

A

direct or indirect
o Typical genetic effect sizes are small
o GWAS has helped in understanding disease pathways but only account for small proportion of variation

Answer 97

A

discover the mechanisms for how specific genetic variants contribute to disease risk

Answer 98

A

rare variant hypothesis

Answer 99

A

The Rare Variant Hypothesis proposes that much of the unexplained genetic variation in complex traits is due to many rare variants, each with larger individual effects, rather than common variants with small effects.

Answer 100

A

GWAS studies have identified many common variants (minor allele frequency >1%), but they explain only a small portion of heritability for most traits — a problem known as “missing heritability.”

The rare variant hypothesis suggests that this missing heritability could be due to rare variants that are:

Not well captured by standard SNP arrays

Specific to families or populations

Functionally important, often in coding or regulatory regions

Answer 101

A

Frequency: <0.5% (often much lower)
Effect Size: Medium to large
Detection: Requires whole genome/exome sequencing
Origin: Often recent, can be de novo
Population-specific: Yes — rarely shared across populations

Answer 102

A

BRCA1/2 mutations in breast cancer — rare but high risk

Lipid metabolism disorders — single-gene rare variants

Autism spectrum disorders — multiple rare, high-impact mutations in neuronal genes

Answer 103

A

“The rare variant hypothesis suggests that complex traits and diseases may be driven by many rare genetic variants with moderate-to-large effects. These are not well captured by GWAS, contributing to the problem of missing heritability and requiring sequencing-based approaches for discovery.”