Population and Comparative Genomics Flashcards

1
Q

what is population genomics?

A

gives a comprehensive picture of genetic variation within species by looking at whole genomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what features can we characterize using population genetics?

A
  • demogrpahy

- natural selection (purifying, adpative, balancing)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is the first stage of gathering population genetics data and what does it entail?

A
  1. hypothesis/query

- need to know what you want to find out

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is the second stage of gathering population genetics data and what does it entail?

A
  1. sample collection and DNA extraction
    - choose 100s/1000s of individuals information
    - choose geographic/habitat of interest
    - extract genomic DNA
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is the third stage of gathering population genetics data and what does it entail?

A
  1. genome sequencing
    - sequence the DNA, reads are from sections of the genome
    - want lots of reads
    - obtain sequence coveraring 5-40x coverage
    - sequene genome using ‘short’ read technology
    - main issue here is cost
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is the fourth stage of gathering population genetics data and what does it entail?

A
  1. read mapping and ‘variant calling’
    - locate genetic variants (sites of the genome that differ)
    - find where each read matches to the genome
    - looking for polymorphisms
    - use SNPs and indels
    - can map sequence reads to a reference genome and identify sites that differ
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is the fifth stage of gathering population genetics data and what does it entail?

A
  1. segregating genetic vairants
    - as a result of read mapping you want a list of positions that vary
    - alleles/polymorphisms/variants
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is the sixth stage of gathering population genetics data and what does it entail?

A
  1. analysis
    - analyse certain sites and use their traits to determine which alleles have an effect on a particular trait
    - describing demogrpah
    - detecting selection
    - quantitative genetics like GWAS
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is sanger seqeuncing?

A
  • small scale (not high throughput)
  • technology of hcoice for low-medium output sequencing
  • can use it for one gene
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is illumina?

A
  • produces vast numbers of reads
  • much quicker, short lengths of sequences
  • technology of choice for genome re-sequencing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is PACBIO?

A
  • pacific biosciences
  • produces larger reads
  • fairly accurate
  • one technology of choice for genome assemblies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is the oxford nanopore?

A
  • produces very long reads (up to 40,000 nucleotides long)
  • advancing fast but more expensive
  • has the worst error rate
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is meant by demography?

A
  • estimates of population size (can also estimate population size backwards through time)
  • population structure (which individuals are more or less closely related)
    • migration and ‘gene flow’ between populations
  • inbreeding/outbreeding rates
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is selection in population genetics?

A

which regions of the genome are subject to strong purifying selection (remove bad mutation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is an example of quantitative genetics?

A

GWAS: which alleles contribute to traits

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

how are demography, selection and quantitative genetics interrelated?

A
  • expanding and shrinking population sizes effect selection
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what is the concept of genetic diversity in population genetics?

A

within a region of a genome there are different amounts of diversity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what are polymorphisms/alleles/variants?

A
  • sites in the genome that differ between individuals of a species
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what are SNPs?

A
  • single nucleotide polymorphisms

- these are the most common

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what are indels?

A
  • small insertions or deletions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what is the human genome comosed mostly of?

A

transposons

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what are examples of structural variants?

A

duplications, rearrangements, large inserrtions/deletions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what is the initial origin of variation?

A

a mutation in one individual

- all polymorphisms start with a single mutation in the popultaion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

how can polymorphisms move?

A

through space and time within a population

- their frequency will change

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

how will a polymorphism occur in a population?

A
  • get two separated population
  • one gene gets across
  • a mutation is shared
  • over time it would increase
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

what are most mutations?

A
  • neutral

- deleterious adnd therefor elost

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

what happens if variants are physically linked on the chromosome?

A

they tend to travel together but can become unlinked through recombinations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

what is the concept GWAS?

A
  • GWAS
  • lots of data is in a matrix (0s and 1s)
  • want to use summary statistics - summarising information in one number
  • average pairwise similarity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

what does S stand for?

A

the number of segreagating sites

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

what is MAF?

A
  • minor allele frequency
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

what is DAF?

A

derived allele frequency (frequency of new allele in populate)

  • need to know the ancestral genome
  • DAFs are rare as they tend to get lost - suggests adaptation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

what is the concept of Tajimas D?

A

describes whether you have more or less rare alleles than expected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

what happens if you have a negative tajimas D?

A
  • have more rare alleles then you expect
  • happens when theres a selective sweep (new mutations throughout the population)
  • or expanding population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

what happens if you have a positive tajimas D?

A
  • too few rare alleles
  • signal of balancing selection
  • shrinking population
  • population structure
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

what happens if tajimas D =0?

A

neutrally evolving, stable population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

what is the concept of population structure?

A
  • when you have individuals more likely to breed with each other than another set
  • can see this through genomes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

how can you look at population structure?

A
  • genomes
  • act as markers to track evolution
  • when people etc move they carry DNA
  • populations somtimes have small contributions which cant be drawn on a phylogenetic tree
38
Q

what are the rules of population strutcure?

A

mutations are rare, drift through populations, recombinations

39
Q

what is the concept of purifying selection?

A
  • loss of deleterious alleles
  • they are removed from the population
  • they are less fit so they die/produce less offspringg
40
Q

what does the process of purifying selection result in?

A
  • reduces diverstiy in regions that are important
  • increase proportion of rare alleles
  • causes a negative tajimas D
  • purifying selection expected to be a common event
41
Q

what is the properties of most new mutations?

A

deleterious

42
Q

why do exons have much lower diversity?

A
  • mutations are more likely to be deleterious

- exons have an important function so deleterious mutations are removed quickly

43
Q

what is the concept of adaptive evolution?

A
  • new mutation is helpful and increases to become more common in the population
  • has similar effects to purifying selection (difficult to differentiate)
44
Q

what does the process of adaptive evolution do?

A
  • reduces diversity around the beneficial allele
  • increases rare alleles
  • causes a negative Tajimas D
  • adaptive selection is expected to be a rare event
45
Q

why is adaptive evolution rare?

A

a mutation causing a beneficial adaptation through a random change will be rare

46
Q

what is a selective sweep?

A

overtime not only will the beneficial allele become more common but so will the linked alleles

47
Q

what is a haplotype?

A
  • region of the genome with alleles that are linked
48
Q

what is the concept of balancing selection?

A
  • advantage to maintaining more than one allele in a population
  • very rare
  • when the heterozygous are fitter
  • advantage of rare alleles but when they become common they are less advantageous
49
Q

what are the results of balancing selection?

A
  • maintains more diversity

- cause a high tajimas D

50
Q

what is the concept of polygenic selection?

A
  • GWAS shows that most traits are determined by multiple genes
  • called complex traits
  • selection acts on all the alleles at once
  • there is therefore selection for multiple genes
  • when these traits evovle many alleels traits
51
Q

what is the concept of linkage of alleles on the chromosome?

A
  • when a strongly beneficial allele arise it will ‘sweep’ through the population
  • arises very quickly
  • alleles close to it will be carried because they are linked
52
Q

what are the results of linkage of alleles on the chromosome?

A
  • loss of diversity around the sweep
  • increase in linkage
  • produces a large loss of genetic diversity (always the same)
53
Q

what happens when recombination occurs?

A

linked alleles can become unlinked

54
Q

what is comparative genomics?

A
  • the comparison of genomes between species
55
Q

what does comparative genetics involve the analysis of?

A
  • gene orthologs/paralogs, gene family expansions
  • gene loss/gain
  • evolutionary rate of genes
  • conserved genic and non-genic regions
  • conservation/changes in synteny (gene order)
56
Q

what are orthologs?

A

gene which is from a recent ancestor between species

57
Q

what are paralogs?

A

gene which is from a recent ancestor within species

58
Q

what is the first stage of collecting comparative genomics data?

A
  1. sequence and assembly a genome
    - choose the organisms interested in
    - assembly: connecting ll short/long sequencing reads in continuous seqeunces
    - sequnce machines are generally shprt reads
59
Q

what is the second stage of collecting comparative genomics data?

A
  1. annotate your genome (identify gene starts, ends, exons and identify gene types homology)
60
Q

what is the third stage of collecting comparative genomics data?

A
  1. align/ compare your genome to others
    - whole genome alignment
    - using BLAST to locate similar genes
61
Q

what is comparative genomic data produced on?

A

linux server - large amount of data with a lot of processing required

62
Q

what can be found from comparative genmoics?

A
  • Which genes have been lost in a lineage
  • When genes have been gained  created through things like gene fusion
  • Which are the fastest evolving genes
  • Conserved genic and non-genic regions
  • How a species may have evolved to adapt to some new niche  how a particular species has evolved and adapt says something about long term evolution
  • The higher the peak the slower the rate = more conserved  purifying selection removes deleterious alleles
63
Q

what is the concept of diversity of divergence are related in comparative genomics?

A

genetic diveristy within species gives rise to divergence between species

64
Q

what is genetic diveristy?

A

differences within species

65
Q

what is divergence?

A

differences between species

66
Q

what are exons?

A

evolve slowly, mutations most often remove

67
Q

what is an example of genetic diveristy giving rise to divergence?

A
  • one population splits into two population
  • at some point there is no interbreeding
  • different alleles become fixed independently through mutations arising
68
Q

what is fixation?

A

when a polymorphism becomes present in all individuals in a species (or population)

69
Q

what is the concept of evolutionary rate in comparative genomics?

A
  • evolutionary rate is the number of differences that occur over time or how many mutations are fixed in a population over time
  • measure via alignments from genes and genomes
  • every genome evolves at a different rate
70
Q

how can evolutionary rates be measured?

A
  • substitutions/year: certain numbers of substitutions per year (have to know the years they’ve been separated)
  • substitutions/gene or per site between two or more species
71
Q

what is the concept of purifying selection in comparative genomics?

A
  • selection to remove deleterious mutations

- over time this results in slower rates of evolution in regions of the genome with more essential function

72
Q

what are introns?

A
  • not conserved and are therefore not removed by purifying selection
73
Q

what happens if regions are more highly conserved?

A
  • suggests that the regions are more important
74
Q

how can purifying selection be detected in comparative genomics?

A
  • via genome alignment
  • looking for regions that remain the same between species
  • can show evolutionary rate: slower rates of evolution result in more important regions being conserved
75
Q

what is synonymous change?

A

does not change the amino acid encoded for, would therefore not have a strong genetic outcome

76
Q

what is non-synonymous change?

A
  • does change the amino acid encoded for

- more likely to have functional consequence (which will generally be deleterious)

77
Q

is the rate of synonymous change slower than non-synonymous change?

A

no

78
Q

what is the concept of adaptive evolution in comparative genetics?

A
  • increase frequency of adaptive allele
  • some genes/genomic regions evolve to have new/improved functions
  • this is one path to adaptation
  • such genes change faster than we expect by chance
79
Q

what tests can be use to measure adaptive evolution in comparative genetics?

A
  1. the dN/dS test

2. the McDonald-Kreitman test

80
Q

what is the dN/dS test?

A
  • dN: the rate of non-synonymous change
  • dS: the rate of synonymous change
  • gene that change their function rapidly may have a higher dN than dS
81
Q

what is the McDonald-Kreitman test?

A
  • use for detecting adaptive change between species

- and for detecting balancing selection within species

82
Q

what is the rate of synonymous change (dS)?

A
  • synonymous change does not affect the protein produced
  • will have little or no effect on the fitness of the organims and so are selectively neutral and will accumulate
  • sometimes they can result in non-optimal codon (rare)
  • if species are far apart this rate needs to be corrected for multiple hits
83
Q

what is the rate of non-synonymous change (dN)?

A
  • non-synonymous change does affect the protein produced
  • most will be deleterious and lost
  • so the dN rate will generally be slower than the dS rate
  • hence the dN/dS rate is generally less than 1
84
Q

what does it suggest if dN>ds?

A
  • there has been many non-synonymous changes

- this is rare and a signature of adaptive evolution

85
Q

what is the concept of polygenic selection and genome-scale data in comparative genomics?

A
  • SNPs in many genes can affect one trait
  • adaptation may cause gradual changes in many genes
  • can detect this by looking for concerted signals over certain categories of genes that work together
86
Q

what is the assumption of the McDOnald-Kreitman test?

A

tests the assumption that diversity within a species gives rise to divergence between species (assumes theres a stable ratio)

  • assumes a stable ratio of synonymous and non-synonymous polymorphisms
  • over time polymorphisms become fixed
  • gives rise to the same ratio of synonymous and non-synoymous fixed mutations
87
Q

how can you test the McDonald-Kreitman test?

A
  • using the chi squared test
  • count that sites that are synonymous and non-synobymous
  • chi-squared
  • find if the rate is stable
88
Q

what is the result of a McDonald-Kreitman test for a neutrally evolving gene?

A
  • ratio will be consistent
89
Q

what is the result of a McDonald-Kreitman test for an excess of non-synonymous fixed differences (a non consistent ratio)?

A

adaptive evolution between species

90
Q

what is the result of a McDonald-Kreitman test for an excess of non-synonymous polymorphisms within a species (a non-consistent ratio)?

A

balancing selection to maintain different non-synonymous differences within species