Week 3.6: Comparative Genomics Flashcards

1
Q

How does the human genome compare to other genomes?

A

Compare human genome with other species

Identify functional elements by conservation (L2)

Evaluate function by sequence similarity of genes to those in model organisms (W4)

How are we different from other species?

What makes us human?

http://genomewiki.ucsc.edu/index.php/Hg19_100way_Genome_size_statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

% Masked

A

This refers to errors of the genome that haven’t been included in the study for many of those errors it is because it includes repeats of DNA which makes it hard to compare. So when we see these figures it is just for the unmasked region. The Dolphin example would be, ignoring 43.56% of the genome, 51.16% of the genome is similar to the Human genome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you compare genomes?

A

Melting temperature curves

Melting temperature curves of hybridized human and chimpanzee DNA gave estimates of divergence ranging from 2.4% (Kohne et al. 1972) to less than 1% (Hoyer et al. 1972; King and Wilson 1975; Sibley and Ahlquist 1987).

They would get the entire DNA from the chimp and the human, they would hybridise it together, because the genomes have different sequences they would disassociate at a lower temperature then human DNA would dissociate, if it was bound to human DNA. Because there were fewer hydrogen bonds between them, by looking at the melting temperature they found the divergences where they differ.

SNPs within Genes

In humans and chimpanzees, orthologous genes showed similarities of over 95% (Hixson and Brown 1986; Koop et al. 1986; Miyamoto et al. 1987; Sakoyama et al. 1987; Goodman et al. 1990; Bailey et al. 1991; Dorit et al. 1995; Glusman et al. 2000)

These tend to by 95% the same, just get one gene from chimp and then one from human, 95% the same, looking at coding and non-coding, looking at larger alignments, similar figures.

SNPs in non-protein coding areas

This also held true for non-coding regions: 53 autosomal inter-genic non-repetitive DNA segments showed average sequence divergence of 1.24% ± 0.07% for a human-chimpanzee pair (Chen and Li 2001).

Larger samples told the same story: a 1.9 Mbp comparison gave 1.24% average sequence difference (Ebersberger et al. 2002)

Larger alignments

Fujiyama et al. (2002), in alignments of 114,421 chimpanzee bacterial artificial chromosome (BAC) end sequences (BESs) to RefSeq human genome contigs. In the aligned regions, totaling 19.8 Mbp, mean identity was 98.8%.

Similarly, aligned regions of human chromosome 21 and chimpanzee chromosome 22 found a 1.44% difference (Watanabe et al. 2004).

Indels

Britten (2002) carried out one of the first analyses to include indels, examining sequences from 5 chimpanzee BACs “over the lengths that could be easily aligned” to the human genome. He aligned 779 Kbp of chimpanzee sequence to the human genome and found 3.4% difference due to the presence of indels.

SNPs + Indels

Differences due to indels and single nucleotides differences can both be expressed as counts of nucleotide positions, and hence can be summed to give a single combined statistic. In the Britton (2002) study, this gave a total divergence of 4.8% and in the 2005 chimpanzee genome draft, a total divergence of ~4%. This gives a more comprehensive measure of genome similarity than one of these methods alone.

Other sources of variation

Copy number variation
Structural variation
Non-alignable regions of the genome

Ebersberger et al. (2002), found that “for 7% of the chimpanzee sequences, no region with similarity could be detected in the human genome”, and in Fujiyama et al. (2002), 13.5% of the BESs were excluded because they did not align to any known human sequence

Very hard to get an overall picture of how similar the genomes were until…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Chimpanzee genome sequenced in 2005

A

The initial draft of the chimpanzee genome covered ~94% of the chimpanzee genome (fairly low coverage) with >98% of the sequence as high-quality bases (Mikkelsen et al. 2005).

Draft genome sequence aligned to 2.41 Gbp (gigabasepairs)(Mikkelsen et al. 2005) of the 3.10 Gbp human genome (77.7%), aligned to 77.7 of human genome

Within this alignment 3.4% difference due to indels, and 1.4% single nucleotide differences were found

It was so ingrained that we were 99.9% similar as chimpanzees, but they didn’t talk about what didn’t align at all, Hard to measure structural variation because of low coverage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Gene complement

A

Counts the number of genes that have no one-to-one ortholog between two genomes.

A study of humans and chimpanzees by Demuth et al. (2006) found a difference of at least 6% using this method (1418 of 22000 genes).

Recent studies identified 644 human proteins with no BLASTP hit in chimp (Knowles and McLysaght 2009) and 584 with no BLASTP hit in other primates (Wu et al. 2011).

Our results imply that humans and chimpanzees differ by at least 6% (1,418 of 22,000 genes) in their complement of genes, which stands in stark contrast to the oft-cited 1.5% difference between orthologous nucleotide sequences.

Demuth et al 2006 PLoS ONE 1:e85

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
A

For many, many years, the 1% difference served us well because it was underappreciated how similar we were. Now it’s totally clear that it’s more a hindrance for understanding than a help.

Pascal Gagneux, zoologist at UC San Diego, quoted in Cohen (2007) Science 316: 1836

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Human

This shows the whole genome broken up into chromosomes,

Conclusions

At fine scales, chimpanzee recombination is dominated by hotspots, which show no overlap with those of humans.

The complete lack of hotspot sharing is consistent with the hypothesis that in humans, PRDM9 plays a critical role in localizing cross-over activity at all hotspots

In chimpanzees no repeat elements, simple DNA motifs, or predicted PRDM9 binding sites are strongly or consistently associated with hotspot locations.

A

Gorilla genome – 2012

We generated a reference assembly from a single female western lowland gorilla (Gorilla gorilla gorilla) named Kamilah, using 5.4 bn base pairs (5.4 Gbp) of capillary sequence combined with 166.8 Gbp of Illumina read pairs

We included the Kamilah assembly with human, chimpanzee, orang-utan and macaque in a five-way whole-genome alignment using the Ensembl EPO pipeline. Filtering out low-quality regions of the chimpanzee assembly and regions with many alignment gaps, we obtained 2.01 Gbp of 1:1:1:1 great ape orthologous alignment blocks

Using a rate of 10−9 mutations per bp per year, derived from fossil calibration of the human–macaque sequence divergence and as used in previous calculations, CoalHMM’s results would correspond to speciation time estimates THC (for human–chimpanzee) and THCG (for human–chimpanzee–gorilla) of 3.7 and 5.95 Myr ago, respectively. These dates are consistent with other recent molecular estimates but are at variance with certain aspects of the fossil record, including several fossils which have been proposed—though not universally accepted—to be hominins, and therefore to postdate the human–chimpanzee split. Indeed, the relationship between molecular and fossil evidence has remained difficult to resolve despite the accumulation of genetic data. Direct estimates of the per-generation mutation rate in modern human populations, based on the incidence of disease-causing mutations or sequencing of familial trios indicate that a lower value of (0.5–0.6) × 10−9 bp−1 yr−1 is plausible (based on average hominine generation times of 20–25 yr). This would give substantially older estimates of approximately 6 and 10 Myr ago for THC and THCG, potentially in better agreement with the fossil record.

In summary, although whole-genome comparisons can be strongly conclusive about the ordering of speciation events, the inability to observe past mutation rates means that the timing of events from genetic data remains uncertain. In our view, possible variation in mutation rates allows hominid genomic data to be consistent with values of THC from 5.5 to 7 Myr ago and THCG from 8.5 to 12 Myr ago, with ancestral demographic structure potentially adding inherent ambiguity to both events. Better resolution may come from further integrated analysis of fossil and genetic evidence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Patterns of similarity

In 30% of the genome, Gorilla is closer to human or chimpanzee than the latter are to each other.

This means that if we draw a phylogenetics tree for humans, gorillas and chimps…

70%- humans are more related to Chimps, this is the most commonly accepted, then 15% of the genome suggests humans more like Gorillas, then 15% Gorillas and Chimps more like each other than Humans

A

This was a bit of a surprise because people thought all of the genome would show the first tree. Because since the early comparisons (protein comparisons). It seemed to show this data, this is what was expected for the whole of the genome, how is it that we are more than a gorilla than a chimp?

How it is that 15% is more like Gorilla than a chimp, the most plausible explanation is “In complete lineage sorting” * no time to explain this…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Patterns at the region around the gene, close relationships to gorilla relative to chimp

If the bars go above the dotted line, suggests more like a Gorilla than a chimp, if it goes bellow it suggests more like a chimp then a gorilla.

Its all about chromosomes, it can be seen that there is a scattering around the genome.

If you look around the genes it looks as though the gene are more like gorillas then chimps

Interestingly some association with disease causing variants…

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Human disease variants

In several cases, a protein variant thought to cause inherited disease in humans is the only version found in all three gorillas for which we have genome-wide sequence data. Striking examples are the dementia-associated variant Arg432Cys in the growth factor PGRN and the hypertrophic cardiomyopathy-associated variant Arg153His in the muscle Z disk protein TCAP, both of which were corroborated by additional capillary sequencing. Why variants that appear to cause disease in humans might be associated with a normal phenotype in gorillas is unknown; possible explanations are compensatory molecular changes elsewhere, or differing environmental conditions. Such variants have also been found in both the chimpanzee and macaque genomes.

Some genes that is associated with diseases

Mean gene expression distance between human and chimpanzee as a function of the proportion of ILS sites per gene. Each point represents a sliding window of 900 genes (over genes ordered by ILS fraction); s.d. error limits are shown in grey.

The more a human gene is like a gorilla gene the more different the pattern of gene expression which it shows will be in us relative to chimps. As ILS fraction gets bigger – more like Gorilla, X

The more like a chimp the more compatible expression in humans – the more like gorilla the more different it is to chimps

CTCF – a protein essential to vertebrate development that is involved in transcriptional regulation, chromatin loop formation and protein scaffolding

A

Segmental duplication distributions. (A) SDs (>20 kb) were classified as lineage-specific or shared based on a three-way comparison of human, chimpanzee, and gorilla genomes. Numbers are in megabase pairs; all SDs were validated by interspecies arrayCGH; (*) megabase pairs adjusted for copy number.

Slide 152 – 184 not covered but look at for extra reading

How well did you know this?
1
Not at all
2
3
4
5
Perfectly