Week 7 Flashcards
What is genomics?
The study of the entire hereditary information in an organism, which is mostly encoded in the genome
What are genomes?
The sequence of all the DNA in a cell
What are subsets and products of the genome?
Mitogenome the mitochondrial genome
Exome all the exons that could potentially be expressed
Transcriptome the expressed genes (expressed exons) in a particular tissue or set of tissues that you are studying
proteome the proteins
Metabolome other metabolic products…
Microbiome the combined microorganisms inhabiting a Particular environment (e.g. your gut), which can be detected using sequencing strategies
What is needed to study genome?
You need its DNA sequence
What species were prioritised for sequencing?
1 - fuzzy or good to eat eg tomatos, rice, cows and dogs
2 - If it belongs to an evolutionarily, scientifically, or economically important species eg ants, bees nematode, mosquito, Arabidopsis and human
What is the number genome sequencing of bacteria?
Thousands of species of bacteria (a few hundred dollars per genome, these days)
What are examples of genome sequencing projects?
Vertebrate genomes project - generate error free genomes of all 66,000 extant vertebrate species
Darwin Tree of life project - gene sequence of all life living in the UK
Earth Biogenome project - gene sequence for all life on earth
What is an overview of the shotgun sequencing method?
Collect the organism and extract a lot of high-quality DNA (long strands)
break into fragments (enzymes, sound)
read the fragments with a high-throughput sequencer (currently Illumina, PacBio and Nanopore machines are dominant)
Piece together the fragments
Recognise the components (annotation)
How large of the DNA fragments created in shotgun sequencing?
De novo genome assembly is piecing together an encyclopedia from 300-500-letter fragments of sentences
What are the cons of shotgun sequencing?
It takes a LOT of work and money to make a ‘finished’ genome from raw fragments. Most published genomes are just tens of thousands of fragments (contigs and scaffolds), but are long enough to read lots of genes
What is the order for shotgun sequencing?
Reads –> Contigs –> Scaffolds –> Chromosomes
What is the pros and cons of long-read sequencing?
Long-read sequencing = less rebuilding needed afterwards!
Better for reading repetitive regions of the genome currently error prone
What is the length of PACBIO and Nanopore reads?
PACBIO - 20-40 kb read lengths
Nanopore - Up to 100 kb read lengths
What are the uses of sequencing genomes?
Genomes themselves are interesting to study
(Try) to find all the genes involved in a phenotype, not just one or two
To reconstruct deep phylogenies (phylogenomics)
Cancer genomics
Inform conservation strategy
What are examples of organisms with variable gene size?
Influenza - 11
E.coli - 4,149
Fruit fly - 14,889
Chicken - 16,736
Human - 22,333
How can genome size vary?
Basic features similar, genome size is highly variable
10,000 fold range between fungi and flowering plants
Number of genes varies much less
What are examples of genome size and number of genes in plants?
Arabidopsis thaliana - ~25,000 genes, 135 Mb genome size
Canopy plant (Paris japonica) - ?? genes, 152,000 Mb
What is a case study about investigating gene diversity?
Three difference identical looking species of fish
1 is a diploid (Corydoras maculifer), 1 is a tetraploid (Corydoras aragu) and 1 is an unknown
Investigated the variation in the immune genes
Does the increased genome size and copy number have difference in immune genes?
What was investigation into genetic diversity of different fish species looking at?
TLR1 and TLR2
PCR amplified 2 toll-like receptor genes
2.5 Kb each
What was the overview of the sampling of the difference fish species?
Polyploid - n = 30
Diploid - n = 23
Sequenced on NextSeq platform
What was the genetic diveristy metric used?
Single Nucleotide Polymorphisms leading to changes in amino acid sequence
What was the genetic diveristy metric used?
Single Nucleotide Polymorphisms leading to changes in amino acid sequence
What is a synonymous SNP?
A SNP that has a change in nucleotide sequence but not in the amino acid sequence
What is a non-synonymous SNP?
A SNP that has a change in nucleotide sequence causing a change in the amino acid sequence
What is important about haplotypes for genetic diversity?
Number of haplotypes an organisms has can suggest what ploidy it is
How can you measure genetic diveristy using SNP and haplotype?
Sequence all the DNA compard to a target DNA sequence with sequencing depth showing how many times a sequence has been repeatedly sequence
Then find a SNPs and count then up eg A in base and frequently in other strands
What is the difference between haplotype number of a diplod and tetraploid
SNP ratio of 50:50 shows that it is diploid
SNP ratio of 25:75 shows that it is tetraploid
What was the difference in SNPs between the diploid and tetraploid fish?
Diploid fish has very few SNP across all groups eg Non-synonymous/ synnoymous and both TLRs
Tetraploid fish had more SNPs across the board
What can show the ratio for the haplotypes of SNP in tetraploid/ haploploid?
Histogram
SNP read ratio on x axis around 0.5 is a diploid
SNP read ratio on x ratio with two major peaks around 0.25 and 0.75 shows tetraploid
What is an examole of comparative genomics?
Comparing the genes involved in echolocation between micorbats and toothed whales
What was the candidate gene for echolocation?
Prestin gene (important for ‘electromobility’ of the cochlear ear) from echolocating & non echolocating bats, echolocating dolphin, and many (non-echolocating) mammal
What is the overview of the phylogenetics of the prestin gene?
High suggestive of convergent molecular evolution of a key geen for echolocation
What are the problems of using the prestin gene?
Its just one gene and the trait of echolocation can’t be due to one or a few genes
There mist have been adaptive changes throughout the genome
Further studies found other examples of convergent genes
What is the overview of genome wide comparison of echolocating animals?
Compared whole genomes of 6 species of bat (4 echolocating and 2 non-echolocating), 1 species of (echolocating) dolphin, and 15 other (non-echolocating) mammals.
~20-30,000 genes per species, but their genomes are very fragmented, so they ended up with 2,326 coding sequences (~genes) that were in all 22 species
Made a huge phylogeny from all the genes (known as phylogenomics)
What where the results of the genome wide comparison of the origin of echolocation?
Echolocating independatly evolved across species
Alternative (false) phylogeny that groups all the echolocating species together (think of the Prestin gene example)
How supported was the alternative phylogeny based on genes like prestin?
400-800 genes that supported the echolocators-together
These 400-800 genes are candidates for genes that were selected to allow echolocation, and they included the 7 candidate genes that have been published before (e.g. prestin).
But they also found many other hearing-related and vision-related genes
What is the function of resequencing?
Compare individuals within a single species
What is the advantage of having a reference genome?
Once you have a high-quality, reference genome, the genomes of subsequent individuals from the same species are much easier to assemble. If you already know the book, you can assemble similar versions of the book from fragments, even if you have fewer fragments
How expensive is resequencing?
Resequencing of humans is very common these days and costs less than $1000 per genome
What is an example of whole genome resequencing?
Whole genome sequencing of Oreochromis cichlid fish
Why did they whole genome sequence a large number of genome of cichlid fish?
Use the resequencing data to identify ideal candidates to be bred from for aquariums and hobbists to reduce the pressure on natural populations of cichlid fish as they know the genome
What is an example of resequencing on evolutionary history?
Relationship between gray wolf species and compared to domestic dogs
Compared ancient wolf DNA which showed that modern dogs evolved from East eurasian wolf population