Molecular 7-13 Flashcards
In non-homologous sequences how do you find local alignments ?
Add gaps
When comparing homologous sequences what does similarity not tell you ?
The order of descent
Why is comparing amino acid sequences less prone to error than comparing DNA sequences ?
DNA sequences contain indels
Is a single long gap more likely then multiple short gaps during sequence alignment ?
yes
If different alignments generate a similar score what is likely to be the best alignment ?
The one with the fewest gaps
What receives the largest scoring penalty during automised alignment ?
Gap penalties > mismatch penalties
Why does the COI gene have no indels with closely related species ?
Slow rate of evolution
What is needed for the construction of phylogentic trees with regards to sequence quality ?
> 20% amino acid identity for protein coding sequences
>66% for non-coding DNA sequences
What is an orthologous gene ?
A gene that is in 2 separate species due to a speciation event
What is a paralogous gene ?
homologous gene that occurred due to a duplication event
What is a xenolog ?
A gene that is acquired by HGT
What type of genes are phylogentic trees based on ?
Orthologous genes
Other than trees being based on base or amino acid similarity what else can trees be based on ?
Character states such as specific bases in a specific site,
What are the benefits and limitations of 3rd generation sequencing ?
Benefits: Avoid PCR thus reduces error and amplification bias Real time measures No DNA fragmentation required Low price per Mbp generated Cons: High error rate
What are the 3 steps in 2nd generation sequencing ?
- DNA fragmentation and adapter ligation
- Clonal amplification by emulsion/bridge PCR
- Cyclic array sequencing
What defines ancient DNA and what are some of its properties?
Ancient DNA (aDNA) is defined as that recovered from biological samples that were not preserved specifically for later DNA analysis. It is typically broken into short fragments and damaged by exposure to high temperatures, moisture, ultraviolet radiation, oxygen.
What are some of the uses of aDNA?
It is used for the analysis of the phylogeny of extinct species and to estimate timing and impact of events, such as migrations, hybridisation and extinctions
Which is the name of the science that uses ancient DNA?
Paleontology
What are they key steps in Sanger sequencing ?
- Attain ssDNA and add radioactive primers
- Seperate into 4 tubes and add ddNTP, polymerase and dNTPs.
- Run through a denaturing gel
What are 2 disadvantages of non automised sanger sequencing ?
Cant run everything in one lane
Cant read sequences at top of gel
How was sanger sequencing made automised ?
ddNTPs were fluorescently labelled which can then be read by a laser
What are the 4 applications of protein sequencing ?
Identification of the protein family
Prediction of the sequence of the gene encoding the particular protein
Discovery of the structure and function of the protein
Evolution history of protein family
What are the steps in sangers protein sequencing method ?
- Add sangers reagent to protein
- Gives a derivative protein which form a yellow DNP on the terminal amino
- Apply acid hydrolysis which cleaves the protein and yellow DNP which is attached to the terminal amino acid.
- Identification of terminal amino acid by chromatography
- Repeat method with different fragmentations of the protein
What are the steps in Edman degradation protein sequencing ?
- Add PITC to protein which reacts with terminal amino group
- Apply acid hydrolysis which cleaves the protein
- Repeat steps 1-2 until all terminal amino groups have been cleaved
- Analyse PTH residues by TLC and HPLC
What is mass spectrometry ?
Analytical technique that ionises chemical species and sorts them ions based on size and mass.
What does mass spectrometry allow ?
Allows the precise identification of a peptide mass and thus the amino sequence.
What are the two methods of mass spectrometry ?
Peptide mass fingerprinting
Tandem mass spectrometry
What are the steps in mass spectrometry ?
- Digest protein
- Sort protein by chromatography and charge added in a MS machine
- This allows mass identification of protein fragments
- Additional fragmentation by collision
- Compare fragments peaks to database
What is the difference between tandem mass spectrometry and peptide mass fingerprinting ?
Additional fragmentation step done by HPLC
What are the 2 theories regarding mt and cp origins ?
Autogenous: descended from nuclear genome
Endosymbiotic: Descended from free living prokaryotic cells
What are the 2 theories within the endsymbiotic theory ?
And what do they state ?
The gradual origin: Aerobic bacteria + Anaerobic prokaryote = aerobic eukaryote with mt.
Primitive aerobic eukaryote + photosynthetic bacteria= Photosynthetic eukaryote with mt and cp
The fateful encounter: same events by chance
What evidence is their for the Endosymbionts theory ?
mt and cp have : Developed cell membranes Can self replicate Produce proteins Have own DNA
What are conserved in land plant chloroplasts ?
introns
What is genomics ?
defined as the study of the genome—the complete set of genes and DNA information from complete organisms, genes and their functions.
What is the difference between genomics and genetics ?
Genetics scrutinises the function and composition of a single gene where as genomics addresses all genes and their relationships in order to identify combined influences on the organism.
What is transcriptomics ?
Is the study of the transcriptome—the complete set of RNA transcripts that are produced by the genome, under specific circumstances or in a specific cell—using high-throughput methods, such as microarray analysis.
What does the comparison transcriptomics allow ?
Comparison of transcriptomes allows the identification of genes that are differentially expressed in distinct cell populations, or in response to different treatments.
What does proteomics allow ?
analyse the structure, function, and interactions of the proteins produced by the genes of a particular cell, tissue, or organism.
What is proteomics ?
is the study of the proteome,—the complete set of proteins expressed in a cell, tissue or organisms by a genome
What is metabolomics ?
is the study of the metabolome—the collection of all metabolites in a biological cell, tissue, organ or organism, which are the end products of cellular processes
What is name of the science that combines information generated from omics ?
Systems biology
What is systems biology ?
the computational and mathematical modelling of complex biological systems. It is a holistic approach to deciphering the complexity of biological systems that starts from the understanding that the networks that form the whole of living organisms are more than the sum of their parts.
What is metagenomics trying to answer ?
trying to identify who is in the community through genomic data and what are they doing through transcriptomics, proteomics and metabolomics.
What is the difference between genomics and metagenomics ?
Genetic material is recovered from the environment and not pure cultures.
Metagenomics is the study of a community
What are the 2 sequencing methods used in metagenomics ?
Amplicon sequencing- metabarcoding eg COI barcode for animals
Whole metagenome sequencing
What does metatranscriptomes show ?
Shows how the community respond to environmental changes . Show which genes are expressed
What does metaproteomics shows ?
Proteins produced by community
What is an application of metagenomics and transcriptomics ?
Bioremediation- Oil spill in gulf of mexico. Microbes involved in break down of hydrocarbons
What is an application of meta proteomics ?
Gives significant insight into how crohns disease occured through comparisons of healthy proteomes and disease proteomes
What are the limitations of metagenomics ?
DNA extraction is difficult
Assembly of DNA is difficult
Many important phenomena that occur at strain level
What is needed for the calculation of rate of point mutations ?
Frequency of mutant (q) Selection coefficient (s)
How do you calculate neutral mutation rate ?
Use non-functional homologous DNA of 2 species with known divergence and generation times and compare.
Around 100 new point mutations per gen
Why can synonymous mutations be used to calculate divergence ?
Because they do not effect fitness
Why do pseudogenes undergo a high rate of nucleotide substitution ?
Less selection pressures as they are non functional
What is meant by substitution rate ?
The rate it takes for all individuals of a species to have a particular mutation
What does correlation on the molecular clock allow you to calculate ?
Divergence for other species pairs
What controversies are there about the molecular clock ?
Rate of amino acid replacement per site is higher during speciation event
Dating may be wrong
Test developed independent of fossil records
How do you test a molecular clocks accuracy ? Describe it
Relative rate test :
Using 2 species with a common ancestor and an outgroup calculate number of substitutions between species.
If values are not significantly different from 0 the species evolved at the same rate
What does dN/ds>1 and <1 mean ?
dN= Number of non synonymous mutations per non synonymous site
ds= Number of synonymous mutations per synonymous site
>1 means positive selection
<1 means purifying selection
What are the 3 hypothesis that explain why rats substitution rates are higher than humans ?
Generation time effect = Short gen= faster substitution rate
DNA repair hypothesis= better DNA repair= low substitution rate
Metabolic rate hypothesis = High metabolism = high substitution rate
What is the relationship found in bird with regards to molecular clock ?
Strong negative relationships between body mass and substitution rates of birds. eg large bird = low rate
What is the difference between phylogenetic gradualism and punctuated equilibrium ?
PG says evolution and speciation is regular and slow
PE says evolution and speciation is calm periods interrupted by rapid evolutionary change and intense speciation
Why is mutation rate higher in mt protein coding genes than nuclear protein coding genes ?
Low fidelity of replication
Inefficient repair
High conc of mutagens
Smaller population size