Lecture 5 DA Flashcards
What are some reasons why we sequenced the human genome (3)?
- Because its there - a bioinformatical challenge.
- Helps against inherited diseases, including those we don’t know about.
- Helps understands consequences of mutation.
What is responsible for the phenotypic diversity among different individual humans?
Single nucleotide polymorphisms - SNPs.
What is more important, the nucleotide sequence or the protein sequence?
Protein sequence.
Which chromosome was sequenced first, and why? Which came after?
22 because it’s the shortest. 21 came after.
Describe the hierarchical approach to sequencing the human genome (5).
- Different groups are each given a chromosome to sequence.
- The groups generate bacterial artificial chromosome sequences (BACS).
- BACS were divided, and shotgun sequencing was done on them.
- High fidelity maps with identifiable motifs allowed them to detect overlapping regions and assemble the sequence.
Describe the shotgun approach to sequencing the human genome (6).
- DNA is isolated and chopped into fragments.
- Fragments are cloned into vectors, and sequenced.
- Overlapping genes are combined to assemble the genome into contigs.
- Scaffolds generated from contigs.
What is celera sequencing, and what is it like to hierarchical sequencing?
Celera sequencing is a whole genome shotgun sequence at once. Finished faster than hierarchical approach.
At how many locations do SNPs occur?
3m.
How many genes total were found?
~51k.
How many coding genes were found?
~20k.
How many non-coding genes were found?
~20k
What are pseudogenes, and how many were found?
Genes that seem to be protein coding, but mutation rendered them non-coding. 18k found.
How many genes with variants were found?`
~20k.
How many mRNA genes were found? What does this mean?
98k. For about every gene, there are 5 mRNAs that can be made, meaning we technically have ~100k genes.
What % of the genome is coding? What % is repeating junk DNA?
Coding -
What are some issues with being able to sequence the human genome?
- Who owns the information.
- Who can access it (police, insurers, employers etc)
- Impact on a person
- Foetal genetic testing - counselling/accuracy
- Patenting the sequence - impact on medical discoveries
What % of the genome encodes small RNAs? What do they do?
8-20%. They are regulatory, and can inhibit mRNA translation.
Why is junk DNA believed to be so important?
It is believed to be like the operating system of the genome, running the coding genes.
What number of RNAs are believed to control how a given protein is switched on or off?
For every protein, 10 times that number of RNAs control it. This depends on the cell type/developmental stage.
What is the output of classical sequencing methods such as sanger sequencing?
500-1k base pairs.
What is the output of next generation sequencing methods (NGS)?
Billions of base pairs.
Describe how sanger sequencing works (5).
- DNA sequence is amplified (PCR).
- Primers are annealed.
- dNTPs are used for extension.
- ddNTPs labelled with fluorescence are used to terminate the sequence one base at a time.
- Fragments seperated using gel/capillary electrophoresis.
What are the benefits of NGS? What are the limitations?
Benefits
-Huge sequencing cap vs classical sequencing
-Rapid throughput/output - very quick
-No gel electrophoresis needed
Limitations
-Expensive, only economic when using large number of base pairs
How do NGS sequencing work (2)?
- Full genome immobilised on a chip, 100 base pairs long.
- All sequenced at once, very quickly.
What is the sequence quality score in NGS?
Prediction of probabilities of an error in base calling.
What is the most common way of genome assembly?
Denovo
How does denovo work?
- Data froms equencing is partitioned
- Overlaps are found between datasets, building the genome gradually
- Forms segments called contigs
- Contigs are used to form scaffolds
Does denovo assembly require a reference sequence?
No.
Gaps can be found between contigs in denovo assembly. How can they be closed (2)?
Hope that a clone can be used to close the gap.
Otherwise can be closed using PCR with a primer at the end of the gap.
What is annotation in bioinformatics?
Determining what the gene does.
What is the data used denovo assembly based on?
De bruijn graphs.