Bioinformatics Flashcards
What is DNA encoded by and what is its purpose?
Can hold information
Encoded using nucleotide sequence
Describe the nucleotide base pairs in DNA and what their purpose is
Complementarity (AT,GC)
• Copying and repair
• Mechanism for heredity
Why do organisms differ to each other?
• Organisms differ due to different sequences
What is the genetic similarity between 2 humans?
o The genetic similarity between two humans is 99.9%
What is the genetic similarity between a human and a banana
o The genetic similarity between a human and banana is 60%
Describe the structure of DNA
DNA structure- made of nucleotides built into DNA strand
• Anti-parallel DNA helix
• Each strand is complementary
• Genes spaced along strands
What is DNA packaged in?
Packaged into chromosomes
What is the genome?
• Genome- complete DNA/RNA sequence required to maintain or make an organism
What is genomics?
• Genomics- study of entire genome and products (RNA, proteins)
What is the aim of genomics and what does it involve?
o Aim: To understand how genome contributes to functioning of cell, organism, population and ecosystem
o Involves large scale molecular biology and genetics which makes complex data sets
o Relies heavily on automated data acquisition, computer-based data analyses (bioinformatics)
Describe the number of base pairs in the human genome?
• More than 3 billion base pairs
Describe the number of protein-coding genes
• About 30,000 protein-coding genes
What is the function of interspersed DNA?
o Interspersed DNA- which seems to have regulatory function
Why is genomics potentially useful?
• Human health
o Detect genetic variants that increase disease risk
o Identify cancer mutations, search for cure/treatment
o Pathogen identification and mitigation (e.g. SARS disease)
Identify antibiotic targets
Rapid disease identification
• Agriculture/animal breeding
o Enhance productivity, consistency and progeny quality
o Identify disease-causing genetic variants
• Study biology of non-model organisms
• Answer fundamental questions
Describe 2 methods for sequencing DNA
- Sanger sequencing, older method still in use
* De novo sequencing of genomes
Describe how Sanger sequencing sequences
• Sanger sequencing, older method still in use:
o PCR or cloning in plasmid of gene of interest, to make many copies of the same piece of DNA
DNA separated into 2 strands
o Dye termination (Sanger) sequencing of product
Take purified fragments of DNA copies
Add dNTPs, primers (complementary to each end of fragment), DNA polymerase
Add ddNTPs- fluorescently labelled chain terminators that identify the last base incorporated into the chain. No 3’-OH group.
• Once ddNTP added to chain, chain elongation stops
Fragments are put together to reveal sequence of original DNA sequence
What is a chromatogram and how do you read one?
o Chromatogram-result of Sanger Sequencing
Each peak represents a fluorescent flash from labelled nucleotide
• Each different nucleotide is labelled different colours- order of colour=order of nucleotide
What DNA strands is a chromatogram suitable for?
Suitable for short stretches of DNA (about 700 bp) but not suitable for long stretches of DNA as would take too much time
Could sequence a whole genome a bit at a time
What are the advantages of de novo sequencing of genomes?
• De novo sequencing of genomes o Newer method of sequencing Works for any organism No need to know gene sequence in advance • Don’t need primers o Very rapid o Quite cheap Human: about $1000 USD Send sequence to lab and may get sequence in small bits back in 5 days and get the entire synthesised genome back in 2 months
What other names is de novo sequencing known by?
o Whole genome shotgun sequencing/massively parallel sequencing/name based on machine manufacturer (e.g. illumina or 454)
How does whole genome shotgun sequencing occur?
Extraction of DNA -> many copies of the genome (1/cell)
Cell samples are inserted into sequencing instrument where high intensity soundwaves break the DNA into billion of pieces which are only 600 bases long
Special tags are added to the ends of the fragmented DNA
• Add adaptors to fragments to anchor them to a support
Tagged strands of DNA attach to a tagged slide
In a sequencer, each piece of DNA is copied hundreds to thousands of time
• Creates clusters of identical DNA fragments
• PCR to amplify each fragment
Sequencer reads the DNA in parallel, one base at a time using different coloured tags for each DNA base
Special sensors within the machine detect different coloured tags
Computers piece together individual DNA fragments and determines order and orientation of contigs using overlaps, laying out reads and make a consensus
Genome sequenced
What are reads?
• Product of sequencing is called reads
Why do we want to sequence the strand many times in whole genome shotgun sequencing?
• Want to sequence strands many times as easier to find overlaps this way (sequence in depth)
What are contigs?
• Contig- stretch of contiguous sequence
o Reads stitched together with no gaps
Describe how errors are dealt with in de novo sequencing
o Dealing with errors in de novo sequencing
Take the average of the multiple nucleotide reads at that position and reach a consensus
To be more confident in average, should have more reads
What is deep sequencing?
• Deep sequencing- sequence every nucleotide many times
What amount of overlap is acceptable in de novo sequencing?
Overlap acceptable for assembly depends on error rate
Describe the relationship between read length and assembly
The longer the read, the easier the assembly
Describe the relationship between error rate and assembly
High error rate makes it harder to assemble genome
What is the best type of de novo sequencing in terms of reads and error rate?
The best type of sequencing to have has long reads and no errors
• Trade off between reads and errors
What is Moore’s law?
• Moore’s law
o Cost of genome sequencing declines over time
What are the pros and cons of sequencing your own genome?
Pros
=Find out about disease risk to mitigate risk
=Genome sequenced with partner sequence to find risk of passing on disease to child
Cons =Privacy issue/security issue =-Life insurance =If increased risk of disease that can’t be cured, will worry about it- worry about disease risks =Risk amount is not precise
What are the factors affecting genome assembly?
- Error rate in DNA sequencing
- Read length
- Repetitive DNA
- Size of the genome
- Number of reads
Describe how error rate in DNA sequencing affects genome assembly
o Overlapping sequences may not be identical
Describe how read length affects genome assembly
o Shorter reads will be harder to assemble (less overlap)
Describe how repetitive DNA affects genome assembly
o Repetitive DNA maps to multiple locations within the genome and can’t be assembled properly, which is why 1% of the human genome remains unsequenced
Since repeated sequences are identical, they cannot be assigned to a unique genomic location: hence, relative locations and orientations of gene contigs cannot be determined
Transposons and retrotransposons
o Repetitive DNA makes genome assembly hard
What percentage of the human genome is repeats
o 50% of the human genome is repeats
Describe how the size of the genome affects genome assembly
o The larger the size, the more reads is required to cover it and the more difficult it will be to assemble it
What is genome size and what is it typically measured in?
o Genome size- the total number of base pairs of DNA in a haploid set of all chromosomes of an organism. Typically measured in
Base pairs
Kilobase pairs
Megabase pairs
How does number of reads affect genome assembly?
o More reads means there is better assembly (deeper coverage)
Higher coverage (average redundancy) is better
o More opportunity to correct errors and find overlaps
What is coverage?
Coverage- the number of times on average that each base pair in the gene has been sequenced
How do you calculate genome coverage?
Genome coverage= Number of reads * (Average read length/Length of genome or contig)
• Units must be constant in calculation, so check the units
What is the purpose of paired-end sequencing?
• To help solve the issue of repetitive DNA
How do you perform paired-end sequencing?
- Sequence both ends of DNA fragments of known size
- Can flank repetitive elements and assign them to some scaffold
- Sequenced fragment needs to be longer than the repetitive element for paired-end sequencing to work