Genome Sequencing Technologies Flashcards
How does sanger dideoxy sequencing show the position of bases on a gel?
dATPs can do chain extension and ddATPs are terminator nucleotides as there is an OH group missing. This cuts DNA strands short and can be used to show positions of bases.
How does dye-terminator sequencing show the positions of bases?
Fluorescently-labelled ddNTPs incorporated into DNA strands. This is done in a capillary tube
How were DNA fragments broken up for sanger sequencing? How large were the fragments before putting into bacteria?
Sonication
100-220kb to be cloned in bacterial artificial chromosomes (BACs)
How are DNA fragments in BACs mapped?
800bp fragments sequenced and put into the library
What sort of sequencing did IHGSC use?
Clone-by-clone sequencing (less complex assembly than shotgun)
What sort of sequencing did Celera use
Shotgun sequencing with an automated ABI 3700
How was massively parallel sequencing a step up from sanger sequencing?
Multiple molecules could be sequenced at the same time
What sort of sequencing does Illumina sequencing use?
Massively Parallel Sequencing
How many reads can Illumina sequencing produce in 1 run?
8 billion
125bp max read length
What steps are involved in Illumina sequncing?
- Sample Step
- Cluster Generation
- Sequencing
- Data Analysis
Name what is added to ends of DNA in the sample step of Illumina Sequencing?
Adaptor regions
Motifs added through reduced cycle amplification, e.g sequence binding site, indices, oligonucleotide complimentary regions
What happens during cluster generation in Illumina sequencing?
DNA fragments isothermally amplified
oligos added and hybridise with the lawn adaptors
Polymerase synthesises a complimentary strand, double strand denatures, the original strand is washed away. The complimentary strand is amplified through bridge amplification
Outline bridge amplification
The complimentary strand folds over and hybridises a to the second type of oligo. Polymerase generates a complimentary strand which denatures and forms separate strands. Reverse strands are washed off and prime ends are blocked.
How does sequencing with Illumina work?
Reversible terminator nucleotides prevent chain extension and have fluorescent nucleotides which correspond to a nucleotide.
500bp fragments are selected.
What types of third generation sequencing are there?
PacBioSMRT
oxford nanopore
Outline the developments of third generation sequencing over Illumina
Single molecule sequencing- no amplification needed
Real time sequencing
Ultra-long read lengths
Can identify base modifications
Describe Zero-Mode Waveguides
Flow cell surface with nanowells where fluorescence can be detected from a single base.
How are nucleotides detected in PacBio?
The phosphate group of nucleotides are fluorescently labelled. The intensity gives a pulse allocated to their nuceotide, e.g “G” pulse in a zero wave guide
How are circular consensus sequences produced in PacBio?
SMRT-bell adaptors are a single DNA loop and a primer attaches to the adaptor and goes around the entire loop
What can be used to sequence hard-to sequencing areas of a genome
Oxford Nanopore
What are the motor proteins used in Oxford nanopore?
Polymerase or Helicase
Give a definition of bioinformatics
The science of collecting and analysing complex biological data
What is it called when two genetic sequences are matching, e.g 70%
70% identify
How do homology and identify differ
Homology is having the same relation and evolutionary origin, but not function. This is absolute. E.g, mammals having the same bones in limbs
When aligning base sequences, which considerations can maximize a match?
Deletions
Insertions
Inversions
Reverse strands
How can nucleotide alignments be scored?
Scoring matrices, such as PAM70
Matches- positive
Gaps- negative
Extending gaps- lower penalty
Mismatches - Negative
How do global and local alignments differ?
Global alignments assume the sequences are homologous and every position is conserved
Local alignments just match parts that match very well
Why is genome assembly De Novo?
You don’t know whether what is being sequenced is correct
Outline 5 challenges of genome assembly (will allow comparisons to jigsaws)
De Novo assembly
Repeats
Coverage bias (missing pieces)
Contamination
Replication meaning multiple genomes may be sequenced at once
Circular chromosomes
Sequencing errors (broken pieces)
What do De Bruijn graphs do?
Line up overlapping sequences to find consensus sequences
How can repeats be dealt with when sequencing a genome?
Resolving bubbles
Read pairs
Increasing chamer repeats to make de Bruijn graphs smaller, making sequences less likely to repeat
What are the longest open reading frames in genomes likely to be?
Genes
How is gene finding made easier?
Neural networks
What can IGV (integrative genomics viewer) be used for?
Visualising map reads
Showing SNPs
Showing Homozygous and Heterozygous map reads
What type of SNPs are there?
Intergenic (within genes)
Intronic
Synonymous
Regulatory
Non-synonymous
Nonsense
What can changes in nucleotide sequences result in?
SNPs
Larger chromosomal arrangements
Insertions
Deletions
Ect
How can structural variants be genomically detected?
Coverage depth caused by duplication
What can studying transcriptomics tell us?
Which genes are expressed
When they are expressed
How much they are expressed
After de novo assembly of a genome, how are consensus sequences made?
Using de Brujn graphs to find repeats between samples
How can coverage depth of variants of a genome increase?
Duplications or deletions being on multiple strands
Why might read pairs be mapped further apart than they are?
Deletions. These split reads show structural variation
How can RNA transcripts be identified?
Northern blotting
How can transcript expression be quantified?
reverse transcriptase quantitative PCR
How are probes on microarrays generated?
Short oligonucleotides generated by photolithography to find certain genes
How is expression on microarrays shown on graphs?
the log is taken to make the base quantities symmetrical
What are some limitations of microarrays?
low resolution
Microarrays are predesigned, we don’t know if probes may be missing
If there is a signal, a similar DNA fragment may be in the sample, not the same
Not direct sequencing
Why might RNA sequencing be used?
Direct sequencing method
larger dynamic range than microarrays
Allows differences in the reference genome to be identified
List some silencing genes
X inactivation
germline genes
repeat regions
How do organisms utilise gene methylation other than upregulation?
Distinguishing parental and maternal alleles.
Bacteria differentiating it’s DNA from phage DNA
How can methylation context determine where methylation occurs?
CpGs (Cytosine polyGuanine) are 60-80% methlated in mammals so can be found with bisulfate sequencing?
How does bisulfate conversion work?
The genome is treated with
sodium bisulfate so unmethylated cytosines become uracil. This comes up as T when sequenced and can be compared to the original sequence.
How does reduced representation bisulfate sequencing work?
DNA is digested with MSP1 which recogneses CCG. Fragments are repaired with adaptors, then selected by size. Bitsulfate conversion and PCR amplification is then done
Unmethylated come up in the sequence as Ts and can be compared to the original sequence
How is direct methylation sequencing done with PacBio?
SMRT DNA sequencing is dine as gaps in fluorescent intensity are analysed for polymerase kinetics.
How does chromatin immunoprecipitation work?
- Proteins are covalently crosslinked to DNA after treatment with formaldehyde
- Chromatin is sheared by sonication or endonucleases. Exonucleases allow bound DNA to be trimmed to the binding sites.
- Immunoprecipitaion and purification of bound DNA with antibodies
How does chromatin immunoprecipitation work?
- Proteins are covalently crosslinked to DNA after treatment with formaldehyde
- Chromatin is sheared by sonication or endonucleases. Exonucleases allow bound DNA to be trimmed to the binding sites.
- Immunoprecipitaion and purification of bound DNA with antibodies
How do genes interact with enhancers in 3C
One to one. PCR is done with primers specific to both interaction regions.
How do genes interact in 4C, 5C and Hi-C?
4C- one to all
5C- many to many
Hi-C- all to all
What does the ENCODE project do?
Identify all the function regions of the human genome
Uses 5C and CIhIA-PET to look at regulatory expression and protein interactions
What did ENCODE show the genome as?
80% functional
20% regulating 2% encoding proteins
What does TraDIS use to sequence
Transposons are replaced with selective markers with antibiotic resistance. Genes disrupted by transposons are inactivated. The mutant doesn’t survive if the gene is essential. Transposons can be inserted into non-essential genes
What sort of sequencing did ECODE use?
Illumina