Lecture 11: genomics Flashcards
How big is the human genome?
3 billion nucleotides
What is the difference between genetics and genomics?
- Genetics refers to the study of inheritance and the ways that traits of conditions are passed down from one generation to the next
- Genomics describes the study of all a person’s DNA
What are the different types of gene sequencing methods?
1) Sanger sequencing ()
2) Microarrays
3) Illumina DNA-sequencing ()
4) PacBio Long-read Sequencing
5) Nanopore Long-read sequencing
What is the process of Sanger sequencing?
- Gene of interest is cloned into vectors, and then the double-stranded DNA in the vector is converted to single stranded DNA by denaturation with alkali or boiling.
- Thermal cycle sequencing is carried out by DNA polymerase using one primer. Involves enzymatic DNA polymerase synthesis of a second strand of DNA, complementary to existing template.
- Fluorescent dideoxynucleotides are added at low concentrations, and get randomly incorporated into the new strand. They lack the hydroxyl group so the chain gets terminated. This process creates different lengths of PCR products labelled with terminal fluorescent dNTP.
- This process is repeated separately with each of the 4 dideoxy bases (4 concurrent strand synthesis reactions). Thus, 4 separate reactions result in 4 families of terminated strands.
- The double stranded DNA can be separated by heating, and the fragments are separated by electrophoresis.
- Fluorescence (of each dideoxy base) detected and read by the detector.
What is the process of microarrays sequencing?
- DNA is added to a microarray chip that contain probes for hundred of thousands of sequences
- Each spot has DNA oligonucleotide probes for either a reference or variant sequence
- DNA from an individual will hybridize with these probes if they are identical to the probe. This produces a fluorescent signal which is read
(this is very outdated now)
What is the process of Illumina sequencing?
- Library prep: The DNA to be sequenced is randomly sheared, and sequence adaptors are ligated onto both ends of a DNA fragment.
- These fragments are added to a flow cell that have DNA sequences that are complementary to the sequences on the flow cell, allowing the DNA fragments to bind to the flow cell surface.
- Bridge amplification is performed to generate multiple copies of the same DNA
- The DNA bends and binds to the sequences on the flow cell and get replicated by DNA polymerase and then separate to give new DNA strands - Paired end sequencing is carried out
- The incorporated fluorescent is recorded to identify incorporated
nucleotide
- DNA is read from both the left and right side (paired end sequencing
Why is there a need to conduct bridge amplification during illumina sequencing?
Done to amplify fluorescent signal during sequencing
What is the process of PacBio SMRT long read sequencing?
- The DNA is put into a nano well (just one strand)
- The DNA is replicated using fluorescently labelled nucleotides, and each nucleotide added is visualised in real time
- Colour signal is converted into ATGC base calls
What is the process of nanopore long read sequencing?
- Single stranded DNA is pulled through a nanopore
- Different DNA sequences causes a different electrical current profile (the electrical current of the nanopore is perturbed slightly)
- Measured electrical current signal is converted into ATGC base calls
What is the importance of a reference genome?
A reference genome allows you to readily identify genetic variations by comparing your sequence to it
How is a reference genome sequenced?
- Multiple copies of the same genome is shredded into many random pieces and then each piece is sequenced (by Illumina sequencing)
- We then try to piece back the genome by overlapping the DNA sequences with each other(de novo assembly)
What type of DNA variations are there?
- Single nucleotide variants
- Small indels (insertion or deletion)
- Copy number alterations (no. of chromosome copies)
- Structural variations
What is the purpose of whole exome/targeted sequencing?
Protein coding regions represent only 2% of the whole genome, so we can save cost by just sequencing the protein coding regions alone
Whole exome sequencing: exons of all genes captured
Targeted sequencing: exons of a subset of genes captured (fewer number of probes)
What is the process of whole exome/targeted sequencing?
- Shear proteins
- Add oligo primers that bind directly to the DNA of interest (exons)
- Wash off the undesired fragments
- Sequence the remaining exons
What are the sources of DNA variation?
- Germline variation
- Somatic mutation
- De novo mutation