PRE FI DNA SEQUENCING Flashcards
Refers to the ORDER OF THE NUCLEOTIDES in the DNA molecule.
DNA SEQUENCE
Applications of DNA sequencing in medical
laboratory:
o Detection of mutation
o Typing microorganisms
o Identifying human haplotypes
o Designating polymorphism
o Treatment strategies
SEQUENCING METHODS:
DIRECT DETERMINATION OF THE ORDER, or sequence of nucleotides in a DNA polymer.
Most specific and direct method for identifying genetic lesions (mutations)/ polymorphisms.
Types:
1. Manual sequencing (chemical (Maxam-Gilbert & Sangers sequencing)
2. Automated fluorescent sequencing (dye primer & dye terminator sequencing)
A. RNA sequencing
B. Next-generation sequencing
C. Direct sequencing: manual and automated
D. Bisulfite DNA sequencing
E. Pyrosequencing
C. Direct sequencing: manual and automated
MANUAL SEQUENCING:
Allan M. Maxam & Walter Gilbert
Requires a ds/ss version of the DNA region to be sequenced with 1 end radioactively labeled ( 32P)
Sequencing proceeds in 4 SEPARATE REACTIONS
Template: LABELED FRAGMENT
A. Chemical (Maxam-Gilbert) Sequencing
B. Dideoxy Chain Termination (Sanger) Sequencing
Chemical (Maxam-Gilbert) Sequencing
Addition of a _________ =
ssDNA would break at specific nucleotides
strong reducing agent (10% piperidine)
Chemical (Maxam-Gilbert) Sequencing:
o Sequence = bands
o Lane in which the band appeared = ID of
the nucleotide
o Sequence is read from the ____ to the ______ of the gel
BOTTOM (5’ end) to the TOP (3’ end) of the gel
Chemical (Maxam-Gilbert) Sequencing:
Run times of short fragments (up to 50 bp)?
1-2 hours
Chemical (Maxam-Gilbert) Sequencing:
Run times of Long fragments (>150 bp) ?
7-8 hours
MANUAL SEQUENCING:
Frederick Sanger
Uses DIDEOXYNUCLEOTIDES(ddNTPs) to determine the order/sequence of nucleotides in a nucleic acid
PRIMER complementary to DNA to be sequenced
Product detection of sequencing:
o Primer is attached at 5’ end to a 32P-
/fluorescent dye-labeled nucleotide
o Incorporate 32p/35S-labeled dNTPs in the
nucleotide sequencing reaction mix
(INTERNAL LABELING)
ddNTPs are added, terminating the DNA synthesis
(chain termination)
o Lack OH = 5’-3’ phosphodiester bond
cannot be established to incorporate a
subsequent nucleotide.
Components: Mixed in 4 reaction tubes
1. DNA template (PCR product)
2. Radioactivity-labeled primer
3. Enzyme (DNA polymerase)
4. dNTPs (all 4)
5. Buffer (20mM EDTA, formamide, gel tracking/
loading dyes)
6. Different ddNTPs in each of the 4 tubes
A. Chemical (Maxam-Gilbert) Sequencing
B. Dideoxy Chain Termination (Sanger) Sequencing
Dideoxy Chain Termination (Sanger) Sequencing
SEQUENCING REACTION of Dideoxy Chain Termination (Sanger) Sequencing?
thermal cycler (cycler
sequencing
Automated reading of DNA sequence ladder
requires fluorescent dyes (4 distinct colors) to label
primers/ sequencing events
1. Fluorescein
2. Rhodamine
3. Bodipy (4,4-difluoro 4-bora-3a-diaza-s indacene)
Fluorescent dyes can be distinguished by AUTOMATED SEQUENCERS
Approaches (to label fragments according to their terminal ddNTP): DYE PRIMER and DYE TERMINATOR SEQUENCING
Automated Fluorescent Sequencing
4 different fluorescent dyes are attached to 4
separate aliquots of the sample.
Dye molecules are attached to the 5’ end of the primer = 4 versions of the same primer with
different dye labels.
Products are LABELED AT THE 5’ end using the dye color associated with the ddNTP at the end of the fragment.
DYE PRIMER OR DYE TERMINATOR SEQUENCING?
Dye Primer Sequencing
1 of the 4 fluorescent dyes attached to each of the
ddNTPs.
All 4 sequencing reactions are performed in the
same tube.
Products fragments are LABELED AT THE 3’ end.
DYE PRIMER OR DYE TERMINATOR SEQUENCING?
Dye Terminator Sequencing
4 sets of sequencing products in each reaction are loaded onto a single gel lane/ capillary.
Fluorescent dye colors distinguish which nucleotide is at the end of each fragment.
Fluorescent detection equipment yields results as electropherogram.
Base calling: process of bases ID in a sequence by sequencing software.
Automated Electrophoresis
Software Programs Used to Analyze and Apply Sequence Data:
- Compares an input sequence with all sequences in a selected
database
A. FASTA FASTQ
FAST-All derived from FAST-P (protein) and
FAST-N (nucleotide) search algorithms
Biological data with quality score
B. BLAST Basic Local Alignment Search Tool
C. Phred
D. GRAIL Gene Recognition and Assembly Internet Link
BLAST Basic Local Alignment Search Tool
Software Programs Used to Analyze and Apply Sequence Data:
- Finds gene-coding regions in DNA sequences
A. FASTA FASTQ
FAST-All derived from FAST-P (protein) and
FAST-N (nucleotide) search algorithms
Biological data with quality score
B. BLAST Basic Local Alignment Search Tool
C. Phred
D. GRAIL Gene Recognition and Assembly Internet Link
GRAIL Gene Recognition and Assembly Internet Link
Software Programs Used to Analyze and Apply Sequence Data:
- Rapidly aligns pairs of sequences by sequence patterns rather
than individual nucleotides
A. FASTA FASTQ
FAST-All derived from FAST-P (protein) and
FAST-N (nucleotide) search algorithms
Biological data with quality score
B. BLAST Basic Local Alignment Search Tool
C. Phred
D. GRAIL Gene Recognition and Assembly Internet Link
FASTA FASTQ
FAST-All derived from FAST-P (protein) and
FAST-N (nucleotide) search algorithms
Software Programs Used to Analyze and Apply Sequence Data:
- Reads bases from original trace data and recalls the bases,
assigning quality values to each base
A. FASTA FASTQ
FAST-All derived from FAST-P (protein) and
FAST-N (nucleotide) search algorithms
Biological data with quality score
B. BLAST Basic Local Alignment Search Tool
C. Phred
D. GRAIL Gene Recognition and Assembly Internet Link
Phred
Software Programs Used to Analyze and Apply Sequence Data:
- Identifies single- nucleotide polymorphisms (SNPs) among the traces and assigns a rank indicating how well the trace at a site matches the expected pattern for an SNP
A.Polyphred
B. TIGR Assembler The Institute for Genomic Research
C. Phrap Phragment Assembly Program
D. Factura
A.Polyphred
Software Programs Used to Analyze and Apply Sequence Data:
- Uses USER - SUPPLIED and internally computed data quality information to improve accuracy of assembly in the presence of repeats
A.Polyphred
B. TIGR Assembler The Institute for Genomic Research
C. Phrap Phragment Assembly Program
D. Factura
Phrap Phragment Assembly Program
Software Programs Used to Analyze and Apply Sequence Data:
- Developed by TIGR as an assembly tool to BUILD A CONSENSUS SEQUENCE from smaller-sequence fragments
A.Polyphred
B. TIGR Assembler The Institute for Genomic Research
C. Phrap Phragment Assembly Program
D. Factura
TIGR - Assembler The Institute for Genomic Research
Software Programs Used to Analyze and Apply Sequence Data:
- Identifies sequence features such as flanking vector sequences, restriction sites, and ambiguities
A.Polyphred
B. TIGR Assembler The Institute for Genomic Research
C. Phrap Phragment Assembly Program
D. Factura
Factura
Software Programs Used to Analyze and Apply Sequence Data:
- Provides MUTATION and SNP DETECTION and analysis, pathogen
subtyping, allele identification, and sequence confirmation
A.Matchmaker
B. SeqScape
C. Assign
SeqScape
Software Programs Used to Analyze and Apply Sequence Data:
- Identifies alleles for haplotyping
A.Matchmaker
B. SeqScape
C. Assign
Matchmaker & Assign
SEQUENCING METHODS:
Determines a DNA sequence W/OUT HAVING TO MAKE A SEQUENCING LADDER
Relies on the generation of light (luminescence) when nucleotides are added to a growing DNA strand.
No gels, fluorescent dyes, ddNTPs
Reaction mix components:
1. ssDNA template
2. Sequencing prime
3. Sulfurylase
4. Luciferase
5. Substrates: adenosine-5’-phosphosulfate (APS) and luciferin
6. 1 of the 4 dNTPs is added to predetermined
order of the reaction
A. RNA sequencing
B. Next-generation sequencing
C. Direct sequencing: manual and automated
D. Bisulfite DNA sequencing
E. Pyrosequencing
Pyrosequencing
SEQUENCING METHODS:
AKA METHYLATION-SPECIFIC SEQUENCING
Chain termination sequencing designed to DETECT METHYLATED SEQUENCING CYTOSINE NUCLEOTIDES
2-4 ug of genomic DNA is cut with restriction enzymes to facilitate denaturation.
DNA is denatured (97C for 5 mins) and exposed to bisulfate solution (sodium bisulfite, NaOH,
hydroquinone) for 16-20 hours.
o Cytosines are deaminated –> uracil
o 5-methylcytosines are unchanged
o Can be detected by Sanger sequencing/ pyrosequencing
A. RNA sequencing
B. Next-generation sequencing
C. Direct sequencing: manual and automated
D. Bisulfite DNA sequencing
E. Pyrosequencing
Bisulfite DNA sequencing
SEQUENCING METHODS:
Early approaches: used RNase to cut end-labeled RNA at specific nucleotides
Other approaches:
o Based on amino acid sequence
o Based on sequencing of its complementary
DNA
A. RNA sequencing
B. Next-generation sequencing
C. Direct sequencing: manual and automated
D. Bisulfite DNA sequencing
E. Pyrosequencing
A. RNA sequencing
o Based on single-molecule sequencing
technology and virtual terminator nucleotides
mRNA is captured by immobilized polydT
oligomers (through their polyA tails).
o RNA without polyA tails: initial treatment with polyA polymerase
o 4 reversible dye-labeled nucleotides are
sequentially added.
Direct RNA sequencing
SEQUENCING METHODS:
AKA MASSIVE PARALLEL SEQUENCING
Designed to sequence LARGE NUMBERS OF TEMPLATES carrying millions of bases.
POWERFUL COMPUTER DATA ASSEMBLY SYSTEMS
(bioinformatics, computer software and support)
are required.
Require the preparation of a sequencing library
(sets of DNA fragments representing the regions to be sequenced).
A. RNA sequencing
B. Next-generation sequencing
C. Direct sequencing: manual and automated
D. Bisulfite DNA sequencing
E. Pyrosequencing
Next-generation sequencing
Collection of genes that have been grouped for testing, enabling simultaneous sequencing of all
genes (2 to >1000 genes).
GENE PANELS
TYPES OF GENE PANELS:
– target regions of SPECIFIC GENES known to affect treatment response, disease state, or clinical condition.
A. Very large panels (≥3000 genes)
B. Targeted panels
C. “Hot-spot” panels
C. “Hot-spot” panels
TYPES OF GENE PANELS:
–critical genes in particular diseases (hematological-cancer specific, solid-tumor specific).
A. Very large panels (≥3000 genes)
B. Targeted panels
C. “Hot-spot” panels
B. Targeted panels
TYPES OF GENE PANELS:
– diagnostic, prognostic, discovery purposes.
A. Very large panels (≥3000 genes)
B. Targeted panels
C. “Hot-spot” panels
A. Very large panels (≥3000 genes)
collection of DNA library
fragments (100-1000 bp) to be sequenced.
Sequencing library
SYNTHETIC SHORT dsDNA carrying sequences complementary to a single primer pair, which may contain short sequences that will ID the sample (indexing/bar coding).
Adapters
The regions to be sequenced are enriched by:
1. Probe hybridization
o Probes: biotinylated oligonucleotides complementary to specific gene regions.
2. Amplification with region-specific primers
(amplicon-based targeted libraries)
o Selected by multiplex PCR with gene- specific primers tailed with binding sites for a secondary primer sets.
Targeted Libraries
loss of library fragments from the sequenced regions.
Allele dropout
Sequencing Platforms:
- Indexed libraries (gene panels) are AMPLIFIED USING PRIMERS immobilized on
microparticles (BEADS) in aqueous oil emulsion using ADAPTERS on the library fragments complementary to the immobilized primers.
A. Sequencing by ligation
B. Ion-conductance
C. Nanopore sequencing
D. Reversible dye terminator sequencing
B. Ion-conductance
Sequencing Platforms:
o Captured/ amplified fragments are HYBRIDIZED to IMMOBILIZED on a SOLID SURFACE (FLOW CELL).
o Labeled nucleotides are applied to the flow
cell and incorporated into growing chains
by DNA polymerase at each polony
location.
A. Sequencing by ligation
B. Ion-conductance
C. Nanopore sequencing
D. Reversible dye terminator sequencing
D. Reversible dye terminator sequencing
Sequencing Platforms:
o Uses a POOL OF LABELED OLIGONUCLEOTIDES DNA LIGASE to identify the template sequence
through the known probe sequences.
A. Sequencing by ligation
B. Ion-conductance
C. Nanopore sequencing
D. Reversible dye terminator sequencing
A. Sequencing by ligation
Sequencing Platforms:
o DOES NOT REQUIRE FRAGMENTATION and
amplification of the template DNA.
o Each nucleotide can be identified by a disruption in current as it passes through the
pore.
o Also USED FOR DIRECT RNA SEQUENCING
A. Sequencing by ligation
B. Ion-conductance
C. Nanopore sequencing
D. Reversible dye terminator sequencing
C. Nanopore sequencing
DATA ANALYSIS:
optical signals are
translated to a nucleotide sequenced
BASE CALLING
Data Analysis:
Optical signals are translated to a nucleotide
sequence (BASE CALLING ), which is measured by
the ____, acceptable = 2-3 (100-1000-fold
certainty of a correct call).
Phred score
Data Analysis:
Each sequence is compared to a REFERENCE SEQUENCE
(“normal”) through ___________
read alignment
based on comparison with the reference
sequence (SNVs, indels, rearrangement
sequences, CNVs).
VARIAINT ID
Sequence variations from the reference are
arranged in a ______
variant call file (VCF)
performed for critical variants ID
ANNOTATIONS
ANNOTATIONS:
Confidence in the variant call is determined by
_______ and _________
sequence quality and coverage = at least 500x
(recommended).
Variants that remain after filtering may be
annotated by searching in disease-specific
databases:
- Cancer Genome Atlas (TCGA)
- Catalogue of Somatic Mutations in Cancer
(COSMIC) - My Cancer Genome
- Leiden Open (source) Variation Database
(LOVD) - Human Genome Mutation Database (HGMD)
Involves using computer technology (in silico) to
collect, store, analyze, and disseminate biological data and information (computational biology).
BIOINFORMATIICS
BIOINFORMATICS TERMINOLOGY:
The extent to which two sequences are the same.
Identity
BIOINFORMATICS TERMINOLOGY:
- The EXTENT TO WHICH TWO OR MORE SEQUENCES ARE THE SAME .Lining up two or more sequences to
search for the maximal regions of
identity in order to assess the extent of
biological relatedness or homology.
Alignment
BIOINFORMATICS TERMINOLOGY:
- Alignment of some portion of two sequences.
Local alignment
BIOINFORMATICS TERMINOLOGY:
- Alignment of THREE or MORE sequences
arranged with gaps so that common residues are aligned together.
MULTIPLE SEQUENCE ALIGNMENT
BIOINFORMATICS TERMINOLOGY:
- The alignment of two sequences with
the BEST DEGREE OF IDENTITY
OPTIMAL ALIGNMENT
BIOINFORMATICS TERMINOLOGY:
- Specific sequence changes (usually protein sequence) that maintain the properties of the original sequence.
CONSERVATION
- Established by National Institute of Health (NIH)
by JAMES WATSON
Primary mission (2.9 million) - To decipher the sequence of complete
human genetic material (entire Genome)
HUMAN GENOME PROJECT (HGP)
1st complete genome
sequence (1984)
Epstein-Barr virus
WHO completed the:
o 1st sequence of a free-living organism (Haemophilus influenzae)
o Sequence of the smallest free-living organism (Mycoplasma genitalium)
Craig Venter and colleague (Institute Genomic Research)
SEQUENCING APPROACH OF THE 2 PROJECTS:
- hierarchical shotgun approach
– to sequence from KNOWN REGIONS
NIH METHOD
SEQUENCING APPROACH OF THE 2 PROJECTS:
- whole-genome shotgun sequencing
– to sequence RANDOM FRAGMENTS
Celera (established by Venter)
1st chromosome to be
sequenced completely.
Chromosome 21
most GC-rich (66%)
Chromosome 2
fewest GC bp (25%)
Chromosome X
most gene-rich per unit length (23 genes/ Mbp)
Chromosome 19
OTHER GENOME OBJECTS:
Goal: to find BLOCKS of sequences that are
inherited together.
Revealed >1,000 disease-associated regions of the genome (coronary artery disease and diabetes).
Human Haplotype Mapping (HapMap) Project
OTHER GENOME OBJECTS:
Provides a RESOURCE of STRUCTURAL VARIANTS in different populations.
Over 88 million variants were verified: 84.7 million SNPs, 3.6 million short insertions/
deletions, and 60,000 structural variants.
1000 GENOME PROJECT