Week 5 Flashcards
alzheimers
runs in families but is not a strictly mendelian disease
amyloid plaques and tangles upon autopsy are diagnostic
APP gene—chromosome 21
but then chromosome 14 as well
-14 and 21: early onset, extremely rare
evidence for linkage with chromosome 19
multipoint linkage
used in alzheimers paper
generates multipoint lod score
tells us locus is on chromosome 19
APOE gene
resides on chromosome 19
bonds A beta 4
APOE alleles
- two amino acid residues at 112 and 158
- 2 different snps—possibility of 4 different alleles, but only 3
linkage disequilibrium
present when the frequency of the combinations of alleles are not as one would expect by chance
mutation history of APOE
cause of LD is history of the mutations
ApOE4 is ancestral, mutation to APOE3, mutation to APOE2
4 documented humans with APOE1 (arg 112, cys 158)
choices of cases and controls (alzheimers study)
early and late onset families
familial and sporadic alzheimers, as well as autopsy cases
ethnic stratification, black and native americans excluded
dna sequencing history
how to determine order of bases?
maxam and Gilbert first, then Sanger
- invented dideoxy or chain termination method of DNA sequencing
- lead to human genome and many other genomes to be sequences
ddNTPs
will be incorporated by DNA polymerase, but cannot be extended because they lack the 3’ OH group
hence, chain termination
deoxyribonucleoside triphosphate
allows chain extension at 3’ end (OH there)
then add small amount of dideoxyribonucleoside triphosphate
leads to rare incorporation, etc.
Sanger sequencing
start with single stranded purified dna of interest
one primer to the dna of interest
mix polymerase and dNTPs, add ddGTP in one tube, ddATP in 2nd tube
if ddATP is incorporated, no further polymerization can happen
sequence is read by separating…
di-deoxy sequencing by fluorescent capillary electrophoresis in numbers
4 colors in parallel —300-900 bases read
results come out with QC measurements and are computer readable
dna has to be clean in hand—pcr amplified or cloned—for sequencing
one primer specific for the dna needed—often same primer as for pcr
still used for low volume, high accuracy work
the human genome project
1990: sequencing all 3x10^9 bases of human dna—compared to landing on the moon
started with sequencing model organisms
existing technologies made more efficient (capillary instead of slab gels, robotics)
20,000 genes
22 autosome, X and Y chromosomes
genes make up about 2 percent of the genome
average gene size
- 3000 bases
- 252-2.4 mega bases
human genome project goals
identify all genes in human dna
Identify sequences of 3 billion chemical base pairs that make up human dna
store this info in data bases, data analysis
transfer related tech to the private sector, and address ethical and other issues
criticism of human genome project
much came from scientists
not hypothesis driven research—trudging through sequences
agreement that genome project not a US monopoly
last generation sequencing
Sanger sequencing
no change in chemistry since then
changes increased throughout by using more robotics, faster machines, etc.
next generation sequencing
solexa sequencing—now illumina
-454-then roche, etc.
3rd generation sequencing after this
Illumina genome analysis
- Prepare gDNA library (fragment genomic dna)
- Generate clusters
- Sequence clusters
- Data analysis
higher error rate than for Sanger sequencing
Paired end sequencing
Sequence cloned from both ends
- doubles run time
- allows matching of two sequences
Allows us to suspect deletion of dna, or insertion—larger rearrangements can be detected
Multiplexing
Add barcodes to adapters to uniquely identify each sample
- sequences of 4-6 bases of high quality
- guards against batch effects
Complete genomics/BGI
company in US—they do sequencing and analysis for you
Nanoball rolling circle with imaging of small pcr amplified fragments
Short reads, high fidelity
much mixing between Econ and science—currently beating all others in price/base sequencing
454
Based on same principle of pyrosequencing by BIOTAGE
synthesis releases PP which is converted into light which can be measured
Add nucleotides one by one, light emerges every time (add a nothing happens, add g and see light means c,…)
Ion torrent—now Thermofisher
reaction in very small chip
measure not light, but ph changes that change electricity (H+ ion)
Nucleotides flow sequentially over ion semiconductor chip, detects when a match occurs
Scalability, small machines, quick, but not for whole genome analysis typically
Third generation sequencing
PacBio
Oxford nanopore
PacBio
Long fragments sequences after crediting circle with adapter
Advantages
- quick, phasing of snps, …
- long reads
- Generate amplicon
- Ligase adaptors (dna circle now)
- Sequence
- Data analysis
limited by how long fragment you can keep stable
high error rate since no error correction
Oxford nanopore
Like ion torrent, measures electricity
-recognizes electric signature when nucleotide incporated
Like PacBio, measures single molecule
Sequence capture
Bead capture
-before sequencing, hybridize and elite the dna you want to sequence
Most popular: Exome capture
-captures all known human exons and surrounding dna regions
-custom capture
Applications of capture/sequence
Expression
-takes all mRNA of a cel type, reverse transcriptase, sequence, ends, etc.
Chip-seq
De novo sequencing
-don’t know what caused disease, sequence all things person ate
Resequencing after capture