Genomics & Genetic Sequencing Flashcards
How frequent are SNPs?
~1 SNP per 1,000 nucleotides, between any two individuals
Any two genomes are separated by ~ 3 million SNPs - still 99.9% identical
Genome composition
1.5% protein coding
5% regulatory (gene expression, development)
20-25% genes (including regulatory sequences)
40-50% repetitive DNA (total)
50% unique (single copy) DNA (total)
*does not add up to 100% because of reasons
Short Interspersed Repetitive Elements (SINEs)
Ex: Alu family - ~300bp related members, 500,000 copies in genome
Long Interspersed Repetitive Elements
Ex: L1 family - ~6kb related members, 100,000 copies in genome
Satellite DNAs
Short repeat sequences organized tandemly (head-to-tail) and in clusters; comprise 10-15% of genome
Families vary with regard to location in genome, total length of tandem array, and length of constituent repeat units that make up the array
Hot spots for human-specific evolutionary changes
Satellite DNAs, including a specific pentanucleotide sequence, found as part of human-specific heterochromatic regions on the long arms of Chr 1, 9, 16, and Y
Alpha Satellite Repeats
171 bp repeat unit found near centromeric region of all human chromosomes; likely important for chromosome segregation in mitosis and meiosis
Alu family
Short interspersed nuclear element (SINE); related repeats, ~300 bp long, dispersed throughout the genome; ~10% of genome
L1 Family
Long Interspersed Nuclear Element (LINE); related repeats, ~ 6kb long, dispersed throughout the genome; ~20% of genome
Pseudogenes
DNA sequences that closely resemble known genes/gene families but are nonfunctional, either as a result of inactivation mutations (nonprocessed) or by retrotransposition (processed)
Retrotransposition
Process involving transcription, followed by reverse transcription and integration of that cDNA back into the genome; formation of non-functional, processed pseudogenes
Single Nucleotide Polymorphisms (SNPs)
A difference in a single DNA nucleotide base, within a particular gene, that gives rise to 2 discreet alleles; allele frequencies differ in different ethnic groups/populations
Simple insertion-deletion polymorphisms (indels)
Variation caused by insertion or deletion of segments between 2 and 100 nucleotides; gives rise to 2 discreet alleles - presence or absence of the inserted or deleted segment
Short Tandem Repeat Polymorphisms (Microsatellites)
Stretches of repetitive DNA consisting of units of 2-4 nucleotides, repeated between 1 and 30 times; different alleles result from differing numbers of repeated nucleotide units; often used as a marker in forensics
Minisatellites
A class of variable number tandem repeats (VNTR); results from insertion of varying numbers of copies of a DNA sequence 10-100bp in length; many alleles, due to variation in the number of copies of the tandem repeat
Variable Number Tandem Repeats (VNTR)
Variation in the number of minisatellites that are repeated in tandem
Copy Number Polymorphisms (CNPs)
Recurring deletions or insertions of larger sections of a chromosome, leading to gaps or duplications; may have two alleles (related to presence or absence of the copy) or multiple alleles (due to number of copies of the segment present)
Short Tandem Repeat Polymorphisms (STRPs)
Different alleles resulting from variation in the number of nucleotide repeat units within a microsatellite region
Polymorphic DNA markers
Scoreable differences at known genomic positions; most often microsatellites, SNPs, and CNVs
Haplotype Blocks
10-50 kb chromosomal regions located in-between rearrangement hot-spots, where recombination is rare; SNPs and marker alleles within blocks tend to be co-inherited and are said to be in linkage disequilibrium (LD)
Candidate gene association study
Hypothesis-driven approach relying on a priori biological or positional hypothesis; done by genotyping markers within a candidate gene and then comparing allele frequencies in cases versus controls; association does not prove causation by the associated variant but implies at least LD with casual mutation
Population stratification
Genetic variation within grossly homogenous populations; variation may be linked to disease and therefore cause false positive results in case-control candidate gene studies
Genetic-linkage studies
Used to study genome segments that are disproportionately co-inherited along with disease to determine if the loci are linked; “multiplex” families with multiple cases of a disease are studied to determine the frequency of recombination events between two loci of interest; a LOD score (logarithm of the odds) of +3 or greater is evidence that two loci are linked
Genome-wide association studies (GWAS)
Tests many markers (SNPs) across entire genome searching for significantly different allele frequences in cases vs. controls; can identify many genes that contribute to a certain disease
Same as candidate gene case-control association study but tests MANY markers across entire genome; requires larger sample size (>1,000 cases/controls) to correct for multiple testing problem
Can discover new mutations that contribute to disease
Chromosomal Analysis
Detects anomalies in chromosome number as well as large structural rearrangements (deletions, insertions, etc.) ex: Trisomy 21
FISH
Flourescence In-Situ Hybridization; flourescently labeled, locus-specific probe (several Kb) can identify micro-deletions or multiple copies of loci of interest in patient DNA (isolated in metaphase or interphase)
Detects chromosomal microdeletion/duplication, chromosomal rearrangements (in cancers), and gene copy numbers
Chromosomal microarray analysis (CMA)
Test and reference DNA samples are labeled with different colors, mixed, and affixed to an array; oligonucleotide probes containing DNA fragments can examine many different loci simultaneously; abnormal ratios of the colors are indicative of deletions or duplications
Cannot detect balanced translocations, inversions, or point mutations
May also detect benign CNVs (vs. test DNA) creating “noise” on the array
Diagnostic Testing
A positive genetic test result in a patient with signs or symptoms of a genetic disease can confirm an already suspected disease or diagnose the underlying and current disease
Predictive Testing
A positive genetic test result in a patient with NO signs or symptoms of genetic disease provides estimate of future disease risk
Chaperone-Based Therapy
Useful in genetic disorders characterized by a misfolded mutant protein; chaperone can help the mutant protein fold appropriately, restoring protein function
Ex: Small percentage of Fabry disease patients benefit from administration of galactose, a chemical chaperone
Retroviral Approach to Gene Therapy
Advantages: Vector can accomodate up to 8 kb of added DNA and the integrated DNA is stable
Limitations: Target cell must undergo division for integration of recombinant DNA into host genome; limited use in nondividing cells (i.e. neurons)
Adenoviral Approach to Gene Therapy
Advantages: Can accommodate inserts of 30 to 35 kb and infect dividing or nondividing cells
Limitations: One case of strong immune response resulting in death
Non-Viral approach to gene therapy
- Naked DN?a (ex: cDNA with regulatory elements in a plasmid)
- DNA packaged in liposomes
- protein-DNA conjugates
- Artificial chromosomes
Positional cloning
Identifies disease genes by first determining the gene’s location through linkage analysis, followed by attempts to identify the gene on the basis of it’s map position
Candidate gene sequencing
Hypothesis-driven approach depending on an a priori biological or positional hypothesis; gene is identified and sequenced directly
Case-control genetic association test
Compares allele frequencies between cases and controls in an ethnically-matched sample; association does not imply causation by the associated variant but does suggest LD with a causal mutation
Population stratification may lead to false positives