Genomics & Genetic Sequencing Flashcards
How frequent are SNPs?
~1 SNP per 1,000 nucleotides, between any two individuals
Any two genomes are separated by ~ 3 million SNPs - still 99.9% identical
Genome composition
1.5% protein coding
5% regulatory (gene expression, development)
20-25% genes (including regulatory sequences)
40-50% repetitive DNA (total)
50% unique (single copy) DNA (total)
*does not add up to 100% because of reasons
Short Interspersed Repetitive Elements (SINEs)
Ex: Alu family - ~300bp related members, 500,000 copies in genome
Long Interspersed Repetitive Elements
Ex: L1 family - ~6kb related members, 100,000 copies in genome
Satellite DNAs
Short repeat sequences organized tandemly (head-to-tail) and in clusters; comprise 10-15% of genome
Families vary with regard to location in genome, total length of tandem array, and length of constituent repeat units that make up the array
Hot spots for human-specific evolutionary changes
Satellite DNAs, including a specific pentanucleotide sequence, found as part of human-specific heterochromatic regions on the long arms of Chr 1, 9, 16, and Y
Alpha Satellite Repeats
171 bp repeat unit found near centromeric region of all human chromosomes; likely important for chromosome segregation in mitosis and meiosis
Alu family
Short interspersed nuclear element (SINE); related repeats, ~300 bp long, dispersed throughout the genome; ~10% of genome
L1 Family
Long Interspersed Nuclear Element (LINE); related repeats, ~ 6kb long, dispersed throughout the genome; ~20% of genome
Pseudogenes
DNA sequences that closely resemble known genes/gene families but are nonfunctional, either as a result of inactivation mutations (nonprocessed) or by retrotransposition (processed)
Retrotransposition
Process involving transcription, followed by reverse transcription and integration of that cDNA back into the genome; formation of non-functional, processed pseudogenes
Single Nucleotide Polymorphisms (SNPs)
A difference in a single DNA nucleotide base, within a particular gene, that gives rise to 2 discreet alleles; allele frequencies differ in different ethnic groups/populations
Simple insertion-deletion polymorphisms (indels)
Variation caused by insertion or deletion of segments between 2 and 100 nucleotides; gives rise to 2 discreet alleles - presence or absence of the inserted or deleted segment
Short Tandem Repeat Polymorphisms (Microsatellites)
Stretches of repetitive DNA consisting of units of 2-4 nucleotides, repeated between 1 and 30 times; different alleles result from differing numbers of repeated nucleotide units; often used as a marker in forensics
Minisatellites
A class of variable number tandem repeats (VNTR); results from insertion of varying numbers of copies of a DNA sequence 10-100bp in length; many alleles, due to variation in the number of copies of the tandem repeat
Variable Number Tandem Repeats (VNTR)
Variation in the number of minisatellites that are repeated in tandem
Copy Number Polymorphisms (CNPs)
Recurring deletions or insertions of larger sections of a chromosome, leading to gaps or duplications; may have two alleles (related to presence or absence of the copy) or multiple alleles (due to number of copies of the segment present)
Short Tandem Repeat Polymorphisms (STRPs)
Different alleles resulting from variation in the number of nucleotide repeat units within a microsatellite region
Polymorphic DNA markers
Scoreable differences at known genomic positions; most often microsatellites, SNPs, and CNVs
Haplotype Blocks
10-50 kb chromosomal regions located in-between rearrangement hot-spots, where recombination is rare; SNPs and marker alleles within blocks tend to be co-inherited and are said to be in linkage disequilibrium (LD)
Candidate gene association study
Hypothesis-driven approach relying on a priori biological or positional hypothesis; done by genotyping markers within a candidate gene and then comparing allele frequencies in cases versus controls; association does not prove causation by the associated variant but implies at least LD with casual mutation
Population stratification
Genetic variation within grossly homogenous populations; variation may be linked to disease and therefore cause false positive results in case-control candidate gene studies
Genetic-linkage studies
Used to study genome segments that are disproportionately co-inherited along with disease to determine if the loci are linked; “multiplex” families with multiple cases of a disease are studied to determine the frequency of recombination events between two loci of interest; a LOD score (logarithm of the odds) of +3 or greater is evidence that two loci are linked
Genome-wide association studies (GWAS)
Tests many markers (SNPs) across entire genome searching for significantly different allele frequences in cases vs. controls; can identify many genes that contribute to a certain disease
Same as candidate gene case-control association study but tests MANY markers across entire genome; requires larger sample size (>1,000 cases/controls) to correct for multiple testing problem
Can discover new mutations that contribute to disease