Human Genome Organization Flashcards
size of haploid human genome sequence
3 x 10^9 bp
Human genomic DNA is distributed on __________ chromosomes
46
The chromosomes are found in __________ pairs:
1) 22 \_\_\_\_\_\_\_\_\_\_ 2) 1 \_\_\_\_\_\_\_\_\_\_
23
autosomes (1-22)
pair of sex chromosomes (XX or XY)
Each chromosome is believed to consist of a __________
single, continuous DNA double helix
Human Genome
- is a record of human evolutionary history
- Reflects results of different selection pressures that have occurred over evolutionary time and shaped our genome (and shaped us)
- Genes and genomic features that have been adaptive have been retained
Genotype + environment = __________
phenotype
Random genomic variation
- is the fuel of evolution
- Random variation in a highly ordered structure = almost always deleterious consequences
- Genetic disease is the price we pay as a species to continue to have a genome that can evolve, i.e., that can adapt to new and changing environments
The human genome is __________ and continues to evolve
- ~30 new mutations occur in each individual
- Shuffling of regions at each meiosis due to __________
- Can produce __________ DNA changes as well as __________ DNA changes
dynamic
recombination
somatic
germ-line
Organization of the genome
- Gene-rich regions/chromosomes
- Gene-poor regions/chromosomes
- Stable regions: majority of genome
- Unstable, dynamic regions; many are disease-associated
- GC-rich regions (38% of genome), AT-rich regions (54% of genome)
__________ (i.e. non-random distribution) of GC-rich and AT-rich regions is basis for chromosomal banding patterns (cytogenetics, karyotype analysis)
Clustering
Clustering (i.e. non-random distribution) of GC-rich and AT-rich regions is basis for __________ (cytogenetics, karyotype analysis)
chromosomal banding patterns
Euchromatic regions
more relaxed
heterochromatic regions
more condensed; more repeat-rich
Genome sequencing effort focused on __________ region
euchromatic
__________ regions essentially unsequenced
Heterochromatic
Genome composition
1) 1.5% is translated (protein coding)
2) 20-25% is represented by genes (exons, introns, flanking sequences involved in regulating gene expression)
3) 50% “single copy” sequences
4) 40-50% classes of “repetitive DNA” - Sequences that are repeated hundreds to millions of times
Tandem repeats
i.e. “satellite DNAs”
examples of tandem repeats
- Some are in different parts of genome, e.g. used as the basis for cytogenetic banding
- Some (a particular pentanucleotide sequence) are found as part of human-specific heterochromatic regions on the long arms of Chr 1, 9, 16 and Y (hotspots for human-specific evolutionary changes)
- “α-satellite” repeats (171 bp repeat unit) found near centromeric region of all human chromosomes; may be important to chromosome segregation in mitosis and meiosis.
Dispersed repetitive elements
- Alu family
- L1 family
- Alu’s and L1’s can be of significant medical relevance
In dispersed repetitive elements, __________ may cause insertional inactivation of genes
Retrotransposition
In dispersed repetitive elements, __________ may facilitate aberrant recombination events between different copies of dispersed repeats leading to diseases
repeats
Alu family
e. g. of SINEs: Short Interspersed repetitive Elements
- ~300 bp related members
- 500,000 copies in genome
L1 family
e. g. of LINES: Long Interspersed repetitive Elements
- ~6 kb related members
- 100,000 copies in genome
Insertion-deletion polymorphisms
Minisatellites
Microsatellites
Minisatellites
- tandemly repeated 10-100 bp blocks of DNA
- VNTR (variable number of tandem repeats)
Microsatellites
- di-, tri-, tetra-nucleotide repeats
- ->5 x 104 per genome
- STRPs (Short Tandem Repeat Polymorphisms)
Single Nucleotide Polymorphisms (SNPs)
- frequency of 1 in ~1000 bp
- PCR-detectable markers, easy to score, widely distributed
Copy number variations (CNVs)
- variation in segments of genome from 200 bp – 2 Mb
- can range from one additional copy to many
- array comparative genomic hybridization (array CGH)
- loci may cover 12% of genome
Gene family is
composed of genes with high sequence similarity that may carry out similar but distinct functions
__________ arise through gene duplication, a major mechanism underlying evolutionary change
Gene families
Some CNV regions are involved in rapid & recent evolutionary change. Such regions are often
- enriched for human specific gene duplications
- enriched for genome sequence gaps
- enriched for recurrent human diseases
Role of genome architecture
Link between evolutionarily adaptive copy number increases and increase in human disease
Limitations of genome sequencing and genotyping platforms
Nextgen DNA sequencing and Genome-wide association studies (GWAS)
Nextgen DNA sequencing
– No mammalian genome has been completely sequenced & assembled
– Nextgen sequencing relies on short read sequences
—Complex, highly duplicated regions are typically unexamined
—Such regions are implicated in numerous diseases, e.g. 1q21
Genome-wide association studies (GWAS)
– “Missing heritability” for complex diseases: Many large-scale studies implicate loci (e.g. SNPs) that account for only a small fraction of the expected genetic contribution
– Many regions of the genomes are unexamined by available “genome-wide” screening technologies: is this where the “missing heritability” lies?