Genome organization Flashcards
number of human chromosome pairs
23; 22 autosomal, 1 pair sex
base pairs in haploid genome
3X10^9
what percent of the genome codes proteins
<1.5%, only about 5% contains regulatory information
point of attachment for cell microtubules for cell division
centromere
centromeric DNA sequence
alpha satellite family, and appx 171-bp unit critical for the attachment of microtubules of the spindle apparatus
the Alu family and the LINE (L1) family
two salient examples of repetitive DNA sequences that have implications in disease
define: single nucleotide polymorphism
occurs when a single nucleotide differs in a sequence that ultimately accounts for two alleles.
average of 1 SNP for every 1000 bp
the three gene poor chromosomes
13, 18, 21; these are the only ones that can have viable life
unstable dynamic regions
common mutations, so common sites for disease
eg. SMA (Chr 5q13); DiGeorge syndrome (Chr 22q); 12 diseases (1q21)
the dynamic nature of the human genome
~30 new mutations for each new individual, Shuffling of regions at each meiosis due to recombination
Can produce somatic DNA changes as well as germ-line DNA changes
Gene-rich regions/chromosomes
chromosome 19
the non-random distribution) of GC-rich (38% of genome) and AT-rich (54%) regions is called:
clustering. this is the basis for chromosomal banding patterns (cytogenetics, karyotype analysis)
euchromatic regions
more relaxed
heterochromatic regions
more condensed; repeat rich
percentage of DNA represented by genes (exons, introns, flanking sequences involved in regulating gene expression)
20-25%
percentage of DNA that is “single copy” sequences
50%
percentage of DNA that are classes of “repetitive DNA”
40-50%
how big is the dystrophin gene (on the X chromosome)
big, over 2 million base pairs, less than 1% are coding exons. mutations lead to Duchenne muscular dystrophy
very large arrays of tandemly repeating, non coding DNA
satellite DNA; the main component of functional centromeres (alpha satellite), and heterochromatin, or even in human-specific heterochromatic regions on the long arms of Chr 1, 9, 16 and Y (hotspots for human-specific evolutionary changes)
of SINEs: Short Interspersed repetitive Elements
Alu family, - ~300 bp related members
- 500,000 copies in genome
of LINES: Long Interspersed repetitive Elements
L1 family,- ~6 kb related members
- 100,000 copies in genome
Repeats may facilitate aberrant ______________ between different copies of dispersed repeats leading to diseases
Non-allelic homologous recombination (NAHR)
Retrotransposition may cause __________ inactivation of genes
insertional
giemsa banding
A,T rich regions (gene poor) take up the dark stain,
Insertion-deletion polymorphisms (indels):
Minisatellites and microsatellites
Minisatellites
tandemly repeated 10-100 bp blocks of DNA
VNTR (variable number of tandem repeats)
dozens in the genome
Microsatellites
-di-, tri-, tetra-nucleotide repeats
->5 x 104 per genome
STRPs (Short Tandem Repeat Polymorphisms)
thousands in the genome
Copy number variations (CNVs)
- variation in segments of genome from 200 bp – 2 Mb
- can range from one additional copy to many
- array comparative genomic hybridization (array CGH)
- implicated in lots of diseases
typical gene
intron containing
retroposed gene
intronless
evolutionary hotspots on chr 1, 9, 16, Y
a particular pentanucleotide sequence on human specific heterochromatic regions
SNPs can easily be detected by
PCR
describe gene families
Gene family is composed of genes with high sequence similarity (e.g. >85-90%) that may carry out similar but distinct functions
how do gene families arise?
through gene duplication, thus is a major mechanism for evolutionary change
beta globin # of exons? mutations can cause?
3 exons; hemoglobin deficiencies
BRCA1 # of exons? mutations can cause?
24 exons; inherited breast and ovarian cancer
MYH7 # of exons? mutations can cause
40 exons; inhereted cardiomyopathy
CNV loci may cover ________% of the genome
12
examples of CNV implicated in human disease
1q21.1; 9p13.3-9q21.12, 5q13.3
CNV example of a Link between evolutionarily adaptive copy number increases and increase in human disease
1q21.1 macrocephaly, autism
1q21.1 deletions
microcephaly, schizophrenia
genome wide association studies
focuses on SNPs
nextgen sequencing limitations
relies on short read sequences; Complex, highly duplicated regions are typically unexamined. Such regions are implicated in numerous diseases, e.g. 1q21
explain “missing heritability”
Many large-scale studies implicate loci (e.g. SNPs) that account for only a small fraction of the expected genetic contribution. answer may lie in unexamined DNA