Genome Organization Flashcards
How many base pairs are in the haploid human genome sequence?
3x10^9 bp
How many chromosomes are in the human genome?
46
Where are the chromosomes located?
Nucleus
How many pairs of human chromosomes are there?
Autosomes_____
Sex Chromosomes_____
22 autosome pairs
1 sex chromosome pair
so
23 pairs in total
Does each chromosome have more than one DNA strand?
No, each chromosome is believed to consist of a single continuous DNA double helix
How do we generally number the chromosomes?
Chromosome numbering is generally based on size, with smaller chromosomes being higher numbers
e.g.
Chr1: 245 million bp
Chr22: 49 million bp
In what sense is the human genome a record of human evolutionary history?
Reflects the results of different selection pressures…these pressure have shaped our genome
In terms of evolution, which gene do we retain?
Adaptive ones :)…
Thus, many that were maladaptive were not retained
A + B = phenotype
What are A and B?
Genotype (genome) + environment
What is the fuel of genomic (and thus all) evolution?
Random variation
In general, random variation in a highly ordered structure, such as the human genome, is almost always __________
Deleterious
What is the price that we pay as a species to have a genome that can evolve, i.e. adapt to changing environments
Genetic disease
Again, random variation in a highly ordered structure is almost always deleterious. Almost!
Is the human genome static?
No! it is dynamic and continues to change and evolve
Approximately how many new mutations occur in each individual?
30
What properties of meiosis allow for genetic diversity?
Independent assortment and shuffling of regions during recombination
The human genome is dynamic, constantly shuffling and changing, is this true for both germ line and somatic cells?
Yes
Germ line cells shuffle DNA during recombination
Somatic cells also produce DNA changes, but these too can be deleterious (e.g. cancer is a disease of “genome instability”)
What is cancer a disease of?
Genome instability
Is there a “human genome”
There is no “one” human genome, there are many (billions of different) human genomes
How frequent are SNPs in the human genome
Average of 1 SNP every 1000 bp between any two randomly chosen unrelated human genomes
What percentage of the human genome is identical?
Around 99.9%
Leaving about 3,000,000 differences :)
Is the human genome organized in a random manner?
No…
there are gene rich regions
there are gene poor regions
Which chromosome is a gene rich chromosome?
19
What are the smallest chromosomes (in terms of gene content)
13, 18, 21 (not counting Y)
What special potential does having limited gene on chromosomes 13, 18, and 21 confer?
Viable trisomies
Is the majority of the genome stable or unstable?
Stable, but there are unstable regions
What diseases are associated with unstable regions of the genome?
Many
e.g.
Spinal muscular atrophy (5q13)
DiGeorge syndrome (22q)
Which chromosome has a particularly large number of diseases associated with unstable regions on it?
Chromosome 1q21.1
Chromosome 1q21.1 is associated with how many diseases?
12 disease are associated with this unstable region!
Are there regions that are particularly rich in certain base pairs?
Yes
GC rich regions
AT rich regions
GC rich regions comprise about what percent of the genome?
38
AT rich regions comprise about what percent of the genome?
54
Do we see clustering of GC and AT rich regions?
Yes! This is the basis for chromosomal banding patterns (cytogenetics, karyotype analysis)
What is the basis for chromosomal banding patterns
Clustering of GC and AT rich regions stain differently, producing unique banding
G-banding (Giemsa staining)
Do chromosomal size and gene content align?
Not really
2 Strategies for genomic sequencing
- Construct clone map then sequence clones…assemble
- Sequence shot put … let computer assemble
Combo works best
What part of the genome does the human genome sequenced so far focus on?
Euchromatic regions
What are many of the remaining euchromatic gaps associated with?
Segmental duplications
Have we sequenced the condensed (heterochromatic) regions of the genome?
No, essentially unsequenced
What component of the human genome is protein coding (translated)
1.5%
What percentage of the human genome is represented by genes (including exons, introns, flanking sequences involved in regulation, etc.)?
20-25%
What percentage of the human genome are “Single copy” sequences?
50%
What percentage of the human genome is made up of “repetitive DNA” = sequences that are repeated hundreds to millions of times?
40-50%
Have we fully sequenced the euchromatic portion of the genome?
No there are still many sequence gaps (>200) that remain…many of which are associated with segmental duplications
What characterizes euchromatic regions
more relaxed
What characterizes heterochromatic regions?
more condensed / repeat rich
2 broad Classes of repetitive DNA?
Tandem repeats
Dispersed repetitive elements
Tandem repeats are as known as
“satellite DNAs”
What protocol are tandem repeats used for?
Cytogenetic banding
What are C-bands?
Specific tandem repeats - a particular pentanucleotide sequence - that is found as part of specific heterochromatic regions on the long arm (q) of chromosome 1, 9, 16, and y that are a hotspot for human-specific evolutionary changes
What is special about C-bands?
They are a hotspot for human-specific evolutionary changes (only found in human genome)
What are alpha satellite repeats?
Another example of tandem repeat
171 bp repeating unit
Where do we find alpha satellite repeats?
near centromeric regions
What might alpha satellite repeats be important in?
chromosome segregation in mitosis and meiosis - (remember they are close to centromeric region)
What are the main dispersed repetitive DNA elements? (2)
Alu family (SINES) L1 family (LINES)
Length and Frequency of short interspersed repetitive elements (SINEs)
300bp; 500,000 copies in genome
Length and Frequency of Long Interspersed repetitive Elements (LINEs)
6kb; 100,000 copies in genome
Medical relevance of Lines and Sines
Retrotransposition (e.g. of Alu’s and L1’s) may cause insertional inactivation of genes if pooped back into a detrimental location
What is retrotransposition / retrotransposed genes?
A portion of the mRNA transcript of a gene is spontaneously reverse transcribed back into DNA and inserted into chromosomal DNA (no introns)
LINES and SINES are examples of what class of genomic structure?
retrotransposons
LINES AND SINES can become pseudogenes, what are pseudogenes?
like a gene but no longer have associated promoter
Repetitive DNA elements, like lines and sines, can facilitate aberrant recombination events, what is the significance of this?
recombination events between different copies of dispersed repeats leads to non-allelic homologous recombination (NAHR) which results in allelic loss on and gain participating chromosomes
Duplication rich genome architecture promotes NAHR and disease…in what way?
Leads to microdeletion and microduplication… if the region contains dose sensitive genes, disease may result.
Types of human DNA variation (4 broad categories)
Insertion deletion polymorphisms (Indels)
Single nucleotide polymorphsisms (SNPs)
Copy number variation (CNV)
Chromosomal (large scale)
2 types of Indels
minisatellites
microsatellites (STRs)
Minisatellites
A type of Indel polymorphism
- tandemly repeated 10-100 bp blocks of DNA
- Variable number tandem repeats VNTR
Microsatellites (STRs)
di, tri, tetra-nucleotide repeats
5x10^4 per genome
HOW FREQUENT ARE SNPs?
1 / 1000 bp
Copy number variation (CNV) size?
how many extra copies?
variation in segments of genome from 200bp to 2MB
Can range from one additional copy to many
How to we analyze CNV in genome?
Array comparative genomic hybridization (ACGH)
DNA variation consequence
can be silent (majority)
or
have functional defect
What are gene families?
Gene families are families of genes composed of genes with high sequence similarity (e.g. >85%) that may carry out similar but distinct functions
Are gene families clustered or dispersed
some are clustered and some are dispersed
Where do gene families come from?
Gene families arise through duplication
Gene duplication is a major mechanism behind_____
evolutionary change
Rationale behind gene duplication and evolutionary change ->
when a gene duplicates it frees up one copy to vary while the other copy continues to carry out a critical function
duplications frequently co-localize with what?
disease?
In the broadest sense, what it genome “Structural variation?”
all changes in the genome not due to single base-pair substitutions
What is the primary of genomic structural variation?
copy number variations (CNVs)
Up to what percentage of the genome may CNV loci cover?
12%
CNVs are implicated in an increasingly large number of what?
diseases
Short tandem repeats and
Variable number tandem repeats
are example of which type of genomic variant?
Insertions/deletions (Indels)
In addition to CNVs and Indels, what other types of structural genomic variants do we see? (3)
Inversions
Duplications
Translocations
genomic variation is most commonly __________ and rarely ___________
detrimental
beneficial
What protocol do we use to analyze SNP?
PCR detectable markers
What percent of the genome is comprised of segmental duplications?
around 5
What defines a segmental duplication?
> 10kb
>95% sequence similarity
Segmental duplications are often located adjacent to?
Human genome sequence gaps
Segmental duplications are responsible for much of what?
Much of the dynamic nature of the genome - clustered near some hotspots (e.g. cBand on chromosome 1 is right next to segmental duplication)
How do we study / measure genomic DNA copy number alteration?
cDNA microarray (arrayCGH)
Can compare e.g. hominid vs human..
Fluorescence ratios are depicted in a pseudocolor scale, such that red indicates increased and green indicated decreased gene copy number compared to the reference
Does array CGH measure DNA or RNA?
DNA
Simplifying, what are the 3 steps for arrayCGH?
Label genomic reference green and test red
Cohybridize to microarray
Signal color illustrates relative expression level
When we use arrayCGH to look at copy number variation, which chromosomes have significant human specific gene duplications?
1,2,5,9
Regions of the genome where there are significant human specific gene duplications also happen to correspond to regions where…..
Why
Regions where many of the gaps in the genome sequence are
Because duplicate rich nature makes hard to assemble and sequence
What specific region of which chromosome did we focus on that was associated with 12 different human diseases?
1q21
This region has copy number variations that have found in 12 different human diseases
There’s also a human specific inversion here
There is also a human specific c-band (constitutive heterochromatin) in this region
Which key sequence is highly duplicated in 1q21?
Duf1220
protein coding domain
How many copies of Duf1220 are there in region 1q21?
over 200…wow
Amplification of Duf1220 in Human Evolutions…
_____________ specific copy number expansion, which progressively increased from __________ to __________ to ___________
Anthropoid specific expansion
monkey to ape to human
Which genedomain exhibits the greatest human-specific copy number expansion of any protein coding sequence?
Duf1220
What is the primary cause of Human increase in Duf1220?
Domain hyperamplification
What analysis of Duf1220 illustrates positive selection in primates
Ka/Ks analysis -
Do we have more genes that have Duf1220?
NO!
Humans don’t really have more genes that have Duf1220, rather they have markedly increased expansion of Duf1220 domain sequence in similar numbers of genes (NBPF genes)
Which genes hole Duf1220 domain?
NBPF
Neuroblastoma breakpoint genes
What could account for why Duf1220 region of genome has been associated with so many diseases?
There are a lot of other genes that are non Duf-encoding nearby
Remember NAHR
Well, Duf genes can serve as recombination focal points - catalysts for disease basically
The Duf genome architecture serves as a facilitator for:
genome changes/variation, many of which can be disease relevant (e.g. macro and microcephaly)
Duf1220,Brain evolution, and Disease
Increased 1q21.1 instability led to _______(advantageous outcomes)
Increased Duf1220 copy number
Duf1220, Brain evolution, and disease
Increased Duf1220 copy number led to?
Evolutionary advantage (increased brain size?)
Increased 1q21.1 instability deleterious outcomes?
1q21.1 duplications –> macrocephaly / autism
1q21.1 deletions –>
microcephaly / schizophrenia
Implications of highly dynamic genome…with regards to genome assembly
No genome is completely sequenced and assembled
- some regions are either missed or too complex and duplication rich to assemble correctly with current methods
Do all regions of the genome look and behave similarly
No!
We have rapidly changing and complex genomic regions
Rapidly changing complex genomic regions and disease?
Implicated in increasing number of genetic diseases
Rapidly changing complex genomic regions and sequencing?
unexamined by available sequencing and genotyping platforms… major current challenge for medical genetics
Highly dynamic genome and “missing heritability” implication?
GWAS implicate loci that account for only a small % of expected genetic contribution for many complex diseases
Genetic technology that could help with complex genome?
long read sequencing - because if you have a repeat - the sequence read might be long enough that the repeat will evenutally end and you will enter some single copy region which will anchor the repeat to a single copy part of the genome so you will know where the repeat is located
GWAS are usually ______ based
SNP
GWAS what do they do?
find association to certain part of the genome with a particular disease, however, when they go back and look at that regions contribution to the phenotype, the contribution is often very low (not significant enough to cause severe phenotype)
Key takeway from lecture
All regions of the genome are not created equal
CNV regions involved in rapid and recent evolutionary change often are enriched for human specific ______________
gene duplications
CNV regions involved in rapid and recent evolutionary change are often enriched for genome___________
sequence gaps
CNV regions involved in rapid and recent evolutionary change are often enriched for recurrent ___________-
human diseases
So there is a link regarding CNV regions, between ______________ and _________________
examples
evolutionary adaptive copy number increases
and
increase in human diseases
1q21. 1,
9p13. 3
9q21. 12
5q13. 3