Human Genome Flashcards
human genome draft sequences
2001, completed 2004
- clone by clone ( sequencing back clones)
- celera genomics – whole genome shotgun sequencing from public data of human genome project to help assemble
human genome composition
-1.5% exons
-24% introns and regulatory sequences
- 15% unique non-repetitive non-coding DNA
-15% repetitive DNA not related to TE
{– 3% simple sequences
– 5% large segmental duplications}
- 44% repetitive DNA that includes TE and related sequences
Human genome genes
around 20,000 protein coding genes
– extensive alternative splicing
many over 100kb long
average coding sequence is 1300 bp
repeated and duplicated region of human genome
repeat rich genome
- Transposon derived repeats
- simple sequence repeats (3% of genome) (microsatellites)
– microsatellites dinucleotide repeats most common
segmental duplications
- inter and intra chromosomal
Tandem duplicates
- copy number variants
-15% human genes have been found to have copy number variants in at least one individual
Human chromosome features
centromeres usually AT rich
genes often in gene ‘islands’ =cluster of genes
lines and SINES not distributed equally
gene density not equally distributed
benefits of the human genome
identification of disease genes
will expand search for drug targets
can learn about sequence variation among individuals, and study the history of human populations with SNPs
-learn about vertebrate genome evolution
- asset to research on human genetics, cell biology, physiology, biochemistry, molecular evolution and population biology
- helped derive 2nd and 3rd gen sequencing techniques
human ENCODE project
determine every sequence with functional properties within human genome
- includes: genes, promoters, enhancers, repressor/silencer, exons, conserved sequences, TF binding sites, methylation sites, chromatin modifications, replication start sites, etc.
- many functional elements not conserved across vertebrates
- will help determine function of genes with currently unknown function
chimpanzee genome
first primate genome sequenced
reveals aspects of primate genome evolution compared to human
single nucleotide substitutions occur at rate of 1.23% compared to humans
orthologs protein coding sequences are similar
- 29% are identical, typically differ by 2 amino acids
incomplete linage sorting
describes a phenomenon in population genetics when ancestral gene copies fail to coalesce into a common ancestral copy until deeper than previous speciation events
- due to close split between several species
Human linage specific traits
-Brain size increase, increased cognition
-evolutionary advantage = creation of novel solutions to survival threats and improved social cognition
also
- endurance running - increased capacity to transfer energy from fat to muscle
– persistance hunting, increased food sources range, improved diet for brain evolution
Human lineage specific changes
cant do experiments so sequencing primate genomes to compare provides insight into genetic basis
- chromosomal rearrangements, fusion (human chromosome 2 between chimps and humans), segmental duplications
- mechanisms:
regulatory changes,
amino acid changes (mitochondrial metabolism),
copy number change
protein domain amplification
pseudogenization
expression changes
genome size comparasions
human- 3 billion bp Drosophila - 180 million chicken - 120 million Arabidopsis- 100 million C. elegans - 100 million
- find evolutionary conserved non-coding sequences like regulatory sequences
- studies of gene family evolution
- chromosomal rearrangements and karyotype evolution
- lineage specific genes – lineage specific traits
Rice genome vs Maize
rice - 430-460 Mb - 35% transposons of non-centromeric regions Maize - 6x size of rice - 2.3 gigabases - 85% TEs
rice most common gene
metabolism genes
Go analysis categories
- molecular functions
- biological processes
- cellular compartment