Landscape of the human genome Flashcards
Genome - general
- 3.2 billion bases
- 4.8% unknown
- 0.14% unplaced
- 21,000 genes
Genome subdivided
genes and gene-related sequences
- genes
- gene-related sequences
- pseudogenes
- gene fragments
- introns, UTRs
intergenic DNA
- interspersed repeats
- LINEs
- SINEs
- LTRs
- DNA transposons
- other intergenic regions
- microsatellites
- etc
Genes and gene-related
genes = 1%
gene-related = 36%
Gene-related
- introns
- UTR
- promoters
- RNA genes
- pseudogenes
- gene fragments
tRNA genes
- 100 genes in multiple chromosomes
- some in large clusters
Ribozymes
- cleave other RNAs
- enzymes made of RNA
- single strand = interacts with DNA, binds RNA
Ribonucleoproteins
- ribozymes that form complexes with proteins
- process nucleic acids
- ribosomes, splicosomes, telomersase
Many RNA genes are involved in
antisense regulation of mRNA
- encoded in same segment of DNA as mRNA
- if read off in opposite direction can bind to mRNA made from same region of DNA (reverse complement)
- binds to argonaute = chops mRNA
micro RNA
miRNA
- 22nt
- hairpin cleaved out
- exported from nucleus, cut in 2, double stranded
- regulation of gene expression in post-transcription
- siRNA
- RNAi
- in cytoplasm
long regulatory ncRNAs
- regulaiton of gene expression
- anti-sense regulators
Antisense regulation
long DS RNA
→ dicer makes siRNA
→ SS RNA binds reverse complement
→ guide argonaute complexes to:
- cleave matching mRNA transcripts
- methylate DNA near newly synthesizing mRNA
- regulates how much mRNA makes protein
piwi protein interacting RNA (piRNA)
- from long RNA precursor
- active in germline cells, regulation of transcription expression
small nucleolar RNA
(snoRNA)
- maturation of rRNA
- site specific methylation
- uridine → pseudouridine
RNA interference
mature mRNA in cytoplasm
→ DS precursors of siRNA and miRNA bind dicer
→ made into short segments
→ short DS RNA binds argonaute
1 strand remains bound to argonaute (guide strand) to make RISC
- siRNA = perfect complement to target mRNA
- miRNA imprecise = targets hundreds
→ argonaute cleaves mRNA, degraded
Pseudogenes and gene fragments
- partial gene sequences that don’t code viable proteins
- mutated copies of functional genes
- just exons
- gene followed by pseudogene that looks like it with inverted copy → transcribed to hairpin loop that’s chopped by dicer → 21nt siRNAs that cause cleavage of other mRNA transcripts
LINEs
encode 2 proteins
- ORF1p = RNA binding and nucleic acid chaperone
- kept safe
- ORF2p = reverse transcriptase and endonuclease
- DNA, into genome
SINEs
- ALU elements
- inverted repeates on right and left arms = RNA folds back on itself
LTRs
- retrovirus, human endogenous retroviral sequences (HERVs)
- 6-11 kbp (long) → protease, reverse transcriptase to move around, reshuffle
- 1.5-3 kbp (short) → fewer/no genes
DNA transposons
- cut and paste
- large autonomous transposons
- encode transposase
- small non-autonomous transposons
- min inverted repeat transposable elements
- encode miRNAs
- no transposase
Microsatellites
- simple sequence repeats
- length evolves (replication slippage)
- affects gene expression
- pop gen to see differences between closely related individuals because evolve so quickly
Mitochondrial genome
- 66% protein-coding genes
- thousands of copies in every cell
- shorter and more gene-rich than nuclear genome
- 2 places for replication to start → transcribe in both directions
- own tRNA
- some genes overlap = 2 genes encoded by the same bit of DNA
- more than 1 ORF, all functional
ENCODE
- claims 80.4% of genome with biological function
- 147 types of tissue
methods
- RNA seq
- CAGE
- RNA-PET
- ChIP-seq
- DNase-seq
- FAIRE-seq
- RRBS
RNA-seq
isolate RNA sequences, sequence
CAGE
cap analysis gene expression
- captures methylated cap at 5’ end of RNA
- sequence adjacent tags → transcriptional start sites
RNA-PET
- paired end tags
- captures RNA with 5’ methyl cap and poly(A) tail
- = full length RNA, sequence end tags
ChIP-seq
- regions of crosslinked chromatin
- selected with specific antibody
- sequence regions most often bound by the protein (in chromatin)
DNas-seq
- DNaseI cuts DNA where accessible
- sequence cut regions to show open chromatin
FAIRE-seq
- freeze and pull out DNA with bound protein to see what part of DNA binds protein
RRBS
- methylation affects function
- unmethylated = C → U
- U where unmethylated C
- methylation protects C from becoming U
- see methylation sites