Eukaryotic genomics - 1 Flashcards
When was the first genome sequenced?
1990s
When was the human genome project?
1990-2003
When was the ENCODE project?
2003-2012 - more functional info than the human genome project
When was the 10,000 genomes project?
2012-present
How many protein coding genes are there?
20,000 protein coding gene - 4% of human genome
How man lncRNA genes are there in the human genome?
15,000
How many genes in prokaryotes compared to eukaryotes?
P - 700-5000 genes
E - 12,000- 3,200,000 genes
What is the difference in transcripton location in Eukaryotes and Prokaryotes?
E - mRNAs process in nucleus and translated in cytoplasm
P - mRNAs translated at same location as transcription
What is the difference in genome structure between eukaryote and prok?
P - circular plasmid
E - linear chromosomes
What proportion of the genome is protein coding?
The higher the eukaryote - the portion becomes smaller
What gene features are important to consider
splicing alternative splicing Alternative transcription start sites Alternative PolyA sites Alternative translation start sites
Facts about model organism Saccharomyces cerevisiae?
12.1 mil bp
16 chromosomes
31% genes have human orthologs
0.05 introns per gene
Facts about model organism Schizosaccharomyces Pombe?
14.1 mil bp
3 chromosomes
69% protein coding genes - human orthologs
0.9 introns per gene
Facts about model organism C. elegans?
103 mil bp
12 chromosomes
26% of genome is introns
Facts about model organism D. melanogaster
143 mil bp
60% of protein coding genes have human orthologs
2.5 introns per gene
Facts about model organism Zebrafish?
1.7 bil bp
25 chromosomes
69% of protein coding genes have human orthologs
Facts about model organism Must musculus?
3.5 bil bp
40 chromosomes
Facts about Human
3.6 bil bp 46 chromosomes humans are 99.8% similar to each other 8 introns per genes 95% genes alternative spliced
Facts about plant Arabidopsis thaliana?
136 mil bp
28000 p coding genes
5 chromosomes
4.8 introns per gene
Facts about plant Zea mays?
2.1 bil bp
390000 protein coding genes
20 chromosomes
85% genome = transposons
As complexity increases in eukaryotes…
coding % decreases
non-codifying % increases
repeat region % increases
What is the NGS workflow?
Purified RNA/DNA
Library prep
Lanes on flowcell
Clusters of each DNA molecule
Sequencing by synthesis - every time nucleotide added, different colour emitted
What is the RNA-seq workflow
Sequencing
Mapping/aligning reads to genome
calculate mRNA transcription levels
assemble transcripts de novo
build new mRNA transcript models
What is exome sequencing?
sequence 1% of genome - corresponds to protein coding content
85% of disease causing mutations identified are found in exam
What is ribosome profiling?
NextGenSeq of ribosome footprints
isolation of cyclohgeximide “frozen” mRNA-ribosome complexes
Ribosome footprinintg - RNaseI treatment
Footprint purification
Preparation for NGS linker ligation, RT, circularisation, PCR
28-34nt RNA fragment protected by ribosome
What are single cell genomics?
sequencing individual cells separately rather than population in bulk
What is nanopore sequencing?
direct sequencing - RNA or DNA of interest is being fed through the pore
change current as result of different sequences going through - can identify which nucleotide is going through based on changes in current
Long read vs short read sequencing?
short read - get a snapshot of what is connected with what