Week 3 Flashcards
Sequencing genomes
resulted in a shift from studying single or a few genes to studying all genes simultaneously
proteome and transcriptome
Genome
all DNA and identification of all DNA elements (transcriprion units)
Transcriptome
all transcripts expressed (list plus analysis of expression)
Proteome
all proteins expressed (list plus analysis and modification)
Large scale ORF finder
Looking for open reading frames in the bacteria
Simple for bacteria because of the fact that DNA contains the coding region that is not interrupted.
So you can go from DNA to the protein coding capacity of that DNA very simply.
We can’t do the same for the eukaryotic DNA.
Eukaryotes and ORF finders
not for most eukaryotes, we can’t go from the eukaryotic genome to the eukaryotic proteome that simply
Splicing
We need the transciptome to get the proteome of the genome.
transcriptome is
all expressed RNA: mRNA rRNA tRNA siRNA miRNA non coding RNA snRNA crRNA snoRNA
eukaryotic mRNA
exstensively processed
5’ prima cap
AUG first codon of ORF
Messenger RNAs are processed with the additon of a poly-A-tail that helps us annotate the proteosome
reverse transcriptase
the DNA copy is made with reverse trancriptase which requires a DNA primer. A common approach is to use an oligo dT primer that hybridizes with the poly A tail. therefore the total transcriptome is not represented
Before nanopore only DNA could be sequenced so RNA always had to be turned into a complementary DNA copy.
post translational processing
a barrier to annotating the genome
a primary transcript is processed, splicing, poly-a-tail and cap
Therefore anytime we make a complementary DNA copy we’re making a complementary copy of the mature mRNA after the intronic sequences are removed.
A large amount of the genome is not expressed: intragenic regions which are not trasncirbed, intronic regions that are transcribed but spliced out.
post translational processing
a barrier to annotating the genome
a primary transcript is processed, splicing, poly-a-tail and cap
Alternative Splicing
Genes undergo alternative splicing, when you align different cDNA sequences to the genome you find that some genes that these aligments are quite different from one cDNA to another
indicating that they came from transcripts that have undergone alternative splicing
This gene produces six distinct messenger rna transcripts.
That encode three distinct polypeptides.
When you align this sequence to drosophila DNA you ifnd six different patterns of alignments due to six different splicing patterns of the mRNA transcripts.
Alternaitve splicing increases the number of proteins that can be encoded by a single gene.
Types of splicing
alternative poly-a-tail sites alternative promoters Exon included or excluded Mutually exclusive inclusion. Alternative 5’ splice sites. Alternative 3’ splice sites. Retained intron
In some messages splicing occurs such that the intron remains in the mature mRNA, in other the mature mRNA the intron is removed.
RNA seq Two major goals
Count the relative number of transcripts in the sample.
Determine the structure of the transcripts in the sample.
Often done after they’ve converted the RNA to complementary DNA and sequenced the complementary DNA.
How do we get distinct cell types
differential gene expression
sc RNA seq goals
To determine the poly A+ transcriptome of individual cells
Useful in the study of development and human disease
sc RNA seq function
1-In drop single cell seq, suspension of cells, microparticles and lysis buffer,
2-mixed in a microfluorodics apparatus and encased into droplets by using oil, oil droplet contains a cell and microparticle
3-lysis buffer in the droplet lyses the cell releasing rna/dna
4-the poly adenlyted RNA is hybridized to a primer on the microparticle that contains oligodT.
5-Barcoded primer beads contains a unique sequence barcode sequence between the PCR handle and an oligodt tail
6-break droplets, reverse transcription with template switching formation of STAMPs
7-STAMPs are amplified by PCR so these microparticles that have these individuals cell barcodes and transcriptome attached are amplified. the amplified fragments are synthesized.
8-Generation of paired end reads.
One read goes through the cell barcode the other read goes through the cDNA
Illumina
Because they’re paired end reads we can tell which cell this cDNA comes from by read one having the cell barcode
9-Even though we are sequencing a complex PCR product from a multidtude of different stamps, each one of those microparticles had a distinct cell barcode that we can use to identify which cell the paired end reads came from.
Organize the data and ask which trasncripts are expressed in cell one.
Determine what genes are expressed in the cell and to what level., count the number of observed trancripts.
Changing pattern of gene expression through development
When you start off as a single cell you have one transcriptome, but as the cells specialize during development you start to get expression of different patterns of genes in the each cell.
Zebra fish development, single cell RNA seq during development on each cell and looking fro changes in gene expression (sequence transcriptome)
Each point is a cell, change in gene expression of the cells as they differentiate.
Proteome
Catalogue of the proteins expressed by an organism?
What proteins are unique to an organism or shared?
what is the function of the protein?
Information encoded in the genome.
All of the proteins encoded within a genome.
Function can be determined by taking advantage of the relatedness of all organisms.
What proteins are unique to an organism or shared?
all life is related; genes are shared
homologous genes can fall into two categories:
- orthologs
- paralogs
orthologs/paralogs
homologous genes in different species. have the same common ancestor
homologous genes in the same species. result of a duplication of a gene
What was the function of the protein?
if an orthologous protein is well characterized in one organism then it may be reasonable to proopose that all the orhtologous proteins share its function.
Complex proteins often contain conserved protein domains of known function like DNA binding for example. Therefore, conserved domains can suggest the biochemical function of the protein in the proteome.
Interactome
proteins interact with one another either in stabke complexes like RNA polymerase, or via transient interactions like initiation factors for translation.
An interactome is the result of a systematic analysis of the proteome.
Systematic analysis of interactomes
1-Yeast two hybrid screen
2-affinity purification and mass spectrometry
Yeast two hybrid screen
Gal4 binds to a uasgal4 and can drive gene expresion, we drive the expression of the reporter gene, beta galactosidase.
when yeast expresses b.galactosidas in the presence of chromogenic reagent the yeast cells will go blue.
yeast gal4 transcription factor is made up of two seperable domains, one is the DNA binding domain, the other is the activation domain
to have transcription AD must bind to DBD
Yeast two hybrid screen steps
- seperate the DNA binding domain from the DNA activation domain, if these domains are expressed independently of one another so not fused to one another there will be no expression of beta galactosidase
- fuse the dbd to the bait protein and fusw the prey protein to the ad
- if these proteins interact you will get blue colonies
Affinity purification
-add an ap tag to a protein and pass the protein mixture through a column, the protein will bind to the column containing the ligand that binds the ap tag and bind the protein along with it, binding them and the protein they are attached to the column
How can genomes vary?
Genome size
Genome content/number
Genome structure/shape
Genome type
Genome type/shape/peices
RNA or DNA
Circular or linear
of peices
double stranded or single stranded
genomes can be complex structures
a mixture of linear and circular or linked circles
Advantages of genome structure/shapes
circular dna is easy replicate; go around the circle
linear is hard to replicate at the ends (telomeres), the 3’ ends are difficult to replicate, they get shorter
Advantages of genome structure/shapes
circular dna is easy replicate; go around the circle
linear is hard to replicate at the ends (telomeres), the 3’ ends are difficult to replicate, they get shorter
Telomerase
reverse transcriptase
telomerase has an RNA template embedded in it that it can use to elongate the 3’ end creating a set of repeats
some chromosomes have circular telomeres
What is a genome? What is genome size?
a set of genetic instructions within a biological compartment
the length of those instructions
Units of size of the genome
nucleotide (single stranded)
base (both single and double stranded)
base pair (double stranded)
bp Units
1 bp
1000 bp = kilobase (kb)
1,000,000 bp = megabase (Mb)
1,000,000,000 bp = gigabase (Gb)
What is genome size
One full haploid set, don’t measure duplicated information
Count the length of one chromosome for identical chromsomes, add up the length for the different chromosomes.
Complexity
does complexity increase with genome size or gene number?
Bacteria and archae
genome size is linearly related to gene number.
the larger size of the genome the more genes
Who has the biggest genome?
plants have an immense variation in genome size
single cell amoeba have a very large genome
genome size is not associated with complexity
What do big genomes have that little genomes don’t have?
non-coding DNA
Differences in gene to base ratio
As genome size increases the precent of non ciding genome increases
The percent non coding varies within the genome.
Gene deserts.
Variation between organisms in the terms of the amount of noncoding DNA they have is due to the race to replication, some organisms that need tro replicate more quickly have lost noncoding regions of their DNA
Is gene number related to complexity?
nope
we have this notion of implicit phylogeny that somehow because we possess specific qualities that some species are less evolved than we are.