Molecular genetics 13-18 Flashcards
What is a gene?
A DNA sequence (or RNA in some viruses) that is transcribed into RNA along with all the sequences to control its expression
Features of prokaryotic genes
- No nucleus
- Usually circular dsDNA
- Gene’s in operons (several open reading frames encoded from one mRNA)
- Simple regulation
What is an example of regulation by inhibition in prokaryotes? How many proteins are involved?
Trp operon for tryptophan biosynthesis - linear pathway with 5 different proteins carrying out three different enzymatic reactions
What happens in the linear pathway producing tryptophan when there is a lot of tryptophan present?
The tryptophan inhibits the production of the second enzyme
What is feedback inhibition example in the tryptophan biosynthesis pathway?
Accumulation of tryptophan slows down the rate of catalysis of the first enzyme complex (trp D/E), so reduces the overall rate of production
What is transcriptional regulation?
The presence of high tryptophan concentration reduces transcription of the operon
Why is it important that tryptophan concentrations are not allowed to get too high?
Tryptophan is a toxin in high concentrations
When tryptophan is low/absent:
- trpR (trp repressor protein) inactive as no tryptophan
- trpR can’t bind to the operon promoter
- transcription not blocked
- trp operon is expressed
When tryptophan is present:
- trpR is constitutively expressed
- trpR protein binds to tryptophan (co-repressor) and forms an active repressor
- Active repressor blocks transcription of trp operon
- No pathway expression
What is an example of an inducible operon in prokaryotes?
The lac operon - contains genes that code for enzymes used in the hydrolysis and metabolism of lactose
Is the lac repressor active or inactive by itself?
Active
What inactivates the lac repressor?
A molecule called an inducer (lacI)
What is lacY responsible for producing?
Lactose permease
What is lacZ responsible for producing?
B-galactosidase
What is lacA responsible for producing?
Acetyl transferase
When lactose is absent:
The lac repressor is active and switches the lac operon off
When lactose is present:
The repressor is inactive as it forms a complex with allolactose (inducer), preventing repression and allowing expression of the genes
Is the lac repressor usually completely inactive?
No, often there is not enough lacI for complete repression, so there is leaky expression
What is lacIq?
A mutation in the lacI promoter region causing increased transcription and so higher levels of lacI protein, so the lacZ/Y/A promoter is more strongly repressed
What is lacI
The regulatory gene responsible for producing the protein that represses the lac operon from being transcribed
What is the other condition needed for the breakdown of lactose, other than lactose being present?
Only occurs if glucose is absent
What further regulation is needed so the lac operon is only transcribed if glucose is absent?
Carbon catabolism regulation
What is the link between cAMP levels and glucose levels?
Cyclic AMP is present in low levels if glucose concentrations are high.
Cyclic AMP is present in high levels if glucose concentrations are low
What does CRP stand for?
Cyclic AMP Receptor Protein
How does CRP affect the transcription of the lac operon?
When cAMP accumulates in low glucose levels it binds to and activated the CRP protein. Active CRP helps bind RNA polymerase to bind to the promoter to cause it to transcribe the protein.
What are the ideal conditions for the lac operon to be transcribed?
Lactose present
Glucose absent
What are sigma factors?
Transcription activators that enable specific binding of RNA polymerase to gene promoters
Why is the lac operon repressed by default?
Lactose may only rarely be present, and is a second-choice carbon source
What do sigma factors do?
Help RNA polymerase to bind to promoters
Dictate the transcription start?
Activate/amplify transcription
Housekeeping rpod sigma-70 for constitutive genes
What is a housekeeping gene?
A constitutive gene that is transcribed at a relatively constant level, required for the maintenance of basic cellular function and expressed in all cells of an organism under normal conditions
What is a constitutive gene?
A gene that is transcribed continually as opposed to a facultative gene, which is only transcribed when needed
What are pathway-specific sigma factors?
They activate gene families for effective expression
What genes are activated by a specific sigma factor when a bacteria is given a heat shock?
rpoH: heat-shock genes fecI: iron uptake rpoS: starvation/stationary phase rpoN: nitrogen starvation rpoF: flagellar genes for motility
What is Quorum sensing?
For coordinating gene expression between individuals - bacteria communicate using chemicals
What is AHL?
Acyl homoserine lactone signal
How does Quorum sensing work?
- Constitutively produce AHL
- When concentration is high, receptor protein activated, switches on transcription of all virulence genes
- This plays a major role in disease
What are LuxR-type R proteins?
Involved in producing luciferin - easy to measure as visible to scientists
When does translation start in bacteria?
Before transcription has finished
What is polarity in bacterial expression?
Usually more protein of the first ORF made than the later ORFs for operons, due to translation starting before transcription has finished
What eukaryotic gene properties are not shared with bacteria?
- Chromatin
- mRNA processing: introns, 5’ cap, 3’ poly A tail
- Transport of mRNA out of nucleus
- Uncoupled transcription and translation
- miRNA/silencing
What factors change the rate of overall expression?
1) Rate of transcription
2) Rate of mRNA degradation
3) Rate of translation
4) Rate of protein degradation
5) Chromatin accessible?
What features affect the rate of transcription?
- Each ORF has its own promoter
- Genes usually not clustered by function/pathway
- Often in different chromosomes
- Eukaryotic genomes have an enhancer (upstream, downstream or within the coding region)
- PolyA tail dictates how far back mRNA gets processed
- Promoter elements can be immediately or several thousand based downstream of the gene
- Repressors can block in various places along the DNA strand, blocking transcription
- Activators are expressed when repressors are not present. The activators bind to the enhancer region, recruit DNA bending, recruit general Tsc transcription factors and recruit mediators
What is the polyA tail also known as?
The ‘terminator region’
Is the polyA tail encoded in the genome?
No, it is added enzymatically
What are general transcription factors?
Essential for the transcription of all protein-coding genes
What are specific transcription factors?
High levels of transcription of particular genes depend on control elements interacting with specific transcription factors
What are wide domain areas that specific transcription factors can affect?
Carbon, nitrogen, pH
What are narrow-domain areas that specific transcription factors can affect?
Specific metabolic pathways
What prevents mRNA from degradation?
5’ cap and 3’ polyA tail stabilise the DNA reducing degradation
3’ polyA tail helps transport the mRNA out of the nucleus
5’ UnTranslated Region (UTR) and 3’ UTR help define stability
Why is it important to have stable RNA?
More stable RNA gets translated more
What is miRNA?
Micro RNA
Short, non-coding RNA molecules
Bind to specific mRNA (complimentary)
Recruit RNA endonuclease enzymes
Digest specific mRNA (or several related mRNA)
Part of the gene silencing pathway using DICER and RISC
How is the rate of translation altered through initiation of translation?
Different mRNAs have different 5’ UnTranslated Regions (UTRs) before the sequence. These different UTRs have different affinities for ribosome binding and cause either high or low levels of translation to occur
What is a Kozak sequence?
A varying sequence around the start codon which plays a major role in the initiation of translation. Certain bases are more likely to appear in the sequence and lead to a higher rate of translation, such as A or G at the -3 position and C at the -1 position
What does translation rate depend on?
Ribosome binding
Translation enhancers
Codon usage
What is codon usage?
The use of genetic redundancy to allow the control of translation.
Different codons for the same amino acid are used in different frequencies. For optimal expression use the common codons and avoid the rare ones
How are proteins targeting for destruction?
They are linked with Ubiquitin (Ubiquitous protein)
Cross-link to targeted protein - many ubiquitous needed
Directs its movement to proteasome (protein complex involved in hydrolysis of a protein)
Triggers it’s digestion
Ubiquitin gets recycled
What can make DNA inaccessible for transcription?
Condensed chromatin
What is histone acetylation?
Acetyl groups are attached to an amino acid in a histone tail. This appears to open up the chromatin structure, thereby promoting the initiation of transcription
What is histone methylation?
Adding methyl groups to amino acids. This can condense chromatin and reduce transcription
How does acetylation of lysine residues affect transcription?
Causes chromatin to be looser, better transcription
How does methylation of histones affect transcription?
DNA more condensed, less transcription
What are the group of enzymes that acetylate lysine amino acids?
Histone acetyltransferases (HATs)
What are the class of enzymes which remove acetyl groups from lysine?
Histone deacetylases (HDACs)
What are the group of enzymes which methylate histones?
Histone methyl transferases
What are the group of enzymes which remove methyl groups from histones?
Demethylases
What can happen when DNA methylation goes wrong?
Cancer
What is epigenetics?
Heritable inactivation if genes
What does SAHA stand for and what does it do?
suberoylanilide hydroxamic acid (SAHA)
Inhibits HDAC - chromatin stays acetylated longer so maintains expression
What does 5AC stand for and what does it do?
5 azacytidine
Inhibits histone methyl transferase, leaving the DNA less condensed so maintaining expression
How to prevent RNA from being degraded by RNAses as you work
- RNAse-free solutions and disposables
- Work clean, fast (if exposed to enzymes, no time for degradation) and cold (enzymes not at optimum temperature)
- RNAse-drew DNAse to remove any remaining DNA
- Chaotrophic salts - disrupt protein structure so RNAse enzymes not active
- Wear gloves
What does TOTAL RNA include?
mRNA, rRNA, tRNA, snRNA, miRNA etc.
What is a Southern Blot?
Run DNA on gel, shoes size of fragment
What is a Northern Blot?
Run RNA on gel, blot and probe for gene of interest, because no discrete lines produced on gel, just a smear. Shows size and abundance of fragment (although can only run one known gene at a time)
How to separate the mRNA from all the other RNAs?
It has a polyA tail (AAAA)
Add beads of oligo(dT) (TTTTT sequence) to the RNA mixture, mRNA will stick to the beads whereas others will not. The mRNA can be eluted into different salt concentration solution as this will change its affinity
What is the name of the process which converts mRNA to cDNA?
Reverse transcription
Where do you find reverse transcriptase activity?
In retroviruses
How to convert mRNA to cDNA (copyDNA)
Prime the mRNA with oligo(dT) which will bind with polyA tail and the entire population of mRNA will undergo reverse transcription
How is the cDNA cloned after it is produced?
Using adapters
What is an example of how the cDNA is modified after it has been cloned?
Using site-directed mutagenesis
How can you amplify RNA?
You can’t use PCR to directly amplify DNA
First the mRNA population must undergo reverse transcription into cDNA, then PCR can amplify the DNA of interest using gene-specific primers
What is qPCR?
Quantitative PCR
Measures the amount of product per cycle
What dye is usually used for qPCR?
SybrGreen - fluoresces when bound to dsDNA
Fluorescence measured after each cycle of amplification
What is the qPCR expressed as?
2^(DeltaCT)
More effective version of Northern Blot - testing all genes at once - RNA quantification by hybridisation of an array
Put bits of every gene on a support and probe with labelled RNA
Chips are hybridised to the labelled transcripts
Signal from each spot is measured
Shows transcript abundance for every gene on the array
What is the sequencing-based method that has superceded RNA quantification by hybridisation
Prepare cDNA from chosen condition
Sequence lots of individual molecules
Assess what is expressed and in what abundance
cDNA cloning - the old fashioned method
Clone cDNA into plasmids, sequence it clone by clone Called ESTs (Expressed Sequence Tags) which represent portions of expressed genes
What type of sequencing is used to sequence RNA directly?
Next-generation sequencing
Features of GFP
- From jellyfish Aequorea victoria
- Simple barrel shaped protein
- Excited by 385 or 480nm
- Emits at 509nm
- Needs no other substrate except oxygen
What is a reporter gene?
A gene that researchers attach to a regulatory sequence of another gene, to determine its rate of expression
Examples of commonly used reporter genes
B galactosidase (lacZ)
Glucuronidase (gusA)
Luciferase (luc)
Green fluorescent protein (GFP)
What is promoter bashing?
Analysis of possible control elements by deletion
How to determine where a protein will go?
Sequence tags within peptides target proteins to specific organelles
“Leader sequence” directs protein - N terminus first 15-30 amino acids direct protein to secretion machinery
To get protein to the ER there is a KDEL/HDEL sequence at the C terminus
To get protein to the nucleus there are 5 positive basic residues
What is a Western Blot?
Using a specific antibody to quantify a specific PROTEIN
What is SDS?
Sodium dodecyl sulphate
A strong detergent that denatures proteins so they are linear
They can then undergo electrophoresis
What is Poly Acrylamide Gel Electrophoresis?
Separates proteins by molecular weight
Different mobility if modified by glycosylation, phosphorylation and acetylation
What are the limitations of Poly Acrylamide Gel Electrophoresis?
Only suitable for soluble proteins
Cysteine-cysteine bonds may require reducing
Limited by antibody availability
What is another way of separating proteins by size that is not Western Blotting? (1)
SDS-PAGE gel
Separate proteins by pH gradient with electric charge across it. Protein will move until it is at a pH where it has no charge - depends on size
Uses isoelectric focussing
What is another way of separating proteins by size that is not Western Blotting? (2)
Protein Mass Spectrometry
Separate proteins by size or hydrophobicity through gel electrophoresis
Feed into mass spectrometer
Find accurate mass of each protein (to 5 dp)
Determine its sequence identity from its mass
Why would you add 6 histidine to the start or end of a protein?
To use them as a hook to purify specifically this protein
The 6 histidine tag binds to Zn+
This acts as an EPITOPE
What is an epitope?
The pet of an antigen molecule to which an antibody attaches itself to
What are the benefits of sequencing a genome?
-Co-located genes may form pathways
-Compare genomes and see different mutations
Identify candidate genes close to a genetic marker associated with a trait
When was the first genome sequenced and by who? How many bases did it have?
In 1977 by Sanger and his colleagues.
It consisted of 5375 nucleotides
What organism was the first to have its genome sequenced?
Phage phi X 174 (bacteriophage)
How many genes could be sequenced per year by one person in the early 1990s?
10 genes
One person could only sequence 1500 bases per day
When was Sanger sequencing developed and how many bases could then be sequenced per day?
Late 1990s
240,000 bases per day
Features of the Human Genome Sequencing Project
1990-2003
3.5 billion bases
Cost more than $3 billion
Factory-scale sequencing
What type of sequencing was developed in 2006 and how many bases can it sequence per run?
Next generation sequencing
Can sequence 1000 billion bases per run
What is the other name for next generation sequencing?
Illumina sequencing
Outline the traditional genome sequencing approach (What is its other name?)
c2001
Called hierarchical shotgun sequencing
1. Genome DNA cut into large fragments, producing a BAC library (each 300 kb fragment like small extra genome)
2. Using radioactive hybridisation of these clones they are organised into large clone contigs. You can work out which fragments overlap with each other
3. Select the BAC to be sequenced
4. Break up selected BAC into smaller pieces - a ‘shotgun clone’
5. Reassemble sub-fragments back into order by working out the sequence of each shotgun clone and which other sequences it overlaps with
BAC meaning?
Bacterial Artificial Chromosome
What is a contig?
A set of overlapping DNA segments that together represent a consensus region of DNA
What is the downside of hierarchical shotgun sequencing (traditional sequencing approach)
Highly labour intensive
Outline next generation genome sequencing
- Fragment genome - sonicate into random overlapping fragments
- Sequence fragments and assemble
- There may be gaps with low coverage, but 99.9% high coverage
What is K-mer based assembly?
All the sequences created in next generation sequencing must then have their ends sequenced to see if there is any overlap between sequences. Illumina sequences the 100 bases on either end of the sequence to find overlaps. It would take to long to compare all 100 bases from the end of one sequence to the 100 from all the other sequences present, so computer breaks up these sequences into smaller fragments. The computer puts each fragment in a particular memory address and finds overlaps between short fragments going all the way along the 100 bases. E.g. k=25 looks for overlaps of k-1=24.
The problem of repeats in shotgun sequencing in eukaryotes
Applied particularly to next generation sequencing as there are no large BAC clones to help order sequences. The repeats between genes can be so long that the computer does not know what gene comes first after the repeat
Solution to the problem of repeats in next generation sequencing (1)
Illumina mate-pair libraries
Several kilobases long, can be used to span repeats. Only the ends are sequenced.
Difficult to make mate-pairs
Solution to the problem of repeats in next generation sequencing (2)
Oxford Nanopore
Sequences are typically several kb in length, and the entire sequence can span a repeat
What size can eukaryotic genomes range to?
16 Gb
How do we find the open reading frame within a DNA sequence?
Feed the sequence into a computer, which will translate the sequence into 3 possible forward and reverse frames. The gene will be found between a Methionine start amino acid and a STOP codon.
What type of DNA does ORF finding work well for and why?
Prokaryotic DNA, as they have no introns or repeats
What type of DNA does ORF finding work less well for and why?
Eukaryotic DNA, because genes are interspersed by non-transcribed gaps, repeats and introns. Introns break up coding sequences
How can codon usage help identify the real open reading frames?
Some codons are more commonly used to encode a specific amino acid in a gene than others. Reading frames with the more commonly used codons for a particular amino acid are more likely to be found within a coding region, whereas non-coding DNA will use all codons equally
What other feature of eukaryotic DNA allows identification of genes?
Eukaryotic genes have conserved splice sites.
Eukaryotic introns tend to start with AGGTAAGT and end with YYYYYYNCAG (Y = pyrimidine C/T, N = any base)
How to confirm ORF expression after finding the ORFs
Use RNAseq
- Extract nucleus acids from sample
- Use oligo dT to extract mRNA
- Use Illumina to sequence mRNA
- Gene expression profiling: use computer to map RNA reads back onto the genome
- Align RNA to a reference and count expression levels
What does BLAST stand for?
Basic Local Alignment Search Tool
What is BLAST?
A database containing every known gene that has ever been sequenced
How can you use BLAST to find out more about the gene you have identified?
The sequence you have found can be compared to the database and the gene name/function can be worked out through similarities to other known genes
What is BLASTN?
A specific type of BLAST tool for comparing DNA sequences with other DNA sequences
Query: nucleotide
Database: nucleotide
What is BLASTX?
A specific type of BLAST tool for comparing a translated nucleotide to known proteins
Query: translated nucleotide
Database: protein
How to build BLAST hits:
- Start with one word match (a word is 11 nucleotides by default or 3 amino acids) - a ‘seed’
- If possible, extend the alignment either side of the word match. If there are no matches, find another seed and start again
- If there are enough hits to pass the threshold value, return an alignment to the user
When interpreting BLAST results, what is an E-value?
The number of matches as good/better than the results expected by chance - smaller sequences likely to have a larger E-value
What must you beware when analysing BLAST results?
E-value cut off at 10
E-values greater than 0.00001 are not considered reliable
Sequence similarity doesn’t prove functional homology
Why is BLASTX so useful?
- Finding a protein match helps to confirm that the DNA you have sequenced is expressed
- Matches to proteins can show up possible introns in genomic sequence
- Protein sequences are more likely to have useful annotation than DNA sequences
Why is it beneficial to search within a certain subset of organisms when using BLAST to compare your gene?
Speeds up the search as there are less sequences to search through
Increases sensitivity - reduces likelihood that same pattern has been found due you chance
What is genomics?
The study of genomes
What is metagenomics?
The study of multiple genomes in complex (usually environmental) samples
What is the problem with growing microbes in labs?
<1% will grow in culture or they grow really slowly over years
Bacteria are also so small that it can be near impossible to determine the species under a microscope
In metagenomics, how does one determine what species are present?
Sequence a marker gene
What gene is usually sequenced to determine what bacteria and archaea are present and how are the variable regions amplified?
Partial 16s ribosomal RNA gene (16s = 16 Svedberg)
V1, V2 and V3 are not conserved so are variable between species
There are conserved regions between the variable regions so you can synthesise primers complementary to the conserved regions
What is a 16S rRNA pipeline?
- Extract DNA from sample e.g. blood, faeces, soil, slime, dust etc.
- Target region of 16S gene which can categorise which bacterial gene is present using forward and reverse primer
- PCR carried out which amplifies variable regions of all genes present
- Each sample contains multiple genomes in a complex mixture - amplicon pool
How is Illumina sequencing made cost-effective when sequencing an amplicon pool?
Barcoding and multiplexing
- Add a unique barcode to sample 1 amplicon primers
- Amplify 16s rRNA from each sample and include a sample specific barcode in the forward primer
- Can sequence up to 96 barcoded samples can be ran at once
- Output: 400 million sequences (4 million sequences per sample) which gives a good snapshot of microbial diversity in that sample
How to process 16s rRNA data
- Start with millions of sequences
- Cluster sequenced together to make OTUs (Operational Taxonomic Unit)
- Assign OTUs to: domain/class/order/genus/species using a BLAST search
- Different taxonomic levels come back, as not all sequences can be assigned to specific species
How to discover what the species in the sample are doing?
Sequence genomic DNA using shotgun sequencing
Outline the process of shotgun sequencing to sequence genomic DNA from sample
- Extract DNA from sample. Each sample contains multiple genomes in a complex mixture
- Fragment DNA into 500bp fragments and sequence with Illumina
- The fragmented pieces are assembled into genes (contigs) and the number of sequences in each contig is noted
- Identify the contigs: BLAST search them to a database of known proteins
Why can only 4 DNA samples be multiplexed at once as opposed to 96 RNA samples?
DNA genomes are much larger, and more data is needed to cover multiple whole genomes
What is 16s rRNA sequencing useful for?
Taxonomic composition of samples
Overall diversity
Differences between samples due to factors of interest