molecular biology (exam prep) Flashcards
basics
What are the characteristics of life?
-maintain integrity (boundaries)
-store information
-perform and regulate metabolism (energy)
-interact with other cells, signals and environments
-replicate/divide
Life: C-based and DNA-based
What are the life domains?
bacteria
archaea
eukarya
viruses
Comparison of bacterial and eucaryotic genomes
bacteria:
-are circular
-no teleomeres
-on the cytoplasm
-contain plasmids
-wraps around HU proteins
-located within an operon
-no mrna post transcirptional mods
eucaryotic:
-resides in the nucleus
-linear
-larger genome
-usually no plasmids
brewers yeast has plasmids
both
-has mitocondiral chloroplasts
origin of present day mitocondria
Endosymbiosis: bacterial cell engulfed by eukaryotic cell and evolve together.
comparison of genome sizes
the complexity of the organism doesnt always comprehend with the size of the genome
3 types of staining to visualise
-binding a molecule to a specific organelle structure
-binding an antibody
-GFP staining (green fluoresent)
modularity
an ordered assembly of amino acids that have already formed from atoms. Modularity allows evolution to occur by forming components that can be individually modified.
polymers
nucleic acids
proteins
lipids
polysac
monomers
nucleotides
amino acids
fatty acids
macromolecules
carbohydrates
lipids
protein
nucleic acids
Draw a structure of nucleic acids
(DNA, RNA)
nucleoside
example uridine
nitrogenous base and 5 carbon sugar
Similarites and differences DNA and RNA
similarities
DNA:
-base T
-double-stranded
-relatively stable
-information storage
-usually one
-deoxyribose sugar
RNA:
-single-stranded
-unstable
-base U
-many functions eg transport, enzymatic etc
the hypothesis the RNA proceeded DNA
DNA replication
semi conservative
According to the semiconservative model, after one round of replication, every new DNA double helix would be a hybrid that consisted of one strand of old DNA bound to one strand of newly synthesized DNA.
Then, during the second round of replication, the hybrids would separate, and each strand would pair with a newly synthesized strand. Afterward, only half of the new DNA double helices would be hybrids; the other half would be completely new.
okazaki fragments?
the short lengths of DNA that are produced by the discontinuous replication of the lagging strand.
dna replication in bacteria
what occurs during initiation of protein synthesis?
the initiation of DNA replication takes place at the origin (ori c) in E.coli.
begins negatively supercoiled
9mer and 13 mer are a+t rich regions, thus less hydrogen bonds. (melt at lower temps and have less hydrogen bonds compared to G C)
10 or 20 monomers of DNAa (inihiator protein)
binds to 9 mer regions. 9 mer region wraps around DNA a monomer this induces the A+t rich region to unwind. (open complex)
DNAC helicase loader loads DNA b to begin unwinding DNA.
dna is repilcated at the repilcation fork, is bidirectional and semi discontinous
primosome
a protein complex responsible for creating RNA primers on single stranded DNA during DNA replication
7 proteins:
DnaG primase, DnaB helicase, DnaC helicase assistant, DnaT, PriA, Pri B, and PriC
how can primosomes be monitored in vitro?
using fluorophores, a fluoresent chemical that can remit light upon light excitation
technqiue: smfret
main initation proteins in e.coli
DnaA (initiator), DnaB (helicase), DnaC (loader), DnaG (primase)
DNA Pol III Holoenzyme synthesises both strands (5’→3’)
*DNA Pol I replaces RNA primers on lagging strand with DNA
*DNA ligase fills the gaps
topoisomerase in e.coli
catalyzes the relaxation of negatively supercoiled DNA
purines
draw structure
adenine
guanine
purimidines
thymine
cytosine
uracil
differences between dna replication in eukarya vs bacteria
What occurs in a reverse transcriptase reaction?
Reverse transcription involves the synthesis of DNA from RNA by using an RNA-dependent DNA polymerase.
The DNA strand is not identical to the og
function of teleomeres
bacteria with teleomeres
Linear chromosomes => telomeres *short DNA repeats (e.g. TTAGGG) *G-rich strand + C-rich strand *every round of replication loses up to 200bp
Streptomyces
milestones:
Be more familiar with some important milestones in the development of molecular biology
Be able to discuss the Lac operon and how that relates to our understanding of genetic regulation
Recognise how molecular biology underpins biotechnology and can contribute to medicine, agriculture and basic science
Recognise links to fields such as gene editing and synthetic biology, which have grown out of molecular biology
Name 3 important milestones in molecular biology
-“jumping genes”
-lac operon
-pcr
what did barbra mcclintock discover?
the discovery of transposable elements in maize “the jumping genes” .
known as the AC/DS system, chromosome breakage occurs at the dissociation and is regulated by the activator which can also provide its own tranposition.
thus replication is not always linear
and the disruption caused by them on chromosome 9
what is the lac operon?
it is a group of genes that with a single promotor that encodes genes for the transport and metabolism of lactose in bacteria (e.coli)
what is the basic process of a functioning lac operon?
sugar production: allosterically repressed.
transcription only occurs when there is lactose ready to be digested
a repressor blocks the production of the
the operon contains 3 types of lactose enzymes (lac C, lac Y, Lac Z) which but if IPTG promotor is added it will bind to the repressor and allow the production to occur
cis elements that bind transregulators
binding sites for proteins:
promotor
operator
cbs-cap binding site
a major trans regulator encoded by the operon is LACI this is the repressor that binds to stops the polyerase binding to the repressor
e.coli’s lac z as a reporter gene
when cleaved by the β-galactosidase enzyme it produces blue product.
thus denoting if a gene is expressed or not
other reporter genes
firefly luciferase gene in plant cells and transgenic plants
reading genes
sanger
writing genes
chorana
alpha helix
– 3.6 amino acyl residues per turn; 2.3 Å helix radius.
*Most common helix in proteins.
* Usually about 10 aa residues, but can be 4-40+.
* Usually contains M, A, L, E, K aas but P and G disrupt a helix.
NAD+ Acidithiobacillus thiooxidans
310 helix
3.0 amino acyl residues per turn;
1.9 Å helix radius *
. * Very strained structure.
* Found in e.g. myoglobin and hemoglobin. * Usually very short - <4 aa residues.
blue whale myoglobin
pi helix
4.4 amino acyl residues per turn; 4.4 Å helix radius.
* Energetically unfavourable – selected against unless functionally critical, so found near active-sites.
* Usually seen as a bulge on a long alpha helix.
* Usually short – 7-10 aa residues, usually.
methane oxidising bacteria
soluble methane monoxygenase (13)
1 strand beta helices
right handed (one arrow)
Thermal hysteresis protein YL-1 aka antifreeze protein)
from mealworm
inhibit the formation of large ice grains inside the cells that may damage cellular organelles or cause cell death
freeze tolerance and ice adhesion
first discovered 50 years ago antartic fish present in millimolar concentrations their fucntion was to stop ice crystal growth
2 strand beta helices
Ice-binding protein (IBP) from a type of Flavobacterium
very potent for a microorganism produces FH in the range of 3°C submillimolar conditions
3 strand beta helices
Antifreeze protein isoform 501 from
spruce budworm.
this anti freeze protein protects organisms from freezing by adhering to ice crystals thus preventing their growth. they absorb to ice adhere to its morphology and prevent further growth.
theories suggest the effectiveness of this protein is due to how well it can stop freezing at the basal plane.
A study was conducted in 2008
protein structure and function part 2
tertiary structures :
other interactions occuring within the polypeptide chain.
predicting helices that span membranes
uses tericary structure of alpha helices grouped together
hydrophobicity plots such as kyte-doolittle plot allows us to examine a primary structure regions are.
graph: above 0 hydrophillic below 0 hydrophobic
example of this: aquaporin (lets what in and out of a cell). 7 alpha helices per sub unit in homosapiens.
beta propeller
multiple 4-stranded beta-meanders motifs arranged in the form of blades. all joined together.
active site of many enzymes in the centre of propeller.
example:
methanol dehydrogenases (enzyme)
8 bladed propeller common in alcohol dehydrogenases.
takes electrons off of the alcohol and donates them to cytochrome c. Thus couples directly to respiration.
from paracococcus bacteria.
beta barrels
(hydrophobic outside)
beta strands with a turn, anti parallel.
example: sensory protein
FhaA-receptor protein from e-coli
needs to be turned on for hole to open. (outside, detergent molecules show this )
2 types:
-outside hydrophobic (will disolve in a membrane)
-inside hydrophobic
beta barrels
(hydrophobic inside)
found in the cytoplasm.
Will dissolve in water (hydrophillic outside)
benzene
alpha solenoid
(bike chain)
stacked pairs of alpha helices
form large flexible structures found in things that need protein-protein interactions. Massive bendy surface area.
example:
phosphatase 2a
adds or takes phospate to or from proteins
quaternary structure (2types)
formed by polypeptide interactions
binding of co-factors ( non-protein chemical compound or metallic ion that is required for an enzyme’s role as a catalyst)
information flow
-operons (transcribe as on mRNA strand)
example in soluble methane monooxygenase.
redox co-factors
2 types
soluble (not part of proteins)
redox active enzymes: PQQ (quinone proteins, hydrophobic)
hemes
flavins
porphyrin rings
chlorophylls
cobalamins (b12)
bound metals
hemes
Transcription in bacteria & eukarya
Rna and sigma factor come together- high ifinity for DNA sequence.
Locate the promotor (strong association) forms closed promotor.
DNA strands start opening up, transcription starts for RNA.
Signals tell polymerase to stop
how does rna polymerase recognise where to start?
promotors: they are recongisable by 2 main sequences
one at -10 & -35 upstream from start of transcription
+1 start of transcription (purine normally)
-10 consesus often TATAAT
-35 consensus often TTGACA
distance most important
Top strand is coding strand
other is template strand therefore the outcome RNA will be the compliment of the top strand
synthesis always happens 5’- 3
differences in transcription
bacteria vs eukarya
e- 3 types of RNA polymerase
1- rRNA transcribe
2-mRNA transcribe
3-tRNA transcribe
more complex promotors
taata box -30-40
more sequences where pole binds and has enhancers upstream and downstream
b- 1 RNA polymerase
transcription in eukaryotes
3 types of rna polymerase
transcription factors involved
enhancer sequence which activator proteins can bind. adaptor proteins- all activate and dna folds.
how is gene expression regulated in eukarya?
Polyadenytion
3’ poly (A) tail- to stabilise the mRNA
Splicing- removing on introns
capping structure added to mRNA:
addition of 7-methylguaosine (binds 5’-5’ phosphate at start)
added to mRNA to stop degradation (DONT DESTROY) polyo virus targets this
Splicing
spliceisome- multi protein complex, help bind RNA around the introns recognise consenus sequence.
cuts out the introns at either end and binds the 2 exons (ligase binds 2 nucleotides)
alternative splicing?
the same gene can produce slightly or different proteins depending on the introns or extrons used.
examples:
Drosphilla gene- grey always present, r/g/b only retain in certain transcripts (38) can be
Genome strucutres:
bacteria and archaea
-no histones (bacteria)
-dna is circular
-divided into genomic DNA, replicon
DNA (plasmids, megaplasmids)
-introns and exons (rare)
inteins (do a similar job to introns)
-HGT
-prophage
Genome structures:
(all dna in a given organism)
making reads:
*using random hexamers, random bit of dna
*next gen sequencing, and amplify using pcr using random hexamers
contigs: made up of reads
*scaffolds: made up of contigs use genome of realted organism and match up the missing sequencing from the reads
genome plots
useful for G/C content
(can be done by ph absorbance now or melting temp of DNA)
circular plot example:
thiomicrohabdus heinhorstii
from fresh water in a manitee cave
this region is a prophage
circular plots
origins of replication
0=origin of replication in bacteria (gene)
(oriC 0.24-2.5kbp)
Origin of rep in archaea (oriC1,2,3,4) spaced around the genome
oriC in e.coli
(bacteria)
initation
(whole genome)
definition of replication begins at unwinding
Opening of the strands to allow replication to begin
-one region binds single stranded dna the other double stranded
DUE- dna unbinding site full of a’s and t’s seperates from each other creating a ‘bubble’ and 2 single strands briefly.
- (2 replication forks)
- DNAa (enzyme) intitator protein, binds to the box site (double stranded) and binds to DUE (creating helix turn helix motif) binds acroos those sites and a second part of the enzyme which is an ATPase domain binds acrose DUE this winds itself up and seperates the 2 DUE strands.
- attaches to single strands, zooms along seperating the 2 strands, expensive of ATP.
Have seperate so enzymes can get in and a copy to need to be
seperated
-space between DnaA box site and DUE is important
-DnaA box site it binds to needs to have correct spacing
oric is a gene but doesnt make anything just facilitates
oriC in the Archaea
-same kind of size
-3-4 oriC around the genome
-Orbs: origin of replication boxes all found in 1-4 oric locations
what happens?
-initiator protein (orc1) binds to one of the oriC. It will bind well to oric number 1 and particarly to all the other oric.
-in some archaea also have WHiP proteins which do the same sort of job and can also act as initiators.
-many molecules of orc1 binding to oriC one will join together forming an=oligimorise
this then does the job of DNAa and does the unwinding
example:
saccharlobus solfataricus (wu et al)
oriC2
orc complexes bind to 1 and partically to another.
theory to multiple if you do it in 4 times then it speeds up time.
replicons
bacteria and archaea
-usually have one chromosome but can be a second or a third. Whatever the biggest is the primary chromosome
-all other replicons are replicated in their own way not like the chromosomes.
*plasmids (origin of transfer)
*megaplasmids etc
replions defintion:
-plasmid: smallest replicon present (G/C is greater of 1% by mol) cannot contain core genes, second disk on dvds. Species-species exclusive not found in other bacteria.
-megaplasmids: probably just a second chromosome
-chromids: usually second insize to the chromosome proper. (G/C content is nearer to chromosome)
size bigger then half million base pairs.
*manmade cloning vectors:
fosmids (f plasmid from bacteria can only be used in e.coli),
cosmids (phage operon, can hold big inserts)
phasmids (made from the f1 phage)
example pUC (plasmids always start with p)
-
prophage
its DNA from bacteriophages
(incorporated into host DNA, to store until better conditions)
under the right conditions the prophage will be transalted and become a mass of phage ions that kill the cell. (calcium, or magensium going up in the environment that cell is in)
detecting tools; PHASTER that can scan genome sequences
1-10 prophage is typical
(often mis-annotated)
activated sludge in sewage works will have multiple prophage. High amount in this area, more virus particals.
viral DNA
cervical cancer (Hpv)- viral dna intergrating into genomes occurs often. Thats why some viruses causes cancer sin humans as the bit of DNA they intergrate into happens to be where divison and growth occurs, switiching it off leading to uncontrolled cell growth.
Horizontal gene transfer
(HGT)
Transfer of DNA from species to species or strain to strain (sideways)
being given to appear.
Usually done by several sources:
*phage: when it intergrates in it will bring DNA with it.
*plasmids
*intergrons: virus like mobile genetic elements
important in antibiotic resistance, hospital situations and active sludge.
genes on plasmid not essential:
*virulence factors: the molecules that assist the bacterium colonize the host at the cellular level
(things that make better at being a pathogen are usually on plasmids)
example:
- natural occuring e.coli, has a plasmid with an antibiotic resistance.
Human stops taking antibiotic early, and as there is some resistance already the resistance genes can now recolonise the whole gut.
all these cells now have this virulence factor they will communiate with other cells and pass it to them.
HGT types:
Transformation and conjugation
(bacterial sex)
archaea can also conjuate but will only pili under stress
*transformation: one of the 2 cells has to be component (chemical, enough calcium in the cell). A piece of genomic dna from bacterium a can be trasnfered to bacteria b
F-positive (sex pillas) grips f negative to it can transfer the dna.
Sex pillas is needed to transfer without it bacteria can only recieve.
conjugation
phage transfection
transformation
translation
(chiara)
genetic code
which codon codes for which amino acid
everything read in triplets.
the translation
genetic code is degenerate- some codons codes for more then one AA (third base wobble)
ORF- open reading frame- be careful of where rna starts from
genetic code frames can be different in mitocondira etc
translation needs 3 types of RNA:
-mRna: carries
-tRna: transfer
-rRna:ribosomal
tRNA:
cloverfield stucture
rRNA (ribosomes)
2 sub units contained in ribosomes
70’s bacteria
major 50’s
minor 30’s
80s-eukarya
60s (3 different sub units)
certain degree of conservation in terms of sequence and instruction. Which infers phylogenetic relationships.
mods to rRNA post transcription
what happens during translation?
*n and c terminal
(h2n to c002)
trna carries new amino acid and it attaches to new chain.
ribsomes are complex strutures
small sub unit
large sub unit: e-site, p-site, t-site.
protein synthesis
123 are the current polypeptide chain that is enlongating→bound to tRNA in p site→new charged tRNA comes into the A site and has a complimentary anti codon→bond between 3-4 begins and ribosome shifts so that the og goes toward the exit site but attached to the chain as another aa and then cycle repeating (sliding of small thne large sub unit)
accesory molecules that aid this:
tRNA is linked to elongation factors not just floating around by itself (different in bacteria and eukarya)
GTP attached to trna and elongation factor
linked to an 1st initator factor bound to GTP (energy) to break down bond and new formation→Hydrolysis of GTP to GDP (release of a phosphate→ breaking high energy bond) this energy changes confirmation of ribosome to form the new bond
2nd initaitor factor- hydrolysis of GTP to GDP creates energy that for change of confirmaton (sliding of 2 subunit) formation of new bond and exit of the last one.
inhibitors of translation (bacteria)
acting on different processes, so to ihibit something you need to choose the right thing
example: when cloning a gene, need to insert the gene into bacteria. Need to select bacteria depending on their antibiotic resistance.
Streptomycin prevents the transition from initiation complex to chain-elongating ribosome and also causes miscoding (only acts on bacteria cells)
to stop certain step pick correct inhibitor
Initiation of translation in bacterial cells
reminder: Polycistronic mRNAs (multiple ORFs) make lots at some time to be used in say a pathway
transcripton/transaltion are coupled in space and time doesnt occur in eukarya. Less options for regulation.
the start if translations is near the shine delgarno reverse compliment of whats in the ribsomes (purine-rich initiation sequence). 6-8 nucleotides before the initiating AUG.
initiation of translation in eukayroitc cells
Monocistronic mRNAs: info for only one type of protein unless alt splicing
transcription (nucleus) and translation(cytoplasm) are seperate in space and time
Transcription can occur not in cytoplasm- for mitocondria
small ribosomal sub unit interacts with the elongation factors, charged tRNA, this is helped by the 5’ cap this complex is at beginning of mRNA (Small subunit only)→All starts scanning to find the initating AUG→ release of elongation factor→hydrolysis of ATP→ADP this energy recruits the large sub unit = complete machinery → chain elongation begins
Once proteins are made : folding &
quality control
*once polypepetide chain starts growing the aa start folding they have different qualities
example: hydrophobic proteins will start folding to get away from the charged cytoplasm
Secondary/tericary starts to form whilst emerging from the ribsomes.
protein misfolding: whilst the protein is growing there
check points to ensure the protein has folded correctly→ either fixed or hydrolised (waste of material as non functional, can be dangerous/pathogens for the cell)
misfolded proteins need to be eliminated, protesome: cylinder that takes in protein, protein will hydrolise in the middle of this.
how cell know this? addition of a flag to the protein (Ubiquitin)
ubiquitin is recycled to be used again
summary
dna to protein in eukaryotic cells: regulation
transcription not always active
regulation occurs at:
capping of new messenger
elognation
splicing
polyadenation (mod)
export from nucleus to cytoplasm
during protein synthesis and after
transcription
translation
protein degradation
gene structure and regulation
regulation of transcription
*number of copies
*localisation of transcripts
*timing of transcripts
*chromatin structure and histone mods
histones (positive charge)
dna is wrapped around histones, they are positively charged.
can have multiple levels of organisation
4 core histones: H2A,H2B, H3,H4 together make an optomer of 8 proteins
DNA is wrapped around this, h1 is a linker for DNA
Mods:
different mods have different meanings depending on where they are- histone code hypothesis.
options:
methyl, acetyl, phoshate, ubiquitin
dna methylation
can change during age of organism
example- at 6 weeks for embryonic globin-
cytoscene has a unmethylated promotor at 6 weeks and the a methylated promotor for globin later on to shut off the production and allow adult globin to produce.
sigma factors
tells polymerase where to start, recognises what genes are needed.
Regulation can occur at this level because gene transcription can be regulated by promotor equence thus sigma factors
=> gene regulation (up-regulation or down-regulation)
Another way of regluating gene expression
example: e-coli time vs optimal density in bacteria cell
different sigma factors are expressing different amounts because you need different genes
growth phase- sigma 70 house keeping genes (growth)
stationary: σ32: Heat-shock gene transcription σ38: Stationary phase gene expression σ54: Expression of genes for nitrogen metabolism
depending on what σ are present different promotors will be activated and therefore different proteins transcibed
different RNA polymerases
(at transcription) in eukaryotic
rna polymerase 1
rna polymerase 2
rna polymerase 3-trna, 5s ribosome and non protein coding
further regulation at transcription
an example of transcript factor to d-differentiate the re-differentiate cells
example of a cascade:
myOD- expierment from skin of a chick embyro took fibreblasts undiffertiated teh cells and activated myod making the fibroblasts muscle cells.
Example regulation due to the localisation through signals on the UTR affecting mRNA localisation
regulation due to localisation:
how can a zygote differentiate in the many different cells in an organsim?- how are they differetiated postioned in the cell?
through the UTR’s
drosophilla melanogaster example:
tells the protein to be at either end of the egg.
how would you test this?
(utr) tatata- this sequence makes it organised. Cut the cds and attach this section to the UTR where its obviously different then the incorrect part will be localised.
(in vivo)
post transcription mods
3’ poly tail
removing of introns
5’ tail
examples
alternative splicing
regulation at/after transcription (localised proteins)
regulation at/after translation
examples:
*depending on avalability of charged tRNA (attached to aa)
genetic code is redundant so different codons coding for
*elongation factors
*post translational mods- once protein produced could be phos, acetyl, gly groups could be added.
these mods will be listend on uni prot
regulate gene expression:
localisation of the protein
needs to be in a specifc place
where are proteins produced (cytoplasm- ribosomes)
an example of protein localisation
*some proteins are needed in certain places
intracellualr proteins: targerted to organelles (nucleus, mitocondria etc)
* these proteins that are produced on free ribosomes and translocated/modified after translation
Extracellular proteins:
destined to plasm membranes or for excretion.
Will be produced on the ER- bound ribosomes.
co-translational translocation occurs in the RER lumen where further packaged into lysosomes.
extracellualr protein:destined to be plasma proteins: produced on the ER
post translational translocation
intracellular
a sequence of the protein will have a flag that says put it in a certain place.
protein is transported within the matrix and the cut signal peptidase is removed after
why is regulation important?
mis-targeting can cause disease
level at translocation
type 1 birds disease (high level of oxalate in urine) causing kidney stones
theres a mutation in the signal peptide, so the oxilate goes into the mitcondria not the proxisomes
thus an active enzyme in the wrong compartment
phosphorylation
having phosphates added or removed
kinase-adds
phosphertase- removes
the addition or removal can activate a protein.
how does the cell know what regulates the cell cycle?
cyclings- proteins which change the level at variouis levels. Activated by CDK.
phosoprlation at a certain levels will activate the cyclings:
-formation of complex
-activiation through phosphorlaytion
-inhibitor is another phosphate in another position
-why does the cell do this?
timing
its faster to have an active but prohibited cell then creating one from scratch.
another level of regulation is
degradation
example of when:
ubiquitylation
seperation of 2 chromatids before cell division
cell is ready to go but they are still linked together, the phosphorlatyin of other proteins then enables this seperation.
enzyme is ready but waiting
where can gene expression be controlled?
*transcription (will it be transcribed or not how much if so)
*mRNA- processing (splicing etc)
*transport in messenger RNA
*where will mrna be will it be active
*translational regulation (makeup protein or not)
*Will the protein be used, how long will it last, hydrolysis right away?
*phosphorlaytion, ubiqylation etc
how to study gene expression?
*attach a fluorescent protein
*real-time qPCR
*microarray
*hybridation-based assays
genome structure in eukaryotic cells
compare the differences between bacterial and eukaryotic cell genomes
eukaryotic bigger-size-
eukayrotic cells- linear
bactiera-circular
bacteria-polysystronic
bacteria- higher gene density
bacteria- presence/absence of introns
HGT/LGT- bactera
ori- multiple in eukarya
genome size and number of genes
c-value paradox:
*genomes
*size vs complexity
*ploidy
*c-value paradox
the complexity of the organism does nto correlte with the genome size
drosphilla bigger then mammals
although their is a relationship between genome size and protein coding genes
packing
similarties between human and mouse genome
some contain the same genes
*sex chromosomes the same
*conserved segments
example
human 17 and mouse 11
synteny
sequence present is the same
human and fish genome are the same, in the same sequence on the same strand
exons are the same but some are used as coding sequence some are cds
genome differences
lookng at related deer species the chromosomes look very different although they appear the same
could look at genes in genome.com
genome content
sequencing of the human genome
(history)
one of the milestones- as soon as the sequences are avalible they will be released to the public.
*research would cost a lot more without this
*24 hours post production results need to be put up
whose genomes?
pangenomes:
genome content
genes: region of the DNA which codes for an active molecule
(alternative splicing causes different proteins)
various RNA genes
other sequences
intergeneic seqeunces (low complexity repeats often found in centromere and telemeres)- most difficult parts to sequence as so repetitive
mobile elemets (selfish DNA)
gene transcription
miRNA and gene silencing-
to silence the expression of the protein it would code for.
example human nuclear genome
2020
first telomere to telomere sequence of human chromosome
synthetic organisms
recombinant DNA 1
cloning tools:
-PCR
-Plasmids
overview of molecular cloning
-obtain the gene of interest (extract dna, PCR)
-cut gol and vector
-seperate and isolate fragments (agarose)
-ligate gol and vector
insert the new recombinant moelcule to the bacteria
-grow the bacteria
(this creates a huge amount of the region of interest)
molecular cloning is better then using PCR
as it will keep the protein alive and its easy to stop it denaturing
PCR:
polymerase chain reaction of section between the primers
heat= denatruation, anealing, polymerasiation
(doubles every cycle)
annealing temperature depends on the primers sequendes (G/C content)
enzymes for PCR:
characteristics of DNA polymerases
*processivity (how many nucleotides will it incoroprate before it dissociates) how long
*fidelity (proof reading, 3’-5’ exonuclease acitivty)
*how much other stuff does it amplify
*thermostablity (how quickly does it degrade at the high temp)
types of DNA polymerase:
TAQ (no proofreading)
processivity- how many nt it can imcorrperate (how good as it at duplicating)
fidelity- how error free it is (proof reading 5-3 exonnuclease)
specificity- how specfic is it- what other stuff does it also amplify
thermostabilty- how quickly does it degrade at temp
types:
taq- doesnt proof read (72-75.c)
doesn’t correct errors
issues- could make errors consistenly
short half life
makes sticky ends
proof reading types
pfu- slower then taq
Platinum taq- not great proof reading
Q5
some commons types of PCR
denaturation of inhibitor (HOTSTART)-polymerase doesnt work at room temp- inhibited by anti-body but once that hat is hit then the polymerase will be activated
(stop product being made before ready)
Touchdown PCR
Nested PCR- one after the other, One before wanted region and one after.
plasmids
do exist in nature but ones for cloning have been editied
example:
pUC19
important elements:
-contain genes to replicate on the chromosome
-ORI
-selection marker- antibiotic resistance- will kill off everything else that isn’t
-multiple cloning sites
-way of screening blue/white cloning
260/280- absorbance level see how clean DNA is
HGT
transformation (bacteria the uptake of dna directly from the environment)
-transduction (can happen in mammalian cells if they have been edited)
*(virus mediated when it picks it up and takes it on)
-conjuation (penis bit)
bacteria transformation
use transfection to make competent cells.
ligase
use overhang to ligase, using sticky ends will make the cloning a lot mor effecient
use a taq etc
recombinant DNA 2
methods to transformation (getting dna inserted)
chemical transformation:
chill cells in calcium chloride to permeablise membrane
then heat shock to prompt dna uptake
electroporation:
purify cells to remove ions then shock cells with a high voltage current to create holes in the membrane
both stressful methods, after the DNA is uptaken wait 2 hours to allow cell recooperation
once growing the bacteria how will you ensure to culture the cells with the protein?
bacteria culture
culutre spread on agar
they will then reproduce and create colonies on each section.
but at this stage you dont know if they have uptaken the plastid.
(sterille techniques)
next screen the colonies
screening bacteria with the insert
(how to indentify)
select bacteria- the vectors has an antibiotics resistant gene. So grow the bacteria on the agar with antibiotic then only the bacteria with the plasmid withh survive.
Couple plasmid with antibiotic in the vector
using white blue/colonies system
lac operon (contains different genes all transcribed at a the same time)
b-galacotisades
lac-glc-gal
also breaks down x-gal
when b-gala breaks down the x-gal it creates a bright blue colony. Therefore any bacteria that has b-gala will break down the x-gal producing the colour compound and turning them blue.
*if the colonies have taken up the insert they wont turn blue as the insert has gone inside and broken up the b-gala.
at this point the screening of inserts is complete but need to screen for the specifc insert wanted
screening for correct insert
-check for the size of the insert om a restriction digest, run on agarose and check size.
-colonly pcr
molecular cloning
difficulties
*need restirction sites in the right place
*need very highly purified enzymes (can be spenny)
using pcr
advantages:
can isolate and fuse fragments independtemly
disadvantages
*length constraints, G/C content too high.
* difficult to join one then more region of interest
to insert more then one gene into a vector.
TA cloning
conserved priming site, dont need to make new primers
TA cloning
Zhou and Gomez-Sanchez (2000). Universal TA Cloning. Curr Issues Mol Biol 2(1): 1.
e.g. https://pcrbio.com/applications/pcr/ta-cloning/, https://www.thermofisher.com/order/catalog/product/K457502
taq polimerase is cheap and adds an overhanging a on 1 of the 2 strands.
pcr amplify insert, and incubate with deoxyadinenes to create the overhanging a
use plasmid with overhanging t
a and t’s will bind= efficient cloning
topo
topo-isomerase (way dna is folded)
plasmid is sold with a correct overhanging t with an enzyme (topo) attached.
opens the plasmid, insert anneals with and t and then ligase activity puts it back together.
#dont need many restriction sites as topo is there. Has some so insert can be cut out later if needed.
why are all the elements there, why are they useful?
advantages and disadvantages of cloning methods
other methods:
TA
gateaaway cloning
gibson assembly
gibson assembly
design primers to amplify GFP (DNA 2) but foward primer needs to contain same sequence as end of first PCR product (rev)
attaching dna to GFP
first create primers
#PCR-amplify left fragment (DNA 2)
PCR product has primers either end
Reverse primer for GFP doesnt matter
add a 5-3 exonuclease
cuts off nucleotides from this section (digests a bit of both fragments)
ligase then acts on binding it all
advantages:
seamless joinign of DNA fragments
cheaper no need for RE
multiple fragments can be joined in one step
limitations
few fragments at a time
need to start from a template
size limit imposed by pcr
applications of molecular cloning
-modified organisms
-produce a recombinant protein like insulins
-recombinant animals for research
-different proteins that are reacting together
considerations:
codon optimisation
different organisms have different perctnages of different tRNAS and codons prefernces
what enginerring a construct to express a protein which codons should be used?
*what is the codon usage of the organism
*which codons are used the most
types of vectors
a dna molecule isolated from a virus, plasmid ir celll.
A dna molecule in which you can out anither dna organism
types:
-plasmids
can obtain large inserts. These are used in libraries for genome sequencing.
-cosmids (can accept inserts of 30/40 kb)
-BACS
-PACS
-YACS
NGS next gen seq +
third gen
illumina
NGS vs sanger
pros and cons
next gen pros
- higher output
-can do lots (parallel at same time) - requires less prep (library)
-low cost per big volume of data
-more data less time quicker
-sequencing in different ways
-produce more errors then sanger (assembly) in theory each read is less reliable
-still under development
-produce long reads
-require big computers
-needs specalist knowledge
-data storage issues
sanger- 96 max
how to sequence a genome
shotgun vs libraries
how to sequence a genome
(shotgun compared to libraries)
Book-shred and sequence- put back together
making libraries
take small pieces, sequence the ‘page’ and put together
library= a comprehensive collection of cloned DNA fragments. (Have to be cloned into a vector with an origin of replication) and the fragments need to represent the whole genome. Can be propregated in a host cell
a good library has:
*insert if not rearranged
*even representation of source material
*no vectors with multiple inserts
*no empty vectors
vector can be very simple but the concept of library is the same
shotgun sequencing
dna shearing, sequencing and assembly
take the genome, randomly fragment and look for overlap, same bases in the same sequence. Then it can be read
*step on fragment
*sequence each of fragment
*try and put them together (align the k-mers) align fragments by looking where the same letters are
*
illumina
Types
Illumina- mysig small sequencer (bench top sequencer)
How it works: output of Illumina are pictures
1)Prep the sample (library)- put a few molecules at the end (taq) one end is used to attach dna piece to a flow cell (slide) with lanes in the lanes there are small oligonucleotides attach to cell and used as an anchor for molecule you have made.
*Allow your molecule to bind to the slide.
in some of the sections the molecule you want, Attach molecule you want to sequence to the slide.(all are fluoresent)
each of the fragment is synthesized by sequence and everytime a molecule is added its fluorscent.
(parallel sequencing)
*make a cluster of these molecules so they can be perceived more easily, produce a clone in each of the lanes in the slide.
in each lane: there are attached olligoers, molecule of interest which have oligers attached.
attaching the bits you want to sequence to the slide, every incorporation take a picture.
cluster needed in each position so its easier to visualise on film. All sequenced at the same time.
dideoxynucleotide wash-
Sequence the complimentary wash away the og strand. Add the oxygen and rewash with a different fluorescent to see how the new positions have incorporated the added letters.
producing a huge amount of data all at the same time.
*library
*clusters
*sequencing
illumina
sequence short fragments including oligos, 150bp sequenced.
Sequence a fragment by both ends (pair end sequencing) or single end.
Issue- small fragments hard to put together.
*making a mate pair for short sections. How do all the small bits get put together
Pacbio smrt
single molecule real time sequencing
Smrt bell template- take the fragment and add 2 hairpin linkers that has a polymerase which will make the complimentary strand put one of each molecule in small pores. Camera can see light for small pores. i
Polymerase sequencing through synthesis-
fragment blocked on one side- The fragment of dna (circle) can go through polymerase a number of time.
t= orange etc
Depending on which light is read by the camera can determine which nucleotide has gone through.
Why?
*Can sequence much longer fragments,
*lag time between fragments-vv
*can read epigentics as well but if theres mods between bases.
*How bases are modded.
*Lag between reading certain bases tells you if its modded or not
oxford nanopore
(miniION)
*first portable sequencer
*sequencing in the field
ebola pandemic- able to do in the field
next gen DNA sequencing platforms
genome content :
protein coding genes
rna genes
NTR (promotors, mobile elements)
initital gene annotation:
-look for open reading frames
-compare sequence to known sequence.
-Compare transcriptombe from same species or similar taxa that looks similar
applications of NGS
Phylogeny
Functional characterisation
Epignentics
compare genomes
detect variants
population structure
Rutter GA (2014). Understanding genes identified by genome-wide association studies
for Type 2 diabetes. Diabetic Medicine 31(12): 1480.
Example:
enabled the heritable nature of type 2 diabetes to be explored. between families and genome-wide association studies.
500 genes identified
comparative genomics
what we have seen so far:
–inital sequencing of human genome:
2001
–human vs mouse genome
–synteny (comparison between fragments of 2 humans. Same genes, but intergenetic regions are different)
–carotypes of chromosomes in genomes
–similarities and differences (similar species with different genomes)
when comparative genomics become useful?
–gene start comparison (2007)
comparing fruit fly genomes- found information about the start of genes
Lin et al. (2007). Revisiting the protein-coding gene catalog of Drosophila
melanogaster using 12 fly genomes. Genome Research 17: 1823.
found out about the start of genes:
looked at protein coding in genes in different drosophilla species and found what was conserved or not.
They corrected mistakes where cds was then started.
how to study these things?
–alignments (workout what sections of the gene are conserved
–mitochonrial genomes (comparison of the shape of mitocondrial genomes)
conservation of genomic regions vs function (2020)
chromosome 19:
–smallest
–very well conserved
–has a high gene density
–high G/C content
Harris et al. (2020). Unusual sequence characteristics of human chromosome
19 are conserved across 11 nonhuman primates. BMC Evol Biol 20: 33
comparative genomics in medicine
(1999)
chromotripsis- rearrangment of chromosomes that result in disease.
In cancer cells, chromosomes are rearranged.
certain stresses (chemical, uv) causes partial fragmentaion, then they are re-arranged wrong.
shows that its not just the information that is important its how they are expressed. (regulation)
synthethic biology
minimum number of genes
current research: