BIOL220Z Molecular Biology Flashcards
Make sure to recap:
Life
Life domains
Main similarities/differences between genomes
Macromolecules
What are the characteristics of life?
- maintain integrity (boundaries)
- information: store, replicate, transform it into “action”
- perform and regulate metabolism (energy)
- interact/signal (with environment, other cells)
- replicate (divide)
*etc
Life: C-based and DNA-based
Describe the genome in bacterial cells
-one single circular chromosome
-smaller
-extra chromosomal elements outside of the circular chromosomes
Describe the genome in eukaryotic cells
-linear
-bigger
-mt genome and chloroplasts
Describe the origin of the present-day mitochondria
Endosymbiosis: bacterial cell engulfed by eukaryotic cell and evolve together.
Name 3 types of staining
-binding a molecule to a specific organelle structure
-binding an antibody
-GFP staining (green fluoresent)
Draw the DNA nucleic acid structure
What is a nucleoside?
Nitrogenous base and 5 carbon sugar
DNA VS RNA similarities
-bases: A,G,C
DNA vs RNA differences
DNA:
-base T
-double-stranded
-relatively stable
-information storage
-usually one
-deoxyribose sugar
RNA:
-single-stranded
-unstable
-base U
-many functions eg transport, enzymatic etc
What are Okazaki fragments?
Okazaki fragments are the short lengths of DNA that are produced by the discontinuous replication of the lagging strand.
What is ori c in e.coli?
This is the replication origin, where DNA sequences are recognised by initiator proteins
What occurs in a reverse transcriptase reaction?
Reverse transcription involves the synthesis of DNA from RNA by using an RNA-dependent DNA polymerase.
The DNA strand is not identical to the og
Name 3 important milestones in molecular biology
-“jumping genes”
-lac operon
-pcr
What was Barbara McClintock’s work about?
the discovery of transposons “the jumping genes” and the disruption caused by them on chromosome 9
What is the Lac Operon?
A classic example of an inducible operon for gene expression and control in bacteria
Who came up with PCR?
Kary Mullis et al
What is synthetic biology?
A multidisciplinary field of science that focuses on living systems and organisms, and it applies engineering principles to develop new biological parts, devices, and systems or to redesign existing systems found in nature.
How can lac Z be used as a cell reporter?
Lac Z codes for B-galactosidase and its activity serves as a marker for gene expression patterns during development eg in whole mouse embryos
Alpha helix info
3.6 amino acyl residues per turn; 2.3 Å helix radius
Perutz (1951)
Most common helix in proteins.
*
Usually about 10 aa residues
contains MALEK
Methionine, alanine, leucine, glutamate, and lysine uncharged
example of where found-myosin
310 helix
3.0 amino acyl residues per turn; 1.9 Å helix radius
Bragg et al. (1950)
Very strained structure.
Found in e.g. myoglobin and hemoglobin.
Usually very short - <4 aa residues.
example of where found - blue whale myoglobin
pi helix
4.4 amino acyl residues per turn; 4.4 Å helix radius.
Low and Baybutt (1952)
Energetically unfavourable – selected against unless functionally critical, so found near active-sites.
Usually seen as a bulge on a long alpha helix.
Usually short – 7-10 aa residues
beta helix (sheet)
Perutz (1951)
Can be parallel or antiparallel and complex structures can form.
Each strand is usually 3-10 aa in length.
Usually contains: Valine, threonine, histidine, tyrosine and isoleucine
Name an example of a protein with 310 helices
Blue whale myoglobin
First genome synthesised?
e-coli
Genomics?
sum of chromosomal DNA
Transcriptomics?
mRNA of specfic condition/time
Proteomics?
sum of protein content
metagenomics
all chromosome DNA, all organsisms, All domains
(blend a person)
metatranscriptomics?
all mRNA
Panenomics?
all chromosomes DNA all strains vs
5!
(5x4x3x2x1)
Beta meander?
2 or more anti-parallell strands linked by hairpin loops
What is a Greek Key Motif?
Four anti-parallel strands and linking-loops.
What occurs at the OriC site in bacteria?
Opening of the strands to allow replication to begin
-one region binds single stranded dna the other double stranded
DUE- dna unbinding site full of a’s and t’s seperates from each other creating a ‘bubble’ and 2 single strands briefly.
- (2 replication forks)
- DNAa (enzyme) intitator protein, binds to the box site (double stranded) and binds to DUE (creating helix turn helix motif) and a second part of the enzyme which is an ATPase domain binds acrose DUE this winds itself up and seperates the 2 DUE strands.
- strands need to be seperated so copies can be made creating 2 genomes
Why is space between DUE and box site important?
genome structures: bacteria and archaea.
What occurs at the OriC site in archaea?
-Contains a series of origin of replication sites (OriC) 3-4
-2 flavours of boxes full size and mini Orbs
- initiator protein (orc 1/ cdc6) comes and binds to an ORIc site.
orc1 will bind well to OriC1 and slightly to all the others- this does same job as dnaA and unwinds
Prophage (viral sequencing)
a bacteriophage genome that is integrated into the circular bacterial chromosome incorporated into the host cell.
HGT (hor gene trans)
transfer of dna from species to species or gene to gene- sideways
ways this can occur:
-phage (the act of prophase intergrating,taking bits of DNA with it)
- plasmids
- intergrons
important in antibiotics
Transcription
Synthesis of RNA under the direction of DNA-transcript of the genes protein-building instructions (mRNA)
Translation
Synthesis of a polypeptide under the direction of mRNA.
Site of translation are the ribosomes
Bacteria Transcription/Transalation
Coupled event, as they lack nuclei.
ribosomes attach to leading strand of mRNA colecule whilst transcription is still happening
eukaryotic transcr/Transla
transcription occur in the nucleus, mRNA are sent to the cytoplasm where transalation occurs
Synthesis and processing of RNA
RNA polymerase (enzyme) pries to strands of DNA apart and hooks together the RNA nucleotides as they base-pair slong the template
overview of transcription
Rna and sigma factor come together- high ifinity for DNA sequence.
Locate the promotor (strong association) forms closed promotor.
DNA strands start opening up, transcription starts for RNA.
Signals tell polymerase to stop
How does RNA polymerase recognise where to start? (bacteria)
promotors: they are recongisable by 2 main sequences
one at -10 & -35 upstream from start of transcription
+1 start of transcription (purine normally)
-10 consesus often TATAAT
-35 consensus often TTGACA
distance most important
Top strand is coding strand
other is template strand therefore the outcome RNA will be the compliment of the top strand
synthesis always happens 5’- 3’
Where is start of translation? (bacteria)
(ATG) look downstream
usually at -6 to -8 is shine dalgamo (ribosomal binding site)
Differnces in transcription bacteria and eukarotic
e- 3 types of RNA polymerase
1- rRNA transcribe
2-mRNA transcribe
3-tRNA transcribe
more complex promotors
taata box -30-40
more sequences where pole binds and has enhancers upstream and downstream
b- 1 RNA polymerase
Transcription in eukaryotes
(Other proteins invovled not just RNA polymerase)
enhancer sequence which activator proteins can bind. adaptor proteins- all activate and dna folds.
What does TF stand for?
transcription factors
part of transcriptor proteind that help activation
How is gene expression regulated? (after)
Post Transcription modifications
(Bacteria)
3’ poly (a) tail- signal degradation
How is gene expression regulated?
(eukaryotic)
(eukaryotic) post transcription
Polyadenytion
3’ poly (A) tail- to stabilise the mRNA
Splicing- removing on introns
capping structure added to mRNA:
addition of 7-methylguaosine (binds 5’-5’ phosphate at start)
added to mRNA to stop degradation (DONT DESTROY) polyo virus targets this
Splicing
spliceisome- multi protein complex, help bind RNA around the introns recognise consenus sequence.
cuts out the introns at either end and binds the 2 exons (ligase binds 2 nucleotides)
(Bacteria) Polycystronic?
one mRNA makes more then one proteins, because they’re all needed at the same time
snRNPs?
help recognise nucleotide sequences from exon/introns boundaries
Alternative splicing
the same gene can produce slightly or different proteins depending on the introns or extrons used.
examples:
Drosphilla gene- grey always present, r/g/b only retain in certain transcripts (38) can be made by one gene therefore lots of different proteins
What are the UTR’s?
Untranslated region
Genetic code
the translation- CDS is read in triplets and equals one AA
degenerate- some codons codes for more then one AA (third base wobble)
ORF- open reading frame- be careful of where rna starts from
genetic code frames can be different in mitocondira etc
ORF
gDNA is the entire genomic DNA and therefore double stranded
rRNA
ribosomes- complex molecule
Bacteria 70’s (s-value is rate of sedimentation)
2 subunit protein and different molecules rRNA (major 60, minor 30s)
bacteria and eukaryotic similarities:
terms of sequence(useful to decide different between organsims) and instruction
16s common used
mods:
TRNA
Charged tRNA so bound to an AA
Translation process
Proteins have a direction of reading:
n-c terminal (5’-3’)
n-c
protein synthesis happens by an addition of aa thats bound to aa tRNA and a bond between new n terminal of new nucleotide to c terminal of og- binds to growing peptide change
Messenger RNA can be tranalted by different ribsomes at the same time→ producing many copies of proteins
Ribsome
not membrane enclosed organelles
3 main sites in the large subunit
e: old tRNA goes out
p: Peptidyl-tRNA-site
a:Aminoacyl-tRNA-site
Protein synthesis
123 are the current polypeptide chain that is enlongating→bound to tRNA in p site→new charged tRNA comes into the A site and has a complimentary anti codon→bond between 3-4 begins and ribosome shifts so that the og goes toward the exit site but attached to the chain as another aa and then cycle repeating (sliding of small thne large sub unit)
accesory molecules that aid this:
tRNA is linked to elongation factors not just floating around by itself (different in bacteria and eukarya)
linked to an 1st initator factor bound to GTP (energy) to break down bond and new formation→Hydrolysis of GTP to GDP (release of a phosphate→ breaking high energy bond) this energy changes confirmation of ribosome.
2nd initaitor factor- hydrolysis of GTP to GDP creates energy that for change of confirmaton (sliding of 2 subunit) formatin of new bond and exit of the last one.
Inhibitors of translation (bacteria)
acting on different processes, so to ihibit something you need to choose the right thing
example: antibiotics
Streptomycin prevents the transition from initiation complex to chain-elongating ribosome and also causes miscoding (only acts on bacteria cells)
to stop certain step pick correct inhibitor
Initiation of translation in bacterial cells
reminder: Polycistronic mRNAs (multiple ORFs) make lots at some time to be used in say a pathway
transcripton/transaltion are coupled in space and time. Less options for regulation.
Initiation of translation in eukaryotic cells
Monocistronic mRNAs: info for only one type of protein unless alt splicing
transcription (nucleus) and translation(cytoplasm) are seperate in space and time
Transcription can occur not in cytoplasm- for mitocondria
small ribosomal sub unit interacts with the elongation factors, charged tRNA, this is helped by the 5’ cap this complex is at beginning of mRNA (Small subunit only)→All starts scanning to find the initating AUG→ release of elongation factor→hydrolysis of ATP→ADP this energy recruits the large sub unit = complete machinery → chain elongation begins
Once proteins are made : folding &
quality control
once polypepetide chain starts growing the aa start folding they have different qualities
example: hydrophobic proteins will start folding to get away from the charged cytoplasm
Secondary/tericary starts to form whilst emerging from the ribsomes
protein misfolding:
check points to ensure the protein has folded correctly→ either fixed or hydrolised (can be dangerous/patogens for the cell)
protesome: cylinder that takes in protein, protein will hydrolise in the middle of this
how cell know this? addition of a flag to the protein (Ubiquitin)
Where does regulation occur in eukaryotic cell?
transcription not always active
regulation occurs at:
capping of new messenger
elognation
splicing
polyadenation (mod)
export from nucleus to cytoplasm
during protein synthesis and after
transcription
translation
protein degradation
how can transcription be regulated? when/ where
regulate how mnay times a gene is transcribed
where transcipes are kept?
timing of transcription
mods to histones and chromosome structure
Histones
dna is wrapped around histones, they are positively charged.
can have multiple levels of organisation
4 core histones: H2A,H2B, H3,H4 together make an optomer of 8 proteins
DNA is wrapped around this, h1 is a linker for DNA
Mods:
different mods have different meanings depending on where they are- histone code hypothesis.
DNA methylation
can change during age of organism
example- at 6 weeks for embryonic globin-
cytoscene has a unmethylated promotor at 6 weeks and the a methylated promotor for globin later on to shut off the production and allow adult globin to produce.
Sigma factor (σ)
tells polymerase where to start, recognises what genes are needed.
Regulation can occur at this level because gene transcription can be regulated by promotor equence thus sigma factors
=> gene regulation (up-regulation or down-regulation)
Another way of regluating gene expression
example: e-coli time vs optimal density in bacteria cell
different sigma factors are expressing different amounts because you need different genes
growth phase- sigma 70 house keeping genes (growth)
stationary: σ32: Heat-shock gene transcription σ38: Stationary phase gene expression σ54: Expression of genes for nitrogen metabolism
depending on what σ are present different promotors will be activated and therefore different proteins transcibed
regulation at transcirption (eukaryotes)
different polymerases
1: ribosomal
2: mRNA and RNA genes (small nuclear etc)
3 tRNA, some non coding molecules
other factors:
either improve the transcription of something or repress it
further regulation example:
different genes have different promotrs require different activator proteins
so another activator gluco that doesnt interact with dna.
Activator comes along (hormone) binds to receptor and activates the receptor so it goes to nucleus and interact with other activator protein= more gene expression
An example of a transcript factor to de-differentiate and re-differentiate cells
regulation at transcription of myOD
fibroblasts taken from skin chick cells and turned into muscle cells
done by: expressing myOD (which is required for muscle cells)
Localisation
Signals on UTR can effect localisation
keeping things in certain places by certain cytoplasmic deterants
If you swap over the UTRs in nanos and bicoid, the mRNAs localised in the “incorrect” region (e.g. the nanos mRNA)
Regulation after translation
Occurs once mRNA is a protein
depending of avalibilty of charged tRNA
can occur at elongation factor
post translational mods (phosphorylation, acetylation, glycosylation)
localisation of the protein:
destination of newly produced protein
2 different localisation of ribosomes
post translational translocation:
proteins produced on free ribosomes are moved after production
extracellular proteins: produced on ER destined for outside nucelus
Example of post translational translocation
protein destined to mitcondria
produced on ribosomes and trans out
includes a signal peptide to say where it needs to go (cut off once has arrived)
Phosphorlation (at/ after translation)
phosphates added, carried out by a kinase.
(opposite phosphatase)
cell cycle: cyclase hydrolyised and used at different stages activated by cdk kinase. need phosphorlation to activate. 2 levels of regulation.
why does cell go through this?
timing, it creates the complex which takes a while, having an activated but slightly inhibited complex is easier then starting it all at one go.
protein degrdation and example at Ubiquitylation
regulation by protein degradation, hydrolyising an element thats acting as an inhibitor so then the complex can perform its function
control level reminder
Control at:
Transcription- is it transcribed or not, how much
Processing of mRNA- splicing
Transpot- wheres it going/whats it need to get there
Translation- make up of protein
Protein reg- how long will protein last, is it active or not (phosphorlation etc)
How to study gene expression?
make protein fluoersent or attach a tag to make it visible as long as it goes in the right place and doesnt effect the final product
do a qPCR- in vitro (extract RNA) make it visible, quanify how much RNA is produced by the amount of flurosent produced
micro ray
hybridisation
Bacteria vs eukaryotic
https://bmcbiol.biomedcentral.com/articles/10.1186/1741-7007-11-119
eukaryotic- bigger, more then 1 linear chromosomes, smaller genes, mulitple ori
Bacteria- circular, no associated proteins, higher gene density, gene transfer, single origin
relationships and how genomes and evolve
c value paradox
number of genes per genome doesnt equate to complexity
example mammalian vs ameoba
relationship between protein coding genes and genome size
small genomes- small protein coding genes
virsus- they use other proteins from what they are preying on
chromosome packing in eukaryotic cells
beads on a string- dna wrapped around histones on chromatin
karyotype
list of complete set of chromosomes
autosomes
not sex chromosomes
haploid vs diploid
complete, only 1 in germ cells
2 copies
human genome vs mouse genome
Lander et al. (2001). Initial sequencing and analysis of the human genome. Nature 409(6822): 860.
similarities-
segments are similar 17 is same as 11, same gene in same form
human 20 similar to mouse 2
large regions of synteny
Synteny
p- petite
fragment of human genome- few genes
3 conticts of fugal genome
genes are exactly the same when compared.
in same sequence on same strand
the sequence shows genes present are the same.
but if zoom in and look at structures are in the gene are the same too (where introns and exons are)
first draft of human genome
https://www.nature.com/articles/s41586-023-05896-x
human pangenomes
genome content
what do they contain?/elements
genome (is full compliment) how much codes for protein coding genes? =2%
genes region of dna that codes for an active molecule
RNA genes etc
other sequences-
intergenic sequences
low complecity repeats (centromeres, teleomeres)
mobile elements- jumping bits of DNA- behaving selfish
splicing sequences
miRNA
Gene Silencing-
minimum number of genes for an organism to survive?
Hutchison et al. (2016). Design and synthesis of a minimal bacterial genome. Science 351(6280): aad6253.
Glass et al. (2006). Essential genes of a minimal bacterium. PNAS 103(2): 425-430.
recombinant technology DNA (1)
What is cloning?
make genetically indentical clone
an organism or molecule
(steps come together which equated to- cloning)
What are the main steps in cloning?
obtain region of interest
prepare region to become sticky and get it to stick to a vector (done restriction enzymetically)- creating recombinant plastid
insert new recomb into bacteria- bacteria can dublicate- genetically identical.
need single colonies of bacteria
can express proteins more or less directly
why need molecular cloning?
instead of just pcr?
Dna farm
long term- stop degredation- unstable to keep, need the specfic primers to just pcr. Bacteria can be frozen(suspended animiation) and kept and they have their own determining methods that will delete mutations etc (example?)
standard polymerase in virto might not be strong enough to cut a very long sequence
Pcr reminder
polymerase chain reactions- end up with an exponetial section thats between your 2 primers.
denature
annealing temperautre
lower- less specficty
higher specifity
decides strength of interaction
elongation
Characterisitcs of Polymerases
and types
processivity- how many nt it can imcorrperate (how good as it at duplicating)
fidelity- how error free it is (proof reading 5-3 exonnuclease)
specificity- how specfic is it- what other stuff does it also amplify
thermostabilty- how quickly does it degrade at temp
types:
taq- doesnt proof read (72-75.c)
doesn’t correct errors
issues- could make errors consistenly
short half life
makes sticky ends
proof reading types
pfu- slower then taq
Platinum taq- not great proof reading
Q5
common types of PCR
denaturation of inhibitor (HOTSTART)-polymerase doesnt work at room temp- inhibited by anti-body but once that hat is hit then the polymerase will be activated
(stop product being made before ready)
Touchdown PCR
Nested PCR- one after the other, One before wanted region and one after.
Restriction enzymes
blunt ends or sticky ends
Where do plasmids come from?
plasmid pu19
important elements:
-contain genes to replicate on the chromosome
-ORI
-selection marker- antibiotic resistance- will kill off everything else that isn’t
-multiple cloning sites
-way of screening blue/white cloning
260/280- absorbance level see how clean DNA is
Horizontal gene transer
what?
transfer of genetic info from the same generation, not from desendants.
transformation
transduction
conjuation
Transformation- naked DNA in environment- taken up directly
transuction- virsus infect bacterium once produced picks up a little bit od DNA and put into another virus
conjuation- transfer
example?
Recombinant Dna 2
notes:
Transformation- how to make prone to take up DNA (compotent cells)
chemical transformation- chilling cells down (contains calcium chloride makes membrane more permiable)
then heat shock to prompt DNA uptake (30 seconds)
electroporation
purify cells to remove ions
high voltage shock make holes in membrane
after DNA uptake, cells have recovery time before selection
after transformation
(need to grow them, grow with correct plasmid)
-spread bacteria culture on agar plate (single bacteria)
-dots on plate show colonease and reproduce these are the same. (steralile)
screen colonies
select bacteria that has acquired the plasmid (vectors contain antibiotic resistance genes)
couple plasmid with antibiotic you have in screening
grow bacteria on an antibiotic, if the original vector has antibiotic resistance it will survive and everything else will die
tectracycline (antibiotic)
blocks translation
white/blue colonies
bacteria can transcibe messenger RNA that contains multiple genes
screening-chose the white
agar plate
xgal to the plate
any bacteria with b-galactoisidase will break down xgal making colonies blue
if gene in multiple cloning site, plasmid contains insert will be white.
if the b-gala will be disrupted and therefore blue (no insert)
screen for correct insert
go to primers next to the insert and PCR amplfiy then run on gel
check for the size of the insert on all colonies
culture colonies on a large scale
engineering with restriction enzymes pros and cons
need restriction sites in the right placen
need highly purifed enzymes
advantages:
can isolate and fuse fragments independently of restriction sites
disadvantages:
length constraints
error rate
GC content (too high PCR not effecient)
difficult to join more then one gene
how to insert more then one gene into a vector, or join multiple fragments
(molecular cloning methods)
differernt types of cloning
#restriction enzyme
#PCR cloning
TA cloning
Zhou and Gomez-Sanchez (2000). Universal TA Cloning. Curr Issues Mol Biol 2(1): 1.
e.g. https://pcrbio.com/applications/pcr/ta-cloning/, https://www.thermofisher.com/order/catalog/product/K457502
taq polimerase is cheap and adds an overhanging a on 1 of the 2 strands.
pcr amplify insert, and incubate with deoxyadinenes to create the overhanging a
use plasmid with overhanging t
a and t’s will bind= efficient cloning
TOPO TA cloning
(common ta cloning)
conserved priming site, dont need to make new primers
topo-isomerase (way dna is folded)
plasmid is sold with a correct overhanging t with an enzyme (topo) attached.
opens the plasmid, insert anneals with and t and then ligase activity puts it back together.
#dont need many restriction sites as topo is there. Has some so insert can be cut out later if needed.
why are all the elements there, why are they useful?
Advantages and disadvantages of some cloning methods
blunt ends- less effecient then a sticky ends (non-directional cloning)
#gateway cloning
#gibson assembly
gibson assembly
Gibson et al (2009). Enzymatic assembly of DNA molecules up to
several hundred kilobases. Nat Methods 6: 343.
2 different needing to be joined together.
advantages-
- all reactions happen in one tube at one temperature
different pieces of dna at one temp and end up with one molecule
- seamless joining of any DNA fragments
- dont need RE
-mulitple fragment joining in one step
-cheaper then starting from scratch
disadvantages- difficult to create primers
few fragments at a time (5/6)
always need to start from a template
Based on PCR therefore still has limitations of PCR
2 different needing to be joined together.
Gibson assembly
design primers to amplify GFP (DNA 2) but foward primer needs to contain same sequence as end of first PCR product (rev)
attaching dna to GFP
first create primers
#PCR-amplify left fragment (DNA 2)
PCR product has primers either end
Reverse primer for GFP doesnt matter
add a 5-3 exonuclease
cuts off nucleotides from this section (digests a bit of both fragments)
ligase then acts on binding it all
What to do with cloned genes?
why use molecular cloning, what are the applications?
produce a recombinant protein
(for localisation)
need to produce a lot of protein
study certain diseases in an organism (animal testing)
study how proteins interacting together
codon optimisation
which codons to choose?
Figure 10-27: The genetic code.
Griffiths et al. (2000). An Introduction to Genetic Analysis. 7th Edition. New York: W. H. Freeman
what codon useage does the host organism use?
-genetic code is redundant-certain triplets can code for the same protein
-different organisms have different tRNAs
#which codons are more used
types of vectors?
a dna molecule that you can put another molecule into.
-plasmids: size? 5-10kb
-cosmids- bacteria phage 30-40kb
(used for packing)
-BACS
-PACS
-YACS
(UP in size)
Next gen vs sanger sequencing
next gen pros
- higher output
-can do lots (parallel at same time) - requires less prep (library)
-low cost per big volume of data
-more data less time quicker
-sequencing in different ways
-produce more errors then sanger (assembly)
-still under development
-produce long reads
-require big computers
-needs specalist knowledge
-data storage issues
sanger- 96 max
how to sequence a genome
(shotgun compared to libraries)
Book-shred and sequence- put back together
What are libraries?
what makes a good library?
Library- comprehensive collection of clone DNA fragments- ligated into a vector. (Includes all key elements, like ori) characteristics needed- no empty vectors.
-insert is not modded
-no empty vectors
-not multiple inserts
Figure 7-3: General procedure for cloning a DNA fragment in a plasmid vector.
Molecular Biology of the Cell. 4th Edition. New York: Garland Science
www.nature.com/nrg/journal/v2/n8/fig_tab/nrg0801_573a_F6.html
shotgun sequencing:
-steps
-coverage
Cutting/fragment to start of the same size, sequence each of the fragments, look for overlaps of the same letters- align the k-mers (length of the fragment) then can read the sequence. How many times each positon has been sequenced (in the final assesmbly) coverage-how many times each bit is identified. Gaps.
coverage- the more time bits are represented the higher coverage
sequencing types/technologies
Types
Illumina- mysig small sequencer (bench top sequencer)
How it works: output of Illumina are pictures
1)Prep the sample (library)- put a few molecules at the end (taq) once end is used to attach dna piece to a flow cell (slide) small oligonucleotides attach to cell and used as an anchor for molecule you have made. Allow your molecule to bind to the slide.
Attach molecule you want to sequence to the slide.(all are fluoresent)
make a cluster of these molecules so they can be perceived more easily.
dideoxynucleotide wash-
Sequence the complimentary wash away the og strand
sequence short fragments including oligos, 150bp sequenced.
Sequence a fragment by both ends (pair end sequencing) or single end.
Issue- small fragments hard to put together
pacbio SMRT.
single molecule real time sequencing
Flusberg et al. (2010). Direct detection of DNA methylation during
single-molecule, real-time sequencing. Nature Methods 7: 461.
Smrt bell template- one of each molecule in small pores. Camera can see light for small pores.
sequencing through synthesis-
fragment blocked on one side- The fragment of dna (circle) can go through polymerase a number of time.
t= orange etc
Depending on which light is read by the camera can determine which nucleotide has gone through.
Why? Can sequence much longer fragments, lag time between fragments-vv can read epigentics as well but if theres mods between bases.
How bases are modded. Lag between reading.
oxford nanopore
(minION)
Advantage- sequencing in the field.
how to annotate a genome?
genome content?
protein coding genes
rna genes
NTR (promotors, mobile elements)
initital gene annotation:
-look for open reading frames
-compare sequence to known sequence.
-Compare transcriptombe from same species or similar taxa that looks similar
What can you do with NGS?
Phylogeny
Functional characterisation
Epignentics
compare genomes
detect variants
population structure
application of NGS example
Rutter GA (2014). Understanding genes identified by genome-wide association studies
for Type 2 diabetes. Diabetic Medicine 31(12): 1480.
enabled the heritable nature of type 2 diabetes to be explored. between families and genome-wide association studies.
500 genes identified
Comparative genomics
what is it used for?
compare genomes if theres similarities it shows that regions are conserved, and that they are beter with then without.
conserved regions will show what needs to be targerted.
shows changes over time
genomes sizes and number of genes (example of comparative genomics)
Figure 1-38: Genome sizes compared.
Alberts et al. (2002). Molecular Biology of the Cell. 4th Edition. New York: Garland Science
size of genome (number of nucleotides) in different species.
human vs mouse genome
first example of comparative genomics
Lander et al. (2001). Initial sequencing and analysis of the human genome. Nature 409(6822): 860.
drosophilla example
Schaeffer et al. (2008). Polytene chromosomal maps of 11 Drosophila species: the order of genomic
scaffolds inferred from genetic and physical maps. Genetics 179: 1601.
how to study this?
alignments- looking at whats conserved across organisms will show the function and the similarites between organisms.
looking at entire mitocondrial genomes
comparing fruit fly genomes- found information about the start of genes
Lin et al. (2007). Revisiting the protein-coding gene catalog of Drosophila
melanogaster using 12 fly genomes. Genome Research 17: 1823.
looked at protein coding in genes in different drosophilla species and found what was conserved or not.
They corrected mistakes where cds was then started.
chromosome 19 (example)
Harris et al. (2020). Unusual sequence characteristics of human chromosome
19 are conserved across 11 nonhuman primates. BMC Evol Biol 20: 33.
certain characteristics are conserved and the function isnt obvious but the conservation shows it is.
conservation of genomic regions
(medicine)
Maher & Wilson (2012). Chromothripsis and human disease: piecing together the shattering process. Cell 148: 29.
chromotripsis- rearrangment of chromosomes that result in disease.
In cancer cells, chromosomes are rearranged.
certain stresses (chemical, uv) causes partial fragmentaion, then they are re-arranged wrong.
comparative genomics in other organisms (ensembl only shows ensembl)
synthetic genomes