Genomics Flashcards
What is T m
Melting temperature, Point at which 50% strands separate
Half of max y axis, then measure across and go down to x axis
What is hyperchromicity
When single stranded DNA absorbs UV light to a greater extent than double stranded DNA
What happens under high stringency
Only complementary sequences are stable, determined by temp near ™ or low salt concentration
why is Genomics important, why study it?
We are now able to treat monogenic diseases such as sickle cell disease. We are able to find the point mutation and nucleotide it affected including the amino acid it was on. Can be treated by stem cell transplantation, but only for small number. Targeted genome editing can provide a permanent cure by altering mutation in stem cells that can be transplanted.
meaning of omics
Omics aims at the collective characterization and quantification of pools of biological molecules that translate into the structure, function, and dynamics of an organism or organisms.
what is genomics
the study of the entire DNA sequence that contains the complete set of genes for an organism
what is genetics
the study of how traits are passed down the generations and the role of genes in that process
transcriptome
the total RNA content in cell produced by transcription
proteome
the total protein content in cell produced by translation.
meaning of transcriptomics
study of all RNA transcrips produced by a cell, tissue or organism
benefits of using microarrays rather than next generation sequencing
- cheaper than NGS as microarrays cost £10-100, whereas NGS costs £100 to £1000
- GWAS is carried out using this technology (genome wide association study).
mitochondrial genome
16kbp, many diseases associated with variants
epigenome
changes in marks on the DNA strand or in histones, has some disease associations
metagenome
genomes of all the organisms from a specific location. Has some disease associations.
microbiome
all organisms in a specific location, eg microbiome of gut
meaning of recombinant
containing different combination of alleles.- produced by combining genetic material from different places.
difference between pyrimidine and purine
pyrimidine have one nitrogen ring, purines have 2 nitrogen rings.
what watson crick pairing is stronger, GC or AU
GC because they have 3 hydrogen bonds.
what bonds are within base stacking of DNA
HYDROPHOBIC interactions, arrangement of bases set above each other internalised to the structure and excludes water.
BONDS IN DNA
Hydrogen between base pairs, phosphodiester bonds between sugar phosphates, hydrophobic interactions, arrangement of bases above each other internalised to the structure, and excludes water. Van der Waals forces, individually small but contributes to the stability.
denaturation of DNA
conversion of a double stranded molecule into a single stranded molecule. It is by disruption of hydrogen bonds within the double helix, occurs when DNA in solution is heated, can also be induced by strong alkali or urea. On denaturation it forms a randomly structured coil, moving and changing shape constantly.
What factors does Tm depend on?
- number of hydrogen bonds,
- GC content (GC have an extra hydrogen bond, hence the more GCs, the more hydrogen bonds contained within the structure.
- length of DNA molecule , however little further contribution beyond 300bp (on graph it begins to saturate).
- salt concentration
- pH (alkali is a denaturant)
- mismatches (unmatched base pairs)
What is the effect of increasing salt concentration on base pairing
high salt reduces the specificity of base pairing at a given temperature, so a duplex containing mismatches can form and be stable at a given temperature in the presence of high salt concentration, whilst the same duplex would be unstable and dissociate at the same temperature in low salt.
examples of chemical denaturants that disrupt hydrogen bonds?
Alkali, fermamide, urea
why does alkali disrupt hydrogen bonds
NaOH= Na+ + OH-, where the OH- ion disrupts H bond pairing. Fewer hydrogen bond means lower stability of the structure and so lowering Tm.
mismatch
base pair combination that is unable to form hydrogen bonds.
effect of mismatch on DNA stability
reduces number of hydrogen bonds, hence lowers stability and Tm, distorts the structure and destabilises adjacent base pairing, can lead to zipping and unzipping,. It also ,akes the formation of a duplex less energetically favourable, reducing the change in free energy on duplex formation. It also creates shorter contiguous stretches of double stranded sequence, leading to a lower Tm.
what is reverse of denaturation called and what factors cause it
renaturation, caused by cooling or neutralisation.
how to prevent mismatches forming between 2 molecules
performing a hybridisation at the Tm of the duplex molecule.
Stringency
the concept of manipulating the conditions to select duplexes with a perfect match only. Manipulating the conditions meaning to limit hybridisation between imperfectly matched sequences allowing us to manipulate specificity–>changing temperature or environment, reducing amount of denature element.
low vs high stringency
low stringency is high salt and low temperature, high stringency is promoting the formation of non mismatched base pairing.
examples of nucleic acid based techniques
northern blotting, southern blotting, microarrays, dideoxy and next gen sequencing, PCR, cloning.
nucleic acid hybridisation techniques
Identifies the presence of NA containing a specific sequence of bases. Allows the absolute or relative quantitation of these sequences in a mixture.
what is a probe
Probes are usually between 20-1000 bases in length, depending on the technique it is used for. A probe is a sequence that uniquely identifies specific sequences, which under high stringency conditions form a duplex.
Nucleic acid blotting techniques (northern/southern blotting), disadvantages
Analysis of mRNA or DNA, can be used to identify specific RNA. Limited technique, only detects one gene at a time and small numbers of samples. It is very time consuming and messy, hence is largely superseded by quantitative PCR or microarrays.
Process of northern or southern blotting
uses DNA or RNA (that is extracted) respectively that is separated by gel electrophoresis which is then transferred by mass capillary flow to a nylon membrane. It is covalently bonded to the membrane and then hybridised with a labelled probe to mRNA transcript in sample. Nylon or nitrocellulose membrane captures nucleic acid.
Microarrays
An ordered assembly of thousands nucleic acid probes. The probes are fixed to a solid surface, then sample of interest is hybridised to the probes. It simultaneously measures 50,000 different transcripts in a cell, tissue or organ.
what are microarrays used for
gene expression profiling, comparison of drug treated and untreated cells. RNA is extracted, labelled, hybridised to the array and the amount and location of the label measured. This tells us how much of each and everyone of the transcripts in the human genome are being expressed. They can also be used to assess the presence or absence of millions of individual SNPs simply through hybridisation of genomic DNA to an array. used in genome wide association studies, assess millions of SNPs through hybridisation of DNA to an array
what does GWAS stand for?
Genome Wide Association studies (GWAS).
what is the formula for GC percentage
(G+C)/(G + C +A + T) X 100
Nucleic acid hybridisation techniques
Identified the presence of NA containing a specific sequence of bases
Allows the absolute or relative quantisation of these sequences in a mixture
Disadvantage of mucking acid blotting techniques
Analysis of mRNA or DNA
Limited technique, only detects one gene at a time and small numbers of samples the gel based techniques are time consuming and messy
Largely superseded by quantitative PCR
Antibiotics
Substances produced by fungi which are toxic to bacteria but not fungi are called antibiotics
What does an exponential amplification require
2 primers corresponding to ends of sequence
What is PCR
pcr is an enzyme based method to specifically amplify segments of DNA using a thermal DNA polymerase in a cyclical process.
what is a chain reaction
A chain reaction is a series of events each one of which is dependent upon the preceding event to sustain itself. a series of reactions that lead to an exponential increase in the number of events occurring in a sequence.
role of DNA dependent DNA polymerase
It recognises a specific structure consisting of a partially double stranded DNA forming an initiation complex with it. It then extends a partially double stranded molecule from the 3’ end of the non-template strand.
How do we ensure that annealing occurs rather than renaturation in PCR
provide huge excess of the primer, ensuring template is in low concentration at start.
Enzyme used in PCR and its properties
DNA dependent DNA polymerase. It synthesises a new nucleic acid strand by copying a DNA molecule. It cannot copy RNA nor make RNA. RNA must first be copied to DNA by reverse transcription before it can be amplified by PCR.
What factors does DNA dependent DNA polymerase require in PCR
A template strand with a primer (20-10 bases long) annealed to it, Deoxy nucleotide triphosphates (dATP, dGTP, dCTP, dTTP), Magnesium ions are required as a cofactor for the enzyme, a roughly neutral pH.
What are the 3 stages of PCR
Denatured (template becomes single stranded)
Annealed (formation of initiating template/formation of a duplex with the primer and template strand)
Native state (optimal conditions for the extension of the initiation complex and enzyme activity, inc temp and pH).
property of the polymerase used in PCR
Must be thermostable, derived from a thermostable bacteria called Thermus aquaticus (Taq polymerase)This is because for PCR to work, the reaction must go through multiple rounds of extreme heating and cooling.
meaning of thermostability
able to retain activity, upon repeated heating to temperatures that would ‘destroy’ most enzymes.
Process of PCR
- Mix all the reactants into PCR, enymes, reactants, excess of primers
- Start cycle, first with denaturation, where you heat the PCR to 95 degrees, to denature the template strand and break the hydrogen bonds between bases.
- Begin to cool the reaction, to a temperature of 55 degrees Celsius, to allow annealing of primers, so primers can bind to template strands. Each primer binds to two ends of the DNA, and to the corresponding template strand they are complementary to.
- Change the temperature again to 72 degrees, which is optimal for DNA polymerase to work. An initiation complex is formed, which elongates from 3’ end of primer, creating a second strand.
- Continue to repeat the process
How to calculate product in each cycle
Every cycle results in a doubling of the product, thus there is an exponential accumulation. 3rd cycle is 8, 10th cycle is 1024, 30th cycle is one billion
PCR applications
Diagnostics-routine diagnostic tool used for identification, confirmation and quantification of specific DNA sequence. Eg, presence absence calling TB, detection in sputum, determining treatment response/ drug efficacy. Differentiating between closely related organisms, swine flu, vs human influenxa, both H1N1 subtypes. And How much determine when treatment might be commenced, HIV viral load.
What are real time PCR or quantitative PCR?
Different quantitative PCR detection methods used for diagnostics.
How to detect SNP using PCR
Adaptations of quantitative real time PCR. These methods depend upon the differences in the melting temperature™ conferred upon short sequences of DNA by their nucleotide composition. They rely upon differences in the Tm of a duplex containing a single nucleotide mismatch (single nucleotide polymorphism).
PCR in foresnsics and law enforcement
amplification of genetic markers:
- parentage or kinsship: immigration and inheritance
- identiffication: millitary casualties, missing prsons or environmental disasters
- matchiing 2 sources: crime scene
- authentification of biological material: cell lines, purity of food.
STRs
short tandem repeats,
2-5 bases in length, repeated many times at specific locations in the genome. Many different STRs are found scattered around the genome. UK data base consists of database to identify individuals with 10STRs. They are highly polymorphic, the number of repeats varies between individuals. They provide a pattern of uniquely sized products accordedby each individuals genome providing a molecular bar code, or DNA fingerprint. highly polymorphic (vary for each individual) but are inherited and are similar between siblings and parents.
how many STRS does UK DNA databasse contian
10STRS and each STR will differ in size, giving 20 numbers and a gender indicator together, they give a matching probablility of around 1 in 1 billion.
why is PCR used prior to NGS
SO NGS can simultaneously sequence large number multiple PCR products of candidate cancer genes.
Exxamplles of use of PCR
- NGS, simultaneously sequencing large number multiple PCR products of candidate cancer genes.
- isolating individual segments of DNA prior to cloning or sequencing.
- manipulating and modifuing DNA, introducing mutations into a sequence of DNA. Modifying the ends of a sequence to make them contain restriction sites compatible with cloning vectors.
- PCR is one of the most commonly used and important tools used in recombinant DNA technology. Eg developing recombinant vaccines, pharmaceuticals, (interferons, clotting factors, tissue plasminogen activator).
Define Creationism
The idea that species are made by a supernatural intelligent creator.
what 2 things does science have to assume
Natural phenomena have natural explanations, which can be studied by scientific experiments.
what does a scientific theory need?
Make testable predictions, stand or fall according to whether the predictions are confirmed or refuted. Popper: ‘a scientific theory must be falsifiable’
what is relative fitness?
The average number of surviving progeny of a genotype (compared with competing genotypes) after one generation.
w<1 and w>1 for relative fitness
If w<1, the frequency of the allele; will decrease with each generation until the allele disappears (negative selection). If w>1 the frequency of the allele will increase with each generation, until the allele reaches fixation (positive selection).
examples of how small mutations and large mutations can occur
small: base substitutions, small insertions and deletions. Large mutations are via large dna duplications, large deletions, insertion of transposable elements, viral insertions, chromosome rearrangements.
How does gene duplication drive evolution?
Gene duplication is a major driving force of evolution. Once a gene has been duplicated, one copy can continue to maintain the original function whilst the other can evolve new functions. There are likely to be changes both in the coding sequence (in amino acid sequence) and in control sequences.
how is it possible that y genes can be expressed during foetal life and B genes are expressed during postnatal life?
Promoter duplicated along with coding sequence
Promoter sequence has evolved so B and Y promoters now bind different transcription factors
Interact differently with gene enhancers
Differential control of B and Y genes
what is a pseudogene?
Gene that cannot make a functional protein. However it is a duplication of B-globin gene. hence one copy can maintain original function whilst the other can lose all function.
Fanconi’s anaemia
Recessive lethal genetic disorder, most affected patients die of bone marrow failure during childhood. Do not reproduce. Gene arises by random mutation, eliminated by natural selection, very low allele frequency.
what is modern synthesis
modern synthesis refers to the combination of natural selection with mednelian genetics. Evolution can be seen as a logical consequence of Mendelian inheritance and ecological competition.
what types of genes does gene duplication create?
A redundant copy of gene, which can evolve to gain new functions eg globin genes. But other duplicated genes may become pseudogenes.
What phrase explains why many genetic diseases are extremely rare?
Evolutionary theory explains why many genetic diseases are extremely rare, and how others are maintained at higher frequencies by positive selection, particularly by heterozygote advantage.
Why does log base 2 and linesr curve show plateau of PCR products
As reaction progresses, we get acidification as we are producing hydrogen ions, due to addition of dAMP (as elongation occurs). Also producing pyrophosphate. Each cycle is incorporating primers into the reaction and product, hence we are depleting the primers that are present, and increasing the template concentration. As a consequence, we are changing environment in which polymerase is working. AS A CONSEQUENCE OF acidification and depletion of reactants, the kinetics change and the reaction progresses where we have a plateauing and are no longer producing product. Green is change in kinetics where reaction cannot change place.
Hence reduce how polymerase is working, so kinetics change, and reaction progresses to a point where we have a plateau.
Eventually reaction doesn’t occur.
How would you use PCR to identify TB
Presence absence calling TB - detection in sputum, determining treatment response/drug efficacy. If you take a sample, perform PCR with primers specific to TB, you can identify the presence or absence of specific DNA segments that correspond to TB and identify the presence or absence of that organism. within the sample.
How would you use PCR to differentiate between organisms
differentiating between closely related organisms “swine flu vs human influenza” both H1N1 subtypes. Allows us to understand the epidemiology and how to. Treat them,.
How would you use HIV to determine how much treatment
Use it to determine when to commence treatment, with HIV WE DO not treat until HIV viral load increases to a certain level. Then we commence treatment. How much: determine when treatment might be commenced, “HIV viral load”
We can also monitor using HIV load assays to determine when we have a failure in treatment of HIV and we have emerging resistance as a consequence of mutations within a population of organisms present in an individual. HIV viral loads are done routinely in order to determine when and how we treat an individual.
How do we perform quantitative or real time PCR
We have a serial dilution of template of known quantity and as a consequence of performing that, we can perform our assay and compare our assay results to these and therefore identify amount of template that we started off with. These techniques use fluorescent detection of the accumulation of product.
The crossing point of the amplification is determined and is proportional to the template concentration at the start.
SNP detection from PCR
HIGH RESOLUTION MELTING, perform PCR at the end of PCR we perform a melt curve, heat reaction up and slowly cool it and measure the annealing of the template. In order to determine presence or absence of particular SNP, the Tm is effected by the particular sequence within the amplicon, and therefore we obtain different curves. Right hand side diagram, are a number of different curves which describe different variance within particular amplicon. By comparing particular curve of a sample amplicon against that of known variance, then we can identify the particular SNP that is present, within the segments.
Allelic discrimination: specific binding of the probe to the amplified region containing the SNP is detected.
what is allelic discrimination, or probe based version of qPCR
pROCESS where specific binding of the probe to the amplified region containing the SNP is detected.
what does HRM detect
Tm of the amplified product is used to determine which sequence is present.
what does it mean if STRs are polymorphic?
They are highly polymorphic meaning that they vary from one individual to another but are inherited and are similar between siblings and parents
how does the UK dna database identify individuals
should be on email.
what is synonomous substitution
Synonymous substitution a mutation substitution that doesn’t cause a change in amino acid sequence
Non-synonymous substitution
Mutation that does cause a change in amino acid composition.
Sickle cell Anaemia
Point mutation in the β globin gene
Single amino acid substitution
a hydrophilic a.a. (glutamic acid) is replaced by a hydrophobic a.a. (valine) at position 6
The crystals damage the red cell membrane resulting in
Cell lysis causing anaemia
Cell adhesion, causing blockage of small blood vessels, followed by tissue infarction
what does relative fitness determine?
Relative fitness, w, will determine whether the frequency of an allele increase or decreases over generations
Alpha 2 beta 2
Adult hb
Alpha 2 delta 2
Minor adult hb, less than 1% in us
Alpha 2 gamma 2
Foetal hb
What is NAH
Mixing DNA from two sources that have been denatured by heat or alkali to make them single stranded , then under certain conditions allowing complementary base pairing of homologous sequences
In what way are single stranded DNA sequences listed?
5’ to 3’
what are histones
basic proteins that bind DNA. Eight histones form the nucleosome. Histone 1 binds the linker DNA.
exome
The sum of all the gene sequences
cis linked
regions physically close to the exons on the DNA strand. Contrast with trans regulatory regions that can be on different chromosomes.
size of human genome
3 x 10 ^9 - 3Gbp
where are pseudo genes found in DNA
Intergenic region
What are the three RNA polymerases and their role in getting transcription?
RNA polymerase I-needed to transcribe rRNA genes
RNA polymerase II-needed to transcribe mRNA
RNA polymerase III-needed to transcribe tRNA and other small RNAs
Introns in genes
vary in number 0-311
vary in size 30bp to 1Mbp
some introns contain other genes
enhancers
upregulate gene expression-they are short sequences that can be in the gene or many kilobases distant. They are targets for transcription factors (activators)
silencers
downregulate gene expression. They are also position-independent and are also targets for transcription factors (repressors)
Insulators
short sequences that act to prevent enhancers/silencers influencing other genes
3 stages of modification of eukaryothic mRNA
Capped at 5’ end (methylated cap)
Polyadenylated at 3’ end
Intervening sequences (introns) removed
alternative splicing
Exons can be skipped or added so variations of a protein (called isoforms) can be produced from the same gene
process in which processed pseudogenes are copied from mRNA
Retrotransposition, as a result they have no promoter or exons
How does UK DNA database allow us to find individuals
Since each of the alleles for eg CSF1PO on chromosome 5 may contain between 6 and 15 repeats this gives for the first STR two sets of numbers between 6 and 15 that makes 10 possible numbers for the first allele and 10 possible numbers for the second allele ie 100 different combinations, if we then add in VWA it can have between 11 & 24 repeats similarly two sets of 14 numbers 192 combinations , combining these two STRs we have 19,600 possible combinations from just 2 STRs . If we use 10 STRs and a gender identifier we have more than 1 billion combinations. Hopefully you get the idea. In the UK DNA database there are more than 6 million individuals.
PCR of the 10 STRs is done in the same way, primers flanking the STRs are used one of these is labelled, The PCR products are separated by capillary gel electrophoresis and the size of the product determined, this allows in turn the determination of the number of repeats in each STR; these numbers are combined and compared to the database to try to find a match.
why is it possible that the PCR product may not be a multiple of 4 even if the repeated sequence consists of 4 bases
The primer will not be directly next to ie immediately flanking the STR see the graphic below. From the slide you can see that TH01 is a repeat of 4 bases with a minimum of 4 bases but the minimum amplicon size given is 163 bases. So there are an additional 147 bases between the ends of the STR and the ends of the primers
percentage of our genome that codes for protein
2%
what are major macro-level differences associated with?
Disease (aneuploidy, translocations, etc)
what are micro or molecular level pathogenic differences associated with and give example of these differences
Micro-molecular level pathogenic difference is sometimes associated with disease (point mutation and SCA, 3bp deletion in CFTR)
WHAT IS polymorphic
any position in the genome that varies between individuals is considered polymorphic=a variant
if we compare human genomes, how many SNPs will we find?
single base differences once every 300 bases
how are SNPs made
They are typically generated by faulty replication of DNA during mitosis. Although here e mismatch repair mechanisms, these mistakes do not get repaired.
what is polymerase slippage?
it is when the polymerase sometimes slips fom the template strand during replication It is this event that holds the lead tocodon expansions.
describe the polymerase slippage model
If the polymerase slips, it causes the new strand to unpair (release) from the template strand. If the polymerase slips, it causes the new strand to unpair (release) from the template strand. If the slip occursat the templates codon repeat region of the Huntington gene, then when the new strand tries to reattach to the template strand, it will have many identical copies of codon to choose from.
snv
single nucleotide variant, change in nucleotide that is not corrected. SNVs may be in a gene, promoter, non coding region. when pathogenic, may call point mutations. base substitutions, generally bi-allelic, due to mutation and mismatch repair. May do nothing, may affect trait or be associated with disorder.
why are mutations common in african population?
beneficial in places where malaria is rife (heterozygote advantage)
mutation
new allele arises, we now have a variant
gene flow
migration leading to introduction of that variant into another population
genetic drift
random change in variant allele frequency between generations
selection
non random change in variant allele frequency between generations because presence of one allele/genotype is pathogenic (negative selection) or beneficial (positive selection)
what does repeat in tandem mean?
one after the other.
copy number variants
intergenic, quite large.
variant effects
Can be beneficial
Can be pathogenic
Most are neutral
Are these of any use?
Yes, can be used as markers to help find disease-causing genes and mutations
Autozygosity mapping & linkage studies (Microsatellites, SNPs)
Association analysis (SNPs, CNVs)
allele
An allele is one version of a particular position or locus on the genome
what is a locus
unique position in genome. A single base to entire genomic region.
what is an allele
particular form of a specific locus. A single base to entire genomic region. An individual has 2 alleles for any autosomal locus.
2 possible alleles
biallelic
3 possible alleles
triallelic
greater than 3 alleles
multiallelic
what are pedigree drawings?
standardised set of symbols
representation of pedigree drawings
males are squares, females are circles. Partners have a line between them. Siblings have a line above them. there is a line down for children. Affected people are shaded. Carriers have dots in them
what does a double line between male and female represent?
a consanguineous couple
what does sb on pedigree diagram mean
stillborn baby of unknown sex
what is the risk of child having a disease from autosomal dominant parent
50% for each child.
meaning of penetrance
percentage of individuals who carry the mutation and develop symptoms of the disorder. Many dominant disorders show age dependant penetrance.
meaning of variable expressivity
variation in severity/symptoms of disorder between individuals with same mutation.
new mutation rate
de nova mutation rate varies considerably between AD conditions
somatic mosaicism
new mutation arising at early stage in embryogenesis. It is present in only some tissues/cells.
germ line mosaicism
gonadal mosaicism. a new mutation arises during oogenesis or spermatogenesis. Mutation present in variable proportion of gametes; can be transmitted to offspring.
anticipation
worsening of disease severity in successive generations; characteristically occurs in triplet repeat disorders.
describe autosomal recessive inheritance
Manifest in HOMOZYGOUS/ COMPOUND HETEROZYGOUS form
Carriers (heterozygote) not affected
Both sexes affected
Male to female and female to male transmission
Usually one generation affected
May be consanguinity
e.g. cousin marriages
compound heterozygote
2 mutations in the same gene. mutations are different
compound homozygote
2 mutations in same gene, identical mutations
X linked inheritance
women have 2 X chromosomes, so they have two copies of X-linked genes. Can be homozygous or heterozygous. Men have one X and a Y, therefore only a single copy of X linked genes, are hemizygous.
what is skewed X inactivation
normally the majority of genes on one of a woman’s X-chromosomes are inactivated
generally random but ~10% of women have uneven or skewed X-inactivation.
what are manifesting carriers
some women have some symptoms in X-linked recessive conditions e.g. cardiomyopathy in DMD.
Y linked inheritance
always and only passed from fathers to sons
what is a pathogenic mutation
results in an alteration of the function of the gene product and can cause a disease phenotype
What are isoforms
Variations of a protein
Sense and antisense strand of dna
Sense strand is the rna chain being naked and the template strand is anti sense strand. Only one dna strand of double helix acts as the template strand
How do bacterial plasmids act as vectors ?
Using PCR to amplify DNA, restriction enzymes to cut it and DNA ligament to re join it we can manipulate DNA, make and insert recombinant genes into plasmids.
We can then transfixed the bacteria where the plasmids will replicate and be maintained.
We can isolate them which will express the recombinant gene
what is recombinant protein
recombinant proteins are proteins that have been generated from vectors to be produced in large quantities for manufacture use
what are transgenic organisms
organisms that have altered genomes
what are nucleases
enzymes that degrade nucleic acids by hydrolysing phosphodiester bonds.
Ribonuclease RNase: degrade RNA
Deoxyribonuclease (DNase): degrade DNA
Exonuclease: degrade from end of molecule
Endonuclease: cleave within nucleotide chain.
what are restriction endonucleases
restriction: limit transfer of nucleic acids from infecting phages into bacteria. There are many different enzymes from different bacteria. They do 2 things, they recognise a specific sequence, and they cut that sequence.
what do restriction enzymes do?
restriction enzymes recognise specific DNA sequences, and they catalyse the hydrolysis of phosphodiester bonds.
what are restriction maps
map of restriction sites within a molecule. they are a useful way of describing plasmids
role of dna ligase
creates phosphodiester bonds
phosphatase enzyme
hydrolyses a phosphate group off its substrate. Calf intestinal alkaline phosphatase, or shrimp alkaline phosphatase. It should be used to prevent cut plasmids from resealing.
role of polynucleotide kinase
adds phosphate to 5’ hydroxyl group of DNA or RNA.
Why use a polynucleotide kinase?
To phosphorylate chemically synthesised DNA so that it can be ligated to another fragment.
To sensitively label DNA so that it can be traced using:
-radioactively labeled atp
-fluorescently labeled ATP
reverse transcriptase
RNA dependent DNA polymerase
Isolated from RNA containing retroviruses.
Synthesises a DNA molecule complementary to mRNA template using dNTPs.
phages
bacterial viruses eg lambda
non-primate lentiviruses
vectors used to integrate dna in mammalian cells
Baculoviruses
vectors used in combination with recombinant expression in insect cells (a eukaryotic expression system)
vectors
cut down version of naturally occurring plasmids and are used as molecular tools to manipulate genes.
what are the important features of plasmid vectors
they can be linearised in non essential regions of DNA
Can be re-circularised without loss of ability to replicate
Replicate at high copy number
contain selectable markers such as antibiotic resistance eg ampicilin or tetracyclin. They are relatively small often between 4 &5 kilobases.
How do we produce recombinant proteins in bacteria
We use them to investigate their properties
To develop and produce therapeutics
How to plasmids allow us to add functionality for example
They allow us to:
Express a recombinant gene in an organism of our choice (prokaryote or eukaryote)
Modify its control elements eg switch it on or off at will, or express it at high levels on demand
Alter the properties of a gene product
-to make it secreted extracellularly or into the periasmic space
Add a peptide tag or join it to another protein
Make it useful as a therapeutic
How much are recombinant proteins or peptides used in bio pharmaceuticals
30%
Examples of recombinant proteins in clinical use
Human insulin
Interferons a and b
Erythropoietin for kidney disease and anaemia
Factor XIII
Tissue plasminogen activator -embolism, stroke
What is coding sequence in a gene
Coding sequence is the part of the genes coding for the protein not including UTRs not any intronic or regulatory sequences such as promoter nor enhancers
What are shine Dalgarno sequence
It is the ribosomal binding site found around 8 nucleotides before the start codon in the RNA in prokaryotes. Remember that the RNA of this group of organism is not capped.
Two types of promoters
Constitutive: always on. Allows a culture of cells to express the foreign protein to a high level
Fine if the protein isn’t toxic to E. coli
Bad idea if it is
Inducible: molecular switch
Allows large cultures to be grown without expressing the foreign protein
Induced in response to a defined signal
Describe the use of inducible promoters as transcriptional depressors
Typically used lac operator which is de repressed by addition of lactose mimic IPTG
Why are some promoters best made in eukaryotes
Many pharmacologically useful proteins are heavily modified and will not be appropriately processed in bacteria
Eg interferons, usually by glycosylation. Some proteins retain biological activity and some don’t
Therefore they are expressed in a eukaryotic system
What is reduced penetrance a characteristic of?
Characteristic of dominant inheritance
What is the meaning of pedigree
Family tree
How do You know a disease is not sex linked?
If there is an equal pair of distribution between males and females
Role of restriction endonucleaseS
Recognise a specific sequence
Cut that sequence
autosomal dominant
Manifest in HETEROZYGOUS form Multiple generations affected Both sexes affected Male to female & female to male transmission Most will have an affected parent 50% risk to offspring
what is age dependent penetrance
Age dependent penetrated, someone might be heterozygous for an allele they have, are healthy then suddenly develop the disease
what does mosaicism mean
a mixed population.
what is pleiotropy
one gene influences 2 or more unrelated phenotypic traits.
dna polymerase slippage model
Sometimes the polymerase slips from the template strand during replication. It is this event, calledpolymerase slippage, that many researchers believe holds the key to codon expansions. According to thepolymerase slippage model, if the polymerase slips, it causes the new strand to unpair (release) from the template strand. If the slip occurs at the template’s codon repeat region of the Huntington gene, then when the new strand tries to reattach to the template strand, it will have many identical copies of the codon to choose from. With so many identical codon copies to reattach to, the new strand may reattach to the template at the wrong copy, usually one more distant than the copy that was adjacent to the polymerase before it slipped. As a result of this misplacement, the new strand forms a bubble of unpaired bases, which represents the expansion of the new strand. Once DNA replication is complete, an unknown mechanism allows the template strand to realign with the new strand and bring the bases from the bubble back into line with the template strand. The bases are then paired with their corresponding partner bases (cytosine (C) to guanine (G); adenine (A) to thymine (T)). In the end, the brand new double helix of DNA contains more CAGs in the repeat region of the Huntington gene than existed before. Polymerase slippage has caused expansion.
what is copy number duplication
The simplest type of copy number variation is the presence or absence of a gene.
An individual’s genome could therefore contain two, one, or zero copies.
describe non allelic homologous recombination
Driven by fact that you can get sequence similarity between different bits of chromosomes. When homologous chromosomes align, they are looking for sequence similarities, they are looking for their partner, but because the sequences can misalign, it can shift the chromosome. This is a problem as when recombination does occur, you can end up having a deletion on one, or a duplication on another.
what is allele
particular form of a specific locus
Single base to entire genomic region
what is locus
Locus = unique position in genome
single base to entire genomic region
Polymorphism
Any genetic variation. Different types of polymorphism are SNVs, microsafellites, CNVs etc
SNVs generated when mismatch relair goes wrong
What to call a genetic variat that is pathogenic
A mutation
Describe plasmids
Discrete Circular dsDNA molecules found in many but not all bacteria
Are a means by which genetic information is maintained in bacteria
Are genetic elements (replicons) that exist and replicate independently of the bacterial chromosomes and are therefore extra-chromosomal
Can normally be exchanged between bacteria within a restricted host range (eg plasmid borne antibiotic resistance)
describe features of plasmid vectors
Can be linearized at one or more sites in non-essential stretches of DNA
Can have DNA inserted into them
and can be re-circularised without loss of the ability to replicate
Are often modified to replicate at high multiplicity (copy number) within a host cell
Contain selectable markers
Are relatively small 4-5kb in size
why is the linearisation of plasmids important?
Plasmids can be linearised in one or more sites. If you cut DNA, then circularise it, then as a consequence of re-circularising it, then you can insert something accidentally. This can disrupt that particular segment of DNA, AND CAUSe a loss of function. I must be able to linearise the plasmid, cut a single enzyme at a single site. This means that DNA should be able to be inserted and still circularised and still function and importantly replicate.
define vector
a plasmid, phage or cosmid into which foreign DNA can be inserted for cloning
why do we use plasmids as recombinant tools
Expression of a recombinant gene in a living organism of choice, what the function of that particular protein might be.
It can be expressed in either a prokaryote or a eukaryote, eg.
Prokaryote or eukaryote
Add or modify control elements, that control expression of particular protein in plasmid or vector. And as a consequence…
Make it inducible (switch it on or off) or express it to high levels on demand. Or understand its regulation eg its own promoter or its own elements.
Alter the properties of the gene product
Make it secreted extra-cellularly or into the periplasmic space,
fuse it to a peptide tag or other protein, join other bits together to give useful properties
make it useful as a therapeutic
Make it into a fusion protein, make recombinant protein, which are proteins which may have different proteins. This can be done in order to make it easier to purify, or understand where it is in a cell.
Synagis -Respiratory Syncitial Virus Herceptin -HER-2 positive breast cancer Remicade (Infliximab) -Rheumatoid arthritis Humira (Adalimumab) -Crohn’s, Plague Psoriasis Xolair (Omalizimab) -Asthma
remember that
What control elements are required for expression in bacteria?
the coding sequence is the part of the gene coding for the protein not including the UTRs nor any intronic or regulatory sequences such as a promotor nor enhancers
These shine-Delgarno sequence is the ribosomal binding site found around 8 nucleotides before the start codon in the RNA in prokaryotes. Remember the RNA of this group of organism is not capped
The promoter is the gene element that is involved in regulation and initiation of transcription
The transcriptional terminator is a sequence that terminates transcription and initiates the dissociation of transcription
what is shine dalgarno sequence
These shine-Delgarno sequence is the ribosomal binding site found around 8 nucleotides before the start codon in the RNA in prokaryotes. Remember the RNA of this group of organism is not capped
constitutive promoter
Constitutive – always on
allows a culture of cells to express the foreign protein to a high level
fine if the protein isn’t toxic to E.coli
Bad idea if it is toxic to E.COLI as it will kill bacteria.
inducible promoter
nducible – molecular switch
allows large cultures to be grown without expressing the foreign protein,
induced in response to a defined signal. Therefore it can be switched on or off, can be expressed at high level before it kills organism.
what are the 2 tags used in gene fusions
Glutothianes transferase, and 6 histidine tag.
Differences with PCR and dideoxy chain termination
Does not have temperature changes
Only uses a single primer
Results in linear amplification
Does not regularly use thermos table polymerase
what are germ line and somatic mutations and de novo mutations
germ line: passed onto descendants, somatic mutations are not transmitted to descendants. De novo mutations are not inherited from either parent.
what is gene flow
the movement of genes from one population to another (eg migration). It is an important source of genetic variation
What is genetic recombination?
It is the shuffling of chromosomal segments between partner (homologous) chromosomes of a pair.
What is difference between mutation and polymorphism?
Mutation is rare change in the DNA sequence that is different to the normal (reference) sequence. A polymorphism is a DNA sequence variant that is common in the population-no single allele is regarded as the normal allele, there are 2 or more acceptable alternatives.
What is MAF to be classed as polymorphism?
equal to or greater than 1% of population.
what is haplotype ?
a group of alleles that are inherited together from a single parent . The order of alleles along a chromosome.
Mendelian/Monogenic diesease
Disease that is caused by a single gene, with little or no impact from the environment (PKD)
Non-Mendelian/Polygenic disease
diseases or traits caused by the impact of many different genes, each having only a small individual impact on the final condition (psoriasis)
What is Linkage analysis
A method used to map the location of a disease gene in the genome. The term linkage refers to the assumption of two things, being physically linked to each other.
what is physical proximity?
Using genetic markers to identify the location of a disease gene based on a its physical proximity.
genetic maps
They look at information in blocks or regions (similar to zones on a tube map).
What is genetic linkage?
The tendency for alleles at neighbouring loci to segregate together at meiosis. Therefore to be linked, two loci must lie very close together.
What are the two types of genetic markers?
Microsatellite markers, and single nucleotide polymorphisms.
What is LOD score used for?
The probability of linkage can be assessed using a LOD score. LOD is logarithm of the odds score. It assesses the probability of observing the same dataif the two loci are linked, purely by chance, i.e it calculates a likelihood ratio of observed vs expected (no linkage).
What is the recombination fraction?
The proportion of recombinant births. The higher the LOD score, the higher the likelihood of linkage.
What is genetic association?
The presence of a variant allele at a higher frequency in unrelated subjects, with a particular disease (cases), compared to those that do not have the disease (controls), or for particular traits compared to those that do not have this trait.
what does GWAS do
use markers across the whole genome (SNP Microarrays)
Look for association between disease and each marker-chi squared test
This has resulted in the detection of large numbers of disease associated genes.
GWAS data is presented as a single graph called Manhattan plot. The X axis is the position of the SNP on the chromosome.
The Y axis is -log10 (p value) of the association.
WTCCC
Welcome Trust Case Control Consortium
What are the problems with GWAS
GWAS has identified associations that are statistically strong and reproducible. However, their contribution to the genetic component of disease is estimated to be low. This may be because of many common SNPs of small effect, rare SNPs, copy number variation and epigenetic variation.
What are association studies
They are undertaken by comparing the frequency of a particular variant in affected patients with its frequency in a carefully matched control group. This is described as a case control study. If the frequency in the two groups differ significantly, this provides evidence for an association
uses of DNA sequencing
In Research for example
Mammalian and Pathogen Gene sequencing
Clone or PCR Amplicon sequencing to confirm a cloned or site-directed mutagenesis
“Walking” a gene to identify a causative mutation in candidate gene studies
Confirmation of causative variants associated with genetic disease following association study
Health
Today dideoxy sequencing is still the gold standard confirmatory test for specific genetic mutations in patients with suspected genetic diseases
Used to confirm all types of mutation
Silent, Misense, Nonsense, Truncating, Indel, and Mis-Splicing
the one exception low frequency mosaicism
Identifying HIV haplotypes resistant to anti-retrovirals HAART, when patients treated with anti viral, then we can identify mutations within that population of molecules and whether individual will need. treatment. problem si that. If you want to sequence with. Sanger sequencing, then you are sequencing what is the average, tether this means you. will fai to detect that by Sanger sequencing.. mosaicism is undertaken.
mutation vs polymorphism
A mutation is a rare variant
Polymorphism means common variation
We have a standard reference sequence. Variance different to reference sequence is called mutation. Polymorphism tend of be common variants
They are different tonreference sequences. You don’t know which is the normal allele
The arbitrary cut off point is a minor allele frequency of 1%
Homologous recombination is important to think about linkage analysis
what is genetic association?
Genetic Association is the presence of a variant allele at a higher frequency in unrelated subjects with a particular disease (cases), compared to those that do not have the disease (controls)
For disease we could use the broader term “trait”, for example height is not a disease
what are the problems with GWAS
GWAS has identified associations that are statistically strong and reproducible However, their contribution to the genetic component of disease is estimated to be low (<5%) Possible answers: Many common SNPs of small effect Rare SNPs Copy Number Variation Epigenetic variation Heritability is overestimated
what is NGS
They use an in vitro cloning step to amplify individual DNA molecules by emulsion or bridge PCR.
USED FOR PERSONALISED MEDICINE, genetic diseases and clinical diagnostics.
Can sequence multiple individuals at the same time.
In principle, the concepts behind Sanger vs. next-generation sequencing (NGS) technologies are similar. In both NGS and Sanger sequencing (also known as dideoxy or capillary electrophoresis sequencing), DNA polymerase adds fluorescent nucleotides one by one onto a growing DNA template strand. Each incorporated nucleotide is identified by its fluorescent tag.
The critical difference between Sanger sequencing and NGS is sequencing volume. While the Sanger method only sequences a single DNA fragment at a time, NGS is massively parallel, sequencing millions of fragments simultaneously per run. This high-throughput process translates into sequencing hundreds to thousands of genes at one time. NGS also offers greater discovery power to detect novel or rare variants with deep sequencing.
What is a DNA library
A DNA library is a collection of random DNA fragments of a specific sample to be used for further study; in our case next generation sequencing
process of ngs
-check with original notes
Library preparation: libraries are created using random fragmentation of DNA, followed by ligation with custom linkers
Amplification: the library is amplified using clonal amplification methods and PCR
Sequencing: DNA is sequenced using one of several different approaches
two ways of clustering microarray results
- clustering in circles within graph
- dendograms
what is qPCR and what is rt PCR
Q PCR is real time PCR, where PCR made quantitative. RT PCR is reverse transcriptase PCR, where RNA made into copy DNA then copied.
what’s in a spot in a microarray
each spot contains lots of copies of the same oligonucleotide probe.
This is a single stranded piece of DNA approximately 20-30 nucleotides long.
Each probe is designed to hybridise with one SNP.
Here’s a gene Kate showed last week containing a SNP at this position here.
EVC is the Ellis van Creveld gene
The SNP is denoted by the red Y. In the IUPAC list of DNA base symbols Y means Cytosine or Thymine, i.e. it is one or other of the two pYrimidine bases in a sequence
We take that DNA sequence and we design a probe complementary to the region next to the SNP
whats in a spot in a microarray?
Lots of copies of the same probe in a spot
Each spot gives the genotype for one SNP
Up to 5 million spots per sample!
Genome wide analysis possible
What is the epigenome
The sum of all the (heritable) changes in the genome that do not occur in the primary DNA sequence and that affect gene expression
An epigenetic change results in “A change in phenotype but not in genotype
What are the 4 epigenetic mechanisms
DNA Methylation
Histone modification
X-inactivation
Genomic Imprinting
Describe DNA methylation
DNA methylation in humans is the addition of a methyl group in the 5’ position of a Cytosine
This is catalysed by DNA methyltransferase enzymes
DNMT1, DNMT3a and DNMT3b
It requires S-Adenosyl Methionine to provide the methyl group
In differentiated cells it occurs in CpG dinucleotides
In general, DNA Methylation turns transcription off by preventing the binding of transcription factors
DNA methylation patterns change during development and are an important mechanism for controlling gene expression
Describe histone modification
This is the addition of chemical groups to the proteins that make up the nucleosome
There are a large number of known histone modifications (>100) and many are of unknown function
Common modifications include acetylation and methylation.
Large range of enzymes catalyse modification
Modifications are named based on the histone, the amino acid and the actual modification
For example, H3K4Me3 means that on Histone 3, the Lysine at position 4 is tri-methylated
What are the histone modifiers
Writers
Histone Acetyltransferase - HAT1
Histone Methyltransferase - EHMT1
Erasers
Histone Deacetylase - HDAC1
Histone Demethylase - KDM1
Readers
Bromodomain and extra-terminal (BET) proteins – BRD2
Chromodomain proteins – CBX1
Role of histone modification
Histone acetylation at Lysine residues relaxes the chromatin structure and makes it accessible for transcription factors
Histone methylation is more complex and can repress or activate transcription depending on where it occurs
Histone modifications can occur concurrently and so their effects can interact or modify each other
Describe X inactivation
This is the inactivation of one of the two X chromosomes in every somatic cell in females
This is needed as the Y chromosome has virtually no genes, so there is only one copy of each X chromosome gene in males (hemizygosity)
X-inactivation ensures that every somatic cell in all humans has the same number of active copies of every gene
What is genomic imprinting
Imprinting is the selective expression of genes related to the parental origin of the gene copy
Every autosomal gene has one paternal and one maternal copy
Imprinted genes tend to be found in clusters
There are very few imprinted genes (~250)
How do we imprint genes?
Imprinting is mediated by imprinting control regions (ICRs)
One copy is silenced by DNA methylation catalysed by DMNT3a and histone methylation leading to inactivation
LncRNAs are essential to the process
Imprinting patterns are reset during gamete formation
What is WES
Whole exile sequencing, used to capture the sequence of the coding region of the genome
sexual determination and differentiation
determination: Genetically controlled process dependent on the ‘switch’ on the Y chromosome.
Differentiation:The process by which internal and external genitalia develop as male or female.
The two processes are contiguous and consist of several stages
What is the SRY
The Sex determining region Y (SRY) switches on briefly during embryo development (>week 7) to make the gonad into a testis. In its absence an ovary is formed. Testis develop cells that make 2 important hormones:
-sertoli cells produce Anti-mullerian hormone (AMH)
-Leydig cells make testosterone
Products of the testis influence further gonadal and phenotypic sexual development
What are the three waves of cell that invade the genital ridge at gonadal development
3 waves of cells invade the genital ridge:
Primordial Germ Cells – become Sperm (male) or Oocytes (female).
Primitive Sex Cords – become Sertoli cells (male) or Granulosa cells (female).
Mesonephric Cells – become blood vessels and Leydig cells (male) or Theca cells (female
What is premordial germ cell migration
An initially small cluster of cells in the epithelium of the yolk sac expands by mitosis at around 3 weeks.
They then migrate to the connective tissue of the hind gut, to the region of the developing kidney and on to the genital ridge – completed by 6 weeks.
Mesenphric cells ?
These originate in the mesonephric primordium which are just lateral to the genital ridges.
In males they act under the influence of pre-sertoli cells (which themselves express SRY) to form…
Vascular tissue
Leydig cells (synthesize testosterone, do not express SRY)
Basement membrane – contributing to formation of seminiferous tubules and rete-testis
In females without the influence of SRY they form…
Vascular tissue
Theca cells (synthesize androstenedione which is a substrate for estradiol production by the granulosa).
what allows dna fragments to attaching to flow cell
dna anchors at p5 and p7
compare targeted 16S PCR to whole genome shotgun sequencing
Targeted 16S PCR amplification
Assess taxanomic diversity in sample
Biased, only bacteria
Whole genome shotgun sequencing
Assess taxanomic diversity in sample
Assess composite gene functions in sample
Unbiased, all micro-organisms
what is metagenomics
Metagenomics is the study of genetic material recovered directly from environmental or biological systems/compartments
Unbiased view of taxanomic diversity in a sample
Not limited by ability to culture
Overall view of gene content in a sample
What is SCD
Sudden cardiac death
Death from definite or probable cardiac causes within one hour of onset of symptoms
Describe subunits of POLG and their roles
Mitochondrial DNA polymerase Polymerase gamma Heterotrimer protein -one catalytic subunit (POLyA) -two accessory subunits (POLyB) -encoded for by diff genes in nucleus POLyA contains 3’-5’ exonuckease donain to proofread newly synthesized DNA, corrects mutations by cutting them out POLyB enhances interactions with dna template and increases activity and processiviry of POLyA
Meta centric
50:50
Submetacentric
Chromosomes with short arm at top and long arms at bottom
Acrocentric
Satellite at top
Long arms at bottom
Chromosome is ^
Micro level differences
Pathogenic differences sometimes associated with disease eg point mutation, SCA, 3bp deletion in CFTR
Macro level differences
Generally associated with disease, aneuploidy, translocations etc
Why are genetic variants most likely to be neutral
Depends on the type of variant (lots of variants in every gene-some pathogenic, some not; depends on the environment).
Are promoters found in coding or non coding sections
Non coding
Genetic variation
Differences in DNA sequence between indiciduals
Inherited it due to environmental factors
Syntenic
Genes close together on same chromosome
What is heteroplasmy
Mutation load which can be quantified with NGS
Need 80% or more heteroplasmy or mutant dna to then develop a disease
However inheritance of mutation load is random
Difference in linkage analysis and control analysis
Linkage analysis is the Finding of the map location of disease gene in a genome. -where the single variant is using the tendency for alleles at neighbouring loci to segregate together at meiosis.
Association analysis is the presence of a single variant allele at a higher frequency in unrelated subjects with a particular disease than in control subjects without the disease.
GWAS Manhattan plot x axis vs y axis
X axis is position of snp on chromosome
Y axis is log 10(p value) of the association
WtCCC diseases
Analyses 2000 samples from each of 7 diseases type 1 diabetes Type 2 diabetes Coronary heart disease Hypertension Bipolar disorder Rheumatoid arthritis Crohn’s disease
Controls come from 1958 brititish birth cohort and others are blood donars
Proteins that bind to histone tails
Writers which add histone modifications -histone acetyl transferase -Histone methyl transferase Erasers remove modifications -histone deacetylase -histone demethylase Readers bind to the modifications, effect gene activity, chromatic condensation and accessibility -bromodomain and extra terminal proteins -chrimodomain proteins
What is imprinting
The selective expression of genes related to the parental origin of gene copy
Meaning of imprinting genes
Selective expression of only mother pattern of genes or father pattern. If egg, imprints rewritten with mother imprints, if spermatic imprints erased and written with paternal imprint, even the genes that came from dad.
Imprinted genes are found in clusters. Imprintingbis mediated by imprinting control regions.
One copy is silenced by dna methylation ( DMNT3a) and histone methylation, leading to inactivation.
Imprinting patterns are reset during gamete formation
Epigenetic targets
Important in gene expression, therefore could be good target for drugs. In cancer, a lot of the genome becomes hyper or hypo methylated. If we could effect that, we can control cancer. Drug can inhibit methylases or demethylases.
Global dna methylation has been known to be altered in tumour cells.
Hypermethylation of tumour suppressor genes means by suppressing tumour suppressors, results in tumours
As methylation suppresses gene expression
Hypometbylation of tumour activating genes can also result in cancer
Epigenetic enzymes often mutated in tumour cells.
Histone acetyl transferases, methyltransferases, Kinases, readers etc
Pharmacoepigenetic drugs
Dnamethyltransferase inibitors are used as standard treatment for mylodys plastic syndrome
- 5 Azeicytidine
- Myelodysplastic syndrome
Histone deacetylase inhibitors
- romidepsin (istodax)
- cutaneous T cell lymphoma
Describe process of x inactivation
Picture of process found in phone
What is a dna library
Collection of random dna fragments of a specific sample to be used for further study; next generation sequencing
Describe NGS
Prep dna sample Cut into fragments
Repair sticky ends of fragments with polymerase
Add adenine bases create an a tail overhang
Add thymine nucleotides for adaption ligation
Illumina SBS sequencing machine then performs NGS (adapters contain primer binding sites to allow for sequencing).
Hybridise dna library fragments to a flow cell, they attach to surface of flow cell as single molecules
Molecules too small to see so perform PCR so so clusters big enough to be visualised.
Now that clusters made on flow cell, flow cell is ready to be loaded onto sequencing platform to perform sequencing.
Polymerase incorporated terminator base with diff fluorescent dye
Wash flow cell
Image
Cleave terminator base so other base can be added
Repeat the process
-have billions of clusters originating from single dna library molecules
Machine tells you how confident it is that each base is correct
Get identification number of the sequence
Parallel process
Short read sequences from gene then re assembled.
Can compare consensus sequence against the human genome reference and look for the genetic variants. Dedicated software and bioinformatics tools will achieve this
NGS vs Sanger
Sanger 800bp NGS is 100-200bp NGS produces a digital readout Sanger produces an analogue readout NGS produces a consensus sequence of many reads Sanger is one sequence read
Can look for shared mutations, identify mutation
Third generation sequencing
Oxford nanopore sequencing
Single molecule sequencing
No PCR
Dna passes through a nanopore
Base sequence converted into an electrical current
New technologies are applying
Principles of this technology is different
Nanopore dare cell membrane proteins where dna is forced through a nanopore and this generates an electrical signal, which gives rise to sequence. Able to sequence larger sequences 10Mbp
RNA sequencing
NGS also used to study rna use rna or mRNA from collection of cells and tissues
RNA is first converted to xDNA prior to library construction
NGS of rna samples determine which genes are actively expressed
Single experiment can capture the expression levels of thousands of genes
Amount of sequence get from each gene is indication of how abundant that gene is in being expressed.
Calculate differences in gene expression of all genes in experimental conditions
With appropriate analysis, rna sequence can be used to discover distinct forms of genes that are differentially regulated and expressed
What are the assumptions we make in bioinformatics
Candidate gene filtering using WES
Ignore structural variants and other forms of genetic variation-just target coding regions
Assumes casual variant is in coding, ignoring regulatory and other non coding variants outside of exon definitions.
Assume casual variant alters protein sequence ignoring rare cases of functional synonymous changes
(Remove synonymous variants)
Assumes casual variant has complete penetrant death , remove previously identified variants
Assume casual variant has complete detectance (restrict to variants filling dominant/recessive model of inheritance).
In vitro vs in vivo
In vitro is in glass
In vivo -> in living body
In vitro cell culture techniques advantage
Pic on phone
Describe gene knockdown by rnai induced gene silencing
On phone i
Why is cell culture not enough
Cells behave differently in dish compared to whole organism
Does not stimulate actual conditions inside an organism
-signals from other tissues
No information about gene expression and function, with regards to developmental phenotypes
Benefits of using a mouse
On phone pic
Zebrafish advantage and making mutant zebra fish
On pic in phone
What are the different techniques to making a mutant
Forward genetics: ENU screening (phenotype based).
- treated fish with ENU
- caused mutations
- our crosses then with normal fish
- approach where you try and find a genetic cause of phenotype
Reverse Genetics: find phenotypic consequence of a genotype change
RNA rescue experiments: proving pathogenesis
Use mutants and morphants to test your variants
Morpholino embryo
you can inject variant into mutant or morphin and see if you can rescue the phenotype
Transcriptomics
Transcriptomics is whole cell gene expression
Proteinomics is whole cell protein content
Metabolimics is whole cell metabolite content
Microbiota vs metagenome
Microbiota is the different organisms in a community
Microbiome is the genome of these organisms in a community
Prokaryotic ribosomes vs eukaryotic ribosomes sub units
Pic on phone
Describe variable regions
Variable regions are conserved and diverged between species. We can use variable regions to try and separate the species based on their sequences.
Describe 16S targeted PCR amplification workflow to detect organism in a sample
Pic on phone
It only allows identification up to genus level
It will amplify any contamination.
Minimise contamination by randomising samples, use negative controls, note batch numbers of reagents
Can only use for bacteria and not fungus
Whole genome shotgun sequencing
Same process, however instead of performing PCR, we are smashing up dna and sequencing all the dna
Process on pic on phone
Can be used for host viruses and yeast-there is no bias unlike PCR 16s
Amplifying whole genome, not just single copy of dna gene (PCR does only single copy…)
Once we have wgs shorty gun, we can re-assemble it
Put them back together, this can be done by algorithms
Creates a sequencing assembly.
We can assemble bits of dna for each of the species in that sample
Once we have our wgs shot gun
We can look at taxanomic diversity, build trees, same as with PCR
Can run context through gene prediction algorithms-identify which genes are present and then identify which metabolic pathways or processes are present. Then compare with patients.
Problems:
Host cell in excess in the sample
No amplification step to enrich bacterial dna
Can be contaminated 10% faecal reads can be contaminated and 90% of human reads including saliva nasal and skin samples
How to enrich without amplification
Pre extraction
Post extraction
Pic on phone
Where does my replication begin
Origin of heavy strand
Where does my transcription begin
Starts at heavy strand promoter
What is mitochondrial dna replication machinery
Mt polymerase gamma (POLG) mtDNA helicase (twinkle-unwinds dna)
On phone
Subunits of POLG used for mtDNA replication
1) pol gamma A
Pol gamma A contains 3-5’ exo nuclease domain to proof read newly synthesised dna, corrects the mutations by cutting them out
2) pol gamma b (x2)
Enhances interactions with dna template and increases activity and processivity of POLyA
Describe structure twinkle which is mitochondrial dna helicase
Hexameter (6 twinkle subunits)
UnWinDs mtDNA template to allow replication by Pol gamma
MtSSBP
Binds to ssdna once it has been unwound Prevents it from annealing again and Protects against jucleases Prevents secondary structure formation Enhances mtDNA synthesis by stimulating twinkle helicase activity (enhances activity of twinkle and helicase)
Describe mt dna replication
Photo on phone
What are the classical signs of mt dna disease
Neurodegeneration, migrants, diabetes, visual impairment, hearing deficit, infertility
What do heteroplasmy levels do ?
People are becoming increasingly aware of disorders.
Heteroplasmy levels determine disease manifestation of mt diseases
How to identify mutations in mt
NGS
X axis is mtDNA nucleotide positions in bp
Y axis is read counts
How do u get secondary mutations in mtDNA
Mutations in mt dna replication machinery cause secondary mutations
Mutations arising somatically are mutations that are not inherited but occur in post motorists tissues as a result of mutations in nuclear genesC which are encoding mtDNA replication machinery eg POLG and twinkle
If these mutations are not working properly, you cannot replicate mtDNA, causes deletions or depletion of dna
Common variants in mt dna can contribute to development of complex diseases,
Describe what dominant mutations in twinkle can cause
Pic on phone
How to identify disease causing genes?
Have snp markers that have been known from studies and from data bases
We know where they are on chrocomose
Perform linkage analysis, find region inherited together
Examine region further, make assumption that the disease gene is in that region
If market linked to disease locus, then the same market alleles will be inherited by two affected relatives
If unlinked, effected members of family less likely to inherit the same marker alleles
Prove mutation causes disease
Family based design, need to look at similarities between family members
Genotyoing array, what markers in body, linkage analysis, paramattric, non parametric, identify chromosome gene locus, perform Sanger sequencing for each of the Exons on the genes, 100 candidate genes later, pro and had homozygous mutation which both parents were heterozygous for, prove mutations in gene is disease causing.
What is a proband
A person serving as the starting point for the genetic study of a family’s
Zygosity
Degree of similarity of alleles for a trait/mutation for an organism
Intermediate phenotypes
Quantitative biological trait that is reliable and reasonably heritable
Shows a greater prevalence in unaffected relatives of patients than in the general population
What does heritability tell you?
Tells you how much of normal variation that you see is because of genetics (a particular combination of variants) and how much is due to environment.
Simplest heritability test is to look at twins, how different they are from ECG. Number given will tell you percent of variation that is due to genetics, eg 0.58-58% due to genetics. number is always betwee 0-1, high number, strong resemblance.
Describe the genetic association time line
Pic on phone
method of linkage analysis/how to find disease causing gene
Pic on phone
1. Take a pedigree, see pattern in genotype
2. Use some kind of tool to generate genotyping data for your pedigree (genotyping array, up to 1 mill markers on 1 chip).
3. Get results from machine, generate graph to show physical and genetic distribution of markers on a genotyping array chip.
-chromosome number on X, y is distribution of them along chromosome in cM
-markers are evenly distributed along chromosomes
4. run a linkage programmme
5. choose to run analysis in a non parametric way or a parametric way
NPL-non parametric linkage testing
not assuming anything about inheritance pattern
How to measure association of genotype to heart rate
Perform snp microarray
3 colours for homozygous aa hetero ag and another homo gg
Then
Pic on phone
Linkage disequilibrium
LD between 2 SNPs decreases with physical distance. Extent of LD varies greatly depending on region of genome.
If LD strong, you need fewer SNPs to capture variation in a region.
They are across stretches of DNA, because of being in close proximity, different variants or regions are inherited together.
Give an example of a short read vs long read machine
Short read -Roche 454
Long read -Pac Bio
What are the multiple source analysis pipelines available?
MOTHUR
QIIME
DADA 2
What is haplotype
Sequence of alleles along a single chromosome
what is the epigenome and describe processes associated with epigenome
The sum of all the (heritable) changes in the genome that do not occur in the primary DNA sequence and that affect gene expression
An epigenetic change results in “A change in phenotype but not in genotype
What are the 4 epigenetic mechanism
DNA Methylation
Histone modification
X-inactivation
Genomic Imprinting
How to purify genes
Add protein rages
6 histidines
Glutathione S transferase
What is linkage analysis
Linkage analysis is a method used to map the location of a disease gene in the genome
What machine allows dna sequencing
ABI 3730
difference between linkage and association
Linkage : two alleles on a chromosome are linked physicay
Association: same sequence found in unrelated subjects suggesting it is the cause of location of mutation
Describe process of NGS
4 step process
1. DNA library construction
In the wet lab – first we need to prepare the DNA sample for sequencing
Essentially the DNA is chopped into small 300bp fragments. This is shearing
This can be achieved chemically, enzymatically or physically (sonication)
We have to repair the end of the sheared DNA fragments
Adenine (A) nucleotide overhangs are added to end of fragments
Adapters with Thymine (T) overhangs can be ligated to the DNA fragments
The end result is the DNA library of literally billions of small, stable random fragments representative of our original DNA sample
Adapters contain the essential components to allow the library fragments to be sequenced
Sequencing Primer binding sites
P5 and P7 anchors for attachment of library fragments to the flow cell
2. Cluster generation
Hybridise DNA library fragments to the flowcell
Random process
But we can’t see individual single molecules of our DNA library –too small
We need use PCR to amplify the fragments to a size that we can see
Perform bridge PCR to generate clusters
Many billions of clusters originating from single DNA library molecules
Clusters are now big enough to be visualised
Flow cell is now ready to be loaded on to the sequencing platform to perform the sequencing
3. Sequencing by synthesis
Sequence each nucleotide 1 cycle at a time in a controlled manner
Modified 4 bases (ATCG) with chain terminators AND a different fluorescent colour dye
Single nucleotide incorporation (DNA polymerase)
Flowcell wash
Image the 4 bases (digital photograph)
Cleave terminator chemical group and dye with enzyme
Camera sequentially images all 4 bases on the surface of the flowcell each cycle
Each cycle image is converted to a nucleotide base call (ACGT)
Cycle number anywhere between 50 – 250 nucleotide base pairs
4. Data analysis
Short read sequences from the machine need to be re-assembled like a jigsaw
To generate a consensus sequence of our original DNA samples
We can compare this consensus sequence against the human genome reference and look for the genetic variants
Dedicated software and bioinformatics tools will achieve this