Genomics Flashcards
What is T m
Melting temperature, Point at which 50% strands separate
Half of max y axis, then measure across and go down to x axis
What is hyperchromicity
When single stranded DNA absorbs UV light to a greater extent than double stranded DNA
What happens under high stringency
Only complementary sequences are stable, determined by temp near ™ or low salt concentration
why is Genomics important, why study it?
We are now able to treat monogenic diseases such as sickle cell disease. We are able to find the point mutation and nucleotide it affected including the amino acid it was on. Can be treated by stem cell transplantation, but only for small number. Targeted genome editing can provide a permanent cure by altering mutation in stem cells that can be transplanted.
meaning of omics
Omics aims at the collective characterization and quantification of pools of biological molecules that translate into the structure, function, and dynamics of an organism or organisms.
what is genomics
the study of the entire DNA sequence that contains the complete set of genes for an organism
what is genetics
the study of how traits are passed down the generations and the role of genes in that process
transcriptome
the total RNA content in cell produced by transcription
proteome
the total protein content in cell produced by translation.
meaning of transcriptomics
study of all RNA transcrips produced by a cell, tissue or organism
benefits of using microarrays rather than next generation sequencing
- cheaper than NGS as microarrays cost £10-100, whereas NGS costs £100 to £1000
- GWAS is carried out using this technology (genome wide association study).
mitochondrial genome
16kbp, many diseases associated with variants
epigenome
changes in marks on the DNA strand or in histones, has some disease associations
metagenome
genomes of all the organisms from a specific location. Has some disease associations.
microbiome
all organisms in a specific location, eg microbiome of gut
meaning of recombinant
containing different combination of alleles.- produced by combining genetic material from different places.
difference between pyrimidine and purine
pyrimidine have one nitrogen ring, purines have 2 nitrogen rings.
what watson crick pairing is stronger, GC or AU
GC because they have 3 hydrogen bonds.
what bonds are within base stacking of DNA
HYDROPHOBIC interactions, arrangement of bases set above each other internalised to the structure and excludes water.
BONDS IN DNA
Hydrogen between base pairs, phosphodiester bonds between sugar phosphates, hydrophobic interactions, arrangement of bases above each other internalised to the structure, and excludes water. Van der Waals forces, individually small but contributes to the stability.
denaturation of DNA
conversion of a double stranded molecule into a single stranded molecule. It is by disruption of hydrogen bonds within the double helix, occurs when DNA in solution is heated, can also be induced by strong alkali or urea. On denaturation it forms a randomly structured coil, moving and changing shape constantly.
What factors does Tm depend on?
- number of hydrogen bonds,
- GC content (GC have an extra hydrogen bond, hence the more GCs, the more hydrogen bonds contained within the structure.
- length of DNA molecule , however little further contribution beyond 300bp (on graph it begins to saturate).
- salt concentration
- pH (alkali is a denaturant)
- mismatches (unmatched base pairs)
What is the effect of increasing salt concentration on base pairing
high salt reduces the specificity of base pairing at a given temperature, so a duplex containing mismatches can form and be stable at a given temperature in the presence of high salt concentration, whilst the same duplex would be unstable and dissociate at the same temperature in low salt.
examples of chemical denaturants that disrupt hydrogen bonds?
Alkali, fermamide, urea
why does alkali disrupt hydrogen bonds
NaOH= Na+ + OH-, where the OH- ion disrupts H bond pairing. Fewer hydrogen bond means lower stability of the structure and so lowering Tm.
mismatch
base pair combination that is unable to form hydrogen bonds.
effect of mismatch on DNA stability
reduces number of hydrogen bonds, hence lowers stability and Tm, distorts the structure and destabilises adjacent base pairing, can lead to zipping and unzipping,. It also ,akes the formation of a duplex less energetically favourable, reducing the change in free energy on duplex formation. It also creates shorter contiguous stretches of double stranded sequence, leading to a lower Tm.
what is reverse of denaturation called and what factors cause it
renaturation, caused by cooling or neutralisation.
how to prevent mismatches forming between 2 molecules
performing a hybridisation at the Tm of the duplex molecule.
Stringency
the concept of manipulating the conditions to select duplexes with a perfect match only. Manipulating the conditions meaning to limit hybridisation between imperfectly matched sequences allowing us to manipulate specificity–>changing temperature or environment, reducing amount of denature element.
low vs high stringency
low stringency is high salt and low temperature, high stringency is promoting the formation of non mismatched base pairing.
examples of nucleic acid based techniques
northern blotting, southern blotting, microarrays, dideoxy and next gen sequencing, PCR, cloning.
nucleic acid hybridisation techniques
Identifies the presence of NA containing a specific sequence of bases. Allows the absolute or relative quantitation of these sequences in a mixture.
what is a probe
Probes are usually between 20-1000 bases in length, depending on the technique it is used for. A probe is a sequence that uniquely identifies specific sequences, which under high stringency conditions form a duplex.
Nucleic acid blotting techniques (northern/southern blotting), disadvantages
Analysis of mRNA or DNA, can be used to identify specific RNA. Limited technique, only detects one gene at a time and small numbers of samples. It is very time consuming and messy, hence is largely superseded by quantitative PCR or microarrays.
Process of northern or southern blotting
uses DNA or RNA (that is extracted) respectively that is separated by gel electrophoresis which is then transferred by mass capillary flow to a nylon membrane. It is covalently bonded to the membrane and then hybridised with a labelled probe to mRNA transcript in sample. Nylon or nitrocellulose membrane captures nucleic acid.
Microarrays
An ordered assembly of thousands nucleic acid probes. The probes are fixed to a solid surface, then sample of interest is hybridised to the probes. It simultaneously measures 50,000 different transcripts in a cell, tissue or organ.
what are microarrays used for
gene expression profiling, comparison of drug treated and untreated cells. RNA is extracted, labelled, hybridised to the array and the amount and location of the label measured. This tells us how much of each and everyone of the transcripts in the human genome are being expressed. They can also be used to assess the presence or absence of millions of individual SNPs simply through hybridisation of genomic DNA to an array. used in genome wide association studies, assess millions of SNPs through hybridisation of DNA to an array
what does GWAS stand for?
Genome Wide Association studies (GWAS).
what is the formula for GC percentage
(G+C)/(G + C +A + T) X 100
Nucleic acid hybridisation techniques
Identified the presence of NA containing a specific sequence of bases
Allows the absolute or relative quantisation of these sequences in a mixture
Disadvantage of mucking acid blotting techniques
Analysis of mRNA or DNA
Limited technique, only detects one gene at a time and small numbers of samples the gel based techniques are time consuming and messy
Largely superseded by quantitative PCR
Antibiotics
Substances produced by fungi which are toxic to bacteria but not fungi are called antibiotics
What does an exponential amplification require
2 primers corresponding to ends of sequence
What is PCR
pcr is an enzyme based method to specifically amplify segments of DNA using a thermal DNA polymerase in a cyclical process.
what is a chain reaction
A chain reaction is a series of events each one of which is dependent upon the preceding event to sustain itself. a series of reactions that lead to an exponential increase in the number of events occurring in a sequence.
role of DNA dependent DNA polymerase
It recognises a specific structure consisting of a partially double stranded DNA forming an initiation complex with it. It then extends a partially double stranded molecule from the 3’ end of the non-template strand.
How do we ensure that annealing occurs rather than renaturation in PCR
provide huge excess of the primer, ensuring template is in low concentration at start.
Enzyme used in PCR and its properties
DNA dependent DNA polymerase. It synthesises a new nucleic acid strand by copying a DNA molecule. It cannot copy RNA nor make RNA. RNA must first be copied to DNA by reverse transcription before it can be amplified by PCR.
What factors does DNA dependent DNA polymerase require in PCR
A template strand with a primer (20-10 bases long) annealed to it, Deoxy nucleotide triphosphates (dATP, dGTP, dCTP, dTTP), Magnesium ions are required as a cofactor for the enzyme, a roughly neutral pH.
What are the 3 stages of PCR
Denatured (template becomes single stranded)
Annealed (formation of initiating template/formation of a duplex with the primer and template strand)
Native state (optimal conditions for the extension of the initiation complex and enzyme activity, inc temp and pH).
property of the polymerase used in PCR
Must be thermostable, derived from a thermostable bacteria called Thermus aquaticus (Taq polymerase)This is because for PCR to work, the reaction must go through multiple rounds of extreme heating and cooling.
meaning of thermostability
able to retain activity, upon repeated heating to temperatures that would ‘destroy’ most enzymes.
Process of PCR
- Mix all the reactants into PCR, enymes, reactants, excess of primers
- Start cycle, first with denaturation, where you heat the PCR to 95 degrees, to denature the template strand and break the hydrogen bonds between bases.
- Begin to cool the reaction, to a temperature of 55 degrees Celsius, to allow annealing of primers, so primers can bind to template strands. Each primer binds to two ends of the DNA, and to the corresponding template strand they are complementary to.
- Change the temperature again to 72 degrees, which is optimal for DNA polymerase to work. An initiation complex is formed, which elongates from 3’ end of primer, creating a second strand.
- Continue to repeat the process
How to calculate product in each cycle
Every cycle results in a doubling of the product, thus there is an exponential accumulation. 3rd cycle is 8, 10th cycle is 1024, 30th cycle is one billion
PCR applications
Diagnostics-routine diagnostic tool used for identification, confirmation and quantification of specific DNA sequence. Eg, presence absence calling TB, detection in sputum, determining treatment response/ drug efficacy. Differentiating between closely related organisms, swine flu, vs human influenxa, both H1N1 subtypes. And How much determine when treatment might be commenced, HIV viral load.
What are real time PCR or quantitative PCR?
Different quantitative PCR detection methods used for diagnostics.
How to detect SNP using PCR
Adaptations of quantitative real time PCR. These methods depend upon the differences in the melting temperature™ conferred upon short sequences of DNA by their nucleotide composition. They rely upon differences in the Tm of a duplex containing a single nucleotide mismatch (single nucleotide polymorphism).
PCR in foresnsics and law enforcement
amplification of genetic markers:
- parentage or kinsship: immigration and inheritance
- identiffication: millitary casualties, missing prsons or environmental disasters
- matchiing 2 sources: crime scene
- authentification of biological material: cell lines, purity of food.
STRs
short tandem repeats,
2-5 bases in length, repeated many times at specific locations in the genome. Many different STRs are found scattered around the genome. UK data base consists of database to identify individuals with 10STRs. They are highly polymorphic, the number of repeats varies between individuals. They provide a pattern of uniquely sized products accordedby each individuals genome providing a molecular bar code, or DNA fingerprint. highly polymorphic (vary for each individual) but are inherited and are similar between siblings and parents.
how many STRS does UK DNA databasse contian
10STRS and each STR will differ in size, giving 20 numbers and a gender indicator together, they give a matching probablility of around 1 in 1 billion.
why is PCR used prior to NGS
SO NGS can simultaneously sequence large number multiple PCR products of candidate cancer genes.
Exxamplles of use of PCR
- NGS, simultaneously sequencing large number multiple PCR products of candidate cancer genes.
- isolating individual segments of DNA prior to cloning or sequencing.
- manipulating and modifuing DNA, introducing mutations into a sequence of DNA. Modifying the ends of a sequence to make them contain restriction sites compatible with cloning vectors.
- PCR is one of the most commonly used and important tools used in recombinant DNA technology. Eg developing recombinant vaccines, pharmaceuticals, (interferons, clotting factors, tissue plasminogen activator).
Define Creationism
The idea that species are made by a supernatural intelligent creator.
what 2 things does science have to assume
Natural phenomena have natural explanations, which can be studied by scientific experiments.
what does a scientific theory need?
Make testable predictions, stand or fall according to whether the predictions are confirmed or refuted. Popper: ‘a scientific theory must be falsifiable’
what is relative fitness?
The average number of surviving progeny of a genotype (compared with competing genotypes) after one generation.
w<1 and w>1 for relative fitness
If w<1, the frequency of the allele; will decrease with each generation until the allele disappears (negative selection). If w>1 the frequency of the allele will increase with each generation, until the allele reaches fixation (positive selection).
examples of how small mutations and large mutations can occur
small: base substitutions, small insertions and deletions. Large mutations are via large dna duplications, large deletions, insertion of transposable elements, viral insertions, chromosome rearrangements.
How does gene duplication drive evolution?
Gene duplication is a major driving force of evolution. Once a gene has been duplicated, one copy can continue to maintain the original function whilst the other can evolve new functions. There are likely to be changes both in the coding sequence (in amino acid sequence) and in control sequences.
how is it possible that y genes can be expressed during foetal life and B genes are expressed during postnatal life?
Promoter duplicated along with coding sequence
Promoter sequence has evolved so B and Y promoters now bind different transcription factors
Interact differently with gene enhancers
Differential control of B and Y genes
what is a pseudogene?
Gene that cannot make a functional protein. However it is a duplication of B-globin gene. hence one copy can maintain original function whilst the other can lose all function.
Fanconi’s anaemia
Recessive lethal genetic disorder, most affected patients die of bone marrow failure during childhood. Do not reproduce. Gene arises by random mutation, eliminated by natural selection, very low allele frequency.
what is modern synthesis
modern synthesis refers to the combination of natural selection with mednelian genetics. Evolution can be seen as a logical consequence of Mendelian inheritance and ecological competition.
what types of genes does gene duplication create?
A redundant copy of gene, which can evolve to gain new functions eg globin genes. But other duplicated genes may become pseudogenes.
What phrase explains why many genetic diseases are extremely rare?
Evolutionary theory explains why many genetic diseases are extremely rare, and how others are maintained at higher frequencies by positive selection, particularly by heterozygote advantage.
Why does log base 2 and linesr curve show plateau of PCR products
As reaction progresses, we get acidification as we are producing hydrogen ions, due to addition of dAMP (as elongation occurs). Also producing pyrophosphate. Each cycle is incorporating primers into the reaction and product, hence we are depleting the primers that are present, and increasing the template concentration. As a consequence, we are changing environment in which polymerase is working. AS A CONSEQUENCE OF acidification and depletion of reactants, the kinetics change and the reaction progresses where we have a plateauing and are no longer producing product. Green is change in kinetics where reaction cannot change place.
Hence reduce how polymerase is working, so kinetics change, and reaction progresses to a point where we have a plateau.
Eventually reaction doesn’t occur.
How would you use PCR to identify TB
Presence absence calling TB - detection in sputum, determining treatment response/drug efficacy. If you take a sample, perform PCR with primers specific to TB, you can identify the presence or absence of specific DNA segments that correspond to TB and identify the presence or absence of that organism. within the sample.
How would you use PCR to differentiate between organisms
differentiating between closely related organisms “swine flu vs human influenza” both H1N1 subtypes. Allows us to understand the epidemiology and how to. Treat them,.
How would you use HIV to determine how much treatment
Use it to determine when to commence treatment, with HIV WE DO not treat until HIV viral load increases to a certain level. Then we commence treatment. How much: determine when treatment might be commenced, “HIV viral load”
We can also monitor using HIV load assays to determine when we have a failure in treatment of HIV and we have emerging resistance as a consequence of mutations within a population of organisms present in an individual. HIV viral loads are done routinely in order to determine when and how we treat an individual.
How do we perform quantitative or real time PCR
We have a serial dilution of template of known quantity and as a consequence of performing that, we can perform our assay and compare our assay results to these and therefore identify amount of template that we started off with. These techniques use fluorescent detection of the accumulation of product.
The crossing point of the amplification is determined and is proportional to the template concentration at the start.
SNP detection from PCR
HIGH RESOLUTION MELTING, perform PCR at the end of PCR we perform a melt curve, heat reaction up and slowly cool it and measure the annealing of the template. In order to determine presence or absence of particular SNP, the Tm is effected by the particular sequence within the amplicon, and therefore we obtain different curves. Right hand side diagram, are a number of different curves which describe different variance within particular amplicon. By comparing particular curve of a sample amplicon against that of known variance, then we can identify the particular SNP that is present, within the segments.
Allelic discrimination: specific binding of the probe to the amplified region containing the SNP is detected.
what is allelic discrimination, or probe based version of qPCR
pROCESS where specific binding of the probe to the amplified region containing the SNP is detected.
what does HRM detect
Tm of the amplified product is used to determine which sequence is present.
what does it mean if STRs are polymorphic?
They are highly polymorphic meaning that they vary from one individual to another but are inherited and are similar between siblings and parents
how does the UK dna database identify individuals
should be on email.
what is synonomous substitution
Synonymous substitution a mutation substitution that doesn’t cause a change in amino acid sequence
Non-synonymous substitution
Mutation that does cause a change in amino acid composition.
Sickle cell Anaemia
Point mutation in the β globin gene
Single amino acid substitution
a hydrophilic a.a. (glutamic acid) is replaced by a hydrophobic a.a. (valine) at position 6
The crystals damage the red cell membrane resulting in
Cell lysis causing anaemia
Cell adhesion, causing blockage of small blood vessels, followed by tissue infarction
what does relative fitness determine?
Relative fitness, w, will determine whether the frequency of an allele increase or decreases over generations
Alpha 2 beta 2
Adult hb
Alpha 2 delta 2
Minor adult hb, less than 1% in us
Alpha 2 gamma 2
Foetal hb
What is NAH
Mixing DNA from two sources that have been denatured by heat or alkali to make them single stranded , then under certain conditions allowing complementary base pairing of homologous sequences
In what way are single stranded DNA sequences listed?
5’ to 3’
what are histones
basic proteins that bind DNA. Eight histones form the nucleosome. Histone 1 binds the linker DNA.
exome
The sum of all the gene sequences
cis linked
regions physically close to the exons on the DNA strand. Contrast with trans regulatory regions that can be on different chromosomes.
size of human genome
3 x 10 ^9 - 3Gbp
where are pseudo genes found in DNA
Intergenic region
What are the three RNA polymerases and their role in getting transcription?
RNA polymerase I-needed to transcribe rRNA genes
RNA polymerase II-needed to transcribe mRNA
RNA polymerase III-needed to transcribe tRNA and other small RNAs
Introns in genes
vary in number 0-311
vary in size 30bp to 1Mbp
some introns contain other genes
enhancers
upregulate gene expression-they are short sequences that can be in the gene or many kilobases distant. They are targets for transcription factors (activators)
silencers
downregulate gene expression. They are also position-independent and are also targets for transcription factors (repressors)
Insulators
short sequences that act to prevent enhancers/silencers influencing other genes
3 stages of modification of eukaryothic mRNA
Capped at 5’ end (methylated cap)
Polyadenylated at 3’ end
Intervening sequences (introns) removed
alternative splicing
Exons can be skipped or added so variations of a protein (called isoforms) can be produced from the same gene
process in which processed pseudogenes are copied from mRNA
Retrotransposition, as a result they have no promoter or exons
How does UK DNA database allow us to find individuals
Since each of the alleles for eg CSF1PO on chromosome 5 may contain between 6 and 15 repeats this gives for the first STR two sets of numbers between 6 and 15 that makes 10 possible numbers for the first allele and 10 possible numbers for the second allele ie 100 different combinations, if we then add in VWA it can have between 11 & 24 repeats similarly two sets of 14 numbers 192 combinations , combining these two STRs we have 19,600 possible combinations from just 2 STRs . If we use 10 STRs and a gender identifier we have more than 1 billion combinations. Hopefully you get the idea. In the UK DNA database there are more than 6 million individuals.
PCR of the 10 STRs is done in the same way, primers flanking the STRs are used one of these is labelled, The PCR products are separated by capillary gel electrophoresis and the size of the product determined, this allows in turn the determination of the number of repeats in each STR; these numbers are combined and compared to the database to try to find a match.
why is it possible that the PCR product may not be a multiple of 4 even if the repeated sequence consists of 4 bases
The primer will not be directly next to ie immediately flanking the STR see the graphic below. From the slide you can see that TH01 is a repeat of 4 bases with a minimum of 4 bases but the minimum amplicon size given is 163 bases. So there are an additional 147 bases between the ends of the STR and the ends of the primers
percentage of our genome that codes for protein
2%
what are major macro-level differences associated with?
Disease (aneuploidy, translocations, etc)
what are micro or molecular level pathogenic differences associated with and give example of these differences
Micro-molecular level pathogenic difference is sometimes associated with disease (point mutation and SCA, 3bp deletion in CFTR)
WHAT IS polymorphic
any position in the genome that varies between individuals is considered polymorphic=a variant
if we compare human genomes, how many SNPs will we find?
single base differences once every 300 bases
how are SNPs made
They are typically generated by faulty replication of DNA during mitosis. Although here e mismatch repair mechanisms, these mistakes do not get repaired.
what is polymerase slippage?
it is when the polymerase sometimes slips fom the template strand during replication It is this event that holds the lead tocodon expansions.
describe the polymerase slippage model
If the polymerase slips, it causes the new strand to unpair (release) from the template strand. If the polymerase slips, it causes the new strand to unpair (release) from the template strand. If the slip occursat the templates codon repeat region of the Huntington gene, then when the new strand tries to reattach to the template strand, it will have many identical copies of codon to choose from.
snv
single nucleotide variant, change in nucleotide that is not corrected. SNVs may be in a gene, promoter, non coding region. when pathogenic, may call point mutations. base substitutions, generally bi-allelic, due to mutation and mismatch repair. May do nothing, may affect trait or be associated with disorder.
why are mutations common in african population?
beneficial in places where malaria is rife (heterozygote advantage)
mutation
new allele arises, we now have a variant
gene flow
migration leading to introduction of that variant into another population
genetic drift
random change in variant allele frequency between generations
selection
non random change in variant allele frequency between generations because presence of one allele/genotype is pathogenic (negative selection) or beneficial (positive selection)
what does repeat in tandem mean?
one after the other.
copy number variants
intergenic, quite large.
variant effects
Can be beneficial
Can be pathogenic
Most are neutral
Are these of any use?
Yes, can be used as markers to help find disease-causing genes and mutations
Autozygosity mapping & linkage studies (Microsatellites, SNPs)
Association analysis (SNPs, CNVs)
allele
An allele is one version of a particular position or locus on the genome
what is a locus
unique position in genome. A single base to entire genomic region.
what is an allele
particular form of a specific locus. A single base to entire genomic region. An individual has 2 alleles for any autosomal locus.
2 possible alleles
biallelic