human and viral genetics Flashcards
define characteristics of autosomal dominant disorders
1 copy of mutated allele is sufficient to cause an individual to be affected, WT is recessive
affected parents can transmit to any progeny (male or female equally likely) and at every generation
with autosomal dominant disorders why aren’t typical ratios seen in humans?
due to small progeny sizes
what is Huntington’s disease?
an autosomal dominant genetic disorder that results in cerebral atrophy and neuronal death (specifically the dopaminergic neurons of the striatum in the basal ganglia)
causes motor dysfunction and cognitive decline
what common pattern/repeat is seen in Huntington’s?
what is classed as normal?
polyglutamine repeat - CAG repeats (CAG codes for glutamine)
the normal number is 36, any higher is concerning, the more repeats the earlier onset the disease
why are later generations affected earlier with HD?
CAG repeats are unstable and therefore prone to expansion in cell replication
this means the repeat expands when a parent passes it on to a child
this is known as anticipation
what specifically does Huntington’s affect and how
mutation in the HTT gene causes the protein to misfold and aggregate (when misfolded proteins come together)
this has many downstream effects on mitochondria, transcription, axonal transport etc..
autosomal recessive disorder - why doesn’t the phenotype appear in every generation?
it’s recessive - both parents have to be carriers
what is cystic fibrosis and how common is it?
1 in 22 are carriers, 1 in 2000 are sufferers
it’s a build up of thick mucus in the airways that can result in organ damage, as a result of a mutation to the CFTR gene that encodes for a Cl- channel responsible for hydrating the airways
are all cases of cystic fibrosis the same? how does this effect treatment?
mutations can be missense (single base change changes amino acid), nonsense (premature stop codon/termination of protein) or frameshift
different mutations cause different molecular phenotypes so treatment often differs, if someone has heterozygous mutations then they might need a combination of treatments as they have more than one issue with the Cl- channel, whether that be in trafficking, not made at all, less effective, made less etc…
what is the most common mutation in CFTR?
a deletion in phenylalanine
what is an X-linked disorder and what does this mean for the likelihood of inheriting/passing it on?
typically recessive - the mutation is found on the X chromosome
inheriting -
males can inherit it when only the mother is a carrier as they have a Y chromosome - nothing to dominate
females require both parents to be carriers to suffer
passing on -
males are unlikely to have descendants showing the phenotype as their male children get the Y, and female will most likely only be carriers
female carriers have a 50% chance of passing it onto male progeny and 50% chance of female progeny being carriers
themselves
while female sufferers have a 100% chance of passing it on to male progeny and 100% chance female progeny will be at least carriers
what are some cystic fibrosis treatments?
class I mutations (nonsense mutations) need production correctors
class II mutations need drugs to improve the intracellular processing of CFTR
class III and IV need drugs to recover CFTR function
Duchenne muscular dystrophy -
what gene does it effect and why is this an issue?
how it is inherited?
mutation in dystrophin gene, this protein acts as a bridge between actin cytoskeletal filaments and ECM, when defective the result is weakening of the muscles
Effects skeletal and cardiac muscle
dystrophin is largest human gene of 79 exons long
X-linked disease, exclusively passed on by the mother
what mutation is most common in Duchenne muscular dystrophy?
60% of cases are due to deletions of one of the exons, with a hotspot between exons 40-54
why does Becker’s dystrophy have less severe symtpoms?
also cause by deletions, but in-frame deletions, so the protein is made it’s just a bit shorter
what is a method of treating Duchene muscular dystrophy?
Use of antisense oligonucleotides
The idea is to block splicing machinery, to keep premature stop codons out of the mRNA, to prevent nonsense mutations and get that shorter, partially functional protein from Becker’s dystrophy
what are the characteristics of complex disease?
quite common (66% lifetime risk)
they are polygenic - multiple genetic components contribute (though these risk alleles are usually only suggesting susceptibility and are not deterministic)
environmental factors can effect
not inherited in a mendelian pattern as more than one genetic component ( and environmental factors) is involved, but does run in families
what is the point in identifying the genetic components involved in complex disease?
helps with early diagnosis and treatment
knowing the specific issues/molecular basis may help develop therapeutics/treatments
environmental risks can be identified and lifestyle changes made
what are the three categories of complex disease?
1) Small number of dominant alleles = large risks, e.g. Parkinson’s
2) Common disease common variant model (CDV = many alleles each give a small risk, e.g. type II diabetes
3) Intermediate - one major dominant allele exerts large effect, with maybe a few smaller contributing alleles e.g. breast cancer
what is an SNP? major and minor allele?
a single nucleotide polymorphism
this is just when a single nucleotide is changed
they are found in coding and non-coding regions
when looking at populations there is a major allele - the one with the greatest frequency
the second most common allele in the population is known as the MAF - minor allele frequency
define rare and common SNPs
rare SNPs have an MAF of < 5%
common SNPs have an MAF of > 5%
explain what linkage analysis is
looks for linkage between mapping markers and occurrence of diseases in families
It attempts to locate a disease-causing gene/s by identifying genetic markers of known chromosomal location that are co-inherited with the trait of interest
what is GWAS?
genome-wide association studies
search for alleles in a population that occur more frequently in disease cases than matched controls
define phenotypic variation and heritability
phenotypic variation = genetic variation + environmental variation
Heritability = the degree of variation in a phenotype within a population that is due to the genetic variation
how are twin studies used to investigate heritability?
twin studies can help calculate relative genetic and environmental contributions of complex traits, how much of it is genetics?
you compare monozygotic twins who are 100% genetically identical, with dizygotic twins who share 50% of alleles
both sets of twins will share an environment that is assumed to have equal influence on both
probably won’t be asked, but in twin studies what do A, C and E stand for?
A = genetic variance (constant for identical, variable for not)
C = common environment (constant for both)
E = specific environmental (variable for both)
explain what is meant by continuous phenotypic variation and use an example
it’s the idea that in complex disease there can be a wide variety of severity and symptoms, many loci can contribute to a disease, and in different combinations they can give rise to different phenotypes
an example of this would be major depression - it has 44 significant risk loci, with all humans carrying a different combo of these alleles, resulting in different phenotypes
some complex diseases have a threshold hypothesis for susceptibility - what does this mean? us an example and explain how it effects relatives differently for male and female?
when you reach a number of mutations in risk alleles, you get the disease, its either you have it or you don’t (opposite of continuous phenotype?)
an example is pyloric stenosis (causes vomiting in infants and is more likely to affect males as allele is on X chromosome)
female carriers carry more of the risk alleles as they’re XX so their relatives have far greater risk than relatives of affected males
are SNPs in non-coding regions harmless?
no, changes to non-coding regions can affect expression and regulation of the associated genes
in terms of mutations/SNPs what does synonymous and non-synonymous mean?
synonymous doesn’t change the amino acid coded for
non-synonymous does change the amino acid, resulting in nonsense mutations, missense mutations
what is the exome aggregation consortium (ExAC)?
conducts disease-specific and population wide genetic studies, sequencing exomes of unrelated individuals
it has records of 7.4 million variants mapped
includes frequency of alleles in a population
documents rare mutations
Genome wide association studies - GWAS - how do they work?
they take a population and study those suffering a disease as well as a large matched control group
a panel of SNPs are investigated, they look at whether the disease group has a higher frequency of particular alleles when compared to the control group, and if a significant difference is found it constitutes an ‘association’ with the disease
many risk loci have been identified, with many to still be found as well
what are the two ways a SNP can be associated with a disease?
the SNP itself can increase the risk
OR the SNP correlates with the real risk allele due to ‘linkage disequilibrium’
the non-random association of alleles at different genomic sites dependent on distance between alleles and the recombination rate - they are together more often than can be accounted for by chance because of their physical proximity on a chromosome
what are haplotype blocks? how are they useful in GWAS? what is a tag SNP?
alleles are split into blocks based on proximity/patterns of linkage disequilibrium, and therefore the likelihood they are linked (areas of high linkage disequilibrium)
a SNP appearing to be a risk allele might be indicating a different allele it is linked to that wasn’t originally panelled for in the GWAS, the haplotype blocks are useful in identifying this (this SNP used to identify the real risk allele is the tag SNP)
this means SNPs narrow down the area of the genome in which to investigate as they themselves may not be risk allele (though they can eb) but might be close to a SNP that is
how is likelihood of an SNP being associated with a disease measured?
how does this look in the common disease common variant model?
the odds ratio
OR = 1 means the events are independent
OR > 1 means events are correlated
OR < 1 means the events are negatively correlated
CDCV - multiple alleles with OR <1.2 showing weak association to the disease phenotype
why is statistical significance so important in GWAS? what else is absolutely necessary in GWAS?
stats are needed to differentiate between true and false positives, 1 in 20 events are actually non-significant
this means large groups and very strict cut offs for P values are needed
genome wide significance must be attained, the p value must be < 5 x 10^-8
what graphing is sued to define risk variants?
a Manhattan plot, has the cut off of 5 x 10^-8 drawn on for clarity
necessary as GWAS is very susceptible to false positives
what did a GWAS for type II diabetes show?
Greatest risk allele is intronic, affecting transcription factor required for pancreatic development
Another is intronic and influences body weight regulation
it’s CDCV, so many ‘novel’ loci with a correlating but low odds ratio, were identified
of course environmental factors also play a role
what did GWAS study of breast cancer show?
lifetime risk is 8-12% in females
risk increases if first degree relatives suffer from the condition
this is an example of intermediate model - rare coding mutations have a significant increase in risk with other small contributing risk alleles (66 in this case to be precise)
BRCA1 and 2 autosomal dominant cause 5% of breast cancer cases (these were mapped by linkage analysis)
what is meant by missing heritability?
the idea that even though risk alleles have been identified for certain complex diseases, they only explain a certain percentage of the heritability of a disease, e.g. Crohn’s disease, 32 loci identified explain 20% of heritability
what are some places missing risk can arise from?
false negatives in a GWAS study
rare variant alleles with an MAF of 1-5%
structural alteration of the genome
epigenetics
3D genome organisation
all of which are not detected in a GWAS
how much do genetic defects effect
1) embryos
2) newborns
3) later life
1) 50% estimated effected by chromosomal disorders, with 8% clinically recognised conceptions terminate as a result of this, but the estimate is 50% (e.g. not clinically recognised)
2) 5% (chromosomal, mitochondrial, single gene, complex etc…)
3) 2 out of 3 diseases have a genetic component
down syndrome is an example of?
aneuploidy, specifically trisomy of chromosome 21
what is in the mitochondrial genome?
how is it inherited?
2-10 homoplasmic (identical) copies of a circular 16.6kb genome
however mutations can cause heteroplasmic copies that can cause mitochondrial disorders, these do not follow mendelian pattern of inheritance as all mitochondria are inherited from the cytoplasm of the egg
compare single gene disorders with complex disorders
single:
single gene
mendelian pattern
individual conditions are rare, but collectively common
high penetrance - mutations are deterministic
tests are predictive
complex:
risk alleles at multiple genes
not simple pattern of inheritance but run in families
conditions are common
mutations aren’t deterministic
no reliable tests
influenced by environment
what size is the human genome and how much of it is coding?
is this proportion similar to other species?
3.2Gb or x 10^9
1.1% is coding
codes for 20,500 genes
other species tend to have a higher proportion of coding sections in their genome for the amount of genes they code for
does non coding mean non-functional?
no, these non-coding regions still have a role, e.g. promoter regions, regulatory elements like enhancers
within the middle of genes there are introns as well
risking it all
only to fall
back where I started
back to the shore
back to before
you took my hand
bet you on land
they understand
and they don’t
stand you broken hearted
what did I give to live where you are?
where do I go with no where to turn to
next generation sequencing - main difference? what are the two main kinds? how much is it?
a lot of small fragments unlike sanger sequencing
whole genome sequencing - loads of data, lots of time and money needed, though it can be done now in 24hrs and costs about £1000 to do
whole exome sequencing - only the protein coding regions
what kind of sequencing is best for simple vs complex gene disorders?
in simple, whole exome as is faster and cheaper and mutations usually occur in the protein region - from around 2011 conditions are almost always identified with NGS approaches
in complex, whole genome sequencing as more often than not the mutation occurs outside the coding region
what are viruses?
how big are they?
what is the size range and format for a viral genome?
genetic elements that cannot replicate without a host, but can exist outside a host
they are very small which is partly why they need a host cell, they cannot carry enough genetic material to replicate alone
polio is around 28 nm which is small, smallpox is around 200nm which is large
genomes range from 0.5Kb to 1000Kb, stored as RNA, DNA or ssDNA linear or circular
very briefly, how do viruses replicate their genetic material?
virus/virion (when it’s outside the cell) must find a host that it recognises (can be only one, can be many species)
Genomic material is injected into the host cell while protein coat remains outside
The injected DNA/RNA enters replication process
phiCD508 is a bacteriophage being studied - I think litch by the lecturer - for what?
it’s ability to infect C.difficile - a major issue in hospitals - in order to potentially engineer phage-like particles to attack C.difficile
what challenges does small size bring to assembly and replication?
lack of space - some viruses like TMV minimise the proteins they really need to code for e.g. only one protein for the coat/capsid capable of self assembly so only one gene is required
so reduce number of proteins needed, with less machinery required for assembly
timing is essential in viral replication - use T4 as an example to explain how a timeframe applies to it’s replication
T4 phage as an example
1st, read viral DNA, make EARLY mRNA to make early proteins for things like nucleases, polymerases sigma factors etc… involved in DNA replication so needed to make loads of the viral DNA
Switch to middle mRNA and make middle proteins at 7min-ish
Late mRNA = late proteins, capsid and structural proteins, and finally a lysozyme to break out
Switching to late proteins - T4 has an earlier produced sigma factor for these late proteins
From start to finish, 25 minutes
how does a virus ensure the correct order of replication (early, middle, late proteins)?
to ensure this is the order it occurs in, sigma factors are used
A sigma factor works with RNA polymerase to bind to promoter region and get transcription going
Host sigma factors are used for early proteins, some of these often modify/bind to host RNA polymerase, targeting alpha subunit, altering its specificity to recognise middle protein phage promoters
Early protein MotA recognises sequence in middle promoters to guide RNA polymerase
Phage codes for an anti-sigma factor to take out the hosts sigma-70 to prevent host transcription
T4 has a sigma factor produced in earlier for the late proteins
what is meant by a lysogenic virus and what are their two lifestyles?
can a virus switch between lifestyles?
lysogenic viruses, often termed temperate bacteriophages, can integrate into the host, rather than escaping from it
they can end in typical lysis - lots of replication and protein production then breaking forth from the cell
they can go down the lysogeny route instead and insert their genome into the host genome as a plasmid essentially or into an actual chromosome where it is called a prophage, replicated in sync with the host chromosome (though viral genes aren’t really expressed)
it is possible to switch from lysogeny to the lytic pathway via induction
in lysogeny, what are repressors?
proteins that supress the lytic pathway, if they get inactivated, that is when induction occurs and the switch to the lytic pathway
explain how the lambda phage does integration
Viral genome, in the case of lambda phage, needs the enzyme ‘lambda integrase’ to attach at site ‘att-lambda’
Viral DNA has sticky ends that come together to form a ring once in host
Site-specific nuclease makes staggered ends in phage and host DNA so they can join, a DNA ligase fills the gaps
explain what happens in the lytic pathway - the mechanism for viral genetic material replication
the rolling circle:
(also used for plasmids) DNA forms loop once in the host
a nick is made in the outer strand, and the inner strand is used as a template while a new outer strand is made for it, displacing the original outer strand
once complete and sealed with DNA ligase, the original outer strand can now be used as a template to make another inner strand, forming two copies of the DNA (that can then be replicated again)
what are some issues/differences for eukaryotic viruses as opposed to phages?
in eukaryotes transcription and translation are discrete processes so a virus has to get it’s genetic material into the nucleus for transcription then back out for translation (usually)
eukaryotic viruses often have a membrane envelope they cannot make so must steal from the host upon leaving
any RNA genomes require reverse transcription
eukaryotic mRNA undergoes extra processing than bacterial (splicing, 5’cap, polyA tail etc…)
how exactly does the eukaryotic virus polio work replication-wise? how does it prevent translation of host mRNA?
(+) strand RNA virus, very small
So genome = ssRNA
this mimics eukaryotic mRNA, with a polyA tail and a fake cap, even forms stem-loops
Host starts to translate it, forms a polyprotein that is cleaved into all the individual proteins required, including more protease for cleaving and * its own RNA replicase to replicate the RNA in the cytoplasm
Inhibits host RNA and protein synthesis by destroying host cap-binding protein
what is the difference between + an - RNA viruses?
I think it’s to do with the sense (screw sense) of the strand
more importantly, the negative sense RNA virus comprises viral RNA that is complementary to viral mRNA and can only act as the genome, while the positive sense RNA virus comprises viral mRNA, which can be translated into proteins straight away acting as genome and mRNA
why does polio, a (+) RNA virus need to code for it’s own RNA replicase?
eukaryotic hosts do not convert RNA to RNA
how does rabies, a (-) RNA virus replicate?
To replicate the genome, the virus must provide an RNA polymerase because the host cannot convert RNA to RNA, and the virus must make the (+) strand from it’s parental negative one, before being able to replicate that over and over (so end result is loads of copies of -ve strand)
Same enzyme is used to make (+) mRNA from the parental (-) strand
From her translation occurs to produce viral proteins, upon assembly they steal some host membrane
how does influenza, a (-) RNA virus, replicate?
Similar to rabies, except the RNA is in pieces, not one long strand, there are 8 linear ssRNA molecules
Two membrane bound proteins that bind to sugars on host cells - neuraminidase and hemagglutinin
the virus (virion) has the RNA polymerase but also an RNA endonuclease
first, the viral genomic RNA replicates in the host nucleus the same as rabies, using the RNA replicase
In transcription, influenza actually makes a 5’cap
The virus steals 5’caps/primers? from host mRNA using its viral endonuclease
Poly A tail is added and the viral mRNA moves to cytoplasm, just as the host would do
the influenza genome changes, meaning vaccines are required yearly - what is meant by antigenic drift and antigenic shift?
antigenic drift = slight and gradual change in surface proteins like hemagglutinin and neuraminidase in influenza
antigenic shift - rapid and large change in surface protein genes, when different strains meet in the host and exchange RNA/ get reassorted - origin of pandemics and epidemics
how does HIV - retrovirus - replicate?
2 identical (+) ssRNA replicated through a cDNA intermediate using reverse transcriptase to make this cDNA from the viral RNA
On the RNA - gag region encodes structural proteins, pol encodes reverse transcriptase and integrase and env encodes envelope proteins of the membrane
Once you’ve got the dsDNA, it’s integrated into the host cell DNA using the integrase enzyme the virus brought with it. From here it can just be transcribed with host DNA
Unless latent, promotors in the LTR region
of the DNA cause transcription to produce capped and polyadenylated mRNA
Similar to polio, produces polyproteins
DNA viruses of eukaryotes - give some examples
polyomaviruses and pox virus (except pox viruses replicate in host cell nucleus)
some polyoma viruses such as SV40 can induce tumours in animals
SV40 (simian virus 40) is very small - how does it specialise it’s replcation to accomodate?
has a circle of dsDNA, replicated in both directions using host cell machinery
no viral enzymes are required (this is often used as a vector for moving genes into eukaryotic cells)
only encodes large and early proteins
genes overlap so only a short section of DNA is used to code for multiple proteins
coronavirus - what is it and how does it replicate?
it is a ssRNA (+) virus
Respiratory infections in human
15% common colds, but can be fatal
Largest known RNA viruses
Glycoprotein spikes look like a crown hence ‘corona’
The genome is (+) ssRNA and already has a 5’cap so can already act as an mRNA
The only translated gene at first is a for a replicase
This replicase is used to produce a (-) strand of RNA
From here, the (-) strand is used to make monocistronic mRNAs to make viral proteins, or to make many copies of (+) RNA
Replicate in cytoplasm
Respiratory infections in human
15% common colds, but can be fatal
Largest known RNA viruses
Glycoprotein spikes look like a crown hence ‘corona’
The genome is (+) ssRNA and already has a 5’cap so can already act as an mRNA
The only translated gene at first is a for a replicase
This replicase is used to produce a (-) strand of RNA
From here, the (-) strand is used to make monocistronic mRNAs to make viral proteins, or to make many copies of (+) RNA