module 3: beyond genome sequencing Flashcards

1
Q

why bother with NGS if the human genome has already been sequenced?

A

1) Clinical setting - gives info about potential disease causing mutations

2) Phylogenetic studies

3) Compare sequences between population to detect variability

4) Keep up with the changes of the genome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

1) What is whole-exome and targeted sequencing?

2) How does it work?

A

1) Sequencing the exonic region (sequences retained in mature mRNA) of the genome! It is important for CLINICAL RESEARCH.

2) Same as whole genome sequencing but differs in library preparation - after fragmenting DNA, special beads bind to exonic sequences only.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Whole-exome and targeted sequencing _______ sequencing power by reducing the _______ area covered.

It lays the framework for causes of _________.

A

increase; genomic

autism (complex disease with many factors contributing to it)

*most disease causing mutations disrupt gene expression; affect coding regions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

_________ platforms can detect modified bases.

A

Nanopore

*has lots of potential for epigenetic studies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

(T/F) NGS is not important after sequencing one genome as a single genome can represent the genetic diversity of our species.

A

False!

NGS help CATALOGUE human genome diversity: NO SINGLE GENOME CAN REPRESENT THE GENETIC DIVERSITY OF OUR SPECIES.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which statement is true?

1) The reference genome for humans is mosaic (many different genomes).

2) 0.6% of NT differs between any 2 individuals.

A

All are true!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

1) What was the first project to catalog human diversity?

2) What was the “All-of-US” project?

A

1) 1000 genome project which ran from 2008-2015. it sequenced 2,504 individual genomes from 26 different populations. POPULATION-level sequencing

2) aimed to sequence the genomes of 1 million American citizens to accelerate research and improve health

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why is cataloging human genetic diversity important?

A

1) helps us see where our genomes differ and how these can affect our phenotypes.

2) can help us learn how our genetics can influence our response to certain drugs and our susceptibility to different diseases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Human Pangenome Reference Consortium looked at 47 phased diploid genomes.

1) What does phased diploid genome mean?

2) Why is this important?

A

1) Separated maternal chromosomes from paternal chromosomes.

2) Reference sequences provide a consensus sequence of two homologous that doesn’t take in the diversity between maternal and paternal.

This will better represent the diversity of the human genome. Each genome carries a certain number of DNA variants.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

1) Define the term mutation.

2) Mutations are caused by:

A

1) Permanent change in the DNA sequence compared to what is predominant in the population

2) Endogenous (un-repaired DNA damage) and exogenous sources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Give examples of exogenous and endogenous sources of mutations.

A

Exogenous:
- ionizing radiation (DNA breaks)
- UV rays (thymine dimers, issues with DNA replication)
- chemicals (deamination, oxidation of bases)

Endogenous:
- oxidation of bases (G -> T)
- errors of DNA pol
- mis-repaired ds/ss DNA breaks
- loss of a purine/pyrimidine (abasic site)
- cytosine deamination (gives uracil)

*most endogenous sources are repaired normally but when they are not repaired, it can lead to mutations!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

(T/F) Mutations can range from a single bp to millions of bp. They are also inevitable.

A

True!

We can not stop mutations. They can be good or bad depending on where they occur and their nature.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

(T/F) We lose 500 purines per day and 10,000 pyrimidines per day.

A

False!

We lose 10,000 purines per day and 500 pyrimidines per day.

These are normally repaired but if not, errors arise during DNA replication.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

1) Define genetic VARIATION.

2) What are the two types of variations? Briefly describe each.

A

Variation: mutations that result in ALTERNATIVE forms of DNA (established in a population).

Common variation: minor allele (least common allele) frequency of at least 1% in the population

Rare variation: minor allele frequency of <1% in the population

*not a strict rule

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Define allele.

A

Allele refers to one of two or more versions of a DNA sequence at a given location.

For any genomic location, we have two alleles (maternal vs paternal).

You can be HOMOZYGOUS or HETEROZYGOUS for alleles.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

(T/F) If you are homozygous for one loci, chances are you are homozygous for all.

A

False!

You can be homozygous for some loci but heterozygous for others.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the four types of genomic variants?

A

1) Single NT polymorphisms

2) Insertion-Deletions (INDELs or DIPs)

3) Simple sequence repeats (SSRs)

4) Copy number variants (CNVs)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Single nucleotide polymorphisms, also known as SNPs are the ____ common genomic variants (1 in every ____ NTs).

There are about _______ SNPs in the human genome.

What are the causes of SNPs?

A

Most; 300

10 million

Causes of SNPs are the same as the causes of mutations; errors, radiation, oxidation, endo vs exo, etc.

*SNPs are point mutations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Briefly describe where SNPs can occur within the genome.

Which location has the most visible impact?

A

SNPs can occur in the:

1) Coding

2) Non-coding (introns - could affect mRNA splicing, TFs binding, and stability of mRNA esp if it is in the 3’ UTR)

3) Intergenic (regulatory regions between genes - can affect transcription)

Most visible impacts are seen when SNPs are present in the coding and intergenic regions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

There are two types of coding SNPs.

Differentiate them.

A

1) Synonymous: NO CHANGE in the amino acid thus no impact of protein

2) Non-synonymous: CHANGES the amino acid.

There are two types of non-synonymous: MISSENSE or NONSENSE (intro of premature STOP codon).

*the mRNA usually gets degraded before it can be translated if it has a non-synonymous coding SNP.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the difference between a causative and a correlated SNP?

A

Causative SNP: SNP alters protein function, leading to disfunction in the organism. The SNP causes the observed phenotype.

Correlated SNP: SNP is not within a coding region but is inherited with a mutation that causes a disease. The SNP does not cause the observed phenotype.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

(T/F) Most SNPs have a significant impact on the health and development of humans.

A

False!

Most SNPs are not observable unless they are affecting a coding/regulatory region.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the human germline mutation rate for SNPs?

Knowing the human germline mutation rate for SNPs and that we have a lot of SNPs in our genome, what does this tell us?

A

1 in 100 million NTs are substituted per generation (~1.2x10^-8 per site per generation). This means 30 NEW PT MUTATIONS per generation are arising in an egg/sperm.

We find a lot of SNPs in our genome and we know that each incidence of creating that SNP is a rare event. This tells us that SNPs are VERY OLD mutational events inherited by a COMMON ANCESTOR.

We can compare the SNPs across various genomes to trace the origins of the human species!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Most SNPs are __-allelic.

A

Bi-allelic

This means that most SNPs come in one or two varieties. For example, a locus can have either A or T but not G or C.

This is because the germline mutation rate is so LOW! That exact position has to be mutated more than once to be tri-allelic and more, which is very rare.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

An example of an SNP is in the gene ABCC11 which encodes a membrane transporter in sweat glands.

The minor allele (__) encodes for dry earwax and no body odour found in _________.

The major allele (__) encodes for wet earwax and normal body odour found in _______ and _____.

A

TT; East Asians

CC; Europeans and African

*this is an example of A SINGLE SNP having a profound change.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

1) What is the second most common form of genetic variation in the human genome; at what rate?

2) What is the size of these? Which ones are more common?

3) What are they caused by?

A

1) Insertion-deletions (INDELs or DIPs) are the second most common form. They occur 1 per 10kb of DNA.

2) INDELs or DIPs can be 1 to 10,000bp in length. The shorter ones (1, 2, 3 bp) are most common.

3) These are caused by errors in DNA replication, recombination or repair.

*depending on where they occur, they can have an effect.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Coding region INDELs can lead to catastrophic phenotypes.

Briefly describe the two types of coding region INDELs.

A

1) Frameshift: changes the reading frame

2) Non-frameshift: multiple of three NTs added or deleted, leading to no change in the reading frame.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Simple sequence repeats (SSRs) are also known as ________.

They account for __% of the total DNA.

___-___ base pairs repeated in tandem (up to 100 times).

Frequency of once in every ___ of DNA.

The germline mutation rate is ____ per locus per gamete.

A

microsatellites

3%

1-6bp

30kb

10^-3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

List these statements as either true or false.

1) For SSRs 2 and 3 base pairs being repeated is the most common. It is the NUMBER OF REPEATS that is variable between people.

2) SSRs are not as common as SNPs or INDELs and they affect less DNA.

3) SSRs are more polymorphic than SNPs as they change more frequently. They also are not BI-allelic. However, the rate of new formation for SRRs is still low enough that it usually doesn’t change within a few generations of a family.

A

1) True!

2) False. Though SSRs are not as common as SNPs or INDELs, they can affect more DNA.

3) True!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Give an example of a disease caused by SSRs and answer the following questions regarding it.

1) What is the SSR involved?

2) What is the correlation between number of repeats and age of onset?

A

Huntington’s disease (HD) is an autosomal dominant, neurodegenerative disease with no cure is an example.

1) Polyglutamine disease (polyQ) due to the trinucleotide expansion of CAG (codes for glutamine) within the HTT gene.

2) Number of repeats proportional to the age of onset (more repeats = earlier onset).

*families with a history of HTT will show an earlier and earlier onset of the disease with each generation as the triNTs can expand.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

1) What are copy number variants?

2) How important are they?

3) How can they be detected?

A

1) DNA segment > 1kb that is present in variable copy number compared to a reference genome.

2) They are as important if not more than SNPs. Though they are not as abundant as the others, they affect more DNA!

3) They can be detected using CGH.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

(T/F) We have strictly two copies of each gene in our genome!

A

False!!

This is not accurate - we can have more than two.

For example, the copy number of each gene of the olfactory receptor genes is EXTREMELY VARIABLE! They can have from none to 6 copies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Why do we need to sequence more genes when we already have the first human genome sequence from 2001?

A

The first human genome sequence was a PATCHWORK of DNA sequences from different individuals. It LACKS diversity.

For a more detailed view of human genetic variation, we need to compare DNA from many different individuals. Thus, we have to sequence as many genomes as possible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

(T/F) In Craig Venter’s genome, the majority of the differences from the reference sequence came from the large-copy-number variants.

A

True!

*there is no such thing as WT or reference genome! lots of variations found in different genomes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What was the 1000 genome project?

A

It was a population-level sequencing project that wanted a deep catalogue of human genetic variation.

They wanted to find genetic variants with a frequency greater than 1% (common variants).

They sequenced over 2500 samples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Which continent has the highest variant sites per genome? Why?

A

Africa has the highest variant sites per genome (most are SNPs). They are the oldest population and thus had more time to accumulate these variants in their genome.

Other continents have lower variant sites per genome due to the Founder’s effect.

This occurs when a small group of migrants leave a population and move elsewhere and fail to capture all of the diversity of the original population. This group also interbreeds and limits diversity.

Thus other populations such as Europe, East Asia, and South Asia have lower genetic diversity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

(T/F) In average, there are 4-5 million variant sites per genome.

A

True!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

(T/F) The majority of the SNPs in our genome are not common and are restricted to a subpopulation.

A

False!

The majority of the SNPs in our genome (3/4th) are common - they are VERY OLD mutational events.

The less common SNPs are restricted to a sub-population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Besides the 1000 genome project, there are two other population-level sequencing projects underway.

Briefly describe each.

A

1) Exome aggregation consortium (ExAC) catalogued genetic variation in the PROTEIN-CODING REGIONS of the genome - looked only for SNPs.

2) Genome aggregation database (gnomAD) that took information from ExAC and had additional whole-genome and exome sequencing. They were looking for SNPs and LARGER VARIANTS (deletions, duplications, inversions, etc).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

The gnomAD project had an average coverage of ____.

It was representative of the general ____ population and did not include severe ______ diseases.

A

32x

adult; Mendelian

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Briefly answer the following questions regarding the results of gnomAD:

1) What types of variants were most commonly found among the structural variants?

2) What is the approximate number of structural variants identified in each individual, as opposed to previous estimations?

3) What were the characteristic traits of the SVs identified?

A

1) The majority of structural variants found were DELETIONS (cnv) and INSERTIONS (non-cnv).

2) Each individual had approximately 7000 structural variants, DOUBLE what was previously identified.

3) Most of the Structural Variants are SMALL and RARE.

We have many COMPLEX variants (more than 1 type of change in the DNA).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Like with SNPs, there were more structural variants found in the ________ population.

Unlike SNPs, the majority of the structural variants are _______.

_________ are variants that occur only once in a population (or the genomes studied). Over ____% of the SVs were found only once.

A

African

Rare (90%)

Singletons; 50*

*shows how rare SVs are as there were many singletons

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

75% of the complex SVs contained ________.

The majority of SVs tend to occur frequently near the _______ and _______.

*which SVs are found in telomeres and centromeres?

*which SVs are found more in the arms of chromosomes.

A

Inversions

Telomeres; Centromeres

*Deletions, duplications, and Multiallelic CNV

*Insertions, inversions, and Complex

44
Q

Why do we care about cataloging genomic variants?

A

Variants can help find disease-causing genes, especially SNPs.

45
Q

What are the two Mendel’s Laws?

A

1) Parental alleles segregate randomly (law of segregation)

2) Pairs of alleles segregate independently (law of independent assortment)

46
Q

What is crossing-over?

What are the genotypes with crossing over and without crossing over?

A

Crossing over is the exchange of genetic material from maternal and paternal homologues in meiosis I.

If there is no crossing over, you can have only the paternal genotypes.

If there is a crossing over, you have the possibility of all 4 genotypes (50% parental, 50% recombinants) assuming they assort independently.

47
Q

DNA sequences that are in _______ on a chromosome tend to be inherited together.

A

proximity

*there has to be a DNA break for crossing over to happen.

2 genes that are close together will be separated by crossovers less frequently than 2 genes that are more distant from each other (location of initial dsDNA break is somewhat random)!!

48
Q

1) What is recombination frequency?

2) How do you calculate RF?

3) What is the distance between 2 genes that recombine with a frequency of 1%?

4) What frequency would we expect if two alleles are not linked? What does this mean during recombination?

A

1) Recombination frequency is the MEASURE OF THE DISTANCE between TWO GENES.

2) RF = (# of recombinant progeny/total # progeny)*100

3) 1 centiMorgan (cM) is the distance between 2 genes that recombine with a frequency of 1%

4) 50% - our 2 genes are not linked and are GETTING SEPARATED BY A RECOMBINATION EVENT.

49
Q

(T/F) If the recombinant rate is higher than 50%, the two genes are closer together because they are being separated less frequently.

A

False!!

50% RF = two genes are not linked. MAX RECOMBINANT RATE.

If the recombinant rate is lower, the two genes are closer together because they are being separated less frequently.

*RF - how often genes are being separated; if separated - they are further from each other.

50
Q

How does Recombination Frequency (RF) assist in identifying disease-causing genes that have not been localized?

A

RF enables us to use a marker (genomic variant!) to determine the precise inheritance pattern of the gene that has not yet been localized.

For example, if a certain SNP is close to a mutation that causes a disease, they will be passed on together. We are fishing for disease-causing genes by looking for SNPs that are present in a large portion of those individuals.

51
Q

What are the disadvantages of using RF in identifying disease-causing genes that have not been localized?

A

1) Need to find a family LARGE enough that is affected and need to genotype a bunch of SNPs in these individuals - some are dead (can’t analyze). Families are also getting SMALLER - ethical questions.

2) BIG LIMITATION: The disease has to be inherited in a Mendelian fashion for us to study the pedigree (complex traits don’t follow Mendelian inheritance).

52
Q

What are the two methods we can use to find genes responsible for complex traits without studying family pedigrees?

A

1) Genome-wide association studies (GWAS)

2) Precision medicine

53
Q

Complex traits are subjected to any of four traits. What are they?

A

1) Incomplete penetrance

2) Phenocopy

3) Genetic heterogeneity

4) Polygenic heredity

54
Q

Match the following traits to their descriptions:

1) Incomplete penetrance

2) Phenocopy

3) Genetic heterogeneity

4) Polygenic heredity

A) mutant phenotype not caused by an inherited mutation (phenotype, but not genotype)

B) two or more genes influence the expression of the phenotype

C) individual with the mutant genotype may not express the mutation phenotype

D) mutations at more than one locus cause the same phenotype (different genotypes, same phenotypes)

A

Incomplete penetrance: individual with the mutant genotype may not express the mutation phenotype

Phenocopy: mutant phenotype not caused by an inherited mutation (phenotype, but not genotype)

Genetic heterogeneity: mutations at more than one locus cause the same phenotype (different genotypes, same phenotypes)

Polygenic heredity: two or more genes influence the expression of the phenotype

55
Q

(T/F) Most diseases do not have a genetic component.

A

False!

Most diseases do HAVE a genetic component - the percentage differs.

56
Q

What is a haplotype?

A

Haplotype is a term that is derived from “haploid genotype.”

It is a cluster of variants (mostly SNPs) present on THE SAME CHROMOSOME and are genetically LINKED (inherited from the same pattern).

In other words, a haplotype is two or more variants (SNPs) inherited together on the same chromosome.

57
Q

Why is it crucial for the genetic distance occupied by multiple loci of a haplotype block to be short?

A

The genetic distance occupied by multiple loci in a haplotype block (approximately 1-100kb) must be short enough to ensure that the ALLELES REMAIN ASSOCIATED together on the same chromosome across generations.

A distance of 100 kb (0.1cM) = 0.1% chance the haplotype will be eliminated due to a recombination event.

58
Q

1) What is the primary goal of haplotype association analysis? How can this be done?

2) Define Linkage Disequilibrium (LD) in the context of haplotype association analysis.

A

1) Trying to identify haplotypes that are more frequent in affected individuals than non-affected.

Genotyped SNP will be in the same haplotype block as the disease-causing mutation. The SNP acts as a proxy for the disease mutation through an INDIRECT ASSOCIATION.

2) LD refers to the NON-RANDOM association of alleles at adjacent loci, indicating a tendency for certain alleles to be inherited together (certain SNPs and disease causing genes) more often than expected by independent segregation.

59
Q

Diversity within the haplotype in the population is _______.

A

Low!

The probability of forming a particular haplotype with more than 4 SNPs is VERY LOW (SNPs are rare and old). These are inherited from a common ancestor.

60
Q

How can you determine which haplotype an individual has?

A

To find which haplotype an individual has, we need to figure out the SNPs they have.

If there are too many SNPs, it might be too much work.

Thus you need to FIND A FEW SNPs within the haplotype that are REPRESENTATIVE, also known as TAG SNPs.

*computer gives us the TAG SNPs.

61
Q

What is genome-wide association studies (GWAS)?

A

GWAS looks across the ENTIRE GENOME for blocks of SNPs to see if they are statistically more often in affected individuals than non-affected.

62
Q

To do GWAS, you have to genotype SNPs.

1) What are the two types of SNP genotyping?

2) What method is most common? Why?

A

1) Whole-genome vs targeted

2) Microarray-based methods (such as CGH) are most common.

This is because they interrogate MILLIONS of markers per sample at the same time. All of the genome is genotyped in one experiment.

63
Q

Briefly describe how Illumina Infinium is used to genotype SNPs for GWAS.

A

1) Fragment genomic DNA into small pieces

2) Hybridize fragmented DNA to a BeadChip - a microarray with millions of tiny beads bound to the surface of the array.

3) Wash the BeadChip

4) EXTEND by introducing a di-deoxy NT that is complementary to the SNP and STAIN

5) Image the Bead-chip and compare the relative intensities of the flurophores.

64
Q

In a BeadChip, each bead contains many identical probes (oligoNTs) that are complementary to the genomic DNA up to but not including the ____.

Each bead represents a _____ genomic locus (SNP).

A

SNP

Different

65
Q

In Illumina Infinium, A and T are labeled with one fluorophore, while C and G are labeled with a different fluorophore.

1) What occurs when an individual is heterozygous in this method?

2) What challenge arises when an individual is heterozygous for either C and G or A and T?

A

1) Two NTs are incorporated in the same bead if the individual is heterozygous.

2) In cases where an individual is heterozygous for C and G or A and T, this method cannot distinguish between them, requiring the design of different probes.

66
Q

Briefly describe the process of GWAS.

A

1) Genotyping SNPs

2) Bioinformatics - statistical tests to find SNPs (markers) found more in the affected population

3) Identification of the genes

67
Q

What is the most difficult part of GWAS? How are scientists overcoming this?

A

recruiting participants

scientists can use BIOBANKS (biorepository that accepts, processes, stores and distributes bio specimens and associated data for use in research and clinical care)

68
Q

All GWAS studies follow a multi-stage approach. What are the three?

A

1) Stage 1: genotype full set of SNPs in relatively SMALL population at LIBERAL p-value

2) Stage 2: less SNPs (only those statistically relevant), LARGER population at more STRINGENT p-value

3) Optional third stage for INCREASED stringency

69
Q

What does the Manhattan Plot tell us?

A

An association test (is this SNP found more in the affected than the non-affected) is performed for EACH SNP, yielding a P-value.

A Manhattan Plot plots these P values for each SNP as dots.

Significant associations have a P value less than genome-wide significance (dividing 0.5 with # of tests done). These dots cross the significant level line of the plot.

The dot at the top is the most statistically significant.

If there is a column of dots = a HAPLOTYPE.

70
Q

Briefly describe how GWAS was used in identifying novel risk loci for type 2 diabetes (process, results).

A

First, they found ~400K SNPs by genotyping SNPs in 1400 individuals for stage 1. Then they performed an association test for each SNP, yielding a P-value.

Then, they chose 59 statistically significant SNPs (59) for stage 2. There, they genotyped ~2500 individuals that have T2D and those that don’t.

They found 8 SNPs that represent 5 loci, demonstrating that 5 common variants contribute to the risk of T2D. One gene was already known and these genes were all involved in diabetic processes.

Lastly, they calculated the PAR and found that the 5 loci control 70% of the chance of developing T2D!

71
Q

What is a Population Attributable Risk (PAR)?

A

Proportion of incidence of a disease in the population that is due to exposure. It represents the incidence of a disease in the population that would be eliminated if exposure were eliminated.

For example, if you do not have a certain SNP, your risk of developing a disease is increased or decreased by a certain percentage.

*can’t account for 100% of the PAR bc there are some RARE alleles to be identified - the larger GWAS studies, the better we are at detecting these RARE alleles.

72
Q

A few alleles contribute most of the risk of developing a disease. These risk alleles happen mostly to be ______ alleles.

A

Major

*most common in a population!

73
Q

Briefly describe precision medicine.

A

It is an emerging approach for disease treatment and prevention that considers individual variability in genes, environment and lifestyle.

It is not currently in use for most diseases.

74
Q

The very first application of precision medicine is pharmacogenomics. What does this mean?

A

Identification of genetic variants that influence drug effects (pharmacokinetics).

By using the information generated by studying genomic variability to personalize medicine, we can maximize efficiency and minimize adverse reactions.

75
Q

1) What is the role of the clotting cascade in blood clotting?

2) How does Warfarin function as an anticoagulant?

3) Why is Warfarin considered a challenging drug to administer?

A

1) Cascade transitions a soluble protein into an insoluble protein, which is essential for blood clots. Both intrinsic and extrinsic factors initiate this cascade.

2) Warfarin inhibits vitamin-K recycling. When there is an inhibition of vitamin K recycling (oxidized to reduced), there is a lower amount of reduced vitamin K, resulting in decreased enzyme activity that produces active forms of clotting factors.

3) Warfarin has a narrow therapeutic margin; an excessive dosage can lead to reduced clotting, risking severe bleeding, while an insufficient dosage can cause increased clotting, potentially resulting in heart attacks. This narrow margin complicates its administration.

76
Q

What is the enzyme responsible for metabolizing warfarin?

A

CYP2C9

It is part of the CP450 family that contains heme as a cofactor and are membrane-associated protein.

77
Q

Out of the ____ alleles of CYP2C9 with over ____ variations in DNA sequence, there are 3 main alleles.

Briefly describe each and how they are formed.

A

37; 300

1) CYP2C9*1 = wild type -> normal activity

2) CYP2C9*2 = R144C variant -> reduced activity up to 20-30%

3) CYP2C9*3 = I359L variant -> reduced activity up to 80%.

Coding sequence SNPs alter the amino acid sequence of the proteins, resulting in altered (reduced) enzymatic activity.

*people can be homo or heterozygous for the 3 alleles - 1/1 is WT while 3/3 is extremely slow.

78
Q

CYP2C9 variants 2 and 3 are a problem because?

Why is this important for doctors?

A

They have reduced activity meaning they metabolize the drug slowly, leaving active drug in the system for longer than intended.

Doctors have to consider genetics when it comes to dosing warfarin! They can give the WT dose to an individual who is homozygous for allele 1, while giving the lowest doses to individuals who have 2/2, 2/3, or 3/3.

79
Q

1) What is VKORC1?

2) What are the three genotypes; what are their results?

3) For which genotype do you have to give a higher and lower warfarin dose?

A

1) The VKORC1 gene encodes for vitamin K epoxidase reductase that reduces vitamin K. It is what is inhibited by warfarin.

2) There are two alleles for VKORC1; G and A. G is the common allele that results in high enzyme production. A is the less common allele that results in low enzyme production. GG = high enzyme pdt, GA = medium, and AA = low.

3) For GG, you would give a higher dose while for AA, you would give a lower dose (need less warfarin when you have less enzyme to inhibit).

80
Q

Both CYP2C9 and VKORC1 have to be considered for prescribing warfarin.

While you give the _____ dose to someone with the 1/1 allele of CYP2C9 and GG allele of VKORC1, you have to give the ______ dose to someone with the 3/3 allele of CYP2C9 and AA allele of VKORC1.

A

Highest; lowest

81
Q

Give some examples of precision medicine.

A

1) Ivacaftor treats cystic fibrosis caused by G551D mutation

2) Imatinib treats leukemia caused by Philadelphia chromosome

3) Stem cell and replacement tissue

4) Molecular profiling of microbes

82
Q

What are the three treatments and cures for genetic diseases?

A

1) Protein therapeutics (replacement/augmentation; antibodies)

2) Viral & non-viral delivery strategies to treat LOF genetic diseases

3) RNA modification therapies (RNAi, Anti-sense oligos)

83
Q

What are the limitations of the treatments and cures for genetic diseases?

A

1) Delivery barriers !!!!!! (protein has to function outside of a cell while gene has to go inside)

2) Incomplete suppression (not to 100%)

3) Off-target effects

4) Safety

84
Q

What are the three tools used to alter the genomic DNA sequence?

A

1) Zinc-finger nucleases

2) TALEN

3) CRISPR-CAS

*the limitations of treatments n cures could be overcome this way

85
Q

ZFNs is a fusion protein composed of a _______ binding domain of two zinc-finger proteins (left and right) and a ______ domain of ____ restriction enzyme.

ZFNs are also known as ________ ______.

Zinc fingers can be engineered to specifically recognize a ___ bp sequence that flanks the cleavage site.

A

DNA; nuclease; Fok1

Molecular scissors (they hone into a specific sequence and introduce a dsDNA break)

3bp

*they contain 4-6 domains of zinc fingers to be more specific

*wherever Fok1 dimerizes, it induces a dsDNA cut (the left n the right ZFs bind pretty close to each other)

86
Q

1) What is TALEN composed of?

2) What are hypervariable residues? Which location are they found in?

3) How does it bind and cleave DNA?

A

1) TALEN is composed of a non-specific Fok 1 nuclease domain fused to a CUSTOMIZABLE DNA-binding domain (TALE repeat domains).

2) There are several repeats in tandem and each repeat contains the same ~35 amino acids except two that differ. These are known as the hypervariable repeats at locations 12 and 13.

3) TALEN binds and cleaves as a dimer. The left TALEN with one Fok 1 nuclease domain binds to the right TALEN with the other Fok 1 nuclease domain. There is 13-18bp spacing between left and right.

87
Q

Compare and contrast ZFNs with TALEN.

A

Similarities:
- both are proteins
- the left and the right of both must dimerize so that Fok1 dimerizes and creates a cut
- both contain a specific domain that will recognize a given DNA sequence

Differences:
- TALEN has TALE repeat domains with each domain recognizing one base
- ZFNs have zinc fingers that recognize three bases per finger

88
Q

ZFNs and TALENS introduce dsDNA breaks that must be repaired.

What are the two ways to repair ds DNA breaks? Briefly describe each.

A

non-homologous end-joining (NHEJ): processing of ends by a large complex of proteins that causes loss of genetic material. FAST but LOW FIDELITY (indels -> frameshifts). It occurs in ALL stages of the cell cycle.

homology-directed repair (HDR): use of the homologous chromosome as a template to copy the missing info. SLOW but HIGH FIDELITY. It is RESTRICTED TO CERTAIN STAGES (S or G2 phase where there is a copy of DNA - dividing cells).

89
Q

How can we use NHEJ and HDR (as a result of ZFNs and TALENs) to alter the genomic DNA sequences?

What are the disadvantages of this?

A

We can use NHEJ to disrupt a harmful gene.

We can supply a template for HDR so the template is copied on the gene being repaired or simply repair defective allele!

Disadvantages:

For HDR, we can not control what template cells will use (supplied or homologous chromosome) and can not control which repair pathway a cell will use!

Programming ZFNs and TALENs is difficult. The molecular biology is complex!

90
Q

Which one of the statements is true?

1) CRISPR-CAS was first discovered in viruses as part of the defence mechanism.

2) The 3 distinct CRISPR systems seen in organisms are I, II, and III, where II has been modified as a lab tool.

3) CRISPR-CAS stands for Clustered Regularly Interspaced Short Palindromic Repeats and CRISPR Associated System.

A

2 and 3!

1) CRISPR-CAS was first discovered in BACTERIA (strep pyogenes) as part of the defence mechanism.

91
Q

What are the two components CRISPR is composed of?

How is it different than ZFNs and TALENs?

A

1) sgRNA that is complementary to the target sequence, guiding the nuclease to a direct spot.

2) Cas9, the nuclease that introduces dsDNA break.

It is different because it is not protein based but rather RNA-based. There is also no need for dimerization.

92
Q

Briefly describe the structure of the short guide RNA (sgRNA).

A

For recognition: 20 NTs at the 5’ end that is complimentary to the genomic target. THIS IS THE ONLY PART THAT CAN BE CHANGED!

It contains both sequence and structural features (secondary sequences) necessary for DNA cleavage such as the stabilization of the sgRNA and supporting of complex formation with cas9.

93
Q

1) What is the PAM sequence?

2) Where does gRNA bind?

3) Where does the cleavage occur? How is it fixed?

A

1) PROTOSPACER ADJACENT MOTIF sequence is 5’-NGG-3’. Guide RNA is designed to be complementary to the 20NTs UPSTREAM of the PAM.

2) PAM is found on only one strand and the gRNA binds to the strand that does not have PAM.

3) Cleavage occurs after the 3rd NT upstream (5’-) of PAM at both strands through the two endonuclease domains of cas9. Break is repaired by NHEJ and HDR.

94
Q

What are three methods of delivering the CRISPR/Cas9 system into cells?

A

1) Plasmids

2) Ribonuclease Protein Complex = cas9 protein + sgRNA

3) Viral particles (adenoviruses) carrying sgRNA/Cas9 expressing cassettes

95
Q

What are the advantages and disadvantages of the CRISPR/Cas9 system?

A

Advantages:
- easier to use compared to ZFNs and TALEMs

Disadvantages:
- has to enter the cell, faced with the same delivery barriers as gene therapy. how can u target a specific cell where you want that change to happen?

96
Q

What is multiplex gene editing?

A

Introduction of multiple gRNAs leading to simultaneous mutations.

1) Multiple gene knockouts - useful when studying large protein families

2) Exon exchange - exchanging an exon!

*CRISPR useful for studying deletions and inversions

97
Q

What are the two types of Cas9 variants? Briefly describe what they do.

A

1) Nickases: cuts only ONE STRAND instead of both strands.

PAIRED NICKING (use of two nickases together) creates staggered dsDNA breaks! This INCREASES SPECIFICITY (two guide RNAs = 40 NTs) and REDUCES off-target effects (if one binds to an off-target site, it will only nick the DNA which gets repaired with no problem!).

2) Dead (dCAS9) - recruited by gRNA but is unable to cleave target DNA site. Cas9 can be fused to a heterologous effector domain (TFs, fluorescent proteins, etc). We can then target these effector to a specific region in the genome.

98
Q

(T/F) The CRISPR revolution: the biggest game change since PCR!

A

True!

  • 20NTs only to be changed: ease of use
  • works in all cell types at all cell stages
99
Q

What is base editing?

A

Base editing is the substitution of the base of the DNA.

It is usually done by CRISPR-Cas9.

100
Q

What is a cytosine base pair editor?

What is it composed of?

A

Cytosine base pair editor converts C-G bp into an A-T bp.

It contains:

1) ssDNA-specific cytidine DEAMINASE - this deaminates the cytosine into a Uracil
2) gRNA - provides target specificity
3) UGI - inhibits uracil-N-glycosylase that would normally repair the deamination
4) Nickase - nicks NON-EDITED strand

101
Q

In base-editing, the gRNA binds to the sequence _____ of the PAM, which is the __________ strand.

By binding to this sequence, the guide RNA creates an ______ that results in an ssDNA which is targeted for the _____.

The nick occurs on the ________ strand.

A

opposite; non-edited

R-loop; edit (deamination)

non-edited

*the edit occurs on the strand with the PAM sequence.

102
Q

Briefly describe the steps of base editing.

What would happen if the nickase nicked the edited strand?

A
  1. Deamination of Cytosine to a Uracil by the DEAMINASE.
  2. Inhibition of the normal, repair mechanism pathway by UGI.
  3. Nickase nicks the non-edited strand, targeting it for cellular mismatch repair.
  4. Cellular mismatch repair converts the G into an A, while DNA replication or repair converts the U into a T, leading to permanent base editing.

If Nickase nicked the edited strand, the uracil would be converted back to cytosine (original bp).

103
Q

What is the activity window?

A

Activity window is the range of bases in the protospacer sequence which is favourable for the EDITING activity during base editing.

It is approximately 5-6 Nts long.

The base that is to be changed has to be within the activity window.

104
Q

(T/F) Because the Cas9/nickase always cleaves 3 NTs upstream of the PAM, the editing also happens here.

A

False!

The edit does NOT have to happen where the nick occurs.

105
Q

What is bystander editing?

A

If there are multiple cytosines within the activity window, all get deaminated to Uracil.

Multiple changes occur. While this is desired sometimes, it leads to unintended modifications.

106
Q

What is an Adenosine Base Editor?

What is it composed of?

A

When Adenosine is deaminated it becomes Inosine, which is read as G by polymerases.

Adenosine Base Editor converts A-T to G-C bp.

It contains:
- Guide RNA
- Nickase
- Adenosine deaminase

*similar to the cytosine base editor, just have to choose the right enzyme, PAM sequence and activity window.

107
Q

Why do we need an inhibitor in the cytosine base editor but not in the adenosine base editor?

A

Adenine deamination does not occur in eukaryotes. We are using the engineered E. coli enzyme (adenine deaminase) for the base editor. Since it does not occur normally, there is no endogenous pathway for repair that needs to be inhibited.

Cytosine deamination does occur in eukaryotes. There is an endogenous pathway for repair that would convert the Uracil back to a Cytosine. This has to be inhibited to proceed with base editing.