PART III FROM GENOTYPE TO PHENOTYPE Flashcards
Which areas of the genome are genes concentrated in ?
-G/C rich areas
What percentage of the genome encodes protein or non-coding RNA?
<2%
What percentage of the genome is regulatory/introns?
- 25%
What rough % of the genome is junk DNA?
- > 50%
What is the genes and gene product mismatch problem?
- That there are about 20 000 genes but more than 500 000 proteins
What is a putative gene?
- A gene whose protein and function is not known but it is based on an ORF and believed to be a gene
What is the trasncriptome?
- COMPLETE collection of RNA produced from a genome BUT not every RNA is present in every cell and eukayotic RNAs are spiced
What does alternative splicing give rise to?
- Different protein isoforms from the SAME gene (this partially explains the gene product mismatch problem)
How can an RNA sequence be deduced?
- By making and analysing cDNA
What are cDNAs and ESTs (Expressed Sequence Tags) used to analyse?
- Used to analyse gene structure, and presence + levels of specific RNA in cells
What is transcriptomics?
- The study of THOUSANDS of RNAs simultneously
Is the whole transcriptome produced in cells?
NO
- Because only a subset of genes is active in any cell
What are the 3 major classes of RNAs that make up the eukaryotic transcriptome?
- Ribosomal RNAs trancribed by RNA pol I
- Protein encoding RNAs (mRNA) and microRNAs (miRNA) transcribed by RNA polymerase II
- Small RNAs (including tRNA) trsanscribed by RNA pol III
Are genes organised into operons in eukaryotes?
-NO
What is the splicing process?
- Where eukaryotic mRNA is produced by excision of non-coding segments (introns) from precursor (pre-mRNA)
Is splicing SEQUENCE specific and if so what can be found out from this?
- YES!
- Intron/exon boundaries can be predicted using bioinformatics genomic sequence analyses
- But there is NO specific splice seuqence that is cut out…more an overall general pattern
What is the key to gene identification in eukaryotic genome analyses?
- Accurately predicting splice junctions
Via what process can related but DIFFERENT polypeptides be generated from the same primary transcript?
- Alternative splicing
What allows for different isoforms of a transcript specifically?
- Different EXONS being incorperated OR omitted from the final mRNA
What process explains why relatively few genes in genome can give rise to vastly greater number of proteins?
- Alternative splicing
Can splicing errors cause disease via mutations?
- YES!
- Mutations can occur in splice donor or acceptor sequences OR generate NEW (cryptic) splice sequences
e. g. Exons being omitted (skipped) deletes a section of protein –> severely affects the structure
How can the use of false (cryptic) acceptor or donor sites sseverely affect the protein strucutre?
- By truncating (shortening) or lengthening exons
What is the old definition and 2 new definitions for the gene repectively?
OLD: One gene encodes one protein
NEW 1: Single transcription unit (gene) encodes one set of protein isoforms
NEW 2 (newest): A single polypeptide is the product of a single gene
What 3 things do we need to know from each gene in terms of RNA?
- Where and when it is transcribed into RNA
- How it is spliced, and how many spliceoforms there are
- Whether particular spliceoforms are restricted to particular cells or growth stage
Can 1. Where and when it is transcribed into RNA
- How it is spliced, and how many spliceoforms there are
- Whether particular spliceoforms are restricted to particular cells or growth stage be directly deduced from genomic DNA sequence with CONFIDENCE?
- NO
- Rely on analysis of cDNA and ESTs derived from RNA
What is a method to sequence RNA that is stable?
- Make a DNA cop as DNA is stable, easy to amplify, and easy to sequence (cDNA)
Why is RNA unstable?
- Because it is HIGHLY susceptible to nucleases
What is used to produce DNA from an RNA template (like in some viruses)?
- Reverse transcriptase
What 4 things does creating a complementary DNA (cDNA) rely on?
- RNA can base pair with DNA
- mRNA has a polyadenylated tail (so can be a DNA primer-TTTTTT)
- Aretroviral enztyme–> Reverse transcriptase can prodce DNA from RNA
- No pre-existing gene sequence info is required to generate a cDNA
What does producing a cDNA using PCR require?
- Pre-existing sequence information to design primers
What are ESTs? (Expressed Sequence Tag)
- cDNAs made from mRNAs originating from a specific cell or tissue (DNA copies of mRNA or mRNA fragments)
- represent a SNAPSHOT of the mRNA at that time and place
- If there is a transcriptionally ACTIVE gene it will be evident in Expressed Sequence Tag databases
What is the collection of colonies of ESTs known as?
- The library –> EST from the colony is then sequenced and data lodged in database
What are the 3 uses of EST and EST databases?
- Gene verification
- Gene structure
- Gene expression
How can EST and EST databases apply to Gene verificaiton?
- if DNA sequence from genome matches EXACTLY to a specific EST it can be concluded that the genomic DNA is TRANSCRIBED and it represents a gene (or gene fragment)
How can EST and EST databases apply to Gene Structure?
- In identifying intron and exon boundaries
- ESTs will only match exons–> so segments that do not match with an EST derived from that gene are introns
Do ESTs only match with introns or exons?
- They only match with EXONS
How can EST and EST databases apply to Gene Expression? (5 things…Identify:)
- Identify specific cells or tissue in which the gene is active
- Identify LEVEL of gene activity
- Identify alterations in gene activity in disease
- Identify transcription start and end points
- Identify alternative splicing patterns
What happens if you BLAST an EST sequence BACK onto a genomic sequence and why?
- It will ONLY MATCH EXONS because ESTs are made from POST spliced mRNA
What is the number of clones containing the same EST in one library PROPORTIONAL to?
-Proportional to the transcriptional activity of the gene
Do ESTs have a 5’ end matching the transcriptional start point of its gene?
NO
Do ESTs represent genes active in EVERY CELL?
NO
What does the program UniGene do?
- Matches ESTs from various sources and organises them into transcript families
Is each Unigene entry a collection of ESTs derived from MULTIPLE GENES or a SINGLE GENE?
- SINGLE GENE!
What are microarrays used for (in general)?
- to assess where, when, and how many genes are expressed in specific cells or tissues
What does ‘deep sequecing’ rely on?
- ESTs
What does having no hits in one section of an encode read mean?
- Alternative splicing has occurred (e.g. Exon 4 removed)
What is the simplest and BEST way to determing if a gene is real?
- Identification of a MATCHING RNA transcript (determine transcription start and end points AND to map intron/exon boundaries)
What does transcriptomics via deep sequencing enable?
- The simultaneous identification and study of THOUSANDS of transcripts produced by a specific cell or tissue
What are the two methods that allows transcripts from MANY genes to be assessed simultaneously?
- Microarray analysis
2. RNA deep sequencing
What are the two methods that allow for trancripts from a SINGLE gene to be assessed?
- In situ hybridisation
2. Reverse transcriptase (RT) PCR and real time quanitative (q)PCR
What occurs in the single gene trancript method of in situ hybridisation?
- Labelled DNA or RNA COMPLEMENTARY to target mRNA is soaked into the cell or tissue
- Probe with SPECIFICALLY BIND to the target mRNA and identify where it is being produced
What must be known for primer design in Reverse Trasncriptase (RT) PCR (single gene analysis)?
- The sequence of the target RNA must be known
What does Deep Sequencing involve?
- THOUSANDS OF GENES SIMULTANEOUSLY preparing a cDNA library (Purify the mRNA, Bind polyA fraction (mRNA), Fragment RNA, Convert to cDNA by random priming (Random hexamers and oligo(dT) primers), applying adaptors and sequence)
- Then alalysing milions of SHORT SEQUENCE READS (sequenced from cDNA fragments)
- Match to genome reference DNA sequence
Can deep sequencing be carried out in parallel?
- YES!
What are 3 ways a protein can be detected in cells?
- Antiodies or other binding reagent (cell/tissue strucutre can be maintained)
- Enzyme activity (usually in cell or fluid extracts)
- Mass spectrometry/proteomics (usually in cell of fluid extracts)
What is the difference between RNA and Protein analysis?
- RNA analysis tells you where something is in general (e.g. it is in the neural system) BUT protein analysis tells you what SPECIFIC tissues it is in
What can you use an antibody for in protein detection?
- To PROBE for the location of the product IN or ON cells
- Pattern can suggest the structure or organelle that protein is associated with
What is the process of using a reporter molecule and making a transgenic cell or animal to find out where and when the gene is expressed?
- Identify and CLONE the genes PROMOTER
- Join the PROMOTER to reporter protein coding sequence (e.g. GFP) to make a TRANSGENE
- Introduce transgene into cell or animal and examine by microscopy
What do homologous genes share?
- A COMMON ancestor
What is an orthologue?
- A gene in a SEPARATE species that has the same biological properties and function (doing the same job)
Where can orthologues be found?
- Within conserved sequence segments (syntenic regions) when two genomes are compared
What is a paralogue?
- A related gene for the SAME species for which a function is known
How are paralogues generated?
- By GENE DUPLICATION
What can knowing the function of a gene in one species suggest?
- Can suggest the function of the CORRESPONDING GENE (orthologue) in another species
Why can identification of orthologous genes be complicated?
- they may be on DIFFERENT chromosomes in DIFFERENT species (during evolution of speices, chromosome number and size changes due to shuffling of large segments of DNA–> each segment contains multiple genes)
- May be a number of similar genes (PARALOGUES) in the genome.
What do inter-species comparisons of chromosomes reveal ?
- They reveal segment boundaries and syntenic regions where orthologous genes are likely to be located.
Can syntenic regions between two chromosomes be mapped?
- YES
What is the order of syntenic genes commonly conserved in?
- Commonly conserved in syntenic blocks and paralogues may be found in the SAME region
What is forward genetics?
- Going from PHENOTYPE to GENOTYPE e.g. deafness
What is reverse genetics?
- Going from GENOTYPE to PHENOTYPE e.g. C.elegans deletion of Srp-6 –> Targeted mutations reveal function by altering the phenotype
What are the 4 ways of approaching reverse genetics?
- Loss of functon mutations (gene-knockouts and knock ins) –> Inactivate or silence the gene to destroy expression
- Change of function mutation (Replace the normal gene with an altered gene–> carrying a point mutation in cell or organism)
- Gain of function mutation (Express a gene at incorrect time, or incorrect tissue)
- Dominant negative mutation–> Specifically SUPPRESS protein function by making a COMPETING dysfunctional protein in cell
What does RISC stand for and what does it do (also what process is it involved in)?
- RNA- Inducing- Silencing - Complex
- Involved in RNA interference
- Cleaves and inactivates the target mRNA (Expression reduced but NOT abolished)
What are the two components of CRISPR-cas-9?
- targeting module (RNA)
2. Cas protein
Which organisms can CRISPR/Cas editing be used?
- In any organism where IVF technology exists
What accounts for the majority of human sequence variation?
- SNPS–> Single Nucelotide Polymorphisms (90%)
What are 3 reasons for the 3% variation in two people?
- INDELs–> Large scale (kilobase) or small scale (several bp) INsertion or DELetion of nucleotides
- Differing numbers or positions of MOBILE GENETIC ELEMENTS e.g. L1
- Single Nucleotide Polymorphisms (SNPs)
What is a polymorphism?
- DNA variation present in >1% of people
What is a mutation?
-A sequence present in <0.1% of people
What is a haplotype?
- unique combo of alleles that makes up an individual
How many alleles do SNPs have?
- 2
What is the average number of SNPs per chromosome?
- 4-5 million
What can a SNP in the non coding region result in?
- Possible gene regulation altering
What can a SNP in the coding region result in that is synonymous (same aa)?
- No effect
What can a SNP in the coding region result in that is NOT synonymous?
- NONSENSE (STOP) –> Prevents protein production
- MISSENSE (AA change) –> Alter protein structure
What is an example of a missense occurring in a SNIPS?-
- Factor V needed to be degraded by protease APC to STOP clotting
- Autosomal DOMINANT missense SNP in FACTOR V gene changes Arg506 to Gln
- APC can no longer degrade V
- Deep vein thrombosis occurs
What can SNPs be indicators for?
- Disease risk such as Alzheimers (ApoE) (ApoE4 higher risk than ApoE2)
What is linkage disequilibrium?
- “non-random association of alleles at different loci in a given population.” google.
What is each recombined DNA segment known as?
- Haplotype block (each carries unique string of SNPs)
What can determining the SNP haplotype of an individual be useful to test?
- Susceptibiliy for a specific disease
Do SNPs have the ability to modify proteins and hence drug responses?
- YES
How can treatments in personalised medicine be customized for each individual?
- By correlating medication, dosages and side effects specific to SNP profiles
What is a route to personalised medicine via SNP analysis?
- Haplotyping
What two things does producing suscpetibility/risk profiles for a BROAD RANGE of diseases or treatments for a particular individual require? (2 things)`
- A reference map of SNPs
2. Developing rapid and cheap screening methods to map at least 10 000 of these SNPs in a patient
What does producing a reference map of SNPs involve?
- Sequencing >100 individual human genomes to have a 95% confidence that all SNPs occurring at 1% or greater are MAPPED
What is HapMap short for?
- Haplotype Map project
What is the International Haplotype Map Project (HapM ap)?
- The first genome wide glimpse of genetic variation
- Describes common disease patterns of the human sequence
What percentage of the genome is identical between two people?
- 99.5%
How many SNPs did the HapMap project characterise?
- 600 000 SNPs! (1SNP per 5kb of genome)
In the HapMap project, how many individuals were used for SNP identification?
- 270 individuals from 4 ethnic groups
Are SNP microarrays a thing?
- YEAH!
What is the principle of SNP microarrays?
- Gene chip –> has thousands of spots, each with a ss 25 base reference DNA molecules (oligonucleotides)
- Each reference DNA is COMPLEMENTARY to a SNP allele
- Oligonucleotiodes are printed onto the chip and synrthesized DIRECTLY onto it
- Genomic DNA to be tested is FRAGMENTED, AMPLIFIED as single strand, LABELED, and put on the chip
- Binding (hybridisation) conditions favour perfect matching between probe DNA and chip DNA 3
What are two words to describe modern SNP arrays?_
- Complex
- Redundant
Roughly how many DNPs can be interrogated on a SNP chip simultaneously?
- > 90 000
What reduces false positives in SNP arrays?
- Each SNP position is represented by up to 40 different BUT overlapping (tiled) DNA sequences
What does the Affymetrix Genome Wide Human SNP Array chip have a median inter-SNP distance of? **
- 0.7kb
What can the Affymetrix Chip be used in?
- GWAS (genome wide associaton studies)
- e.g. 7 diseases: 2000 patients: 3000 controls
Roughtly how many SNPs does the Affymetrix chip have?
- 500,568
How can we apply the study of SNPs to Pharmacogenetics?
- By studying the relationship b/w genetic variation (haplotype) and response to medications
What can the study of SNPs with Pharmacogenetics result in?
- Individuals reacting or responding to drugs differently
Thus any patient may require DIFFERING DOSES compared to others - May be more or less susceptible to side-effects
What is an example of identifying a SNP with relation to medication dosage?
- VKOC1 (Vitamin K Epoxide Reductase) is usually inhibited by warfarin to control blood clotting disorders
- However people with SNP in the VKORC1 promoter region (chinese) means that it is associated with a LOW WARFARIN dose requirement
- Therefore safe warfarin dose can be predicted by determining a patients VKORC1 haplotype
How is SNP analysis better than STR analysis for crime scenes and ancestry?
- SNP analysis is CHEAPER and can identify lots of SNPs
- SNP analysis also has low mutation rates than STR (changes less overtime) —> for identifying relatives
What is a disadvantage of SNP analysis and from this, what will give a more complete picture of the genome?
- Does not give all genetic information on individual
e. g. no info on variations –> INDELs and mobile elements - Routine sequencing of genome will give more complete picture
What does the genomics revolution encompass?
- The fact that over time gene sequencing is getting cheaper
Which company does ULTRA FAST GENOME SEQUENCING and how much does it cost per instrument?
- Illumina
- 10 million per instrument
- 18 000 genomes per year
What is whole Exome Sequencing? (WES)
- For mendelian disorders linked to mutations in EXONS
- Sample DNA is fragmented
- Predetermined genomic fragments containing exons are isolated
- ## Sequence compared to reference genomes
What can Whole Exome Sequencing (WES) be used to detect?
- RAPIDLY detect and diagnose RARE genetic disorders
- Especially those caused by a SINGLE GENE dysfunction
What was the difference between the HapMap project and the 100 genomes project?
- 1000 genomes project was done AFTER the HapMap project and aimed to have a much more detailed ctalogue of the human genome
What type of sequencing did the 1000 genomes project use?
- Whole Exon Sequencing
What was found from the 1000 genomes project?
- Each person carries around 250-300 loss of function variants in KNOWN GENES
- 50-100 variants are implicated in INHERITED DISORDERS
- Also the rate of de novo germline mutation is approximately 10E-8 per base, per generation.
What is an example of an application of WES (Whole Exome Sequencing)?
- In cancer treatment
- Take the tumour and screen for differences then make personalised assays.
- Can then be used to direct the treatment (e.g. inhibiting certain pathway)
What is involved in stage 1 and stage 2 of the 1000 genomes project?
Stage 1: Whole Exome sequencing (genomes of 700 people from 25 populations)
Stage 2: Analyse 2500 genomes by whole GENOME sequencing
What does Whole Genome Sequencing involve?
- Sequencing EVERY base
- DNA molecules are attached to primers on a slide and amplified so that local clusters are formed
- 4 types (A,T,C,G) of reversible terminating nucleotides are added
- Each nucleotide is fluorescently labelled with a different colour + attached to a BLOCKING group
- 4 nucleotides COMPETE for binding sites on the template DNA to be sequenced (non incorperated molecules wash away)
- After each synthesis, laser removes the blocking group and probe -
Fluorescent colour (specific to one of four bases) becomes visible) - this allows for sequence identification and is repeated until ENTIRE DNA molecule is sequenced
As of 2018, how many genomes had been sequenced?
- 71 095 genomes