Lecture 10 Flashcards
What is the ultimate crime in bioinformatics?
not using existing resources
*Excel
Name a tool for Data Manipulation
BioConductor
What is BioConductor?
-open-source suite of programs for gene expression profiling analysis
-Runs in R statistical language
What can you do with 13 lines in BioConductor?
- GC arrays
- Identify significantly differentially expressed genes
- Display in a heatmap
Why is gene expression profiling important?
- Only 40-60% of genes identified in genome sequencing projects are functionally annotated by sequence similarity (lineage and species-specific genes)
- sequence similarity will not identify novel functions of proteins
- genes involved in regulation, interaction, or integration of pathways
- genes expressed at low levels or show transient exprsesion (missed)
How were genes involved in regulation, interaction, or integration of pathways traditionally identified?
genetic (mutant) analysis and biochemically
What is functional genomics?
Field aiming to create and apply technologies that take advantage of sequence information to analyze full complement of gens and proteins encoded by an organism
What are the 4 major approaches used to elucidate possible function of genes?
- Expression pattern for all genes
- Expression and distribution of all proteins
- Knocking out of genes and examination of phenotype and/or gene expression patterns
- Identifying interactions among proteins (two-hybrid analysis and newer bait methods)
What are the four omics approaches? Which one is the only context-independent
- Genomics - complete set of genes of an organism or oragnelles
**context-independent - Transcriptome - complete set of mRNA molec present in cell, tissue, ororgan
- Proteome - complete set of protein molec. in cell, tissue, organ
- Metabolome - complete set of metabolites in cell, tissue, organ
What methods of analysis are available for studying the genome?
Systematic DNA sequencing
What methods of analysis are available for studying the transcriptome? (4)
Microarrays, high-throughput northern analysis, ESTs, RNA-seq
What methods of analysis are available for studying the proteome?
-2D gel electrophoresis
-peptide mass spec, BioID
-two-hybrid analysis
-peptide/protein microarray
What methods of analysis are available for studying the metabolome?
- nuclear magnetic resonance spectrometry NMRS
- mass spectrometry
- infra-red spectroscopy
Name three profiling technologies
- cDNA Microarrays
- Oligonucleotide Microarrays
- RNA-Seq
How do cDNA microarrays work?
- cDNAs for each gene in genome are spotted onto glass slide
- Each spot represents specific gene - Take RNA from some populations, label with fluorophores (diff dye colours), incorporated during RT
- Mix samples and hybridize to cDNA microarray
- Analyze colour (if mixed two populations, then equal abundance yellow, red or green means one population greater expression)
How do oligonucleotide microarrays work?
- Based on genome sequence design oligont that matches to 3’ end of transcript (25 nt in lenght)
- synthesized oligont on silican wayfare
- mRNA population labelled using biotynylation hybridized to array
- Scan to see how much labeled RNA is bound to particular probe
- Determine gene expression level according to intensity
what process is used in oligonucleotide microarray technology that is used in computer chips?
Photolithographic process (deprotect nt at each position and add specific nt)
What should be considered for oligonucleotide design in microarrays? (5)
- Should have similar Tm = allow similar hybridization efficiency
- Discriminate against members of same family (stimes for paralogues impossible)
- Oligos with specific Tm and length
- Free of secondary structure and self-annealing tendency
- unique to one species without homology with another species (discriminate bacterial from human genes)
Why is image processing done?
extract information after hybridization
- input is scanned images of array fluorescence
- grid applied to image
- position and value (with background correction) on slide associated with appropriate identifier
- output: table of Ids and values
*ScanAnalyze, Affymetrix
Name 5 cell-type specific expression profiling methods
- Lasser Capture Microdissection
- Specific GFP lines–>protoplasting–>FACS
- INTACT / TRAP (translating ribosome affinity purification)
- scRNA-seq
- Spatial transcriptomics
What are the problems of counts for profiling experiments? why do we need normalization
Microarray: counts per pixel (CCD)
numbers (CDD or rna-seq) are:
-arbitrary
-not comparable between samples
-not linear multiple of abundance of what you want to detect
What does normalization do?
remove trends that correlate with variables not expected to influence gene expression changes
-mean expression level across samples should be similar
What is an MA plot?
A = x axis = Log intensity of expression
M = y axis = ratio of intensity relative to median value
- expect a cloud shape
What are the methods of normalization for microarray data? (2)
- RMA Robust Multichip Analysis
- quantile normalization
-better expression estimates but introduces inter-array correlations in coexpression analyses - GCOS/ MAS5.0 Affymetrix normalization algorith
- Loess
-locally weighted linear regression to smooth data (cDNA microarray)