Data analysis Flashcards
What is data analysis
taking image data and turning it into sequence data
What are the roles of bioinformatics?
- Analytic method development
- Construction and curation of computational tools and databases
- Data mining, interpretation and analysis
What does bioinformatics encompass
- Identifying differentially expressed genes
- Somatic mutations
- Copy number alterations
- Epigenetic changes
- Genomic understanding
- Multifactorial analyses: including BP, pulse ox, etc.
But also
• Pathway analysis
• Genome analysis
• Literature searches
name 2 genome browsers
UCSC: UCSC genome browser
EMBL: ENSMEBL Genome browser
what are genome browsers?
curated databases of all the annotated genomes that we have (like PubMed for genes)
Array processing steps
- Experimental Design
- Image Analysis – scan to intensity measures (raw data)
a. You can get many different types of image data files - Normalization – “clean” data
- More “low level” analysis -fold change, ANOVA, data filtering
- Data mining-how to interpret > 6000 measures
- Validation: repeat with another technology
what do you have to consider in your experimental design in an array experiment?
Sample size
Biological (different mouse) and technical (repeating sequencing) replicates
etc
what are the different ways you can data mine in an array experiment?
a. Databases
b. Software
c. Techniques-clustering, pattern recognition etc.
d. Comparing to prior studies, across platforms?
How do we do array image analysis?
softwares use algorithms to look at gene abundance estimates (expression) and can make e.g. volcano plots
Why do we need to normalise data in array experiments
“Normalizing” data allows comparisons ACROSS different array
○ Intensity of fluorescent markers might be different from one batch to the other due to differences in experiments, machines, etc.
○ Normalization allows us to compare those chips without altering the interpretation of changes in GENE EXPRESSION: technical variation can hide real biological variation
How are most “low level” analysis on array experiments done?
There is no standard way
pairwise (usually)
list of up and down regulated genes are made and determine the cutoffs (by fold increase, t-statistic [p-value], or a combination)
what is the usuall fold cutoff for significance in array experiments of down or up regulation?
3-fold
what should not be forgotten during array experiment “low level” analysis ?
multiple test correction (some things may have very large change in fold but low significance)
What are the 3 stages of NGS data analysis?
- Primary analysis (run/sample quality): Raw data, images, signals –> basecalling –> bases/colours, quality values
- Secondary analysis (sample quality/info) +/- reference –> allignment and assembly
- Tertiaty analysis (science): comparison –> statistical analysis and database searches
what is basecalling?
tells you which base is present or not
moore’s law
computing power doubles every two years
- NGS is getting cheaper faster but computing power is following moores law