1. Introduction to Molecular Biology Flashcards
Nucleus Definition
-an organelle that contains the genetic material of the cell
Chromosome Definition
-organised structure of DNA in the cell nucleus
DNA Definition
-deoxyribonucleic acid -it is a nucleic acid containing the genetic instructions used in the development and functioning of all known living organisms
Gene Definition
-the DNA segments carrying this genetic information are called genes -a molecular unit of heredity in a living organism -a region of DNA that codes for mRNA.
Transcription Definition
-the process of creating a complementary RNA copy of a sequence of DNA
Translation Definition
-a process where messenger RNA (mRNA) produced by transcription is decoded by the ribosome to produce a specific amino acid chain, or polypeptide, that will later fold into an active protein
What is RNA abundance used to measure?
-RNA abundance is a measurable indicator of gene expression -measured using microarray -simultaneous measurement of 15-30 thousand genes -number of samples 4-100s
Gene Expression and Microarray Data
-comparison between e.g. Control/Treatment, Normal/Disease -we want to identify differently expressed genes between the two groups
Microarray Data and Linear Regression
-linear regression is not possible since there are so many more variables (genes) than observations
Microarray Definition
-a technology where DNA sequences (from 1000s of genes) are pre-printed onto its surface (primers) -isolate human mRNA from samples in the lab -label the mRNA with coloured dye -mix with the microarray, hybridisation occurs, RNA binds to complementary primers -quantification, scan intensity of pixels on the array
What are the two types of microarray?
-single colour (Affymetrix arrays) -two-colour (cDNA arrays)
Two Colour Microarray
-microarray is hybridised with cDNA from two different samples each labelled in a different colour, usually red and green -they are mixed and hybridised to a single array -the relative intensities of each colour indicate the relative expression of a particular gene in each sample -generally only able to measure relative expression not an absolute measurement
One Colour Microarray
-e.g. Affymetrix gene chip -RNA from a single sample so provides intensity data for each gene from one sample -there are batch effects that have to be accounted for in comparison between different arrays
How is microarray data plotted?
-frequency on y -log scale on x -histogram
The Simple Linear Regression Model
yi = βo + β1xi + εi, i=1,…,n -yi is the response or dependent variable -βo and β1 are regression parameters -xi is the independent or explanatory variable, measured without error -and εi is iid N(0,σ²)
Using Linear Regression for Microarray Data
-possible if we use y as the outcome (e.g. diseased or healthy) -and use the x variables as each gene -then βj represents the relationship between gene expression of gene j and the condition
Linear Regression Estimation, β^
-derived by minimising the sum of square residuals S => β^ = (Xt X)^(-1) Xt y -where X is the matrix whose ij element is the ith observation of the jth independent variable (the genes) -Xt indicates the transpose of X -y is a vector whose ith element is the ith observation of the dependent variable (condition) -and β^ is a vector of estimators for the β parameters, βo^, β1^
Linear Regression Estimation, σ²^
σ²^ = [et e]/[n-p] -where et indicates the transpose of e -p is the number of parameters, 2 for the simple model β0,β1 -and: e = y - Xβ^
Hypothesis Testing for Microarray Data Test Statistic
T = β^j / se(βj^) ~ t(n-L-1) -where se means the standard error: se(βj^) = sqrt(var(βj^)) -L is the number of covariates
Simple Linear Regression Model in Matrix Form
yi = βo + β1xi + εi, i=1,…,n -matrix form: y = Xβ + ε -where y is a column vector with entries y1, y2, …, yn -X is an nx2 matrix with entries: all 1s in the first column and x1,x2,…,xn in the second column -β is a column vector with entries βo,β1 -ε is a column vector with entries ε1,ε2,…,εn
Linear Regression Variance-Covariance Matrix
Σ = cov(β^) = E [(β^-β) (β^-β)t] = σ² (XtX)^(-1) -the diagonal is the variance of each β^ -other entries are the corresponding covariances
Linear Regression Standard Errors for β^
-the standard errors of βo^ and β1^ are given by the square root of the corresponding diagonal of Σ
Linear Regression Hypothesis Testing Overview
-our main interest is to test whether any of (or any function of) the individual regression parameters is significantly different from zero -the hypotheses involved are: Ho : βj=0 H1 : βj≠0 -for any j
Linear Regression Hypothesis Testing Test Statistic
tj = βj^ / SE(βj^) -under Ho:βj=0, tj follows a t-distribution with degrees of freedom n-p -at a significance level α, the decision is to reject Ho if either: ->in the two-sided case: |tj|>Tdf(α/2 %) ->or equivalently: P_Ho (|T|>tj) < α
Linear Regression Matrix Form for Microarray Data Analysis
-in microarray data analysis, y will be the vector of gene expression in log scale -X will be the design matrix, in analysis X may need to be constructed beforehand (not given) -β would correspond to (possibly) many factors including differential expression