chemometrics quiz 2 Flashcards
bicariate vs multivariate
bivariate looks at two data sets and tells how related - multivariate explains relationship between more than 2
Covariance vs correlation
Covariance - how two data sets change or vary together in tandem
Correlation - tells you when a change in one variable leads to a change in another
How do you calculate covaariance
Take mean of x and y
Take each point and subtract its x and y from their respective means and sum up
then divide by N-1
Scales of covariance vs correlation
covariance affected by change in scale - correlation isn’t
covariance keeps units correlation doesn’t
each are - when the two are independent
covariance from infinity to negative infinity correlation is from 1 to -1
How to calculate correlation
Covariance divided by (stdev x *stdev y)
how to program these
cor and cov
whats a corrgram
correlation matrix so basically the same variables on x and y axis s and see how they correlate, can be picture or colored etc - match top and bottom typically show how much and in what direction
How do p alues work with correlation
p vallue < 0.05 means correlation coefficient different than 0
Whats a scatter plot matrix?
same idea as a corrgram but each space has an actual scatter plot
How do you plot scatter plot matrix
pairs() funcion
Partial Correlation
between two quantitative variables - controlling for one or more quantitative variables
WHat is regression used for
1) IDing explanatory variables that are related to an outcome/response variable
2) Describe the form of a relationship between dependant and independent variable (general relationship()
3) Provide an equation to predict response variable from explanatory variable (cal curve)
Ordinary least squares regression what is it
quantitative depedant variable predicted from a weighted sum of predictor variables where weights are parameters estimated from data
goal of regression
choose model paramteres (Y and B! sloe and intercept) - that minimize difference between actual and predicted model
What is a residual
the difference between the observed and fitted value
what is true about sum of residuals
in a linear regression model - sum of residuals should be 0
Wghat determines best fit
minimize sum of squared differences
ASSUMPTIONS FOR LINEAR REGRESSION
Error in your x should be negligible
Error with y vlue must be normally distributed (dependant variable needs to be normally distributed)
Varian in error across y should be constant across area of interest (stdev constant)
x and y should be continuous
Whats a high leverge point and how to deal
a point that has more influlence on r^2
can deal with by having event spaced values
How to calculate regression
determine residuals then calculate residual standard dev - deviation of data points form regression line
stdev for slope
stdev for y intercept ( y - y fitted) squared and then summed
How to insepct residuals
plot them - should be scattered around zero with no pattern
How to tell how influential a data point is
COOKS DISTANCE -
How do t test and p vallue relate to regression
if p value less than 5 - significantly different than - THERE IS A RELATIONSHIP
R^2
shows how well the points fit
How to test your prediction analysis
Make the regression with 80% while saving 20% to test afterwards ( split randomly
What is the minmax and MAPE
min max accuracy and MEAN ABSOLUTE PERCENT ERROR
- tells you how far its off from a perfect model (1 is perfect)
MAPE same thing - MAPE is 10 on average if forecast off by 10% its 100-MAPE (eg a MAPE of .49 means its 51% accurate
POLYNOMIAL REGRESSION (2nd order)how does it change
instead of y= a + bx
y = a +bx + cx^2
What is multiple linear regression used for
When ou have more than one predictor variable - eg the more predictors the more - cubic is 3
how to interpret regression slope
basically the slope indicates change in one variable in comparison to another
eg increase in dependant variable for one unit of the independent
Confidence interval for regression
95% confidence interval says 95% confident that the interval contains the true value
AHH - so we’re predicting right - 95% confidence says the actual value is between these ranges vs just an absolute here’s this value
How do you test if dependant variables are independt
DURBIN WATSON
What is global validation
a test that performs a variety of tests to see if regression is valid
skew, urotsis, equal variancesetc
How to test for outliers
outlierTest() gives bonferroni adjusted pvalue
what is hat statistics
hat tells you if theres a high leverage point (is an unusual combination) - can set the statistic and plot 2 or 3 times hat and see where things end up
p/n (p is number of parameters in model
including intercept and n is sample size
What is an AV plot
added variable plot another way to test for influential points
What is an influence plot
shows outliers LEVERAGE and influential observations in one plot - size is the influence, shows hat value on one
What are corrective measure if ID problesm with regression
delete observations, transform variables, add or delete variables, change approach
What is a nested model for
to look at multiple predicotrs and see which one does the best job to explain
What is AIC
Akaike information criteriion - again takes into account eh models fit - a smaller value is preferred (can take in multiple predictors or less and show whats better fit