chemometrics quiz 2 Flashcards

1
Q

bicariate vs multivariate

A

bivariate looks at two data sets and tells how related - multivariate explains relationship between more than 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Covariance vs correlation

A

Covariance - how two data sets change or vary together in tandem
Correlation - tells you when a change in one variable leads to a change in another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you calculate covaariance

A

Take mean of x and y
Take each point and subtract its x and y from their respective means and sum up
then divide by N-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Scales of covariance vs correlation

A

covariance affected by change in scale - correlation isn’t
covariance keeps units correlation doesn’t
each are - when the two are independent
covariance from infinity to negative infinity correlation is from 1 to -1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How to calculate correlation

A

Covariance divided by (stdev x *stdev y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

how to program these

A

cor and cov

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

whats a corrgram

A

correlation matrix so basically the same variables on x and y axis s and see how they correlate, can be picture or colored etc - match top and bottom typically show how much and in what direction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do p alues work with correlation

A

p vallue < 0.05 means correlation coefficient different than 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Whats a scatter plot matrix?

A

same idea as a corrgram but each space has an actual scatter plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you plot scatter plot matrix

A

pairs() funcion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Partial Correlation

A

between two quantitative variables - controlling for one or more quantitative variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

WHat is regression used for

A

1) IDing explanatory variables that are related to an outcome/response variable
2) Describe the form of a relationship between dependant and independent variable (general relationship()
3) Provide an equation to predict response variable from explanatory variable (cal curve)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Ordinary least squares regression what is it

A

quantitative depedant variable predicted from a weighted sum of predictor variables where weights are parameters estimated from data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

goal of regression

A

choose model paramteres (Y and B! sloe and intercept) - that minimize difference between actual and predicted model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a residual

A

the difference between the observed and fitted value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is true about sum of residuals

A

in a linear regression model - sum of residuals should be 0

17
Q

Wghat determines best fit

A

minimize sum of squared differences

18
Q

ASSUMPTIONS FOR LINEAR REGRESSION

A

Error in your x should be negligible
Error with y vlue must be normally distributed (dependant variable needs to be normally distributed)
Varian in error across y should be constant across area of interest (stdev constant)
x and y should be continuous

19
Q

Whats a high leverge point and how to deal

A

a point that has more influlence on r^2
can deal with by having event spaced values

20
Q

How to calculate regression

A

determine residuals then calculate residual standard dev - deviation of data points form regression line
stdev for slope
stdev for y intercept ( y - y fitted) squared and then summed

21
Q

How to insepct residuals

A

plot them - should be scattered around zero with no pattern

22
Q

How to tell how influential a data point is

A

COOKS DISTANCE -

23
Q

How do t test and p vallue relate to regression

A

if p value less than 5 - significantly different than - THERE IS A RELATIONSHIP

24
Q

R^2

A

shows how well the points fit

25
Q

How to test your prediction analysis

A

Make the regression with 80% while saving 20% to test afterwards ( split randomly

26
Q

What is the minmax and MAPE

A

min max accuracy and MEAN ABSOLUTE PERCENT ERROR
- tells you how far its off from a perfect model (1 is perfect)
MAPE same thing - MAPE is 10 on average if forecast off by 10% its 100-MAPE (eg a MAPE of .49 means its 51% accurate

27
Q

POLYNOMIAL REGRESSION (2nd order)how does it change

A

instead of y= a + bx
y = a +bx + cx^2

28
Q

What is multiple linear regression used for

A

When ou have more than one predictor variable - eg the more predictors the more - cubic is 3

29
Q

how to interpret regression slope

A

basically the slope indicates change in one variable in comparison to another
eg increase in dependant variable for one unit of the independent

30
Q

Confidence interval for regression

A

95% confidence interval says 95% confident that the interval contains the true value
AHH - so we’re predicting right - 95% confidence says the actual value is between these ranges vs just an absolute here’s this value

31
Q

How do you test if dependant variables are independt

A

DURBIN WATSON

32
Q

What is global validation

A

a test that performs a variety of tests to see if regression is valid
skew, urotsis, equal variancesetc

33
Q

How to test for outliers

A

outlierTest() gives bonferroni adjusted pvalue

34
Q

what is hat statistics

A

hat tells you if theres a high leverage point (is an unusual combination) - can set the statistic and plot 2 or 3 times hat and see where things end up
p/n (p is number of parameters in model
including intercept and n is sample size

35
Q

What is an AV plot

A

added variable plot another way to test for influential points

36
Q

What is an influence plot

A

shows outliers LEVERAGE and influential observations in one plot - size is the influence, shows hat value on one

37
Q

What are corrective measure if ID problesm with regression

A

delete observations, transform variables, add or delete variables, change approach

38
Q

What is a nested model for

A

to look at multiple predicotrs and see which one does the best job to explain

39
Q

What is AIC

A

Akaike information criteriion - again takes into account eh models fit - a smaller value is preferred (can take in multiple predictors or less and show whats better fit