Regression & Correlation Flashcards

1
Q

What does correlation refer to?

A

Degree to which two quantitative variables are related

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is commonly used to measure correlation in quantitative parametric data?

A

Pearsons correlation coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does value of correlation coefficient ‘r’ vary between?

A

-1 to +1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Units of ‘r’?

A

None

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When is correlation coefficient not valid?

A

If data is not independent (paired)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When can Fishers transformation be used?

A

To compare two correlation coefficients for hypothesis testing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are Partial correlations?

A

Correlations between two variables after adjusting for a third variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Spearmans correlation (rho)?

A

Non-parametric equivalent of Pearsons

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What can Spearmans be used for?

A

To test association between two variables if at least one is ordinal or
If sample size is small despite being continuous variables
or if non-linearity is suspected or
if non-normal distribution is noted for both variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does Spearmans assume?

A

Difference between each pair of ordinal variables is the same i.e. the ranks are equidistant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

If the difference between each pair of ordinal variables is not the same, how can one calculate correlation?

A

Kendalls Tau - appropriate measure of nonparametric correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does regression statistics help with?

A

Helps predict what value one variable will be if given a particular value of the other variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Explain the formula for simple linear regression

A

y = a +bx

B = regression coefficient
A = intercept on y axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What can simple linear regression predict?

A

Probable score in Y axis from known score in X axis i.e. dependent variable can be predicted from value of independent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How does one determine the value of a and b for regression?

A

Using a scattergram and method of least squares

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Explain the method of using a scattergram and method of least squares

A

Hypothetical straight line is constructed so that its vertical distance from various points of observations on a scattergram is kept to a minimum; this is called the residue.
The sum of the square of residues is kept to a minimum for a regression line of good fit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What happens in multiple linear regression?

A

Several independent variables together predict a single dependent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What type of technique is multiple regression?

A

Multivariate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the name of the independent variables in multiple regression?

A

Covariates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the name of covariates which may be highly correlated with each other?

A

Collinearity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Effect of collinearity?

A

May disturb the regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

When is regression coefficient useful?

A

Examine confounders

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What can be used to express correlation coefficients?

A

CI

24
Q

What does square of regression coefficient test?

A

Goodness of fit or final regression

25
Q

What is goodness of fit?

A

Proportion of total variation in dependent variable that can be explained by the independent variable

26
Q

What does goodness of fit measure?

A

How well actual outcome (dependent variable) and calculated dependent variable correspond

27
Q

Values or goodness of fit

A

0-1

28
Q

What is the coefficient of variation?

A

Goodness of fit calculated for Pearsons correlation coefficient

29
Q

What is needed for linear regression?

A

Continuous dependent variable

30
Q

What type of regression is used if dependent variable is binary?

A

Logistic regression

31
Q

Why is logistic regression popular?

A

It can give OR, RR and hazard ratio for independent variables that affect the dependent variable

32
Q

Test used if variable is continuous, 1 independent & dependent variable

A

Simple linear regression

33
Q

Regression test used if continuous data with >1 independent variable and 1 dependent variable

A

Multiple linear regression

34
Q

Regression test used if binary data, 1 independent variable and 1 dependent variable

A

Simple logistic regression

35
Q

Test used if binary data, >1 independent variable and 1 dependent variable

A

Multiple logistic regression

36
Q

What is log-linear analysis used for?

A

Categorical data

37
Q

What must be noted in log-linear analysis?

A

No demarcation between the dependent and independent variable

38
Q

What data can be used in logistic regression?

A

Continuous and categorical independent variables

39
Q

What are Bernoulli random variables?

A

Variables that have dichotomous outcomes used in logistic regression

40
Q

What is exponential correlation?

A

When one demonstrates the exponential relationship of a variable with a factor such as time using log-transformed values plotted against time

41
Q

What is polynomial regression?

A

When in non-linearity, the relationship between dependent variable (y) and independent variable (x) is expressed as Y=X(n square)

42
Q

What is the 1 in 10 rule?

A

Number of variables studied in multiple regression models must not be greater than 10% of sample size.

43
Q

1 in 10 rule for logistic regression?

A

Number of variables must not be greater than 10% of number of events

44
Q

How can multiple regression be performed

A

Stepwise regression
Forward selection
Backward elimination

45
Q

What happens in stepwise regression?

A

Coefficient of regression calculated and starts with most significant to least significant independent variable and fits them in stepwise fashion into regression equation.

46
Q

Disadvantage of stepwise regression

A

Sometimes statistically significant variables may not be clinically significant

47
Q

Theory behind forward selection

A

Confounding factor is associated with both independent and dependent variable
If one does not know the confounding variable, they are treated as covariates

48
Q

What does one often examine in multiple regression

A

Which is the confounding variable

49
Q

What happens in forward selection?

A

While constructing multiple regression equations, if the regression coefficient of a previously added variable changes then either one of the covariates is a confounder; these are retained in the equation irrespective of statistical significance.
Latter added covariate is discarded if no change occurs in regression coefficient

50
Q

What is backward elimination?

A

Starts with final model - full equation - and tries to discard covariates one by one according to changes that occur in correlation coefficients

51
Q

In the equation y=a+bx+e what is y?

A

Dependent variable - outcome of interest

52
Q

In the equation y=a+bx+e what is a and b?

A

Constants

53
Q

In the equation y=a+bx+e what is b?

A

Slope or regression coefficient

54
Q

In the equation y=a+bx+e what is x?

A

Independent variable - predictor of outcome

55
Q

In the equation y=a+bx+e what is e?

A

Error.
Random variable with mean = 0

56
Q

In the equation y=a+bx+e what does e represent?

A

Part of variability of Y which is not explained by relationship with x

57
Q

What can method of least squares be used for with respect to e (error)?

A

We can find best linear regression equation with minim variance of e