Stats - correlation and regression Flashcards

1
Q

What is correlation used for?

A

Correlation is used to test for association between variables (e.g. whether salary and IQ are related).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How is regression used, and how does it relate to correlation?

A

Once correlation between two variables has been shown, regression can be used to predict values of other dependent variables from independent variables. Regression is not used unless two variables have firstly been shown to correlate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the 3 basic categories of correlation?

A

Linear
Non-linear
No correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Variables that are correlated through a linear relationship can display either positive or negative correlation.
What is the difference between these two?

A

Positively correlated variables vary directly (as one increases so does the other).

Negatively correlated variables vary as opposites (as the value of one variable increases the other decreases).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you measure the strength of correlation?

A

The strength of the association can be estimated by observing a scatter graph of the variables. The correlation type is independent of the strength.

It can be strong/ moderate/ weak.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you measure the strength of a linear relationship?

What symbols are given to the sample and the population correlation coefficients?

A

Correlation coefficient (Pearson’s correlation coefficient).

The sample correlation coefficient is given the symbol r.

The population correlation coefficient has the symbol ρ (rho).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The sign of the correlation coefficient tells us the direction of the linear relationship. How do positive and negative correlations appear?

A

If r is negative (<0) the correlation is negative and the trend line slopes down. If r is positive (> 0) the correlation is positive and the trend line slopes up.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The size (magnitude) of the correlation coefficient tells us the strength of a linear relationship.
What value does r have in a:
1) very strong linear association

A

r = 0.8-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The size (magnitude) of the correlation coefficient tells us the strength of a linear relationship.
What value does r have in a:
2) strong correlation

A

0.6-0.79

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The size (magnitude) of the correlation coefficient tells us the strength of a linear relationship.
What value does r have in a:
3) moderate correlation

A

0.4-0.59

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The size (magnitude) of the correlation coefficient tells us the strength of a linear relationship.
What value does r have in a:
4) weak correlation

A

0.2-0.39

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The size (magnitude) of the correlation coefficient tells us the strength of a linear relationship.
What value does r have in a:
5) very weak linear association

A

0-0.19

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Parametric statistic procedures rely on assumptions about the shape of the distribution.

What 3 characteristics do parametric data assume?

A

1) normal distribution
2) measured on an interval/ ratio scale
3) conditions or groups have equal variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How is a complete absence of correlation expressed?

A

0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do we summarise correlation using:
a) parametric variables
b) non-parametric variables

What are the symbols for:
c) the samples
d) the population

A

a) Pearson’s
b) Spearman’s rank

c) parametric - r, non-parametric - rs
d) p (for both)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is linear regression?

A

In contrast to the correlation coefficient, linear regression may be used to predict how much one variable changes when a second variable is changed. A regression equation may be formed, y = a + bx, where

y = the variable being calculated (predicted value of response variable)
a = the intercept value (value of y when x = 0)
b = the slope of the line or regression coefficient. Simply put, how much y changes for a given change in x
x = the second variable

17
Q

What kind of graph is used in correlation and regression analysis?

What goes on the x and y axis?

A

Scatter graphs

They assist in determining, visually, if variables are associated. They may also show the nature of a relationship. They can also assist in determining if there are any outliers that may be effecting the distribution.

X-axis = independent variable

Y-axis = dependent variable

18
Q

What are the:
A) dependent variable
B) independent variable

A

Dependent variable: The variable being measured in an experiment, which depends on the changes in the independent variable (Y axis)

Independent variable: The variable that is manipulated or controlled by the experimenter to observe its effect (x axis)

19
Q

What type of regression is used with dichotomous variables (i.e binary outcomes like employed vs unemployed)?

A

Logistic regression is a statistical method for analysing a dataset in which there are one or more independent variables that determine an outcome. The outcome is measured with a dichotomous variable (in which there are only two possible outcomes). In other words, it predicts the probability of occurrence of an event by fitting data to a logistic function. Hence, it is also known as logistic regression. Since its outcome is binary, it can be used to model the likelihood of a disease or health condition occurring.

It does not assume a relationship between the variables, as in linear regression.

20
Q
A