Chapter 9 simple linear regression/correlation slide 13 onwards Flashcards

1
Q

3 assumptions when doing pearson correlation analysis

A
  • X and Y are quantitative data
  • Variables X and Y are simple random variables
  • Pairs of X and Y follow the bivariate normal distribution
    • individual variables are normally distributed
    • r is sensitive to outliers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Strong positive relationship correlation coefficient

A

Close to +1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Negative linear relationship correlation coefficient

A

Close to -1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

No linear relationship correlation coefficient

A

Close to zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Hypothesis testing for correlation

A

ρ (represents correlation coefficient of populations)
Null hypothesis: ρ =0
Alternative hypothesis: ρ not equals to zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Often, coefficient of determination is expressed as

A

Proportion or percentage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does it mean when coefficient of determination r^2 is 0.85

A

85% of the change in y is caused by a change in x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The ______ the correlation coefficient, the ______ the coefficient of determination. This implies that _________ in dependent variable is influenced by independent variable

A

larger, larger, more changes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

interpretation when r^2 is zero

A

no correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

interpretation when r^2 is up to 49%

A

low correlation (need consider e sign)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

interpretation when r^2 is 50-95%

A

high correlation, need to consider the sign

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

interpretation when r^2 is 96-99%

A

very high correlation, need to consider the sign

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

interpretation when r^2 is 100

A

perfect correlation (need to consider the sign)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

2 methods for determining correlation for non-normally distributed data

A

logarithmic transformation and non-parametric correlation analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is non-parametric correlation analysis

A

ranking data from smallest to largest using Kendall’s or Spearman’s rank correlation to calculate the correlation coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

5 properties of simple linear regression

A
  • relationship between 2 variables is approx best fit line or straight line
  • one independent and one dependent variable
  • aka regression analysis
  • forms a simple equation to describe relationship
  • intercept: alpha, slope: beta
17
Q

General equation for simple linear regression

A

y’=beta x + alpha

18
Q

What is Y-Y’

A

errors of prediction

19
Q

What is Y’

A

predicted values

20
Q

5 things needed to calculate linear regression

A
  • mean of X
  • mean of Y
  • standard dev of X
  • standard dev Y
  • r (correlation between x and y)
21
Q

Formula for slope

A

r(standard dev of y)/standard dev x

22
Q

Formula for intercept

A

alpha= mean of y- beta(mean of x)

23
Q

3 potential factors that affect correlation and regression

A

Outliers, multiple observations from same subject, combine data collected from differet populations

24
Q

2 effects of outliers on regression line

A
  • over influence the regression line

- increase the residual error and reducing correlation

25
Q

Outlier at high end of distribution affects the correlation coefficient _________ than outliers that do not lie at the high end of distribution

A

more

26
Q

2 methods to identify outliers

A

Examine scatter plot or examine residual plot

27
Q

Explain more about the scatter plot and how it can be used to observe outliers

A
  • linear regression takes into account the points in order to derive the best fit line
  • the point to the regression line is known as errors of prediction
  • small value: error of prediction is small
  • large value: error of prediction is large
28
Q

1 commonly used criterion for best fitting line

A

line that minimizes the sum of squared errors of prediction

29
Q

What does residual plot consist of

A

Consists of plotting the (y-y’) against x

30
Q

What is a good regression plot

A
  • usually no pattern

- (y-y’) are randomly distributed and it is closer to zero (y axis)