SESSION 9 - CORRELATION AND SIMPLE LINEAR REGRESSION Flashcards

1
Q

X and Y associated ? - What do you do first?

A

Scatter plots and Correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Predict Y from X? - What do you do first?

A

Simple Linear Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What do scatterplots do?

What do you look out for?

A

Shows relationship between 2 continuous variables

Linear association?
Outliers?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is correlation?

A

Correlation is a statistical measure that expresses the extent to which two variables are linearly related (meaning they change together at a constant rate).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a simple linear regression?

A

Simple linear regression is used to estimate the relationship between two quantitative variables. You can use simple linear regression when you want to know:

How strong the relationship is between two variables (e.g., the relationship between rainfall and soil erosion).
The value of the dependent variable at a certain value of the independent variable (e.g., the amount of soil erosion at a certain level of rainfall)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are regression models?

A

Regression models describe the relationship between variables by fitting a line to the observed data. Linear regression models use a straight line, while logistic and nonlinear regression models use a curved line. Regression allows you to estimate how a dependent variable changes as the independent variable(s) change.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the regression coefficient?

A

How much we expect Y to change as X increases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the E in the equation?

A

E is the error of the estimate, or how much variation there is in our estimate of the regression coefficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Y?

A

Y is the predicted value of the dependent variable (y) for any given value of the independent variable (x).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is B0?

A

B0 is the intercept, the predicted value of y when the x is 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is B1?

A

B1 is the regression coefficient – how much we expect y to change as x increases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is X?

A

X is the independent variable ( the variable we expect is influencing y).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the Correlation Coefficient (r)?

A

The sample correlation coefficient (r) is a measure of the closeness of association of the points in a scatter plot to a linear regression line based on those points.

Possible values of the correlation coefficient range from -1 to +1, with -1 indicating a perfectly linear negative, i.e., inverse, correlation (sloping downward) and +1 indicating a perfectly linear positive correlation (sloping upward).

A correlation coefficient close to 0 suggests little, if any, correlation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Pearson Product-Moment Correlation?

A

The Pearson product-moment correlation coefficient (or Pearson correlation coefficient, for short) is a measure of the strength of a linear association between two variables and is denoted by r. Basically, a Pearson product-moment correlation attempts to draw a line of best fit through the data of two variables, and the Pearson correlation coefficient, r, indicates how far away all these data points are to this line of best fit (i.e., how well the data points fit this new model/line of best fit).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can we determine the strength of association based on the Pearson correlation coefficient?

A

The stronger the association of the two variables, the closer the Pearson correlation coefficient, r, will be to either +1 or -1 depending on whether the relationship is positive or negative, respectively. Achieving a value of +1 or -1 means that all your data points are included on the line of best fit – there are no data points that show any variation away from this line. Values for r between +1 and -1 (for example, r = 0.8 or -0.4) indicate that there is variation around the line of best fit. The closer the value of r to 0 the greater the variation around the line of best fit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the line of best fit?

A

It is also known as regression line

Line of best fit refers to a line through a scatter plot of data points that best expresses the relationship between those points. Statisticians typically use the least squares method (sometimes known as ordinary least squares, or OLS) to arrive at the geometric equation for the line, either through manual calculations or by using software.

17
Q

What is the spearmann’s rank correlation?

A

The Spearman’s rank-order correlation is the nonparametric version of the Pearson product-moment correlation. Spearman’s correlation coefficient, (ρ, also signified by rs) measures the strength and direction of association between two ranked variables.

18
Q

What are the assumptions of Spearman’s rank-order correlation?

A

You need two variables that are either ordinal, interval or ratio

Although you would normally hope to use a Pearson product-moment correlation on interval or ratio data, the Spearman correlation can be used when the assumptions of the Pearson correlation are markedly violated. However, Spearman’s correlation determines the strength and direction of the monotonic relationship between your two variables rather than the strength and direction of the linear relationship between your two variables, which is what Pearson’s correlation determines.

19
Q

What is a monotonic relationship?

A

A monotonic relationship is a relationship that does one of the following: (1) as the value of one variable increases, so does the value of the other variable; or (2) as the value of one variable increases, the other variable value decreases.

20
Q

If spearman’s greater than Pearson’s?

A

(If ρ>r, relationship monotonic and not linear)

21
Q

Correlation:
Pearson’s r – value range

A

Correlation:
Pearson’s r – value range

Perfect
negative
association

Negative
association

No
association

Positive
association

Perfect
positive
association

r=-1 r<0 r≈0 r>0 r=1

22
Q

What is pearson’s correlation coefficient not?

A

It is not the slope

23
Q

What is the slope?

A

That is the regression coefficient

24
Q

If there is a big outliers, what should be done?

A

Consider using Spearman’s rank

25
Q

What is Hypothesis Testing on Correlation Based On?

A

Hypothesis test:
H0: r=0 (no association)

H1: r≠0

26
Q

What are the assumptions of the Correlation Pearson’s r

A
  • X and Y continuous
  • Independence of observations
  • Homoscedastic
  • For valid tests and CIs
    – Random sample of individuals
    – X & Y (approximately) normally distributed
27
Q

What does Homoscedasticity mean?

A

Dependent variable (Y) has similar amounts
of variance across range of values for
independent variable (X)

28
Q

What is simple regression’s hypothesis test?

A

H0: b1=0, no association between X and Y:
* Independent variable X not associated with Y
(X does not help predict Y)

29
Q

What is the t-test statistic in a simple regression model?

A

In a simple linear regression, the t-test statistic indicates whether the slope of the population regression line is significantly different from zero

The t-test statistic determines if the slope of the regression line is significantly different from zero. This helps determine if the x-variable is a useful predictor of the y-variable

The t-test statistic is calculated using the formula (t=\beta /s_{b}), where (\beta ) is the sample regression coefficient and (s_{b}) is the residual standard error.

What it means
A t-value of zero means the sample results are exactly the same as the null hypothesis. As the difference between the sample data and the null hypothesis increases, the absolute value of the t-value increases

In regression analysis, t-statistics evaluate the significance of regression coefficients, indicating if a variable significantly contributes to the mode

30
Q

What are fitted values?

A

In regression analysis, fitted values are the predicted values of the outcome variable for a given set of data. They are also known as predicted values.

Fitted values are calculated using the estimated regression line, which is the line that minimizes the distance between the predicted and actual scores. The regression line is also known as the “line of best fit”

31
Q

What are residuals?

A

In linear regression, the difference between the fitted values and the true values of the outcome variable is called the residual. The residual is positive if the point lies above the regression line, and negative if it lies below.

32
Q

What is the coefficient of determination?

A

It is a measure of goodness of fit. A summary that indicates how well in the regression the independent variable explains the variation in the dependent variable

Therefore, the coefficient of determination measures the fraction of the total variance in the dependent variable explained by the independent variable.