Regression Flashcards

Question 1

Q

When would you use regression?

Answer

A

When considering relationships between a continuous predictor variable and a continuous response variable.

Question 2

Q

How do you plot Least Squares Regression?

Answer

A

1) Plot a point at coordinate (mean of x, mean of y)

2) The best fit line is the line that minimises the squared deviations of data points from the line.

Question 3

Q

What is the equation of a line and an alternative notation?

Answer

A

1) y=mx+c
2) y= A0 + A1x

(Where A0 is c, A1 is the gradient, and x is x)

Question 4

Q

What is another term for the gradient?

Answer

A

Coefficient

Question 5

Q

1) What does Pearson’s correlation coefficient range from?

2) what does a Pearson’s coefficient of -1 or +1 mean

Answer

A

1) Ranges from -1 to 1

2) -1 is a perfect linear negative correlation
+1 is a perfect linear positive correlation

Question 6

Q

What does Pearson’s correlation coefficient assume?

Answer

A

Assumes correlation must be linear

Question 7

Q

What does a Pearson’s correlation coefficient of 0 indicate?

Answer

A

There is absolutely no relationship between x and y.

Question 8

Q

What is Spearmans rank?

What is it used for?

Answer

A

1) This is a non parametric correlation coefficient which doesn’t assume correlation is linear.
2) Is used to look at monotonic correlations

Question 9

Q

How is a Spearman’s rank calculated?

Answer

A

The raw x and y data is converted into ranks. It the correlation is monotonic the ranks will appear as a perfect linear relationship.

Question 10

Q

What is Spearman’s rank compared to Pearson’s correlation coefficient?

Answer

A

Spearman’s rank is simply the Pearson’s correlation coefficient of the ranked data as opposed to the raw data.

Question 11

Q

Can p-values be associated with correlation coefficients?

Answer

A

Yes and they would indicate if the correlation is significantly different from 0

Question 12

Q

What are 3 types of general linear models?

Answer

A

1) ANOVA
2) ANCOVA
3) Linear regression

Question 13

Q

What is the form of a General linear Model?

Answer

A

Y= A0 + A1x + A2x + (B1 or B2 or -B1-B2) + E

Where: A0 is a constant

A1 is the gradient of predictor variable 1

A2 is the gradient of predictor variable 2

(B1 or B2 or -B1-B2) is the effects of categorical predictor variables

E is the error which is normally distributed

Question 14

Q

What determines significance in General linear models?

Question 15

Q

What is R^2?

Answer

A

This is how much variation in the data/model have we explained.

1- (residual sum of squares/total sum of squares)

Question 16

Q

What is the residual sum of squares / Total s of squares?

Answer

Study These Flashcards

A

Proportion of variation that hasn’t been explained

Question 17

Q

How can you use the Pearson’s coefficient to find R^2?

Answer

Study These Flashcards

A

R^2 = the Pearson’s correlation coefficient(r)^2

Question 18

Q

What is Simpsons paradox?

Answer

Study These Flashcards

A

This is when you come to the wrong conclusion because potential lurking variables haven’t been taken into account.

Question 19

Q

What is interpolation?

Answer

Study These Flashcards

A

Predicting values of the response variable within a zone of measured values.

Question 20

Q

What is extrapolation?

Answer

Study These Flashcards

A

Predicting values of the response variable outside the zone of measured values.

Question 21

Q

What can be used if a relationship isn’t linear?

Answer

Study These Flashcards

A

1) Linear regression using polynomial explanatory variables

2) Non linear regression

Question 22

Q

What is an example of non linear regression?

Answer

Study These Flashcards

A

Random forest regression

Question 23

Q

What is random forest regression?

Answer

Study These Flashcards

A

This is a forest of decision trees. The trees are built on training data you provide the algorithm.

Randomness comes from building lots of trees only based in a subset of the data that it randomly samples each time.

The decision trees are used to make predictions and the average prediction of the forest of decision trees is used to fit the regression line.

Question 24

Q

What are advantages/ disadvantages of random forest regression?

Answer

Study These Flashcards

A

Advantages: Based entirely on the data it has and therefore we cannot impose any of our ideas for the nature of the relationship.

Disadvantages: 1) can be slow
2) can sometimes overfit the data

Regression Flashcards

(24 cards)