Data Analysis week 6 Flashcards

Question 1

Q

What does the covariance measure

Answer

A

The covariance of (x,y) measures the (strength and direction of the) relation of x and y, the spread of x and the spread of y.

Question 2

Q

What does the formula of the covariance look like

Answer

A

The variance is the covariance of a variable with itself. Therefore the formula for the covariance is the formula of the variance with x and y instead of x^2.

Question 3

Q

What is the correlation

Answer

A

The correlation is the covariance normalized, such that it becomes scale independent and measures only the relation between x and y.

Question 4

Q

What does the covariance show

Answer

A

It shows the tendency of the variables to change together and about the spread.

Question 5

Q

What does a positive covariance mean

Answer

A

If the covariance is positive, x tends to be high when y tends to be high and visa versa.

Question 6

Q

What does a negative covariance mean

Answer

A

If the covariance is negative, x tends to be high when y tends to be low and visa versa.

Question 7

Q

What does a positive correlation mean

Answer

A

If the correlation is positive, x tends to be high when y tends to be high and visa versa.

Question 8

Q

What does a negative correlation mean

Answer

A

If the correlation is negative, x tends to be high when y tends to be low and visa versa.

Question 9

Q

What is the range of the correlation and what does the value tell us about x and y

Answer

A

The correlation is always a number between -1 and 1. If the correlation is equal to -1 or 1, x and y are on a straight line. If the correlation is 0, x and y are absolutely not on a straight line.

Question 10

Q

What kind of relations does the correlation measure

Answer

A

Only linear relations.

Question 11

Q

How can you check how sure you are of you estimated correlation

Answer

A

Use bootstrapping on the correlation (this can be done, because correlation is descriptive statistic).

Question 12

Q

What is the null hypothesis in hypothesis testing for the correlation

Answer

A

That there is no relation between the variables.

Question 13

Q

How do you make the null hypothesis true in hypothesis testing for the correlation

Answer

A

You break the relation between the variables by random shuffling on of the variables. You do this by drawing without replacement (different to bootstrapping, where we draw samples with replacement). This is called permutation testing.

Question 14

Q

What is a prediction model

Answer

A

A model that describes the relation between variables in such a way that other values of the variables can be predicted.

Question 15

Q

What is the equation of a regression model

Answer

A

y = alpha + beta * x

Question 16

Q

How can you compute alpha and beta for the equation of a regression model and what are these called.

Answer

Study These Flashcards

A

By using formulas that use the covariance. These alpha and beta are called the regression coefficients. These form the best fitting line for the dataset.

Question 17

Q

What are alpha and beta

Answer

Study These Flashcards

A

alpha is the intercept and beta (the slope) is the point where the regression line intersects the y-axis, So the y-value of the smallest x-value in the dataset.

Question 18

Q

What does the regression line show and what is a condition for a good regression line

Answer

Study These Flashcards

A

The regression line shows you what the values of the variables should be. Prediction models will make mistakes for predicting values, but it should make the same kind of mistakes for different values.

Question 19

Q

How can you check how sure you are of your computed regression coefficients

Answer

Study These Flashcards

A

By bootstrapping, and/or by displaying a bunch of regression lines computed on resamples.

Question 20

Q

What is a residual and what do residuals measure

Answer

Study These Flashcards

A

A residual is the difference between the actual value of an observation and its predicted value. Residuals measure the prediction error

Question 21

Q

What does a residual model show

Answer

Study These Flashcards

A

It shows if the prediction model wits the data well.

Question 22

Q

When does a regression model fit the data well and how do you check this

Answer

Study These Flashcards

A

If the residuals have about zero mean everywhere and have about the same spread everywhere. You check this by plotting the residuals and adding a smooth line.

Question 23

Q

What is the coefficient of determination

Answer

Study These Flashcards

A

R^2. It checks how much the model explains. Is always a value between 0 and 1. 0 meaning we’re not explaining anything and 1 meaning we’re perfectly explaining the data.

Question 24

Q

What do prediction intervals show

Answer

Study These Flashcards

A

They include uncertainty of the prediction itself.

Data Analysis week 6 Flashcards

(24 cards)