Data Analysis week 6 Flashcards

1
Q

What does the covariance measure

A

The covariance of (x,y) measures the (strength and direction of the) relation of x and y, the spread of x and the spread of y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does the formula of the covariance look like

A

The variance is the covariance of a variable with itself. Therefore the formula for the covariance is the formula of the variance with x and y instead of x^2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the correlation

A

The correlation is the covariance normalized, such that it becomes scale independent and measures only the relation between x and y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does the covariance show

A

It shows the tendency of the variables to change together and about the spread.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does a positive covariance mean

A

If the covariance is positive, x tends to be high when y tends to be high and visa versa.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does a negative covariance mean

A

If the covariance is negative, x tends to be high when y tends to be low and visa versa.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does a positive correlation mean

A

If the correlation is positive, x tends to be high when y tends to be high and visa versa.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does a negative correlation mean

A

If the correlation is negative, x tends to be high when y tends to be low and visa versa.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the range of the correlation and what does the value tell us about x and y

A

The correlation is always a number between -1 and 1. If the correlation is equal to -1 or 1, x and y are on a straight line. If the correlation is 0, x and y are absolutely not on a straight line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What kind of relations does the correlation measure

A

Only linear relations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How can you check how sure you are of you estimated correlation

A

Use bootstrapping on the correlation (this can be done, because correlation is descriptive statistic).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the null hypothesis in hypothesis testing for the correlation

A

That there is no relation between the variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you make the null hypothesis true in hypothesis testing for the correlation

A

You break the relation between the variables by random shuffling on of the variables. You do this by drawing without replacement (different to bootstrapping, where we draw samples with replacement). This is called permutation testing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a prediction model

A

A model that describes the relation between variables in such a way that other values of the variables can be predicted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the equation of a regression model

A

y = alpha + beta * x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How can you compute alpha and beta for the equation of a regression model and what are these called.

A

By using formulas that use the covariance. These alpha and beta are called the regression coefficients. These form the best fitting line for the dataset.

17
Q

What are alpha and beta

A

alpha is the intercept and beta (the slope) is the point where the regression line intersects the y-axis, So the y-value of the smallest x-value in the dataset.

18
Q

What does the regression line show and what is a condition for a good regression line

A

The regression line shows you what the values of the variables should be. Prediction models will make mistakes for predicting values, but it should make the same kind of mistakes for different values.

19
Q

How can you check how sure you are of your computed regression coefficients

A

By bootstrapping, and/or by displaying a bunch of regression lines computed on resamples.

20
Q

What is a residual and what do residuals measure

A

A residual is the difference between the actual value of an observation and its predicted value. Residuals measure the prediction error

21
Q

What does a residual model show

A

It shows if the prediction model wits the data well.

22
Q

When does a regression model fit the data well and how do you check this

A

If the residuals have about zero mean everywhere and have about the same spread everywhere. You check this by plotting the residuals and adding a smooth line.

23
Q

What is the coefficient of determination

A

R^2. It checks how much the model explains. Is always a value between 0 and 1. 0 meaning we’re not explaining anything and 1 meaning we’re perfectly explaining the data.

24
Q

What do prediction intervals show

A

They include uncertainty of the prediction itself.