Statistics 6 Flashcards

1
Q

Outline 3 characteristics of explanatory statistics

A
  1. Most powerful form of statistical analysis
  2. Determines causation of relationship
  3. Strick assumptions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does regression allow us to do with statistics?

A

It allows us to make a numerical prediction of how one variable linearly affects another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does carrying out regression allows us to define?

A

Causation relationship - which variables are independent and which are dependent i.e. which one affects the other and by how much

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the regression line?

A

Numerical description of the line of best fit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the three assumptions/conditions of a regression line?

A

Continuous data
Parametric (Normally distbd. and n>30)
There will always be scatter from the perfect relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the standard format for expressing a regression line? What are the components of this format?

A

Y = a + bx [a = y-intercept and b = regression coefficient]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What criteria is the regression line based of? explain this criteria

A

“least-squares criterion” - ensuring that there is equal total distance of points from the line either side of the line (total distance of points above line = total distance below line). This can be satisfied in many lines and so we need to make sure that the line drawn is the best fit between all the points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What two characteristics do we need to look at to determine whether the regression line is useful?

A

Unexplained variance and explained variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is explained variance?

A

The variation of the points from the line that can be understood/ explained by the regression line. The closer the points are to the line (in between the regression line and the mean y-value) the higher the explained variances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is unexplained variance?

A

The variation of the points from the line that are not well understood/explained by the regression line that has been drawn. The further away the points are from the regression line (not on the side with the mean y-value, the higher the unexplained variance)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you define a useful regression line and a bad regression line based off the explained and unexplained variance?

A

The higher the explained variance and the lower the unexplained variance, the better the regression line is because it represents a greater amount of the dataset. If this is reversed then the regression line is not very representative as it does not explain much of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is another word for the unexplained variance?

A

Residuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

If you are visually determining the usefulness of the regression line, what do you need to do and what is another key component?

A

You need to compare the height of the unexplained variance and the explained variance. To do this, measure the height difference of the regression line to the point in question and then compare it against the height difference from the regression line from the same point to the mean y-value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the f-ratio?

A

Value which represents the ratio of the explained variance to the unexplained variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the notation for the explained variance?

A

S (subscript y and superscript 2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the notation for the unexplained variance?

A

S (subscript e and superscript 2)

17
Q

What is the best way to remember the different notations for the explained and unexplained variance?

A

the unexplained variance notation has an e which you would think should represent the explained variance - but no. Instead, they are the other way round so just think of them as opposites to what they represent.

18
Q

How do you calculate the f-ratio and what does the result mean?

A

Explained variance divided by unexplained variance. If >1 then this means the explained variance is bigger than the unexplained variance, if <1 then this means the unexplained variance is bigger. We can then use this to determine the usefulness of the regression line. the bigger the value of f the better the regression line.

19
Q

What is the coefficient of explanation?

A

It determines what proportion of the variance in Y can be explained by the variance in X

20
Q

What notation is used for the coefficient of explanation?

A

r superscript 2

21
Q

What would a coefficient of explanation that is 0.85 mean?

A

that 85% of the variance in y can be explained by the variance in X

22
Q

What can the coefficient of explanation range between and what does the sign of this value mean? What calculation is it similar to?

A

-1 to +1: if positive or negative then this indicates the direction of the relationship. Similar to PMCC

23
Q

Using the knowledge of what the coefficient of explanation value means, what does this mean about the explained and unexplained variance?

A

If the value is high then this means that the explained variance is also high

24
Q

What are the two source of error in regression models?

A
  1. Standard error of residuals = larger scatter around the regression line i.e. the error is that the residuals are very scattered
  2. Sampling error = regression line characteristics error (i.e. y-intercept and coefficient gradient)
25
Q

What is uncertainty/error in the y-intercept of the regression line represented as?

A

lines either side of and equidistance to the drawn regression line to indicate that the error could go either way

26
Q

What is uncertainty/error in the regression coefficient of the regression line represented as?

A

Lines travelling at opposite angles to the drawn regression line through the same point. They are angled alternately to the line.

27
Q

What is the combined uncertainty/error in the y-intercept and regression coefficient (total sampling error) represented as?

A

Two lines that curve toward and then away from the regression line either side of it

28
Q

Is the sampling error normally represented on a plot with the regression line?

A

Yes

29
Q

What is the purpose of the standardized residual calculation?

A

To identify any usual residuals

30
Q

What is homoscedacity?

A

Residuals are consistent across the x-axis i.e. there is no real change in their variation across the distribution when you are predicting y form x which is really important

31
Q

What is autocorrelation?

A

Where each correlation between x and y is not independent of each other

32
Q

What is the problem of autocorrelation?

A

Regression assumes that autocorrelation does not happen i.e. each correlation result is independent of each other

33
Q

What is the Durbin-Watson statistic?

A

A numerical statistic for determining autocorrelation

34
Q

What is the coefficient of explanation statistic represented by in SPSS?

A

“R. square value”

35
Q

What is the t-value in SPSS?

A

The statistical significance of the regression coefficient value

36
Q

How do you interpret the Durbin-Watson statistics?

A

=2 means no autocorrelation
<2 means positive autocorrelation
>2 means negative autocorrelation