Statistics: Regression analysis Flashcards

1
Q

Give another name for a regression equation

A

A prediction equation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is meant by a model in statistics

A

a simple approximation of how variables relate in population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is meant by a conditional distribution?

A

The probability of y values at a fixed value of x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When you have several quantitative variables what can software provide?

A

A correlational matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe the relationship between x and y in regards to their distance from the mean

A

If an x value is a certain number of standard deviations from the mean the y value is r times that many standard deviations from its mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

show two forms of predictions error

A
  • The error using the regression line to make a prediction (y-y^)
  • The error using the mean of y to make a prediction (y- mean of y)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What formula do you use to summarize the size of the errors in each scenario?

A

total sum of squares for means, residual sum of squares for regression equation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do we eliminate the dependance of units when summarising errors?

A

by calculating the proportional reduction in error using r^2 formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what does it mean if r^2= 0.6?

A

The error using y^ to predict y is 60% smaller than the error using the mean of y to predict y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Name two disadvantages of r^2

A
  • It’s easier to interpret the original scale than a squared scale
  • The direction of the relationship is lost
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Name two factors that have strong influence on on the size of the correlation apart from outliers

A
  • If the subjects are grouped for the observations rather than individually the correlation tends to increase in magnitude
  • The size of the correlation depends on the range of the x values sampled. The correlation is smaller when we sample only a restricted range of x values than when we use the entire range.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the name given to the error of making predictions about individuals based on the summary results of groups?

A

Ecological fallacy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

State the basic assumption for for using a regression line for description

A

The population means of y at different values of x have a straight line relationship with x in the form my=a+Bx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

State the extra assumptions for using regression to make statistical inference

A
  • Data was gathered using randomisation

- The population values of y at each value of x follow a normal distribution, with the same sd at each x value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the null hypothesis that x and y are statistically independent?

A

H0:B=0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does a small p value mean in this context

A

The regression line has a non zero slope

17
Q

How do you find the d value in this calculation and for the confidence interval?

A

n-2

18
Q

When may a confidence interval for B not have a useful interpretation? what do we do in this situation?

A

If a one unit increase in x is relatively small or large, we can then estimate the effect for an increase in x that is a more relevant portion of the actual range of x values