Week 3: Regression Flashcards

1
Q

What is needed for simple linear regression? - (3)

A
  • What sort of measurementt = DV
  • How many predictor = 1
  • What type of predictor variable = continous
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Regression is a way of predicting things you have not measured - (2)

A

Predicting an outcome variable from one predictor variable. OR
Predicting a dependent variable from one independent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Rgeression predicting variable y from

A

variable x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In regression

when we know that x should

A

influence y (insttead of y influencing x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Regression used to create a linear model of relationship between

A

two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Regression has the different to correlation as it adds a

A

constant bo

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In regression we create a model to predict y - (2)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Regression equation

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In regression we test how good the model we created to predict y is

A

good at fittting the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The straight line equation in regression model has 2 parameters - (2)

A

The gradient (describing how the outcome changes for a unit increment of the predictor)

The intercept (of the vertical axis), which tells us the value of the outcome variable when the predictor is zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Yi in regression equation means

A

outcome variable e.g., album sales

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

b0 in regression equation means

A

intercept

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

biXi in regression equation means - (2)

A

Regression coefficient for predictor

e.g., direction and strength of the relationship between advertising budget & album sales

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

εi in regression equation means - (2)

A

eror

e.g., Error album sales not explained by advertising budget

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

biXi in regression equation means - (2)

A

Predictor variable

e.g., advertising budget

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Example of using simple linear regression equation to predict values - (3)

A
  • Imagine you want to spend £5 on advertising – you would then get this equation here –
  • so based on your model we can predict that if we spend £5 on advertising, we will sell 550 albums (the error term).
  • What is left shows this prediction will not be perfect as there is always a margin for error. Your outcome variable is also known as a predicted value in a regression.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

The closer the Sum of Squares of the model (SSM) is to the total sum of squares SST) to the data,

A

the better the model accounts for the data and smaller residual sum of squares (SSR) must be

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Formula of Sum of Squares of Model (SSM)

A

SSM =SST (total) - SSR (residual)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

SST (total) uses the

A

difference between the observed data and the mean value of Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Sum of squares (residual) uses the difference between the

A

observed data and the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Sum of squares model uses the difference between

A

mean value of Y and the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

In simple linear regression, R^2 is the proportion of variance in DV (outcome variable; Y) that is explained

A

by IV (predictor variable X) in regression model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

The R squared (Pearson’s Correlaiton Coefficient squared) is the

A

coefficient of determination

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

R^2 gives you overall fit of model thus

A

model summary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Adjusted R squared tells

A

how well R squared generalises to population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Adjusted R squared indicates how well a

A

predictor variable explains the variance in the outcome variable, but adjusts the statistic based on the number of independent variables in the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Adjusted R squared will always be lower or equal to R^2 value because..

A

It’s a more conservative statistic for how much variance in the outcome variable the predictor variable explains

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

. If you add more useful variables, adjusted r-squared will

A

increase

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

If you add more and more useless variables in mode, what will happen to adjusted R squared?

A

adjusted R squared will decrease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

How to calculate R squared in simple linear regression?

A

SSM (SST - SSR)/SST

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

R squared gives raio of

A

explained variance (SSM) to total variancw (SST)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

The F ratio tests if the line is better than the mean meaning if

A

overall model (fitted regression line) is a good fit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is the mean squared error? (3)

A
  • Sum of squares (SS) are total values
  • Can be expressed as averages
  • These are called Mean squares MS
34
Q

What is SSM divided by DF gives

A

mean sum of squares (MSM)

35
Q

SSR divided by DF gives

A

Mean sum of residuals (MSR)

36
Q

What is calculation of F raito?

A

MSM/MSR

37
Q

F ratio measures the ratio of

A

MSM to MSR

38
Q

If model is good then F ratio will

A

account for a large portion of variance MSR as compared to what is left - residuals MSR

39
Q

Diagram of SSM/SSR/SSR labelled for ANOVA

A
40
Q

The DF in SSM/DF represents

A

number of variable in the model

41
Q

The DF in SSR/DF represents

A

number of observation minus number of parameters

42
Q

What is line coefficients in this output of regression for model parameters of regression line?

A
  • Line coefficients is intercept B0 and slope bi
43
Q

What is bi ( slope) in this regression output for model parameters of regression line?

A
  • change of outcome associated with a unit change in predictor
44
Q

What is standard error in this regression output for model parameters of regression line

A

indicates how far off you would be, on average, if you were to use the independent variable and th model to predict scores on the dependent variable

45
Q

Where is beta in output of simple linear regression for model parameters of regression line?

A

beta = r

standardised coefficient gives correlation coefficient in simple regresion

46
Q

What does this part of simple linear regression output show for model parameters of regression line?

A

t statistic and associated p-value

47
Q

Regression line looks at the

A

variance that we cannot explain vs variance we can explain with model

48
Q

Assumptions of linear regression - (7)

A
  1. Variale type = outcome must be continous and predictors can be continous or dichotomous
  2. Non-zerio variance - predictors must not have zero variance
  3. Independent = all values of outcome should come from different person
  4. Linearity = relationship we model is in reality linear
  5. Hommoscedasticity -> for each value of the predictors the variancw of the error term should constant
  6. Independent Erros: For any pair of observations, the error terms should be uncorrelated (see Durbin-Watson test)
  7. Normally distributed errors
49
Q

Diagram of non-linear in simple linear regression

A
50
Q

Diagram of good vs bad homoscedasticity on plot of residual values (veritcal axis) and predicted

A

Good = all data points occupy all four quarrters of plot
Bad = residuals look like a cone

51
Q

What is homosecedasticity mean?

A

“having the same scatter.”

52
Q

Diagram of heterodasticity and homoscedasticity

A
53
Q

Another Diagram of homoscedacity and heterodasticity

A

hetrodasticity –> points higher on x axis have larger variance than smaller ones , points are at widely varying distances from regression line

54
Q

Diagram of good and bad of normality of errors = frequency histograms of residuals

A

Bad histogram = positively skewed

55
Q

Correlation does not mean

A

causation e.g., even if tthey make sense

56
Q

Spurious correlations can occur when an

A

unknown variable could drive the effect

57
Q

Example that correlation does not mean causaiton

A

relationship between the predictor variable - visits to the pub - and the outcome variable - exam score. There is a correlation between the two variables, but would we really think that more visits to the pub would cause better exam performance? Perhaps there was a third variable that might explain the link? Maybe there was a support session on statistics that was held between 4 and 5pm in a building next to a Pub?

58
Q

Example of simple linear regression quesiton is

A

Does poverty levels predict the number of teen births?

59
Q

Example of simple linear regression:

Does poverty levels predict the number of teen births?

What is x and y? - (2)

A

x = poverty rate, which is the percent of the state’s population living in households with incomes below the federally defined poverty level.

y = year 2002 birth rate per 1000 females 15 to 17 years old

60
Q

Example of simple linear regression:

Does poverty levels predict the number of teen births?
x = poverty rate, which is the percent of the state’s population living in households with incomes below the federally defined poverty level.
y = year 2002 birth rate per 1000 females 15 to 17 years old

What is H0 and H1? - (2)

A

H0: The slope equals 0, i.e. poverty levels do not predict teen birth rate

H1: The slope is different than 0, i.e. poverty levels predict teen birth rate

61
Q

Example of simple linear regression:

Does poverty levels predict the number of teen births?
x = poverty rate, which is the percent of the state’s population living in households with incomes below the federally defined poverty level.
y = year 2002 birth rate per 1000 females 15 to 17 years old

What does its fittted model y = 4.267 + 1.373x mean? - (2)

A

The slope (Β1= 1.373) indicates that the 15 to 17 year old birth rate increases 1.373 units, on average, for each one unit (one percent) increase in the poverty rate.

The intercept (B0: =4.267) means that if there were states with poverty rate = 0, the predicted average for the 15 to 17 year old birth rate would be 4.267 for those states.

62
Q

Correlation vs regression

In correlation - (5)

A
  1. Does not imply causation
  2. All we can say is that two variables are related/associated
  3. X and y can be swapped
  4. One outcome value
  5. No regression line on scatterplots!
63
Q

Correlation vs regression

In regression - (4)

A
  1. Independent variable influences the dependent (outcome) variable
  2. X and y cannot be swapped!
  3. Has a model: equation to allow predictions outside of current measurements
  4. Regression line of the model on a scatterplot
64
Q

If p in SPSS output is 0.000 we report as

A

p < 0.001 as p is never 0

65
Q

What does this SPSS simple linear regression output show?

A

Our model is significantly better at predicting the data than the null model (F (1, 118) = 729.43, p<.001) and explains 86% of the variance in our data (R2=.86)

66
Q

What does this SPSS output show? - (2)

A

y = 3.19 x + 391.67

Our model is significantly better at predicting the data than the null model (F (1, 118) = 729.43, p<.001) and explains 86% of the variance in our data (R2=.86). For every 1 unit increase in alcohol there is a 3.19 increase in break reaction time (B = 3.19, t = 27.01, p<.001)

67
Q

Which of the following statements about Pearson’s correlation coefficient isnottrue?

A.It can only be used with continuous variables
B.It can be used as an effect size measure
C.It varies between –1 and +1
D.A correlation coefficient of zero indicates there is no relationship between the variables

A

A - biserial and point biserial correlation , Pearson correlaiton coefficient can be usd witth binary and ccategorical variables

68
Q
A
69
Q
A
70
Q

A psychologist was interested in whether the amount of news people watch (minutes per day) predicts how depressed they are (from 0 = not depressed to 7 = very depressed). What does the standardized beta tell us in the output?

A - As news exposure decreases by 0.224 standard deviations, depression increases by 1 standard deviation

B - As news exposure increases by 1 minute, depression decreases by 0.224 units

C – As news exposure decreases by 0.224 minutes, depression increases by 1 unit

D - As news exposure increases by 1 standard deviation, depression decreases by 0.224 of a standard deviation

A

D - standardised beta coefficient of -0.244 for news exposure

71
Q
A
72
Q

A psychologist was interested in whether the amount of news people watch predicts how depressed they are.
In this table, what does the value 4.404 represent?

A - The ratio of how much the prediction of depression has improved by fitting the model, compared to how much variability there is in depression scores

B - The ratio of how much error there is in the model, compared to how much variability there is in depression scores

C - The proportion of variance in depression explained by news exposure

D - The ratio of how much the prediction of depression has improved by fitting the model, compared to how much error still remains

A

D

We don’t know the overall variability, but only the error. The other options are wrong because we do not know how much variability there is in depression score. We don’t measure the variability of the population, but only of the observed one, and this is the sample. If, instead of closing with “depression score” there was the specification “of the sample”, then this would have been correct.

73
Q

The coefficient of determination:

A.Is the square root of the variance
B.Is a measure of the amount of variability in one variable that is shared by the other variable
C.Is the square root of the correlation coefficient
D.Indicates whether the correlation coefficient is significant

A

B

The proportion of the variation in the outcome variable (Y) that is predictable from the predictor variable (X).
A measure of how much variability in one variable can be “explained by another”.
R² shows how well terms (data points) fit a model curve or line.
An R² value of 0.78 indicates that 78% of the variation in Y is determined by the relationship between Y and X.

74
Q

The correlation beween 2 variables A and B is 0.12 witth significance of p < 0.01.

What can we conclude?

A. there is a small relationship between A and B
B. There is a substantial relationship between A and B
C. That variable A causes variable B
D. That variable A causes variable B

A

A -

+/- 0.1 represents small, +/- 0.3 represents medium and +/- medium effect

75
Q

The table below contains scores from 6 people on 2 different scales that measure attitude towards reality TV showers

Using scores aabove, the scales are likely to

A. correlate positively
B. correlate negatively
C. be uncorrelated

A

A - high scores on one scale tend tto produce high score son other and low scores on one also correspond with low socres in another

76
Q

A Pearson’s correlation coefficient of -0.5 would be represented by a scatterplot which

A. There is a moderately good fit between theregression line and the individual data points onthe scatterplot

B. Half of the data points sitt perfectly on the line

C. Regression line slopes upwards

D. Dtaa cloud looks like a circle and regression line is flat

A

A

77
Q

If two variables are significantly correlated r = 0.67
then..

A. share variance
B. No unique variance
C. Relationship is weak
D. Variables are independent

A

A , not D as variables correlated are not independent

78
Q

What do the results in table below show?

A. In a sample of 100 people, there was a strongnegative relationship between work productivityand time spent on Facebook, r = –.94, p < .001

B. In a sample of 100 people, there was a weaknegative relationship between work productivityand time spent on Facebook, r = –.94, p < .001

C. In a sample of 100 people, there was a non-significant negative relationship between workproductivity and time spent on Facebook, r = –.94,p < .001

A

A

79
Q

A Pearson’s correlation of –.71 was found between number of hours spent at work andenergy levels in a sample of 300 participants. Which of the following conclusions can bedrawn from this finding?

A. There was a strong negative relationship betweenthe number of hours spent at work and energylevels

B. Spending more time att work caused participants to have less energy

C. Amout of time spent at work accounted by 71% of variance in energy levels

D. Estimate of the correlation will be impreicse

A

A

80
Q

Example of simple linear regression:

A child psychologist was interested in whether playing video games was associated with child aggression.She collecteddata on 666 childrenand adolescents.She recorded how long each week children spent time playing video games, and then rated how aggressive the children were in a social situation.

The variables are:
VideoGames: hours spent playing video games per week
Aggression: Rating of child aggressiveness (higher scores indicate increased aggression

Report R^2 to 2 DP

Report DF

Report p value

Report Adjusted R squared - (4)

A
  • R squared = 0.03
  • DF = 1,664 (DF of regression, DF of residual)
  • P-value = < 0.001
  • Adjusted R squared = 0.02
81
Q

The p -value for F statistic in simple linear regression tells whether ….

The p-value for t statistic in simple linear regression tells us whether… - (2)

A
  • overall proportion of variance in outcome explained by predictor is significant
  • Used to calculate significance of predictor