Lecture 6 - Statistical Tests II: Linear Regression Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

in what instance would we select a linear regression?

A

interested in association → interested in trend [where x is continuous] → “experiment” → y-continuous → y-normal → linear regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

linear regression is a statistical model which shows:

A

the relationship between 2 continuous variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what questions should we be asking when we are choosing a statistical test?

A

(1) what type of response variable? [continuous, discrete/count, proportion, binary]

(2) what type of explanatory [continuous, discrete/count, proportion, binary, categorical]

(3) interested in differences or trends/relationships?

(4) paired or independent sample

(5) normal/normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what type are variables are present when we select a chi-squared statistical test?

A

when we are dealing with two categorical variables (y-counts, x categorical)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what variables must be present in order for us to carry out a linear regression statistical analysis?

A

for a linear regression we must both have a continuous X & Y variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

gradient =

A

change in y / change in x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what are the three stages when trying to calculate a linear regression?

A

(1) choose your model: linear / non-linear

(2) estimate the parameters of the model

(3) model fit: how well does the model describe our data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is a Y bar that is found horizontally across the span of a graph?

A

a Y-bar, indicated by a dotted line labeled with a Y with a line on top of it shows the mean value line in your data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

how do you calculate the total sum of squares?

A

total sum of squares is the sum of all the squared distances between your data points and the Y-Line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is the equation of a line and its units?

A

y [w/ a hat] = a +bx

where:
a = intercept
b = slope

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is the error sum of squares (residuals)?

A

error sum of squares or residuals = the sum of all the distances between each individual data point and the line of best fit (y[w/ a hat] = a + bx)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what must all lines of best fit pass through and what allows us to choose what line of best fit is the most appropriate one?

A

all lines of best fit need to go through the mean-line of Y & X

we select the best line when the unexplained variation in our response is the smallest - when our residuals are the smallest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

with regression, if the slope is positive or negative, what does this show about the relationship between the two variables?

A

if the slope is positive: the relationship between the variables is positive

if the slope is negative: the relationship between the variables is negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what happens to the total sum of squares, SST, if we add additional data points?

A

the value gets larger

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

how to calculate mean sum of squares:

A

calculate mean variability = mean sum of squares (MS) = divide our total sums of squares by our sample size

mean sum of squares = sum of square deviations from the mean / degrees of freedom

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

how do you construct and fill out an ANOVA table?

A

source | SS |D.O.F| MS
regression | SSR| 1 | SSR
error | SSE| n-2 |S^2=SSE/N-2
total | SST| n-1 |

[for regression: you need an additional column called ‘F’ which is the statistic in which F = SSR / S^2

17
Q

degrees of freedom regarding SST & SSE:

A
  • SST requires estimation of 1 parameter (mean of Y) => n-1 degrees of freedom
  • SSE requires estimation of 2 parameters (mean of y, slope) => n-2 degrees of freedom
18
Q

SSR + SSE =

A

SST

19
Q

F-distribution percentile (5%) command in R:

A

qf(0.95,1,n-2)

20
Q

what is F is larger than the critical value?

A

this means that we must accept the alternative hypothesis and reject the null hypothesis - we infer that the probability that relationship is due to chance is <0.05

21
Q

we are only allowed to add a trend line if:

A

we are only allowed to finally add out trend line if we reject the null hypothesis that the slope = 0, if the slope is not significantly different from 0, we must not add a trend line

22
Q

we are only allowed to carry out a linear regression if certain assumptions are fufilled:

A
  • residuals must be normally distributed
  • the variance associated with the distribution of the the residuals ions constant (ie. variation in y does not increase with increasing x)
  • individual measurements are independent
  • data comes from a random sample
23
Q

how can we test if our assumptions are met or violated when questioning wether we can write a linear regression?

A

we can see if our assumptions are violated through using diagnostic plots in R

residuals vs fitted: in the first one we ask whether the variance is consistent or constant [we want it to look scattered like the sky at night]

normal Q-Q: int he second plot we check whether residuals are normally distributed and we want them to fall onto the regression line (just about)

24
Q

full and complete command needed to have a linear regression in R:

A

(1) data<-read.csv(“excel_sheet1.csv”, header = T, stringsAsFactor = T)

(2) attach(data)

(3) names(data)

(4) m1<-lm(y variable~x variable)
#”m1” is simply your model name

(5) summary.lm(m1)

(6) summary.aov(m1)

(7) plot(m1)

once you see the sky at night and straight line you can assume normal distribution and therefore plot your linear regression

25
Q

how can you interpret the results of your plot(m1) command in R?

A

you will be given two micro-graphs: Residuals vs Fitted & Normal Q-Q

for Residuals vs Fitted you want to see “sky at night”

for Normal Q-Q you want to see a plot where data point follow the line of best fit

26
Q

how can you create a linear regression model and make a graph with a trend-line in R?

A

create a linear regression model using the command:

m1<- lm(y-variable~x-variable)

plot(y-variable~x-variable, pch=19, las = 1)

abline(lm(y-variable~x-variable))

27
Q

what can you not add when in a linear regression when your p value is greater than >0.05

A

you can not add a regression line!

28
Q

what sort of sample does the data come from in linear regressions?

A

random samples are used in linear regressions

29
Q

type of variance in a linear regression:

A

variance is constant in linear regressions

30
Q

total sum of squares [SST] =

A

regression sum of squares [SSR] + error sum of squares [SSE]

31
Q

null and alternative hypothesis in regards to the slopes in linear regressions:

A

null hypothesis = the slope is not significantly different from zero

alternative hypothesis = the slope is significantly different from zero x

32
Q

regressions are used when:

A
  • when we are interested in how x and y are related
  • to analyse experimental data (x manipulated)
  • when x and y cannot be swapped: assume y depends on x
33
Q

linear regression null hypothesis:

A

slope = 0

34
Q

when is the slope not significantly different from 0?

A

SSE = SST