Wk 12: Modelling Association Flashcards

1
Q

If we carry out 1 hypothesis test with a significance threshold of 5%, the probability of making a type 1 error is ____ %.

A

5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

If we carry out 10 hypothesis tests with a significance threshold of 5%, the probability of making a type 1 error is ____ %. Why?

A

>5

  • If you are doing multiple comparisons, in order to reduce the chance of type 1 error, we decrease the significance level to 5%/10 (Bonferroni correction), so overall significant level is <5%. Or we look at Tukey HSD.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

If we carry out 1 hypothesis test with a significance threshold of 5%, the probability of making a type 2 error depends on _____. Why?

A

(1 - power)

  • To increase power, we can increase sample size, increase effect size, decrease errors, decrease methodological variability.
  • Experiment with >80% power is worthwhile.
  • Power analysis needs survey of literature, small sample study (pilot study), in order to figure out a clinically relevant sample size, typical variability etc. before the real experiment.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are 4 characteristics?

A
  1. All are normal distribution, but because of small sample (n = 20), they look skewed.
  2. 1st box plot: Normal.
  3. 2nd box plot: Median is far from the centre. Whiskers are negatively skewed.
  4. 3rd box plot: Median is positively skewed. Whiskers are negatively skewed.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are 3 characteristics?

A
  1. 1st box plot: Not normal. Positively skewed.
  2. 2nd box plot: Normal.
  3. 3rd box plot: Chance of seeing 1 outlier in normal distribution is 1% - acceptable.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are 4 characteristics?

A
  1. n = 100
  2. 1st box plot: 3-4 outliers in a sample >20 is not normal - seriously deviated.
  3. 2nd box plot: 1 outlier is acceptable. Normal.
  4. 3rd box plot: Normal.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a simple linear regression?

A
  1. The mean response is given by a straight line with constant normal variability about that line.
  2. Least-squares line: A line that minimises the sum of the squared residuals.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are 4 assumptions for linear regressions?

A
  1. Independent observations
  2. Linear association
  3. Normal variability
    1. Skews do not matter as much for large sample size because it will approach normally distribution (central limit theorem).
    2. But skews will undermine predictions made by the linear model. You can log transform the data to get rid of the skew.
    3. Can again check normal variability by checking residuals on scatter plot, P-P plot & histogram.
  4. Equal variances
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are regression models that use a straight line to estimate the mean response based on a predictor? What are 4 characteristics?

A

y = b0 + b1x

  1. y is the mean response
  2. x is the predictor
  3. Null hypothesis: The slope in the relationship is 0 (flat line).
  4. P-value is the probability of getting the b1 value by chance if there was no relationship between the response and the predictor.
    1. p < 0.05 suggests there is evidence against the null hypothesis, so there is a relationship between the response and the pred ictor.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are 3 characteristics?

A
  1. Null hypothesis: height does not affect mass (flat line).
  2. Mass and height show positive association (line going up).
  3. Residual is the vertical distance from each dot to the line.
    • Dots above line have positive residual ○ Dots below line have negative residual
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are 2 characteristics?

A

Checking residuals on scatter plot (simple linear regression)

  1. A constant band of dots suggests normal variability. Dots not converging or diverging.
  2. In this, the left side has less variability, the right side has more variability.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are 3 characteristics?

A

Checking residuals on P-P plot (simple linear regression)

  1. If you have normal distribution, the dots should fall on the line.
  2. This shows deviations in the middle, so the residuals are not normal
  3. Deviations on top or bottom do not matter as much as middle
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are 2 characteristics?

A

Checking residuals on histogram (simple linear regression)

  1. Normal distribution should be a bell curve
  2. Positively skewed distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are 2 characteristics?

A

Interpretations of Tukey HSD

  1. p = 0.674 means there is no difference between “last” and “uncertain”
  2. p < 0.001 means there is a significant difference between the other groups.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are 3 characteristics?

A

Interpretation of linear regression tables

Model summary table

  1. R is the Pearson correlation coefficient.
  2. R square is the % of variability explained by the model.
    • For simple linear regression, R square is the square of Pearson correlation coefficient ®.
  3. R square (0.556) means the linear regression model explains 55.6% of the variability.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are 6 characteristics?

A
  1. p < 0.001 mean there is an association between the variables.
  2. “Total” row corresponds to the sample standard deviation.
  3. Amount of groups is df + 1
  4. Amount of participants is 396 + 1 = 397 297
  5. Residual df is n - 2 since we need to estimate an intercept and slope before we can estimate residual variability.
  6. Regression df is 1 because we have 1 variable.
17
Q

What are 2 characteristics?

A

Coefficients table

  1. Standard error measures how uncertain we are about the parameter estimate.
  2. t = B / Std. Error. T value suggests that hearing B (0.126) is 12.37 standard errors away from the mean, which is unlikely (p < 0.001).
18
Q

What is a multiple linear regression? What is the dummy/indicator variable? What is the H0?

A
  1. Regression allows us to model the effects of multiple variables on a response.
  2. y = b0 + b1x1 + b2x2
  3. A dummy/indicator variable (x) codes a nominal variable as 1 and 0, so you can put it in linear regression.
  4. Null hypothesis: There is no difference between the groups.
19
Q

What is a characteristic?

A

Checking residuals on histogram (multiple linear regression)

  1. Roughly normal distribution
20
Q

What are 2 characteristics?

A

Checking residuals on P-P plot (multiple linear regression)

  1. Roughly normal
  2. In the middle, the big jump on both sides is a bit weird
21
Q

What is a characteristic?

A

Checking residuals on scatter plot (multiple linear regression)

  1. Constant band of dots suggests normal variability
22
Q

What are 2 interpretations of multiple linear regression?

A

Interpretation of multiple linear regression

  1. P < 0.001 suggests there is significant evidence of a hearing effect, after taking into account the age group; and significant evidence of an age effect, after taking into account the hearing level.
  2. In simple linear regression, hearing B was 0.126. Now taking in account of age, the hearing B is 0.088, meaning that some of the original relationship was actually due to the different age.
23
Q

What are 2 characteristics?

A
  1. p = 0.001 means there is an association between working memory & QuickSIN score.
  2. TestA B score -0.046 means negative association
24
Q

What are 3 characteristics?

A
  1. Age has insignificant association with QuickSIN score (p = 0.605)
  2. Hearing has significant association with QuickSIN score (p < 0.001)
  3. Working memory has weak significant association with QuickSIN score (p < 0.094)