Module 5 Flashcards

1
Q

Most often a research question will lead to a model such that…

A

a dependent variable is a function of one or more independent variables or predictors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

General linear model

A

represents a dependent variable as a function of population means

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Intercept only model

A

A participants score equals the population mean plus an error term

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the linear model when the independent variable is binary?

A

yi = B0 + B1xi + ei

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does this model say?

yi = B0 + B1xi + ei

A

The value of y for participant i…

B0 = intercept parameter

B1xi = slope parameter multiplied by the value of x for participant i

ei = error term for participant i

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

If xi = 0 (ie. independent label is 0) what does B0 = ?

yi = B0 + B1(0) + ei

A

B0 = intercept parameter equals the population mean for participants who do not have a diagnosis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

If xi = 1 (ie. independent label is 1) what does B1 = ?

yi = B0 + B1(1) + ei

A

B1 = the difference between the population mean for participants who do not have a diagnosis and the population mean for participants who do have a diagnosis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain each unit in the model that includes the population means of the two groups:

yij = uj + eij

A

yij = the value of the dependent variable individual i in group j

uj = population mean of the dependent variable for group j

eij = same as above but now includes the j subscript to index group membership

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

b1 =

A

u2 - u1

difference between population mean for participants without diagnosis and with diagnosis

because

u1 = B0
u2 = B0 + B1
u2 - u1 = B1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to calculate predicted mean of each group from parameter estimates??

A

B0hat= intercept value

B1hat= Intercept value plus slope coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what does the hat indcate?

A

estimated or predicted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Because Bhat1 is the estimated difference between the population means of the two groups it represents what?

A

A point estimate of the effect of the independent variable on the dependent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

If we found that the 95% confidence interval is 72 to 1258 around this effect estimate that we got from subtracting the slope value from the intercept. what is the conclusion?

A

This interval captures the parmeter B1 with 95% confidence

Because B1 = u1-u2, the interval captures the population mean difference with 95% confidence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Formula for calculating this CI

A

Bhat1 +/- tcrit Sbhat1

tcrit = critical value from a t distribution that sets off alpha/2 in each tail

Sbhat1 = estimated standard error of the slope parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the degrees of freedom in the situation of a binary independent variable?

A

N-2 because there are two coefficients - intercept and slope in the estimated model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is standard error?

A

Standard deviation of the sampling distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Pooled Variance/ homogeneity of variance

A

assumes that the population variance of the dependent variable is equal across the two groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Null hypothesis writing and formulas

A

H0 : u1 = u2
The population mean of group 1 equals the population mean of group 2

B0 = B0+B1
Only holds if B1=0 therefore
H0: B1 = 0

OR H0: u2 - u1 = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the purpose of a statistical model?

A

describe or explain individual differences or variation in a dependent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

If a model does a good job of accounting for individual differences, then the variance of the errors should be….

A

relatively small to the overall variance of the variable

variance of the model should be lower than the overall dependent variable

indicating that the full model has accounted for or explained a portion of the dependent variable variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

proportion reduction in error/coefficient of determination R2

A

Represents the proportion of the dependent variable variance explained by the model

22
Q

In the context of a single binary independent variable, the proportion reduction in error is equivalent to what?

A

the standardized effect size eta - squared n^2

23
Q

How to manually calculate PRE or coefficient of determination or R^2

A

variance of the dependent - model variance/ variance of dependent

24
Q

What does an ANOVA do with respect to variation?

A

Involves partitioning the total sample variation of the dependent variable into variation explained by the model and error variation.

Error variation is also referred to as residual variation.

25
Q

standard deviation is what in relation to variance

A

square root of the variance

26
Q

Variance is what?

A

the sum of squared deviations from the mean

27
Q

F Statistic

A

Partitioning the total variability into variability due to the model (numerator) and residual variability (denominator)

compares variance explained by the model with remaining variance

28
Q

Total SS for Y

A

Sum of the squared deviations of observed values of Y from the mean of Y

sum (yi - ybar)^2

29
Q

Model SS for Y

A

Sum of the squared deviations of an individuals predicted mean from the mean of Y

30
Q

Why is it called variability explained by the model

A

Because it summarizes the amount of predicted variation due to group membership relative to the overall mean

31
Q

Residual SS for Y

A

The sum of the squared residuals (predicted errors) across all observations described earlier

32
Q

What does P represent

A

P is the number of group indicators in the linear model, with P = 1 for our current example using only a single binary independent variable

In general P = J-1 where J is the total number of categories formed by a categorical independent variable

33
Q

Formula for R^2

A

R2 = 1- (Ss resid/SS total)

34
Q

What does the F statistic test?

A

the F statistic drawn from the ANOVA table is used to test whether R^2 is significantly greater than 0.

35
Q

As a ratio of variance terms (MSmodel/MSresid = F), what is the range of the F statistic?

A

0 to +infinity

36
Q

If the null hypothesis is true, the variability explained by the model equals what?

A

If the null hypothesis is true, the variability explained by the model equals the error variability

  • in intercept only model the error variance equals the dependent variable variance

When null is true they are equal and thus F = approx 1 \

If the null is true and the model does nothing - can get the variance just by taking the variance of y which is equal to the error variance under the null model

37
Q

What does an F statistic significantly greater than 1 mean

A

Independent variable explains some variability in y

Null hypothesis rejected

38
Q

How is significance determined

A

p value from the F distribution with P numerator degrees of freedom and (N - P - 1) denominator degrees of freedom

39
Q

What is the p-value

A

the probability of obtaining the current F statistic or one that is greater, if the null hypothesis is true

If P is less than alpha, null is rejected

40
Q

A sample mean divided by its estimated standard error follows what distribution when the null is true

A

T distribution

41
Q

Standard error of the difference between two means

A

sqrt (pooled variance/sample size + pooled variance/sample size)

42
Q

When numerator degrees of freedom =1 what is F = to?

A

F = t^2

43
Q

Independent groups T-Test assumptions

A
  1. Observations are independent
  2. Dependent variable is normally distributed within each group
  3. Homogeneity of Variance
    - use of the pooled variance estimate in the formula for the standard error of the regression slope - based on the assumption that the sample variances of the two groups are both estimates of a single population variance
44
Q

When is there robustness to the violation of the normal distribution assumption?

A

When sample size is large

45
Q

When is there robustness to the homogeneity of variance assumption?

A

When sample size is equal

46
Q

Options to deal with non-normality

A

Transform the variable - skewness
Welch’s t-test - different variances

47
Q

Welch’s t-test

A

Instead of transforming the dependent variable we could use a t-test that’s robust to the violation of the homogeneity of variance assumption - the welch’s t test

Doesn’t use pooled variance to calculate the standard error

Instead the standard error of the difference between two sample means can be calculated as = sqrt (s1^2/n1 +s2^2/n2)

Welch’s t-test also involves a complex adjustment to the degrees of freedom known as the Satterthwaite approximation

48
Q

welch T test APA report

A

“The mean time reaction time was greater for those with a reading disorder diagnosis (M = 2039.76ms, SD = 1128.36) than in the control group (M = 1374.68ms, SD = 625.35). We used Welch’s t-test to test the mean difference and construct a 95% CI because the standard deviations
differed substantially between groups and the sample size was unbalanced. The 95% CI for the mean difference was [-1547.49, 217.33], and the sample means did not significantly differ, Welch’s t = 1.69, p = .12.”

49
Q

Log transformation APA

A

Because the reaction time variable was positively skewed within groups (skewness = 2.24 in the group with a reading disorder and = 2.24 in the control group), we applied a log transformation
prior to performing statistical inference. The skewness of the log-reaction time variable was 1.14 in the reading disorder group with 0.74 in the control group. The mean log-reaction time was significantly greater for those with a reading disorder diagnosis (M = 7.52 log ms, SD = 0.45)
than in the control group (M = 7.15 log ms, SD = 0.39), t (36) = 2.44, p = .02. The 95% CI for the mean difference was [0.06, 0.68].

50
Q

Independent T-test APA

A

The mean time reaction time was significantly greater for those with a reading disorder diagnosis (M = 2039.76ms, SD = 1128.36) than the control group (M = 1374.68ms, SD = 625.35), t (36) = 2.28, p = .03. The 95% CI for the mean difference was [72.14, 1258.02].