extras Flashcards

1
Q

what to remember when describing a distribution

A
  1. centre - median need to SAY median
  2. Spread - IQR - such that the middle 50% of scores are situated btw x and y + max and min
  3. Shape - peaks and distribution of scores + skewness
  4. any outliers?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Outliers can occur because of?

A

sampling error

participant error

researcher error

random chance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

probability density functions

A

hypothetical population distribution are defined using mathematical formulas known as pdfs - give the probability of observing a particular value of a variable

total area under the curve defined by a probability density function always equals 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

normal distribution is a…

A

hypothetical population distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

should you describe a sample as normal?

A

No, it approximates a normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

standard normal distribution

A

Normal distribution with u=0 and o=1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

z score if x is an observation from a normal distribution - z-score of x is

A

z = x-u/o

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Z scores follow what kind of distribution…

A

follow a normal distribution with u=0 and o=1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

sampling distribution

A

We can imagine collecting an infinite number of samples of N = 40 Peabody scores, leading to an infinite number of sample means and standard deviations.

each of these samples came from the same population, then each sample
mean is an estimate of the same population mean, , and each sample standard deviation is an estimate of the same population standard deviation, .

Because of sampling error (not “bias”!), very few, if any, of these mean and standard deviation estimates will exactly equal the true population mean and standard deviation.

creating a frequency distribution table or graph for the collection of sample means obtained from repeatedly collecting different samples of size N = 40 from the same population. This collection of sample means would form the sampling distribution of the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

A sampling distribution is the distribution of a …

A

statistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Sampling distributions are blank blank distributions

A

theoretical population distributions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Central limit theorem

A

Describes the sampling distribution of the mean

also applies to sample regression slope estimates

Central limit theorem - for means calculated from samples drawn from any parent population with the mean and sd, the sampling distribution of the mean will converge to a normal distribution with mean u and sd o/sqrtN - as N approaches infinity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

standard error is what

A

standard error of a statistic is the standard deviation of that statistics sampling distribution

o/sqrtN and is often represented as o xbar

average amount that that a sample mean xbar is expected to be different from the population mean u

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Z score for individual

A

z = x-u/o

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

zscore for a sample mean

A

z = xbar - u/o/sqrtN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

point estimate

A

single value used as an estimate of a population parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what are point estimates influenced by?

A

point estimates are calculated using data from random samples drawn from a much larger population so they are influenced by sampling error

variation of a point estimate from one sample to another represents the extent of sampling error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Sampling error and sample size

A

smaller samples have more sampling error than larger samples

point estimates from small samples, more sampling error

standard error of the mean formula- bigger N gets, smaller standard error gets - less sampling error with larger N

CI from small samples have more sampling error than from larger samples = wider CI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Confidence interval does what?

A

Conveys the degree of sampling error around a point estimate by presenting a range of plausible or reasonable values for the population parameter of interest.

CI is a range of values or an interval that is expected to capture a population parameter of interest with some prespecified level of confidence.

gives the precision of a point estimate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does the Central Limit Theorem tell us about sample means?

A

Sample means can be treated as observations from a normal distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Interpretation of a confidence interval

A

This interval captures u with 95% confidence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Factors affecting the width of a confidence interval that are under the researcher’s direct control:

A

level of confidence

sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Type I error

A

is the rejection of a true null hypothesis. The probability of a Type I Error is alpha (a), given that the correct statistical model has been used to test H0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Type II error

A

is the failed rejection of a false null hypothesis. The probability of a Type II
error is beta ().

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Power

A

Power is the probability of rejecting a false null hypothesis. Power is the complement of the probability of Type II error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is power greater for?

A

larger sample sizes and for larger effect sizes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

statistical model

A

represents the value of a dependent variable (often symbolized with the letter y) as a function of one or more parameters plus an error term.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

General Linear Model

A

and thus all models we examine will express the dependent variable as a linear function of the parameter(s).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

error variance,

A

which represents the extent
that professor salaries differ from the mean salary

In an intercept-only model, the error variance is equivalent to the variance of the dependent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

t distribution is used when

A

using sample estimate of the standard error of the mean

t distribution has higher kurtosis that results from the added uncertainty due to estimating the standard error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

The particular T distribution used depends on what?

A

the degrees of freedom

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

When df = infinity, t distribution =

A

standard normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

t stat formula

A

t = ybar - uo/sybar

uo = population mean value given by the null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

One sample T test report

A

The mean nine-month salary for professors was M = $113,706.46 (SD = 30,289.04), with 95% CI [110,717.90, 116,695.10]. A one-sample t-test confirmed that this mean significantly differs from the U.S. population median salary, t (396) = 41.76, p < .001

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Effect size

A

magnitude of the association

difference between two means

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Assumptions for a one-sample t-test

A
  1. independent observations
  2. sample data come from normal pop distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

general linear model

A

represents the dependent variable as a function of population means

38
Q

Describe confidence interval of a slope estimate

A

The interval from blank to blank captures the population mean difference with 95% confidence

39
Q

CI formula for slope parameter

A

Bhat1 +- tcrit Sbhat1

sbhat1 = standard error of the slope parameter

40
Q

Df in binary independent variable for determining t crit

A

n-2 = two coefficients in the estimated model

41
Q

Standard error estimate of Bhat1

for a binary independent variable

A

Sbhat1 = sqrt(s2pooled/n1 +s2pooled/n2)

42
Q

pooled variance estimate assumes what?

A

Population variance of the dependent variable is equal across the two groups - homogeneity of variance

43
Q

Two sample t test or binary categorical anova null hypothesis

A

H0: u1 = u2

H0: B1 =0

44
Q

Error term formula for predicted errors from linear model

A

ehati = yi - uhat1

one error term for each group

45
Q

error term formula for errors from null model if null is true

A

ehati = yi - ybar

46
Q

what is the purpose of a statistical model?

A

describe or explain individual differences or variation in a dependent variable

47
Q

If a model does a good job of accounting for individual differences, what should the variance of errors be like?

A

variance of the errors should be small relative to the overall variance of the variable

ie. full model has accounted for or explained a portion of the dependent variable variance

48
Q

Proportion reduction in error

A

R2 - represents the proportion of dependent variable variance explained by the model

49
Q

In the context of a single binary independent variable R2 =

A

eta squared

50
Q

ANOVA

A

involves partitioning the total sample variation of the dependent variable into variation explained by the model and error variation -residual variation

51
Q

Relation btw sd and variance

A

sd is the square root of the variance

52
Q

Variance

A

sum of squared deviations from the mean

53
Q

Numerator and denominator of F statistic

A

MS model/MS error

Variability explained by the model/residual variability

54
Q

SS Total

A

Sum of squared deviations of observed values of y from the mean of y

SUM (yi-ybar)^2

55
Q

Model SS for Y

A

SUM (uhati - ybar)^2

56
Q

Model SS for Y is called variability explained by the model because

A

it summarizes the predicted variation due to group membership relative to the overall mean

57
Q

Residual SS for Y

A

the sum of squared residuals across all observations described earlier

SUM (yi-uhati)^2

58
Q

Write out the ANOVA table

A
59
Q

Formula for R^2

A

SS model/SS total

1-(SSresid/SStotal)

60
Q

Range of F stat

A

0 to infinity

61
Q

Distribution of F stat

A

One tailed

Postiviely skewed

0 to infinity

varies by DF

62
Q

Formula for T for the difference between two sample means

A

t = ybar2 - ybar1/sybar2-ybar1

63
Q

When numerator df =1 then F =

A

t^2

64
Q

Independent samples T test report

A

“The mean time reaction time was significantly greater for those with a reading disorder diagnosis (M = 2039.76ms, SD = 1128.36) than the control group (M = 1374.68ms, SD = 625.35), t (36) = 2.28, p = .03. The 95% CI for the mean difference was [72.14, 1258.02].”

65
Q
A
  1. The observations are independent
  2. The dependent variable is normally distributed within each group

Homogeneity of variance: The use of the pooled variance estimate in the formula for the standard error of the regression slope (i.e., standard error of the sample mean difference) is based on the assumption that the sample variances of the two groups are both estimates of a single population variance.

66
Q

Robustness against non-normality and homogeneity of variance violations when

A

sample size large

sample size equal

67
Q

1st dummy variable step

A

J-1 separate binary dummy variables

68
Q

Null hypothesis of one way anova

A

H0: B1=B2=B3=B4=0

69
Q

APA report one way anova

A

“The overall proportion of variance explained by the linear model, R
2 = .45, was significant, F (4, 45) = 9.09, p < .001, indicating that the number of words recalled significantly varied across the five conditions representing different levels of depth of processing.”

70
Q

What does the result of an anova indicate

A

at least one population mean is unlikely to be unequal to the other population means.

71
Q

T formula for each slope coefficient estimate

A

t = Bhat/sbhat

72
Q

When are anova t-tests valid

A

as planned comparisons

if a researcher explicitly planned to compare the mean of the reference with the other categories

73
Q

When to do post hoc

A

When comparisons not planned a priori or you want to compare group means that do not include the reference group

74
Q

APA report for a priori t tests

A

Because the dummy variables in the linear model were defined a priori, the corresponding ttests represent planned comparisons. The rhyming mean (M = 6.90) did not significantly differ from the counting mean (M = 6.90), t (45) = 0.07, p = .94. But the adjective mean (M = 11.00) was significantly different from the counting mean, t (45) = 2.88, p = .006.”
Etc. for the t-tests for the remaining dummy variables.

75
Q

Assumptions for one way ANOVA

A
  1. independent observations
  2. normally distributed errors
  3. homogeneity of variance
76
Q

What happens if one performs multiple significance tests on the same data without proper adjustments?

A

Probability that at least one of the tests produces a type 1 error is greater than .05

77
Q

Formula for type 1 error accumulation

A

1-(1-a)^c

78
Q

Tukeys HSD

A

experiment-wise Type I error rate is maintained at the -level used to
test the omnibus null hypothesis, regardless of whether the pairwise comparisons were planned a
priori.

79
Q

Bonferroni adjustment

A

the experiment-wise alpha level is simply divided by the number of specific hypothesis tests to be performed.

80
Q

Moderation

A

the second independent variable may moderate the effect of the primary
independent variable; for this reason, the second independent variable is often called a moderator

81
Q

What does a population model represent

A

how these two independent variables
combine to explain individual differences in the dependent variable.

82
Q

what does it mean that the main-effects model is likely misspecified

A

meaning that it is an incorrect model in the sense that it cannot adequately account for the major regularities of the
data.

83
Q

interaction effects,

A

allows the effect of a smoking-group dummy variable to be moderated by the effect of a task-type dummy variable.

84
Q

Null for two way anova

A

all interaction terms = 0

85
Q

Questions asked by comparing full model with main-effects model

A

Does smoking group significantly interact with task type? Do the smoking group mean differences significantly vary across the task types? Is the effect of smoking group significantly moderated by task type?

86
Q

MS effect

A

main and full model

Because the two models differ by the inclusion of the interaction terms, the difference between their RSS values (14857 – 13587) gives the overall interaction sum-of-squares term = 1269.5. = MSeffect

87
Q

interaction degrees of freedom

A

(J – 1)*(K – 1)

88
Q

family-wise error rate

A

control the overall probability of at least one Type I error within each level of the moderator variable. There are three levels of the task moderator, thus there are three families, and three pairwise comparisons within each family. Thus, the pvalues are adjusted based on three comparisons. A correction for experiment-wise error rate, on the other hand, would be based on nine comparisons.

89
Q

, simple main effects

A

refers to the separate omnibus effects
of a focal independent variable within different levels of a moderator variable.

e.g. e simple main-effect of smoking group within the driving task

90
Q

How to report on simple main effects

A

simple main-effect of smoking group within the reading task is significant, F

91
Q

Assumptions for t tests and anovas

A
  1. Independent observations
  2. Dependent variable is normally distributed within each cell of the study design
  3. Homogeneity of variance: The variance of the dependent variable is constant across the cells of the study design.