Final Exam Flashcards

1
Q

null hypothesis (H0)

A

statement that is the skeptical viewpoint of your research question
- no difference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

4 steps of a hypothesis tests

A
  1. define null and alternative hypothesis
  2. establish null distribution
  3. conduct statistical test
  4. draw scientific conculsions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

null distribution

A

sampling distribution we expect from sampling a statistical population when the null hypothesis is trues

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

alternative hypothesis (HA)

A

statement that is the positive viewpoint viewpoint of your research question
- everything not in null (mutually exclusive)
- there is a difference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

3 factors to the hypotheses

A
  1. mutually exclusive
  2. they describe all possible outcomes = exhaustive
  3. null always includes the equality statement
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

non-directional hypothesis

A

state that there should be a difference in alternative hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

directional hypothesis

A

state that the difference should be in a specific direction (smaller vs. larger)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

statistical inference

A

conclusion that a set of data are unlikely to come from the null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

statistical decision

A

whether we believe our data came from the null distribution or not
- if its likely data came from null distribution = “fail to reject”
- if it is unlikely data came from null distribution = “reject null”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

2 probabilities for null distribution

A
  1. type 1 error rate
  2. p-value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

type 1 error rate (alpha)

A

probability of rejecting the null hypothesis when it is true
- set by researcher without any inference to data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

p-value (p)

A

probability of seeing your data, or something more extreme, under the null hypothesis
- are under curve from data to more extreme values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

rules of making statistical decision

A
  1. if p-value is less than type 1 error rate, then we “reject null hypothesis”
  2. if p-value is greater than or equal to type 1 error rate, then we “fail to reject null hypothesis”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what the scientific conclusions consider

A
  1. strength of inference: how strong evidence is
  2. effect size: only consider it when we reject null hypothesis (small = low impact)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

error rates

A

probability of making a mistakes
- type I and II have an inverse relationship (when one increases the other decreases)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

type II error rates

A

probability of failing to reject null hypothesis when it is false
- area under alternative distribution from data point to something more extreme

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

types of t-tests

A
  1. single-sample t-tests
  2. paired-sample t-tests
  3. two-sample t-tests
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

single-sample t-tests

A

evaluate whether mean of your sample is different from some reference value

ex. is mean test score from a sample of high school students different than national standards

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

paired-sample t-tests

A

evaluate whether mean of paired data is different from some reference value
- looks at changes in a SU

ex. does tutoring improve grade for a student

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

two-sample t-tests

A

evaluate whether mean of two groups are difference from each other (compare two groups)

ex. do dogs sleep more than cats

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

mean

A

= m

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

reference value

A

= mew - u
- it is given

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

the reporting of a single-sample t-test should include…

A
  1. sample mean and standard deviation
  2. observed t-score
  3. degrees of freedom
  4. p-value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

observed t-score

A

calculated using sample mean, standard deviation, size and reference value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

reference value for paired t-tests

A

typically 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

null and alternative hypotheses for paired t-tests

A

the statements about how difference between the paired measurements is related to reference value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

scientific conclusions for paired t-tests

A
  1. if we reject null = the sample data provide strong evidence that the difference between the paired measurements is different from reference value
  2. if we fail to reject null = the sample data do not provide strong evidence that the difference between paired measurements is different from reference value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

the reporting of a paired t-test should include…

A
  1. mean difference between paired measurements, and standard deviation of the differences
  2. observed t-score
  3. degrees of freedom
  4. p-value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

sample means for two-sample t-test

A

m1=sample mean of first group
m2=sample mean of second group
- can change

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

scientific conclusions for two-sample t-tests

A
  1. if we reject the null = the sample data provide strong evidence that the means of the two groups are different
  2. if we fail to reject the null = the sample data do not provide strong evidence that the means of the two groups are different
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

the reporting of a two-samples t-test should include…

A
  1. mean, satndard deviation, and sample size for each group
  2. observed t-score
  3. degrees of freedom
  4. p-value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

expected contingency table

A

expected frequencies under null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

1-way contingency table

A

one categorical variable
- is there a difference in counts among the levels of that variable?
- all counts are distributed equally

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

key features of 1-way contingency table

A
  1. ECT is always given as counts
  2. sum of all expected counts must be same as sum of all counts in observed contingency table
  3. ECT has fractional values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

2-way expected contingency table

A

two categorical variables
- looking for an interaction between the variables
- counts are distributed independently among cells

ex. is age independent of year

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

calculating 1 way

A

calculate marginal distributions as proportions
- row and column sums/table total

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

calculating 2 way

A

product of row and column proportions for each cell x table total

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

chi-squared distributions

A

measure of the distance between the observed and expected contingency tables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

steps to calculating chi-squared distributions

A
  1. take difference between each observed and expected cell
  2. square the difference
  3. divide by the expected value
  4. sum over all cells in the table
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

chi-squared distribution

A

when you sample an imaginary statistical population where the null hypothesis is true, you would get the distribution of chi-squared scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

key features of chi-squared distribution

A
  1. area under curve sums to one
  2. degrees of freedom determines shape of distribution - different for 1 and 2 way tables
  3. only positive values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

chi-squared test

A

used with only categorical data and contains the variation we expect from sampling error rate
- always directional

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

statistical decision of chi-squared test

A
  1. reject the null if observed score is greater than critical score or if p-value is less than type 1 error rate (a)
  2. fail to reject the null is observed score is less than or equal to critical score or if p-value is greater than or equal to a
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

what side do p-value and type 1 error always go on in chi-squared tests

A

the right side

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

scientific conclusions for 1-way tables

A
  1. reject null and conclude theres evidence to support that the counts are not equal among cels
  2. fail to reject null conclude that there is not evidence to support that the counts are not equal among cells
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

scientific conclusions for 2-way tables

A
  1. reject null and conclude there is evidence to support that the variables are not independent of each other
  2. fail to reject null and conclude there is no evidence to support that the variables are not independent of each other
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

the reporting of a chi-squared test should include…

A
  1. short name of the test (X2)
  2. degrees of freedom
  3. total count in the observed table yes
  4. observed chi-squared value
  5. p-value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

factors of a correlation test

A
  1. no implied causation between variables (one variable doesn’t cause another)
  2. both variables are assumed to have variation
  3. is not used for prediction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

pearsons correlation coefficient

A

measures the strength of association between two numerical variables
r=measuring from sample
p=population parameter - about stats pop

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

correlation coefficients

A

p=-1 indicates a perfect negative correlation
p=0 indicates no association
p=1 indicates a perfect positive correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

assumptions behind a correlation tests

A
  1. each pair of numerical values is measured on same sampling unit
  2. numerical values come from continuous numerical distributions with non-0 variation
  3. association = straight lines
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

null and alternative hypothesis for correlation tests

A

H0: correlation coefficient is 0
HA: correlation coefficient is not 0
directional = if there is a positive or negative association

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

null distribution for correlation tests

A

sampling distribution of correlation coefficients from a statistical population with no association between variables (p=0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

statistical decision for correlation tests

A

same as chi-squared tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

scientific conclusions for correlation tests

A

different depending on direction
- no direction = just based on association
- directional = based on positive or negative association

56
Q

the reporting of a correlation tests should include…

A
  1. symbol for tests (r)
  2. degrees of freedom
  3. observed correlation value
  4. p-value
57
Q

linear regression

A

used to evaluate whether changes in one numerical variable can predict changes in a second numerical variable

58
Q

linear regressions in experimental studies

A

prediction reflects a causal relationship between the variables
- predictor variable is independent variable, and response is dependent variable

59
Q

linear regressions in observational studies

A

choice of predictor variable depends on research question

60
Q

two parameters of linear regressions

A
  1. slope (b)
  2. intercept (a)
61
Q

slope (b)

A

amount that the response variable (y) increases or decreases for every unit change in the predictor variable (x)
- +ve values = increasing relationship
- 0 = no relationship
- -ve values = decreasing relationship

62
Q

intercept (a)

A

value of response v. (y) when predictor v. (x) is at 0 (x=0)

63
Q

3 components of the statistical model for linear regressions

A
  1. systemic component
  2. random component
  3. link function
64
Q

systemic component of statistical model

A

describes mathematical function used for predictions (linear equation)

65
Q

random component of statistical model

A

describes probability distribution for sampling error (normal distribution for response variable)

66
Q

link function of statistical model

A

connects the systemic component to the random component

67
Q

fitting the statistical model

A

estimate the intercept and slope that best explain the data
- done by minimizing the residual variance

68
Q

residual variance (ri)

A

difference between observed data (Yi) point and predicted value (yi)
- sum of squares

ri=Yi-yi)

69
Q

steps to calculating residual variance

A
  1. calculate residual for each data point
  2. take square of each residual
  3. sum the squared residuals across all data points
  4. divide by degrees of freedom (df=n-2)
70
Q

intercept hypothesis

A

used to answer questions at x=0
- how does intercept (a) relate to a reference value (Ba)

71
Q

slope hypothesis

A

used to answer questions about how much y changes for a unit change in x
- how does slope (b) relate to reference value (Bb)

72
Q

linear regression test

A

same as chi-squared and correlation tests
- compare observed an critical and a and t

73
Q

scientific conclusions for intercept hypothesis

A
  1. reject the null and conclude there is evidence that the predicted response v. is different from the reference (Bb) at x=0
  2. fail to reject the null and conclude there is no evidence that the predictor response v. is different from reference (Bb) at x=0
74
Q

scientific conclusions for slope hypothesis

A
  1. reject the null and conclude there is evidence that changes in predictor v. can be used to predict changes in the response v.
  2. fail to reject the null and conclude there is no evidence that changes in the predictor v. can be used to predict changes in response v.
75
Q

the reporting of linear regression test should include…

A
  1. symbol for parameter being tested ( a or b)
  2. observed parameter value
  3. observed t-score
  4. degrees of freedom
  5. p-value
76
Q

4 main assumptions for linear regressions

A
  1. linearity
  2. independence
  3. normality
  4. homoscendasticity
77
Q

linearity assumption of linear regressions

A

response v. should be well described by a linear combination of predictor v.
- relationship is assumed to be straight line
- violations of linearity look like a frowny face

78
Q

independence assumption of linear regressions

A

residuals along the predictor v. should be independent of each other

79
Q

normality assumption of linear regressions

A

residual variation should be normally distributed
- evaluated by looking at histograms
- if assumptions of normality is met, the histogram will look similar to reference normal distribution
- if not met, histogram will have fatter or skinnier tails

80
Q

homoscendasticity assumption of linear regressions

A

residual v. should have same variance across the range of predictor v.

81
Q

violations of independence assumption of linear agression

A

can occur when there is repeated sampling of SU of when there is a spatial or temporal relationship among SU

82
Q

violations of normality assumption on linear regression

A

can occur if…
1. if stat pop has a skewed or unusual distribution
2. if your data violate the assumption of linearity

83
Q

shapiro-wilks test

A

evaluates the null hypothesis that the residuals are normally distributed
H0: residuals are normally distributed
HA: residuals are not normally distributed

84
Q

heteroscendasicity

A

if the residuals have little variation along some parts of predictor v. and large amounts at others

85
Q

analysis of variance

A

used to compare variance between two groups

86
Q

f-tests

A

evaluates difference in variance between two groups
- done using ratio of variance

87
Q

ratio of variance (f-score)

A

asks whether ratio is different from one
- ratio is equal if both groups have tje same variance

88
Q

null and alternative hypothesis for f-score

A

evaluate whether f-score is different from 1
H0: ratio of variances is 1
HA: ratio of variances is not 1

89
Q

null distribution for f-score

A

sampling distribution from repeatedly sampling a stats pop where the variance was the same in both groups
- ratio of variances will never be negative

90
Q

degrees of freedom for f-distribution

A

dfA=nA-1 for one group (group A)
dfB=nB-1 for another group (group B)

91
Q

statistical decision of f-tests

A

same as the others
- comparing observed and critical values or p and a

92
Q

reporting of a f-test should include…

A
  1. mean, standard deviation, sample sizes for each group
  2. observed F-score
  3. degrees of freedom for each group
  4. p-value
  5. yes
93
Q

example of f-test

A

do different models of electric vehicles (categorical) differ in their operating costs (numerical)

94
Q

single factor ANOVA test

A

used when there are more than two levels in a categorical variable

95
Q

two sources of variation for ANOVA tests

A
  1. group variation
  2. residual variation
96
Q

group variation

A

variation among means of categorical levels
- if means are same among groups, variation =0
- if means are different, variation = high
- mean sum of the squares

97
Q

residual variation

A

variation among sampling units within a categorical level
- mean squared error

98
Q

statistical model of anova

A

compares group variation to residual variation
- if group variation is same as residual, the means are not different
- if group is larger than residual, groups are different

F-TESTS

99
Q

how do u calculate differences in mean

A

F-score = group variation divided by residual variation
- increased group variation = increased f-score

100
Q

statement of means

A

H0=the means are same across all levels of categorical variable
HA=the means are different somewhere (different between at least two groups)

101
Q

statement of F-tests

A

evaluate whether group variation is larger than residual variation
- null and alternative hypothesis are directional

102
Q

null distribution of anova test

A

sampling distribution from repeatedly sampling a stats pop where the means are the same across all levels of categorical variable
- f-distribution
dfG=k-1
dfE=n-k

103
Q

statistical decision for anova

A

same as before
- compares critical f-score to observed f-score or p to a

104
Q

scientific conclusions for anova

A
  1. reject null and conclude there is evidence that at least 2 of the means are different
  2. fail to reject null and conclude there is no evidence that at least 2 of the means are different
105
Q

reporting of an ANOVA should include…

A
  1. mean, standard deviation, and sample size for each group
  2. observed f-score
  3. degrees of freedom for group and residual variation (dfG and dfE)
  4. p-value
  5. yes
106
Q

ANOVA post-hoc tests

A

secondary test used to identify what group means are different in an ANOVA
- only used if anova rejects null hypothesis

107
Q

contrast statement of post-hoc tests

A

evaluate whether mean of any two groups is different
- multiple contrast statements

108
Q

family of contrasts

A

a set of contrast statements

109
Q

family-wise error rate

A

type I error rate for entire family of contrasts

110
Q

TukeyHSD

A

type of post-hoc test = honest significant difference
- evaluates all possible combinations of categorical levels and compares means
- compares family-wise error rates

111
Q

two-factor anova test

A

considers two categorical factors and focuses on the interaction between them
- also evaluates the effect of two categorical factors on a numerical variable

112
Q

3 questions two-factor anova tests answer

A
  1. main effects A
  2. main effects B
  3. interactions
113
Q

main effects A

A

differences among the levels of factor A averaging across the levels of factor B

114
Q

main effects B

A

differences among the levels of factor B averaging across the levels of factor A

115
Q

interactions question

A

differences among levels of one factor within each level of other factor (cell-by-cell comparison)

116
Q

interaction

A

deviation from additivity - levels dont add up as expected

117
Q

additivity

A

when the effects of the levels are their sample sums (adding)
- use interaction plots to visualize

118
Q

interaction plot

A
  • if the categorical v. are additive (no interaction = the lines will be parallel
  • if the categorical v. are not additive (interaction) = the lines are not parallel
119
Q

antagonist interaction

A

lines cross

120
Q

synergist interaction

A

lines do not cross but are not parallel

121
Q

statement of means for main effect A

A

H0=means are same across all levels of factor A
HA=means are different somewhere in at least 2 groups of factor A

122
Q

statement of means for main effect B

A

H0=means are same across all levels of factor B
HA=means are different somewhere in at least 2 groups of factor B

123
Q

statement of means for interaction

A

H0=deviation of each cell relative to additivity is 0
HA=has at least one cell having non-0 deviation from additivity

124
Q

null distribution for main effects A

A

means are the same across all levels of factor A

125
Q

null distribution for main effects B

A

means are the same across all levels of factor B

126
Q

null distribution for interaction

A

the cell means are additive

127
Q

4 sources of data variation for f-tests for two-factor ANOVA

A
  1. group variation factor A
  2. group variation factor B
  3. AB interaction
  4. residual variation
128
Q

group variation factor A

A

variation among the means of the levels for factor A
- total group variation/degrees of freedom for a

129
Q

group variation factor B

A

variation among the means of the levels for factor B
- total group variation/degrees of freedom for b

130
Q

AB interaction

A

amount of variation attributable to deviation from additivity
- total variation of cell deviations from additivity/degrees of freedom ab

131
Q

residual variation

A

variation among SU within a cell
- total residual variation/degrees of freedom (ab(n-1))

132
Q

f-tests for main effects A

A

factor A group variation relative to residual variation
- MsA/MsE

133
Q

f-tests for main effects B

A

factor B group variation relative to residual variation
- MsB/MsE

134
Q

f-tests for interactions

A

amount of variation attributable to deviations from additivity relative to residual variation
- MsAB/MsE

135
Q

statistical decision for two-way anova tests

A

same as before
- compare observed f-score to critical f-score or p and a

136
Q

scientific conclusions for two-way anova tests (interaction)

A
  1. reject null if there is evidence that at least one cell deviates from additivity
  2. fail to reject null if no evidence that at least one cell deviates from additivity
137
Q

scientific conclusions for two-way anova tests (main effects)

A
  1. reject null if there is evidence that means of at least two levels are different in the factor
  2. fail to reject null if there is no evidence that means among levels are different in the factor
    - only evaluate main effects if conclusion is to reject the null