Final Exam Flashcards

1
Q

null hypothesis (H0)

A

statement that is the skeptical viewpoint of your research question
- no difference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

4 steps of a hypothesis tests

A
  1. define null and alternative hypothesis
  2. establish null distribution
  3. conduct statistical test
  4. draw scientific conculsions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

null distribution

A

sampling distribution we expect from sampling a statistical population when the null hypothesis is trues

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

alternative hypothesis (HA)

A

statement that is the positive viewpoint viewpoint of your research question
- everything not in null (mutually exclusive)
- there is a difference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

3 factors to the hypotheses

A
  1. mutually exclusive
  2. they describe all possible outcomes = exhaustive
  3. null always includes the equality statement
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

non-directional hypothesis

A

state that there should be a difference in alternative hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

directional hypothesis

A

state that the difference should be in a specific direction (smaller vs. larger)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

statistical inference

A

conclusion that a set of data are unlikely to come from the null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

statistical decision

A

whether we believe our data came from the null distribution or not
- if its likely data came from null distribution = “fail to reject”
- if it is unlikely data came from null distribution = “reject null”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

2 probabilities for null distribution

A
  1. type 1 error rate
  2. p-value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

type 1 error rate (alpha)

A

probability of rejecting the null hypothesis when it is true
- set by researcher without any inference to data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

p-value (p)

A

probability of seeing your data, or something more extreme, under the null hypothesis
- are under curve from data to more extreme values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

rules of making statistical decision

A
  1. if p-value is less than type 1 error rate, then we “reject null hypothesis”
  2. if p-value is greater than or equal to type 1 error rate, then we “fail to reject null hypothesis”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what the scientific conclusions consider

A
  1. strength of inference: how strong evidence is
  2. effect size: only consider it when we reject null hypothesis (small = low impact)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

error rates

A

probability of making a mistakes
- type I and II have an inverse relationship (when one increases the other decreases)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

type II error rates

A

probability of failing to reject null hypothesis when it is false
- area under alternative distribution from data point to something more extreme

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

types of t-tests

A
  1. single-sample t-tests
  2. paired-sample t-tests
  3. two-sample t-tests
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

single-sample t-tests

A

evaluate whether mean of your sample is different from some reference value

ex. is mean test score from a sample of high school students different than national standards

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

paired-sample t-tests

A

evaluate whether mean of paired data is different from some reference value
- looks at changes in a SU

ex. does tutoring improve grade for a student

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

two-sample t-tests

A

evaluate whether mean of two groups are difference from each other (compare two groups)

ex. do dogs sleep more than cats

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

mean

A

= m

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

reference value

A

= mew - u
- it is given

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

the reporting of a single-sample t-test should include…

A
  1. sample mean and standard deviation
  2. observed t-score
  3. degrees of freedom
  4. p-value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

observed t-score

A

calculated using sample mean, standard deviation, size and reference value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
reference value for paired t-tests
typically 0
26
null and alternative hypotheses for paired t-tests
the statements about how difference between the paired measurements is related to reference value
27
scientific conclusions for paired t-tests
1. if we reject null = the sample data provide strong evidence that the difference between the paired measurements is different from reference value 2. if we fail to reject null = the sample data do not provide strong evidence that the difference between paired measurements is different from reference value
28
the reporting of a paired t-test should include...
1. mean difference between paired measurements, and standard deviation of the differences 2. observed t-score 3. degrees of freedom 4. p-value
29
sample means for two-sample t-test
m1=sample mean of first group m2=sample mean of second group - can change
30
scientific conclusions for two-sample t-tests
1. if we reject the null = the sample data provide strong evidence that the means of the two groups are different 2. if we fail to reject the null = the sample data do not provide strong evidence that the means of the two groups are different
31
the reporting of a two-samples t-test should include...
1. mean, satndard deviation, and sample size for each group 2. observed t-score 3. degrees of freedom 4. p-value
32
expected contingency table
expected frequencies under null hypothesis
33
1-way contingency table
one categorical variable - is there a difference in counts among the levels of that variable? - all counts are distributed equally
34
key features of 1-way contingency table
1. ECT is always given as counts 2. sum of all expected counts must be same as sum of all counts in observed contingency table 3. ECT has fractional values
35
2-way expected contingency table
two categorical variables - looking for an interaction between the variables - counts are distributed independently among cells ex. is age independent of year
36
calculating 1 way
calculate marginal distributions as proportions - row and column sums/table total
37
calculating 2 way
product of row and column proportions for each cell x table total
38
chi-squared distributions
measure of the distance between the observed and expected contingency tables
39
steps to calculating chi-squared distributions
1. take difference between each observed and expected cell 2. square the difference 3. divide by the expected value 4. sum over all cells in the table
40
chi-squared distribution
when you sample an imaginary statistical population where the null hypothesis is true, you would get the distribution of chi-squared scores
41
key features of chi-squared distribution
1. area under curve sums to one 2. degrees of freedom determines shape of distribution - different for 1 and 2 way tables 3. only positive values
42
chi-squared test
used with only categorical data and contains the variation we expect from sampling error rate - always directional
43
statistical decision of chi-squared test
1. reject the null if observed score is greater than critical score or if p-value is less than type 1 error rate (a) 2. fail to reject the null is observed score is less than or equal to critical score or if p-value is greater than or equal to a
44
what side do p-value and type 1 error always go on in chi-squared tests
the right side
45
scientific conclusions for 1-way tables
1. reject null and conclude theres evidence to support that the counts are not equal among cels 2. fail to reject null conclude that there is not evidence to support that the counts are not equal among cells
46
scientific conclusions for 2-way tables
1. reject null and conclude there is evidence to support that the variables are not independent of each other 2. fail to reject null and conclude there is no evidence to support that the variables are not independent of each other
47
the reporting of a chi-squared test should include...
1. short name of the test (X2) 2. degrees of freedom 3. total count in the observed table yes 4. observed chi-squared value 5. p-value
48
factors of a correlation test
1. no implied causation between variables (one variable doesn't cause another) 2. both variables are assumed to have variation 3. is not used for prediction
49
pearsons correlation coefficient
measures the strength of association between two numerical variables r=measuring from sample p=population parameter - about stats pop
50
correlation coefficients
p=-1 indicates a perfect negative correlation p=0 indicates no association p=1 indicates a perfect positive correlation
51
assumptions behind a correlation tests
1. each pair of numerical values is measured on same sampling unit 2. numerical values come from continuous numerical distributions with non-0 variation 3. association = straight lines
52
null and alternative hypothesis for correlation tests
H0: correlation coefficient is 0 HA: correlation coefficient is not 0 directional = if there is a positive or negative association
53
null distribution for correlation tests
sampling distribution of correlation coefficients from a statistical population with no association between variables (p=0)
54
statistical decision for correlation tests
same as chi-squared tests
55
scientific conclusions for correlation tests
different depending on direction - no direction = just based on association - directional = based on positive or negative association
56
the reporting of a correlation tests should include...
1. symbol for tests (r) 2. degrees of freedom 3. observed correlation value 4. p-value
57
linear regression
used to evaluate whether changes in one numerical variable can predict changes in a second numerical variable
58
linear regressions in experimental studies
prediction reflects a causal relationship between the variables - predictor variable is independent variable, and response is dependent variable
59
linear regressions in observational studies
choice of predictor variable depends on research question
60
two parameters of linear regressions
1. slope (b) 2. intercept (a)
61
slope (b)
amount that the response variable (y) increases or decreases for every unit change in the predictor variable (x) - +ve values = increasing relationship - 0 = no relationship - -ve values = decreasing relationship
62
intercept (a)
value of response v. (y) when predictor v. (x) is at 0 (x=0)
63
3 components of the statistical model for linear regressions
1. systemic component 2. random component 3. link function
64
systemic component of statistical model
describes mathematical function used for predictions (linear equation)
65
random component of statistical model
describes probability distribution for sampling error (normal distribution for response variable)
66
link function of statistical model
connects the systemic component to the random component
67
fitting the statistical model
estimate the intercept and slope that best explain the data - done by minimizing the residual variance
68
residual variance (ri)
difference between observed data (Yi) point and predicted value (yi) - sum of squares ri=Yi-yi)
69
steps to calculating residual variance
1. calculate residual for each data point 2. take square of each residual 3. sum the squared residuals across all data points 4. divide by degrees of freedom (df=n-2)
70
intercept hypothesis
used to answer questions at x=0 - how does intercept (a) relate to a reference value (Ba)
71
slope hypothesis
used to answer questions about how much y changes for a unit change in x - how does slope (b) relate to reference value (Bb)
72
linear regression test
same as chi-squared and correlation tests - compare observed an critical and a and t
73
scientific conclusions for intercept hypothesis
1. reject the null and conclude there is evidence that the predicted response v. is different from the reference (Bb) at x=0 2. fail to reject the null and conclude there is no evidence that the predictor response v. is different from reference (Bb) at x=0
74
scientific conclusions for slope hypothesis
1. reject the null and conclude there is evidence that changes in predictor v. can be used to predict changes in the response v. 2. fail to reject the null and conclude there is no evidence that changes in the predictor v. can be used to predict changes in response v.
75
the reporting of linear regression test should include...
1. symbol for parameter being tested ( a or b) 2. observed parameter value 3. observed t-score 4. degrees of freedom 5. p-value
76
4 main assumptions for linear regressions
1. linearity 2. independence 3. normality 4. homoscendasticity
77
linearity assumption of linear regressions
response v. should be well described by a linear combination of predictor v. - relationship is assumed to be straight line - violations of linearity look like a frowny face
78
independence assumption of linear regressions
residuals along the predictor v. should be independent of each other
79
normality assumption of linear regressions
residual variation should be normally distributed - evaluated by looking at histograms - if assumptions of normality is met, the histogram will look similar to reference normal distribution - if not met, histogram will have fatter or skinnier tails
80
homoscendasticity assumption of linear regressions
residual v. should have same variance across the range of predictor v.
81
violations of independence assumption of linear agression
can occur when there is repeated sampling of SU of when there is a spatial or temporal relationship among SU
82
violations of normality assumption on linear regression
can occur if... 1. if stat pop has a skewed or unusual distribution 2. if your data violate the assumption of linearity
83
shapiro-wilks test
evaluates the null hypothesis that the residuals are normally distributed H0: residuals are normally distributed HA: residuals are not normally distributed
84
heteroscendasicity
if the residuals have little variation along some parts of predictor v. and large amounts at others
85
analysis of variance
used to compare variance between two groups
86
f-tests
evaluates difference in variance between two groups - done using ratio of variance
87
ratio of variance (f-score)
asks whether ratio is different from one - ratio is equal if both groups have tje same variance
88
null and alternative hypothesis for f-score
evaluate whether f-score is different from 1 H0: ratio of variances is 1 HA: ratio of variances is not 1
89
null distribution for f-score
sampling distribution from repeatedly sampling a stats pop where the variance was the same in both groups - ratio of variances will never be negative
90
degrees of freedom for f-distribution
dfA=nA-1 for one group (group A) dfB=nB-1 for another group (group B)
91
statistical decision of f-tests
same as the others - comparing observed and critical values or p and a
92
reporting of a f-test should include...
1. mean, standard deviation, sample sizes for each group 2. observed F-score 3. degrees of freedom for each group 4. p-value 5. yes
93
example of f-test
do different models of electric vehicles (categorical) differ in their operating costs (numerical)
94
single factor ANOVA test
used when there are more than two levels in a categorical variable
95
two sources of variation for ANOVA tests
1. group variation 2. residual variation
96
group variation
variation among means of categorical levels - if means are same among groups, variation =0 - if means are different, variation = high - mean sum of the squares
97
residual variation
variation among sampling units within a categorical level - mean squared error
98
statistical model of anova
compares group variation to residual variation - if group variation is same as residual, the means are not different - if group is larger than residual, groups are different F-TESTS
99
how do u calculate differences in mean
F-score = group variation divided by residual variation - increased group variation = increased f-score
100
statement of means
H0=the means are same across all levels of categorical variable HA=the means are different somewhere (different between at least two groups)
101
statement of F-tests
evaluate whether group variation is larger than residual variation - null and alternative hypothesis are directional
102
null distribution of anova test
sampling distribution from repeatedly sampling a stats pop where the means are the same across all levels of categorical variable - f-distribution dfG=k-1 dfE=n-k
103
statistical decision for anova
same as before - compares critical f-score to observed f-score or p to a
104
scientific conclusions for anova
1. reject null and conclude there is evidence that at least 2 of the means are different 2. fail to reject null and conclude there is no evidence that at least 2 of the means are different
105
reporting of an ANOVA should include...
1. mean, standard deviation, and sample size for each group 2. observed f-score 3. degrees of freedom for group and residual variation (dfG and dfE) 4. p-value 5. yes
106
ANOVA post-hoc tests
secondary test used to identify what group means are different in an ANOVA - only used if anova rejects null hypothesis
107
contrast statement of post-hoc tests
evaluate whether mean of any two groups is different - multiple contrast statements
108
family of contrasts
a set of contrast statements
109
family-wise error rate
type I error rate for entire family of contrasts
110
TukeyHSD
type of post-hoc test = honest significant difference - evaluates all possible combinations of categorical levels and compares means - compares family-wise error rates
111
two-factor anova test
considers two categorical factors and focuses on the interaction between them - also evaluates the effect of two categorical factors on a numerical variable
112
3 questions two-factor anova tests answer
1. main effects A 2. main effects B 3. interactions
113
main effects A
differences among the levels of factor A averaging across the levels of factor B
114
main effects B
differences among the levels of factor B averaging across the levels of factor A
115
interactions question
differences among levels of one factor within each level of other factor (cell-by-cell comparison)
116
interaction
deviation from additivity - levels dont add up as expected
117
additivity
when the effects of the levels are their sample sums (adding) - use interaction plots to visualize
118
interaction plot
- if the categorical v. are additive (no interaction = the lines will be parallel - if the categorical v. are not additive (interaction) = the lines are not parallel
119
antagonist interaction
lines cross
120
synergist interaction
lines do not cross but are not parallel
121
statement of means for main effect A
H0=means are same across all levels of factor A HA=means are different somewhere in at least 2 groups of factor A
122
statement of means for main effect B
H0=means are same across all levels of factor B HA=means are different somewhere in at least 2 groups of factor B
123
statement of means for interaction
H0=deviation of each cell relative to additivity is 0 HA=has at least one cell having non-0 deviation from additivity
124
null distribution for main effects A
means are the same across all levels of factor A
125
null distribution for main effects B
means are the same across all levels of factor B
126
null distribution for interaction
the cell means are additive
127
4 sources of data variation for f-tests for two-factor ANOVA
1. group variation factor A 2. group variation factor B 3. AB interaction 4. residual variation
128
group variation factor A
variation among the means of the levels for factor A - total group variation/degrees of freedom for a
129
group variation factor B
variation among the means of the levels for factor B - total group variation/degrees of freedom for b
130
AB interaction
amount of variation attributable to deviation from additivity - total variation of cell deviations from additivity/degrees of freedom ab
131
residual variation
variation among SU within a cell - total residual variation/degrees of freedom (ab(n-1))
132
f-tests for main effects A
factor A group variation relative to residual variation - MsA/MsE
133
f-tests for main effects B
factor B group variation relative to residual variation - MsB/MsE
134
f-tests for interactions
amount of variation attributable to deviations from additivity relative to residual variation - MsAB/MsE
135
statistical decision for two-way anova tests
same as before - compare observed f-score to critical f-score or p and a
136
scientific conclusions for two-way anova tests (interaction)
1. reject null if there is evidence that at least one cell deviates from additivity 2. fail to reject null if no evidence that at least one cell deviates from additivity
137
scientific conclusions for two-way anova tests (main effects)
1. reject null if there is evidence that means of at least two levels are different in the factor 2. fail to reject null if there is no evidence that means among levels are different in the factor - only evaluate main effects if conclusion is to reject the null