KNPE 251 2/2 Flashcards
Alternative Hypothesis
a statement, or position that is the positive viewpoint of your research question
directionality
whether there is direction in the hypothesis
null hypothesis
a statement or position, that is the skeptical viewpoint of your research question
Alpha (type I error)
probability of rejecting null when it is true
p-value
is the probability of seeing your data or something more extreme under the null hypothesis
statistical significance
is the conclusion that a set of data are unlikely to come from the null hypothesis
false negative
tests are negative when it should be positive
false positive
test results are positive when they should be negative
Type II error
probability of failingto rejective the null hypothesis when it is false
Single-sample T test
a statistical test that compares a sample mean of a numerical variable to a reference value. The null distribution is a t- distribution
Paired measurements
are two measurements taken from the same sampling unit
paired-sample t test
is a statistical test that compares the difference between paired measurements of a numerical variable to a reference value. The null distribution is a t- distribution
two-sample t-test
is a statistical test that evaluates whether the mean of a numerical variable for one group is different from the mean of another group. the null distribution in a t-distribution
expected contingency table
table of expected frequencies under the null hypothesis
independence In contingency tables
refers to the cells in the table having equal relative proportions across the levels of each variable independently
interaction in contingency tables
refers to the cells in the table not having equal relative proportions across levels of each variable
chi-squared distribution
the distribution of chi-squared scores expected from repeatedly sampling a statistical population where the null hypothesis is true. It is the null distribution for hypothesis testing with categorical data
chi-squared test
is a hypothesis test used with categorical data
chi-squared score
is the measure of the distance between two contingency tables. If the contingency tables are an observed and expected table, then measures the distance between sample data and the null hypothesis
X^2
the measure of difference between two contingency tables
association
is a pattern whereby one variable increases (or decreases) with a change in another variable. There is no implied causation between the variables
Bivariate normal distribution
is a normal distribution for two numerical variables that can be used to describe a statistical population where there is an association between the variables. Often used to describe the set up for correlation tests
correlation coefficient
the statistical test used to evaluate a sample correlation coefficient against a null hypothesis
correlation test
is a measure of association between two numerical variables.
p=-1 (correlation coefficient)
perfect negative association
p=0 (correlation coefficient)
no association
p=1 (correlation coefficient)
perfect positive association
Pearsons correlation coefficient
is the statistical test used to evaluate a sample correlation coefficient against a null hypothesis
r
is the statistical test used to evaluate a sample correlation coefficient against a null hypothesis
dependant variable
response variable
independent variable
predictor variable
intercept
value of the response variable when the predictor variable is zero
Linear regression
statistical test used to evaluate whether changes in one numerical variable can predict changes in another numerical variable
link function
one of three parts of linear regression; connects the systematic component to the random component
predictor variable
numerical variable used to predict the response variable
random component
one three parts to linear regression; describes the probability distribution for sampling error
residual
the difference between the observed data point and the predicted value
residual variance
average squared residual value across all data points
response variable
numerical variable predicted by the predictor variable
slope
the parameter in a linear regression that describes the amount that the response variable increases (or decreases) for every unit change in the predictor variable
statistical model
a mathematical model that incorporates both the relationship among variables and how the data are generated
sum of squares
another name for the residual variance of linear regression
systematic component
one of three parts of a linear regression; describes the mathematical relationship that connects the predictor variable and the response variable
heteroscedasticity
term used to describe residual patterns that are not homoscedastic
homoscedasticity
an assumption of a linear regression stating that the residuals have equal variance across the predictor variable
independence (assumption)
an assumption of linear regression stating that the residuals sequentially independent from each other
Linearity (assumption)
an assumption of linear regression stating that the response variable is a linear function of the predictor variable
Normality (assumption)
an assumption of linear regression stating that the residuals are not normally distributed
Shapiro-Wilks test
statistical test to quantitatively evaluate the assumption that the residuals are normally distributed
F
used in the F-test to quantify the ratio of two variances
F-score
used in the F-test to quantify the ratio of two variances
F-test
statistical hypothesis test used to evaluate whether the variances of two groups are different
analysis of variance (ANOVA)
common name given to statistical tests based off the F-distribution
group variation
the variation between the group means and the overall grand mean
residual variation
the variation between the sampling units and the group means
contrast statement
A test of the difference in means between two groups in an ANOVA
Family of contrasts
the set of all contrast statements used for a set of data
family-wise error rate
the type I error rate for the family of contrasts
post hoc tests
secondary tests uses to evaluate what groups have different means in an ANOVA
Turkey HSD test
A type of post hoc test that evaluates all possible contrast statements
Additivity
when the response to the combination of two levels is simply the sum of the two
cell
the group of sampling units that corresponds to the joint level of two categorical variables
interaction
when the response to the combination of two levels is not the simple sum of the two
interaction plot
a specialized plot that highlights the interaction pattern between two categorical variables
Main effects
another name for a categorical variable in a two-factor ANOVA
two-factor analysis of variance
two-factor ANOVA statistical test used to evaluate the change in a numerical across two categorical variables
Sources of Variation
are the partitioning of the data set by factor means, interactions and residuals that form the basis of F-tests
If the p value is greater than or equal to alpha we…
fail to reject null hypothesis
if p value is less than alpha we…
reject null hypothesis
tradeoff between error rates type I and II
when one increases the other decreases, they are inversely related