hypothesis testing chi squared Flashcards
whparametric test
test to estimate at least one population
parameter from sample statistics
what is the assumption made w/ parametric tests
variable we have measured in the
sample is normally distributed in the population to
which we plan to generalize our findings
what is a non parametric test
a test that is distribution free,
no assumption on the distribution of the variable in the population
what does the choice of a statistical test depends on:(5)
the Level of measurement for the dependent and
independent variables
Number of groups or dependent measures
Number of units of observation
Type of distribution
The population parameter of interest (mean, variance,
differences between means and/or variances)
examples of parametric and non parametric tests

define a normality test
measures a goodness of fit of a normal model to the data
- if the fit is poor=> the data aren’t well modeled in respect
to a normal distribution, without making a judgment on
any underlying variable.
what is a normality test used for
to determine if a data set is modeled by a normal distribution and to calculate the likelihood for a random variable underlying the data set to be normally distributed.
graphical methods of normality tests
comparing the histogram of sample (empirical distro) data to a probability curve
it should resemble a bell curve
list the tests of univariate normality
D’Agostino’s K-squared test
Jarque–Bera test
Anderson–Darling test
Cramér–von Mises criterion
Lilliefors test
Kolmogorov–Smirnov test
Shapiro–Wilk test
what is the Kolmogorov–Smirnov test
nonparametric test of the equality of distributions that can be used to compare a sample with a reference distribution =
(1-sample K–S test)
or
to compare 2 samples (2-sample K–S test)
- quantifies a distance between the empirical distribution betw/ sample and reference// 2 samples
*
NULL HYPOTHESIS FOR K-S TEST
the sample is drawn from thereference distribution
(in the 1-sample case)
or
the samples are drawn from the same distribution
(in the 2-sample case).
K-S for testing normality of distibutions
- samples are standardized and compared with a standard normal distribution.
- equivalent to
setting the mean and variance of the reference
distribution equal to the sample estimates, - using these to define the specific reference
distribution changes the null distribution of the test
statistic.
???
what is the chi-squared test
test is used to check for an association
between 2 categorical variables.
- H0: There is no association between the variables.
- HA: There is an association between the variables
what does it mean if two categorical variables are assoc in chi squared test
the chance that an individual falls into a particular category for one variable depends upon the particular category they fall into for the other variable.
assumptions for the chi squared test
- A large sample of independent observations
- All expected counts should be ≥ 1 (no zeros)
- At least 80% of expected counts should ≥ 5
define the chi square test
a test statistic that measures the difference between the observed the expected counts assuming independence.
- large chi squared rejects null hypothesis because the observed count is diff from the and expected counts
- p value of chi squared os probability that the chi squared statistic is large or larger than the value we obtained if H0 is true.
*
whi is association not causation
observed association between two
variables might be due to the action of
a third, unobserved variable.
limitations of chi squared
- No categories should be less than 1
- No more than 1/5 of the expected categories should be less than 5
- to fix this:
- collect larger samples
- combine your data for the smaller expected categories until their combined value is 5 or more
what is the Yates Correction
- When there is only 1 degree of freedom, regular chi-test should not be used
- Apply the Yates correction by
- subtracting 0.5 from the absolute value of each calculated O-E term,
- then continue as usual with the new corrected values
what is te Fisher’s exact test
- computes the exact probability under the null
hypothesis of obtaining the current distribution of frequencies across cells, or one that is more uneven. - test is only available for 2 x 2 tables.
Mann-whitney test
- observations from both groups are combined and ranked,
- withthe average rank assigned in the case of ties
- If the populations are identical in location, the ranks should be
randomly mixed between the two samples.
null hypothesis of mann whitney = Two sampled populations are equivalent in location (they have the same mean ranks).
Kruskal-Wallis test for ordinal data independent samples
- observations from all groups are combined and ranked,
- the average rank is assigned in the case of ties.
- If the populations are identical in location, the ranks
should be randomly mixed between the K samples.
null hypothesis for kruskal willis = K sampled populations are equivalent in location.
Ordinal data 2 related samples.
Wilcoxon signed rank test
- Two related variables.
- No assumptions about the shape of distributions of the variables.
- Takes into account information about the magnitude of differences within pairs
- gives more weight to pairs that show large differences than to pairs that show small differences.
- Based on the ranks of the absolute values of the
differences between the two variables.
null hypothesis for Wilcoxon signed rank = Two variables have the same distribution.