Assumptions Flashcards
What are the assumptions of multiple regression?
Each subject is randomly sampled from the statistical population or at least the sample is representative of the population
Linearity - Increasing X variable be one unit (increases or decreases) the Y by the same amount at all values of X
No interaction among predictor variables - Increasing a X variable by one unit changes (increases or decreases) Y by a certain amount, regardless of the values of the other X variables
Independent observations - Knowing Y for any particular subject provides no information about Y in other subjects
Normal distribution - Distribution of the residuals must be Normal at least approximately
Homoscedasticity - The SD of Y values are always the same, regardless of the values of the X variables
What are the assumptions for testing hypotheses using linear regression?
Each subject (or XY pair) was randomly sampled from the population, or at least they are representative of the entire population
Each subject (or XY pair) was selected independently. Picking one subject from the population should not influence the chance of picking anyone else
The relationship between X and Y must be linear in the population
The equation defines a line that extends infinitely in both directions. No matter how high or low a value of X you propose, the equation can estimate (or predict) a Y-value
For each value of independent variable (X), the distribution of the values of the dependent variable (Y) must be normal
The variance of the distribution of the dependent variable (Y) must be the same at all values of X. In other words, E ha the same variance everywhere regardless of the value of X (=homoscedasticity)
What are the assumptions of the correlation coefficient?
Subjects are randomly selected from, or at least representative of the larger population
Paired samples (or bivariate data). Each subject (or experimental unit) must have both X and Y values
Independent observations. Sampling one member of the population should not influence your chances of sampling any other subject
X and Y values must be measured independently
X values were measured and not controlled
Normal distribution. The X and Y values must each be sampled from populations that follow a Normal distribution at least approximately
All covariation must be linear. The correlation coefficient only applies to linear relationships, and hence would not be meaningful if Y increase X increases up to a certain put then Y decreases as X increases
What are the assumptions for contingency GOF tests?
Samples were randomly collected, or at least representative of the populations
The data must form a contingency table (values must be actual numbers not %s)
The sample size must not be too small (if df = 1, do not use the chi-squared if total # of subjects is s test if paired)
What are the assumptions for a 1-way GOF test?
subjects are randomly selected from the population, or at least representative of the population
The sample data consist of frequency counts for the k different categories (ie. not converted to proportions or %)
Each observation must be independent
Probabilities remain constant and independent during the experiment
The sample size must not be too small
GENERAL RULE: multinomial 1-way GOF test - the sample size should be large enough that no expected frequency (E) is <5
What are the assumptions of a 2-way ANOVA test?
The variable has a normal distribution in all the populations
The variable has the same variance in all the populations
Subjects are randomly selected or at least representative of the population
Samples are obtained independently
Subjects within the samples are obtained independently
What are assumptions of 1-way ANOVA?
- The variable has a normal distribution in all the populations
- The variable has the same variance in all the populations (homoscedasticity)
- Subjects in each sample are randomly selected from, or at least representative of, their larger statistical population
- The samples were obtained independently (ie. they are not paired)
- The subjects within each sample were obtained independently
- The different samples are from populations that are categorized in only one way
What are the assumptions of Paired t-Tests?
Subjects in the paired samples must be randomly selected from, or at least representative of, the larger (statistical) population
Samples must be paired or matched, based on the experimental design (decided before the data are collected)
Each pair must be selected independently of the others
The distribution of the differences between the two populations must approximate a Normal distribution
What are the assumptions of an unpaired t-Test?
Subjects in the samples are randomly selected, or at least representative of, the larger statistical population
The 2 samples were obtained independently
Subjects within each sample were obtained independently
Data are sampled from populations for which the variable approximates a normal distribution
For the pooled t-test only: The standard deviations (SD) of the 2 populations must be identical
What are the assumptions for Wilcoxon Signed-Ranks Test?
Pairs were randomly selected from, or at least representative of, the larger populations
Samples must be paired or matched based on experimental design before the data were collected
Each pair must be selected independently of the others
The population of differences (found from the pairs of data) has a distribution that is approximately symmetric
What are the assumptions of the Mann-Whitney U Test?
Subjects are randomly selected from, or at least representative of, the larger populations
The two samples were obtained independently
Subjects within each sample were obtained independently
The populations do not have to follow any particular distribution, but the 2 distributions must have the same shape
What assumptions must be true to interpret the CI of a population mean?
- The subjects in the sample are randomly selected from the population, or at least are representative of the population.
- The variable is distributed in the statistical population in a Normal manner, at least approximately.
- All subjects are from the same (single) population.
- Each subject has been selected independently of the others.