Statistics Flashcards
Cluster sampling
involves selecting units or groups of individuals from the population (e.g., schools, hospitals, clinics.)
exists in contrast to simple random sampling and stratified random sampling (which involve selecting individuals from the population)
Probability Sampling
When using probability sampling, each element in the target population has a known chance of being selected for inclusion in the sample.
Methods of probability sampling include:
- simple random sampling,
- stratified random sampling, and
- cluster sampling.
Non-Parametric Tests
Nonparametric tests are inferential statistical tests used to analyze nominal or ordinal data (or interval or ratio data when the assumptions for a parametric test have not been met). They include:
- chi-square test
- Mann-Whitney U test
- Wilcoxon matched-pairs test
Benefits of Parametric Tests
An advantage of the parametric tests is that they are more “powerful” than the nonparametric tests.
They include the Student’s t-test and the analysis of variance.
Parametric Tests
Parametric tests are inferential statistical tests that are used when the data to be analyzed represent an interval or ratio scale and when certain assumptions about the population distribution(s) have been met - i.e., when scores on the variable of interest are normally distributed and when there is homoscedasticity (population variances are equal).
Normal Curve/Areas Under The Normal Curve
In a normal distribution,
- about 68% of observations fall between the scores that are plus and minus one standard deviation from the mean,
- about 95% between the scores that are plus and minus two standard deviations from the mean, and
- about 99% between the scores that are plus and minus three standard deviations from the mean.
Experimentwise Error Rate
The experimentwise error rate (also known as the familywise error rate) is the probability of making a Type I error (which is rejecting the null hypothesis when its actually true [claiming “effect” when there is no effect”).
As the number of statistical comparisons in a study increases, the experimentwise error rate increases.
Mixed (Split Plot) ANOVA
The mixed ANOVA is a type of factorial ANOVA that is used when a study includes at least one between-groups independent variable and one within-subjects independent variable.
Cross-Validation/Shrinkage
Cross-validation refers to validating a correlation coefficient (e.g., a criterion-related validity coefficient) on a new sample. Because the same chance factors operating in the original sample are not operating in the subsequent sample, the correlation coefficient tends to “shrink” on cross-validation. In terms of the multiple correlation coefficient (R), shrinkage is greatest when the original sample is small and the number of predictors is large.
One-Way ANOVA F Ratio
The one-way ANOVA yields an F-ratio that indicates if any group means are significantly different. The F-ratio represents a measure of treatment effects plus error divided by a measure of error only (MSB/MSW). When the treatment has had an effect, the F-ratio is larger than 1.0.
One-Way ANOVA
The one-way ANOVA is a parametric statistical test used to compare the means of two or more groups when a study includes one IV and one DV that is measured on an interval or ratio scale.
Trend Analysis
Trend analysis is a type of analysis of variance that is used to assess linear and nonlinear trends when the independent variable is quantitative.
Sampling Distribution
How is it Used
The sampling distribution is used in inferential statistics to determine how likely it is to obtain a particular sample mean given the
- population mean
- the population standard deviation
- the sample size
- and the level of significance
Standard Error of the Mean
equal to the population standard deviation divided by the square root of the sample size.
Sampling Distribution
Shape, Equal To,
- The sampling distribution is normally-shaped
- its mean is equal to the population mean,
Sampling Distribution of the Mean
Definition
The sampling distribution of the mean is the distribution of sample means that would be obtained if an infinite number of equal-size samples were randomly selected from the population and the mean for each sample was calculated.
Dependent Variables
The dependent variable (DV) is the variable that is believed to be affected by the independent variable and is observed and measured.
Independent Variables
The independent variable (IV) is the variable that is believed to have an effect on the dependent variable and is varied or manipulated by the researcher in an experimental research study.
Each independent variable in a study must have at least two levels.
Scales Of Measurement
- nominal
- ordinal
- interval
- ratio
A nominal scale yields “frequency data” (the frequency of observations in each nominal category). Ordinal, interval, and ratio scales provide scale values or scores.
negatively skewed distribution
In a negatively skewed distribution, the majority of scores are in the high side of the distribution, but a few are in the low (negative) side and the mode is greater than the median, which is greater than the mean.
positively skewed distribution
In a positively skewed distribution, most scores are in the low side of the distribution but a few scores are in the high (positive) side and the
mean is greater than the median which, in turn, is greater than the mode.
Skewed Distributions
Skewed distributions are asymmetrical distributions in which the majority of scores are located on one side of the distribution.
Random Assignment
Random assignment involves randomly assigning subjects to treatment groups and is sometimes referred to as “randomization.”
It is considered the “hallmark” of true experimental research because it enables an investigator to conclude that any observed effect of an IV on the DV is due to the IV rather than to error.
(Random assignment must not be confused with random selection, which refers to randomly selecting subjects from the population.)
Mode
The mode is the most frequently occurring score or category, and it is used as a measure of central tendency for nominal variables or variables that are being treated as nominal variables.
Median
The median is the middle score in a distribution when scores have been ordered from lowest to highest. It is used with ordinal data (and with interval and ratio data when the distribution is skewed or contains one or a few outliers).
Mean
The mean is the arithmetic average of a set of scores, and it can be used when scores represent an interval or ratio scale.
Measures of Central Tendency
The mean, median, and mode are the most commonly used measures of central tendency.
Size of Rejection Region is defined by
alpha
Retention Region
The retention region is the region of a sampling distribution that contains the values that are likely to be obtained simply as the result of sampling error. When an inferential statistical test indicates that an obtained sample value is in the retention region, the null hypothesis is retained and the alternative hypothesis is rejected.
The retention region is equal to one minus alpha.
Rejection Region
The rejection region of a sampling distribution contains the sample values (e.g., means) that are unlikely to be obtained simply as the result of sampling error. When an inferential statistical test indicates that the obtained sample value falls in the rejection region, the null hypothesis is rejected and the alternative hypothesis is retained.
Statistical Power
Statistical power refers to the probability of rejecting a false null hypothesis.
Power cannot be directly controlled but is increased by having a
- large sample
- maximizing the effects of the IV
- increasing the size of alpha
- reducing error
pretest sensitization
which occurs when pretesting affects how subjects react to the treatment
threat to external validity
reactivity
which occurs when subjects respond differently to a treatment because they know they are participating in a research study
threat to external validity
reducing multiple treatment interference
Counterbalancing can be used to control multiple treatment interference and involves administering different levels of the IV to different groups of subjects in a different order.