Statistics Flashcards
What is the difference between inferential and descriptive stats?
Inferential stats help us generalize (make inferences) about a wider population from a smaller sample. They are based on the assumption that the population sampled has normal distributed characteristics and the sample is randomly selected (probability sample).
Descriptive stats organizes and summarizes data using numbers and graphs. It uses data summary (bar graphs, pie charts, etc), measures of central tendency (mean, median, mode), and measures of variability (range, variance, and standard deviation)
What is the difference between statistic versus parameter?
Parameter: whole population
Statistics: describing sample
What are univariate, bivariate, and multivariate analysis?
Uni-variate: analysis of one variable (diagrams)
Bi-variate: analysis of two variables, explores relationship between them and searches for correlations
Multi-variate: three or more variables, the relationship might be spurious, or an intervening variable
What are the types of multivariate analysis?
Common factor analysis and principle component analysis
What is a bell curve, platy, lepto, and skew?
Bell curve/normal curve: what usually occurs; the mean, median, and mode end up in the middle at about the same place, scores cluster at the middle
Platykurtic: flat relative to the normal curve
Leptokurtic: tall relative to the normal curve
Skew: where the tail is distribution is, negative skew is when the tail is in the low numbers (down to the left), positive skew is when the tail is in the high numbers (down to the right)
What is the central limit theorem?
The bigger the sample the better. In many situations, it Will eventually lead to a normal curve.
What is the normal curve? What Percentages are important?
68.3% of its values occur plus or minus one SD from the mean
95% of its values occurs plus or minus two SD from the mean
99.7% of its values occur plus or minus three SD from the mean
What is there to note about missing data/anomalous data?
If there is data that is completely missing, don’t use it. If there is partial missing data, its the researchers judgement call.
Anomalous data is data that appears “out of line,” discrepancies indicate measurement problems
What is the mean, median, and mode?
Mean: sum all values in distribution then divide by total number of values
Median: Middle point with entire range of values
Mode: Most frequently occurring value
What are the measurements of dispersion?
Standard Deviation: average amount of variation around the mean
Maximum/minimum
Range: highest value minus lowest value
Interquartile range: calculated after highest and lowest 25% are removed
Variance: a measure of the extent to which values in data set vary
What is a z score?
A z score is the number of units of standard deviation any one value is above or below the mean
What are the differences between ANOVA vs T test and Correlation?
T Test (compare means)
Correlation (levels of data needed)
ANOVA (analysis of variance), compares three or more groups and compares variance among groups
What are F, R, and T values?
T value: ratio of the difference between the mean of the two sample sets and the variation that exists between them
R value: correlation coefficient that measures the strength of the relationship between two variables
F value: the ratio of two variances
What are P values?
Probability level
Calculated by converting your statistic into a z score
Used to determine the likelihood that an observed outcome is the result of chance
What is the Null hypothesis?
Testing offers probability, not certainty. Must decide to accept probability