Statistics Review Flashcards
Descriptive statistics
A set of statistics used to organize and summarize the properties of a set of data.
Data matrix
A grid presenting collected data.
Frequency distribution
A table that gives a visual picture of the observations on a particular variable.
Frequency histogram
A data visualization technique showing how many of the cases in a batch of data scored each possible value, or range of values, on the variable.
Dot plot
A data visualization technique in which every data point for a given variable is represented.
Central tendency
A measure of what value the individual scores tend to center on.
Mode
The value of the most common score; the score that was received by more members of the group than any other.
Bimodal
A distribution with more than one mode or score.
Multimodal
Having more than two modes or scores.
Median
The value at the middlemost score of a distribution of scores; the score that divides a frequency distribution into halves.
Most appropriate measure of central tendency when a set of data has outliers.
Mean
An arithmetic average; a measure of central tendency computed from the sum of all scores in a set of data, divided by the total number of scores.
Most common measure of central tendency (but not always the most appropriate way to measure central tendency).
Variance
A computation that quantifies how spread out the scores of a sample are around their mean; it is the square of the standard deviation.
Standard deviation
A computation that measures how far, on average, each score is in a data set from the mean.
Box plot
A data visualization technique that depicts a sample’s median, interquartile range (25th and 75th percentiles), and outliers. AKA box and whiskers plot.
Outlier
A score that stands out as either much higher or much lower than most of the other scores in a sample.
z score
A computation that describes how far an individual score is above or below the mean, in standard deviation units. Also called standardized score.
Z score formula
z = (X - M)/SD
Variance formula (compute sample SD)
SD^2 = Σ(X - M)^2/N
Standard deviation formula
SD = square root of (SD)^2
Variance formula (estimate pop SD)
SD^2 = Σ(X - M)^2/(N - 1)
r formula
r = Σ(z[x]z[y])/(N - 1)
Cohen’s d
A measure of effect size indicating how far apart two group means are, in standard deviation units. Tells us how much overlap there is between the two sets of scores.
Cohen’s d formula
d = (M1 - M2)/SD[pooled]
Inferential statistics
A set of techniques that use data from a sample to estimate what is happening in the population.
Point estimate
A single estimate based on our sample data of the true value in the population. May be a percentage, difference between means, or relationship between two variables.
Confidence interval (CI)
A given range indicated by a lower and upper value that is designed to capture the population value for some point estimate (e.g., percentage, difference, or correlation); a high proportion of CIs will capture the true population value.
Population Estimation
- Our research question is about the whole population.
- The quality of the sample data matters.
- The population value is unknown.
- Larger samples give more certain estimates.
- To get a better estimate, we should do more than one poll.
The steps of estimation and precision
- State a research question using terms such as “how much” or “to what extent”.
- Design a study: operationalize the question in terms of variables that are either manipulated or measured.
- Collect the data and compute the point estimate and confidence interval.
- Interpret the results in the context of your research question.
- If possible, conduct the study again and meta-analyze the results.
Margin of error of a percent estimate formula
SD * sqare root of (1/N) * 1.96
Standard error
The typical, or average, error researchers make when estimating a population value.
The constant associated with 95% confidence
1.96. This is a Z score from a normal distribution.
Dependent samples design
A design in which each person has two scores because they were tested under two conditions, and we are interested in the difference between them. Also called a paired design.
Null hypothesis significance testing (NHST)
An inferential statistical technique in which a result is compared to a hypothetical solution in which there is no relationship or no difference.
p value
In NHST, the probability of getting the result in a sample or one more extreme, by chance, if there is no relationship or difference in the population.
The NHST procedure
- Assume that there is no effect (null hypothesis)
- Collect data and calculate your result
- Calculate the probability of getting a result of that magnitude, or one even more extreme, if the null hypothesis is true
- Decide whether to reject or retain the null hypothesis
Statistically significant
In NHST, the conclusion assigned when p < .05; that is, when it is unlikely the result came from the null-hypothesis population. There is a less than 5% chance that the null hypothesis is true.
Not statistically significant
In NHST< the conclusion assigned when p > .05; that is, when it is likely the result came from the null-hypothesis population.
Alpha level
The value, determined in advance, at which researchers decide whether the p value obtained from a sample statistic is low enough to reject the null hypothesis or too high, and thus retain the null hypothesis.