Final Exam Flashcards

(58 cards)

1
Q

Descriptive statistics

A

Statistical tools to organize and summarize data
- information about a collection of observations (their central tendency)
- information about the variability in a set of observations
- information about the shape of a distribution of observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Inferential statistics

A

Statistical tools to generalize beyond collections (samples) of actual observations in order to make predictions and test hypotheses about the general population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Population

A

Any complete collection of observations or potential observations (ENTIRE group of interest)
- population characteristics are called parameters
- μ, σ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Real population

A

All potential observations are available at the time of sampling
- ex. anxiety scores of current participants in a meditation program

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Hypothetical population

A

One in which not all potential observations are available at the time of sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Sample

A

Any smaller collection of actual observations drawn from a population
- sample characteristics are called statistics
- x̅, s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Level of measurement

A

Specifies the extent to which a number, word, letter, etc. represents something in the world

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Nominal

A
  • Words, letters, or numerical codes
  • Observations are sorted into categories, no order
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Ordinal

A
  • Values have an inherent, logical order
  • No equal intervals
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Interval

A

The distance between consecutive points on the scale is the same all the way along the scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Ratio

A

Amounts or counts of quantitative data that reflect differences in degree based on equal intervals and a true zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Qualitative data

A

Consists of words, letters, or numerical codes that represent a class or category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Quantitative data

A

Consists of numbers that represent an amount or a count

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why is data type important?

A

We use different statistical tests depending on the type of data we have collected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Frequency distribution

A

A collection of observations produced by sorting observations into classes and showing their frequency (f) of occurrence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Ungrouped frequency distribution

A
  • Frequencies are tallied for each and every value
  • Each class has a single value
  • Only use these for data sets that have ≤ 20 values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Grouped frequency distribution

A
  • Observations are sorted into classes of multiple values
  • Use for data sets with > 20 values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Relative frequency

A

Shows the frequency of each class as part of a fraction of the total frequency for the entire distribution
- frequency per class/total

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Cumulative frequency

A

Shows the total number of observations and all lower-ranking classes
- add up from the bottom

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Cumulative relative frequency

A

Shows the cumulative frequency of each class as a proportion of the total
- divide the cumulative frequency by the total

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Percentile rank

A

Percentage of scores in the entire distribution with similar or smaller values than that score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Measures of central tendency

A

Means, medians, and modes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Mean

A

The average
- sum of all scores/number of scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Median

A

The middle value when observations are ordered from smallest to largest (or vice versa)

25
Mode
The most frequent score in a distribution
26
Variability
The degree by which scores are spread out across a distribution - range - variance - standard deviation
27
Range
Highest value – lowest value
28
Variance
A measure of how data points differ from the mean
29
Standard deviation
A measure of how dispersed the data is in relation to the mean - σ = sum of squares/N
30
Sum of squares
A statistical measure of deviation from the mean - population: SS = Σ(x - μ)^2 - sample: SS = Σ(x - x̅)^2
31
Negatively skewed distribution
The majority of observations are at the high end of the distribution, with few negative scores - ex. retirement ages, scores on an easy test
32
Positively skewed distribution
Most scores are at the low end of the distribution, with few high scores - ex. U.S. incomes, scores on a very difficult test
33
The normal distribution
- Most of the area under the curve falls in the middle - No skew, a bell curve - Symmetrical - Mean = median = mode - Half of scores fall on either side of the mean - Total area under the curve = 1.00 or 100% - X-axis is in units measure in experience (lbs, inches, mph) - ex. IQ, height, weight
34
Standard normal distribution
- X-axis is in standard deviation units (x-axis can be turned into Z-scores) - Mean is always 0 - Standard deviation is always 1
35
Z-score
A unit-free, standardized score that indicates how many standard deviations a score is above or below the mean - can be positive or negative (unlike standard deviations; scores above the mean are positive, scores below the mean are negative) - population: z = (x - μ)/σ - sample: z = (x - x̅)/s
36
Table A / Z Table
Provides z-scores and their associated areas under the curve
37
How to use table A/the Z table
- Sketch the problem, know what you're looking for, and plan the solution - Calculate the necessary z-scores - Find the appropriate areas under the standard normal curve in table A
38
Correlation
The relationship between variables, and how paired values of two variables change together (ex. height and weight, years of education and annual income, medication and anxiety) - described as positive or negative, strong, moderate, or weak
39
Positive correlations
As one variable increases, the other increases (as one decreases, the other also decreases)
40
Negative correlations
- As one variable increases, the other decreases - As one variable decreases, the other increases
41
Scatterplot
Graphs showing individual data points plotted as combinations of two variables - useful for determining the direction of a relationship (negative or positive) - useful for determining the strength of a relationship (strong, moderate, weak)
42
Pearson's r
Describes the strength of correlation and direction of the relationship - r = (Σ ZxZy)/(n-1) - ranges from -1 to +1 - direction indicated by sign (+ or -) - strength indicated by value (0 = no relationship, ±1 = perfect relationship) - 0 < |r| < .3 = weak - .3 < |r| < .7 = moderate correlation - |r| > .7 = strong correlation - correlation coefficient
43
Coefficient of determination (r^2)
The percentage of variance in one variable explained/predicted by the relationship between two variables - ex. r^2 = (.94)^2 = .88 - 88% of the variation in psych GRE score is explained by the relationship between grades on a cognition final and psych GRE scores - 1 - r^2 = (1 - .88) = .12 tells me that 12% of the variation in psych GRE scores is NOT explained by the relationship between grades on a cognition final and psych GRE scores
44
Linear regression
Plots a straight line through a cluster of dots on a scatterplot, and uses that line to predict the value of one variable from the value of another
45
Least squares regression line
Best fitting line for a set of data that minimizes the sum of the standard deviations from each data point to the line (minimizes the average distance to the line) - Y’ = bx + a - Y’ = predicted value - x = value for which we are predicting y - b = slope of regression line = r(sqrt((SSy)/(SSx))) - a = y-intercept of the regression line = ȳ - bx̄
46
Standard error of the estimate
The estimation of the accuracy of any predictions - Sx|y = sqrt((Σ (y - y’)^2)/(n-2))
47
Independent variable
A variable (or treatment) manipulated by the investigator in an experiment
48
Dependent variable
The variable believed to be influenced (changed) by the IV
49
Sampling distribution of the mean
Refers to the probability distribution of means for all possible random samples of a given size from some population - mean = same as population mean - shape will approximate a normal curve if sample size is sufficiently large (central limit theorem)
50
Standard error of the mean
The sampling distribution's standard deviation - σx̅ = σ / √n - measures variability in the sampling distribution - extent to which sample means vary around their mean
51
Null hypothesis
A statistical hypothesis that nothing special is going on in the sample with respect to a specific characteristic of the underlying difference; the hypothesis of no difference
52
Alternative hypothesis
Opposite of null; states that the sample is special or different from the population
53
Significance level
Indicates how rare a sample mean must be to reject the null hypothesis - α (alpha)
54
Type I error (α)
Rejecting a null hypothesis when it is in fact true
55
Type II Error (β)
The likelihood of incorrectly retaining the null hypothesis, failing to reject a null hypothesis when it is in fact false
56
Confidence interval
A range of values that with a known degree of certainty, includes an unknown population characteristic - x̅ ± (Zconf)(σx̅) - Zconf is the critical z value used in the decision rule - a 95% CI is a range of values that in the long run would contain the parameter of interest 95% of the time
57
Cohen's d
Tells you about the observed mean difference in terms of SD units - (mean 1 – mean 2)/standard deviation - .2 = small - .5 = medium - .8 large
58
T-test
Used when we don't know the standard deviation - t = (x̄ - μx̄)/Sx̄ - Sx̄ = estimated standard error = s/sqrt(n) - x̄ = sample mean - μx̄ = hypothesized population mean