Statistics Flashcards

1
Q

nominal data

A

involves tallying people to see which non-ordered category each person falls into
e.g. sex, voting preference, ethnicity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

ordinal data

A

involves tallying people to see which ordered category each person falls into
group means cannot be calculated from ordinal data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

interval data

A

involves obtaining numerical scores for each person, where score values have equal intervals
either no zero score (e.g. IQ scores, t-scores) or zero is not absolute (e.g. temperature)
group mean can be calculated from interval data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

ratio data

A

involves obtaining numerical scores for each person, where scores have equal intervals and an absolute zero
e.g. savings in bank, scores on EPPP, number of children, weight
comparisons can be made across score values (e.g. $10 is twice as much as $5)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

measures of central tendency

A

mean, median, mode
best measure of central tendency typically the mean
when data skewed or there are some very extreme scores present, median preferable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

standard deviation

A

measure of average deviation (or spread) from the mean in a given set of scores
square root of the variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

variance

A

standard deviation squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

range

A

crudest measure of variability

difference between highest and lowest value obtained

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

positive skew

A

higher proportion of scores in the lower range of values
mode has lowest value, mean has highest value
(bump on left)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

negative skew

A

higher proportion of scores in the higher range of values
mean has lowest value, mode has highest value
(bump on right)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

kurtosis

A

how peaked a distribution is
leptokurtotic distribution - very sharp peak
platykurtotic - flattened

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

norm-referenced score

A

provides information on how the person scored relative to the group
e.g. percentile rank

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

criterion-reference or domain-referenced score

A

e.g. percentage correct

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

standard scores

A

based on the standard deviation of the sample

e.g z-scores, t-scores, IQ scores, SAT scores, EPPP scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

z-scores

A

mean of zero, SD of one
shape of z-score distribution always identical to shape of the raw score distribution
useful because correspond directly to percentile ranks (ONLY IF distribution is normal) and easy to calculate from raw score data
transforming raw scores into z-scores does not normalize distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

z-score formula

A

z=(score-mean)/(SD)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

standard error of the mean

A

if researcher were to tape many, many samples of equal size and plot the mean IQ scores of these samples, researcher would get normal distribution of means
any spread or deviation in these means is error
average amount of deviation = standard error of the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

standard error of the mean formula

A

SD(population) / SQRT (N)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

central limit theorem

A

assuming an infinite number of equal sized samples are drawn from the population, and the means of these samples are plotted, a normally distributed of the means will result
tells researcher how likely it is that particular mean will be obtained just y chance - can calculate whether the obtained mean is most likely due to treatment or experimental effects or to chance (sampling error, random error)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

rejection region

A

aka rejection of unlikely values
size of rejection region corresponds to alpha level e.g. when alpha is .05, rejection region is 5% of curve
when obtained values fall in rejection region, null hypothesis rejected, researcher concludes treatment did have an effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Type I error

A

mistakenly rejecting null (differences found when they don’t exist)
corresponds to alpha

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Type II error

A

mistakenly accepting null (differences not found, but they do exist)
corresponds to beta

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

power

A

defined as ability to correctly reject the null
increased when sample size is large, magnitude of intervention is large, random error is small, statistical test is parametric, test is one-tailed
power = 1-beta
as alpha increases, so does power

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

non-parametric tests

A

e.g. Chi-square, Mann-Whitney, Wilcoxin

if DV is nominal or ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
parametric tests
e.g. t-test, ANOVA | if DV is interval or ratio
26
assumptions of parametric tests
homoscedasticity - there should be similar variability or SD in the different groups data are normally distributed
27
Kolmogorowv-Smirnov test
same qualifications as independent samples or single sample t-test, except it's a non-parametric test 1 IV, 1 DV 1 or 2 independent groups
28
Wilcoxon (sign rank)
same qualifications as matched t-test, except it's a non-parametric test 1 IV, 1 DV 2 correlated groups
29
Krusall Wallis
same qualifications as 1-way ANOVA, except it's a non-parametric test 1 IV, 1 DV >2 independent groups
30
Friedman test
same qualifications as 1-way repeated measures ANOVA, except it's non-parametric test 1IV, 2 DV >2 correlated groups
31
single sample chi-square test | description and degrees of freedom
nominal data collected for one independent variable e.g. 100 psychologists sampled as to voting preference df = #columns - 1 (in example, 3-1=2 df)
32
multiple sample chi-square
nominal data collected for two IVs e,g. 100 psychologists sampled for voting preference and ethnicity df = (#rows - 1)(#columns-1) in example (3-1)(5-1) = 2X4 = 8
33
t-test for simple sample
interval or ratio data collected for one group of subjects | df=N-1
34
t-tests for matched or correlated samples
interval or ratio data collected for two correlated groups of subjects df = #pairs - 1
35
t-tests for independent samples
interval or ratio data collected for two independent groups of subjects df = N-2
36
one-way ANOVAs: dfs
df total = N-1 df between groups = #groups-1 df within groups = dftotal - dfbetween
37
One-Way ANOVA: | F ratio
MSbetween/MSwithin When F ratio equals or approximately 1, no significance As F ratio gets above 2.0, typically considered to be significant
38
One-Way ANOVA: mean squares
MS between = SS between/df between | MS within = SS between/df within
39
Post Hoc tests
Scheffe followed by Tukey, provide most protection from Type I error (most conservative) Fisher's LSD provides least protection from Type I error Duncan, Dunette, Neuman-Kuels provide mid-range protection REVERSE true for Type II error
40
assumptions of bivariate correlations
linear relationship homoscedasticity - similar spread of scores across the entire scatter plot unrestricted range
41
Spearman's Rho or Kendall's Tau Correlation
ordinal (rank ordered) X | ordinal (rank ordered) Y
42
Pearson's r Correlation
interval or ratio X | interval or ratio Y
43
Point-Biserial Correlation
interval or ratio X | true dichotomy Y
44
Biserial Correlation
interval or ratio X | artificial dichotomy Y
45
Phi Correlation
true dichotomy X | true dichotomy Y
46
Tetrachoric Correlation
artificial dichotomy X | artificial dichotomy Y
47
Eta correlation
curvilinear relationship between X and Y
48
zero-order correlation
most basic correlation analyzes relationship between X and Y when it is believed that there are no extraneous variables affecting the relationship
49
partial correlation (first order correlation)
examines the relationship between X and Y with the effect of a third variable removed e.g. if it is believed that parent education (third variable) affects both SAT an GPA, this variable could be measured and its effect removed from the correlation of SAT and GPA
50
part (semipartial) correlation
examines relationship between Z and Y with the influence of a third variable removed from only one of the original variables
51
coefficient of multiple determination
R squared index of the amount of variability in the criterion Y that is accounted for by the combination of all the predictors (Xs)
52
multiple R
correlation between 2 or more IVs (Xs) and one DV (Y) where Y is always interval or ratio data at at least one X is interval or ratio data
53
multicollinearity
problem that occurs in multiple regression when predictors are highly correlated with one another and essentially redundant
54
canonical R
extension on multiple R correlation between two or more IVs (X) and two or more DVs (Y) e.g. examining relationship between time spent studying for EPPP (X1) and number f practice tests completed (X2) with score obtained on exam (Y1) and amount of subjected distress experienced while taking the exam (Y2)
55
discriminant function analysis
special case of multiple regression used when there are two or more Xs and one Y however, used when Y is nominal (Categorial)
56
loglinear anlysis
aka logit analysis used to predict categorical Y based on categorical Xs e.. if type of graduate school and sex were used to predict likelihood of passing or failing the EPPP
57
path analysis
applies multiple regression techniques to testing a model that specifies causal links among variables
58
structural equation modeling
enables researchers to make inferences about causation | e.g. LISREL ( Linear Structure Relations)
59
factor analysis
operates by extracting as many significant factors from data as possible
60
eigenvalues
factor analysis indicates strength of factor <1.0 usually not considered significant aka characteristic root
61
factor loadings
correlation between a variable (e.g. item or subtest) and underlying factor interpreted if equal or exceed +/- .30
62
orthogonal rotation
type of factor rotation axes remain perpendicular (90 degrees) always results in factors that have no correlation with one another generally preferred because easier to interpret communalities must be calculated
63
communalities
calculated in orthogonal rotation refers to how much of a test's variability is explained by combination of all the factors factor loadings all squared and added together
64
oblique rotation
type of factor rotation angle between axes is non-perpendicular and factors are correlated some argue that oblique rotations are preferable to orthogonal rotations because factors tend to be correlated in the real world
65
principal components analysis
type of factor analysis when one is trying to extract factors and there is no empirical or theoretical guidance on the values of the communalities always results in a few unrelated factors, called components factors empirically derived, researcher has no prior hypotheses first factor (component) accounts for largest amount of variability, each additional component explaining somewhat less
66
(principle) factor analysis
type of factor analysis | communality values would need to be ascertained before analysis
67
Normal curve
See pic