Statistics Flashcards

1
Q

population

A

set of all individuals of interest in a study population = parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

parameter

A

numerical value that describes a population can be a single measurement or set of measurements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

sample

A

set of individuals selected from a population, representative of population in a study sample = statistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

statistics

A

numerical value that describes a sample can be a single measurement or set of measurements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

descriptive statistic

A

statistical procedures that are used to summarize, organize, simplify data - make raw score meaningful e.g. mean, median, mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

inferential statistics

A

techniques that allow us to study samples then make generalizations about the population - infer sample -> population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

sampling error

A

discrepancy/ amount of error that exists between a sample statistic and population parameter - important to consider in inferential statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

construct

A

internal attributes/ characteristics that cannot be directly observed but are useful for describing and explaining behavior - hypothetical e.g happiness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

operational definition

A

defines construct in terms of observable behaviors e.g. intelligence defines as performance on IQ test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

nominal scale

A

categorical organization - can only measure qualitative difference e.g gender, country of origin, hair color

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

ordinal scale

A

categories organized in a certain sequence, differences are quantitative - amount between one person and next is not consistent e.g. class rank, rating scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

interval scale

A

ordered categories that are intervals of exactly same size with an arbitrary zero point - 0 does not mean the absence of the construct being measured e.g. celsius scale, temp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

ratio scale

A

interval scale with absolute zero point - can describe differences between categories in terms of ratios (one thing is 3 times larger than another) e.g. weight, height, speed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

discrete variables

A

separate, indivisible categories - whole numbers or specific categories - no decimals e.g 3 goals scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

continuous variables

A

infinite number of possible values that fall between any two observed values - divisible into infinite number of fractional parts e.g. height

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

real limits

A

boundaries of intervals for scores that are represented on a continuous number line - each score has two limits, half way between scores (upper real limit, lower real limit) e.g. if you have observed value of 8, actually represents range from 7.5 - 8.5 (kind of like rounding)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

correlational method

A

two variables observed to see if there is a relationship between the two

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

experimental method

A

establishes cause and effect relationship between variables - must manipulate one variable, observe second - controlled research situation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

non-experimental method

A

variable determines group (those that have depression) - don’t manipulate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

independent variable

A

manipulated variable - 2+ treatment conditions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

dependent variable

A

observed for changes to assess effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

control

A

does not receive manipulated experimental treatment, baseline for comparison

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

quasi-independent variable

A

groups not created by manipulating independent variable - participent variable (male/female) - time variable (before/after)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

summation notation

A

a way to represent scores n ∑ xi i = 1 i = the starting point of the scores n = the stopping point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

µ

A

population mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

x

A

sample mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

σ

A

population standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

s

A

sample standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

σ2

A

population variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

s2

A

sample variance

SS/n (df w/ sample)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

P

A

population portion that have particular attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

p

A

sample proportion that have particular attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

ρ

A

population correlation coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

r

A

sample correlation coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

N

A

population number of elements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

n

A

sample number of elements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

H0

A

null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

H1

A

alternative hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

α

A

alpha probability of a type 1 error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

B

A

beta probability of a type 2 error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

type 1 error

A

incorrect rejection of a null hypothesis

false positive

thinking there is an effect when there isnt

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

type 2 error

A

incorrectly retaining a false null

fals negative

thinking there isnt an effect when there is one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

frequency distribution

A

organized tabulation of the number of individual scores located in each category on the scale of measurement - takes disorganized scores and placed them in order from highest to lowest - see entire set of scores at glance - categories based odd measurement scale - can be graph or table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

grouped frequency distribution

A

when the data covers a wide range of values and it is unrealistic to list individual scores - rule 1: ~10 class intervals - rule 2: relatively simple width (2, 5, 10) - rule 3: interval starts with a score that is multiple of the width - rule 4: all intervals should be the same width

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

bar graph

A

uses horizontal or vertical bars to show comparisons among categories - nominal/ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

ogive

A

curve of the cumulative frequency distribution or cumulative related frequency distribution - express simple frequency as percentage of total frequency - cumulate and plot these percentages (e.g. lowest scores makes up 5%, next score makes up 6% but the cumulative frequency is 11% so that is what is plotted for score 2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

polygon

A

a line drawn to join all the midpoints of the top bars of a histogram - like an ogive, but does not use cumulative frequencies or smooth lines - to convert to ogive, add up percentages before each bar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

histogram

A

an area diagram -> bars portray frequencies of possible values of a variable - continuous variables (this is why the bars touch) - set of rectangles along the intervals between class boundaries - areas proportional to the frequencies in corresponding classes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

population distributions

A

cant find absolute frequency but can find relative frequencies e.g. don’t know how many fish encompass the population in a lake -> don’t know how many trout or salmon, after research can say that there are twice as many trout as salmon

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

percentile

A

score point below which a specified % of the scores in a distribution fall

  • compute the percent * N
  • round this figure so that it ends in .0 or .5 whichever is closer
  • if rounded value ends in .5 the desired centile is the next higher value, if ending in .0 split the difference with the next higher score
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

percentile rank

A

precent of cases which are below a specific point in the distribution

  • write down exact limits of the interval which contain the score whose rank is to be obtained
  • interpolate between the cumulative percents to dind desired CR

exact limit/ cum %

Y/A

X/B

Z/C

X-Z/Y-Z = B-C/A-C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

central tendency

A

descriptive statistical measure to determine a single score that defines the center of a distribution

goal: find one score that is most representative of the group

most common method of summarizing/describing distribution

53
Q

mean

A

average; sum of scored divided by number of scores

appropriate when… no extreme outliers, no nominal scales

∑X/N

54
Q

median

A

the score that divides the distribution of scores exactly in half

appropriate when… there are extreme outliers, no nominal scales, skewed distribution

N/2

55
Q

mode

A

score or category that has the greatest frequency

appropriate when… you want answer to be correct as often as possible, nominal scales, discrete variables (hair color frequency)

56
Q

how is the mean affected when adding/removing a new score?

A

will change mean, unless score is the same as the mean

57
Q

how is the mean affected when adding/subtracting a constant to every score?

A

same constant is added/subtracted to the mean

e.g. 1,2,3 M = 2; now add 2 to each score: 3,4,5 M = 4

58
Q

how is the mean affected when scores are multiplied/divided by a constant?

A

mean changes in the same way

e.g. 1, 2, 3 M = 2; now multiple all scores by 2: 2, 4, 6 M = 4

59
Q

central tendency and its relation to symmetrical and skewed distributions

A

when choosing which measure is most valuable…

normal dist: all equal

skewed dist: median

negatively skewed: mean < median < mode

positively skewed: mode < median < mean

60
Q

variability

A

quantitative measure of the degree to which scores in a distribution are spread out or clustered together

no variability: no difference between scores

small variability: small difference

large variability: large difference

61
Q

range

A

the distance between the largest score and the smallest score

must compute in terms of real limits

problem: solely determined by two extreme outliers of distribution
calculate: substract lowest number from highest number

62
Q

inter-quartile range

A

ignores any extreme outlier scores -> measures the range covered by the middle 50% of the distribution

separates scores into 4 equal parts with “cuts” either between or on certain scores

interquartile range is distance between Q1 and Q3 (top 25% to lowest 25%)

calculate: order from least to greatest, find median/middle number, calculate the median of the first half, calculate median of the 2nd half, substract the smaller half from the larger half

63
Q

semi-interquartile range

A

half of the inter-quartile range

middle 25%

divide interquartile range in half

64
Q

standard deviation (SD)

A

most commonly used and most important measure of variability

takes into account all values of a variable

mean = reference point; measures variability by considering distance between each score and the mean

determines whether scores are generally near or far from mean, how much they deviate from the mean

65
Q

SS (sum of square deviations) - population

A

∑(X - µ)2

find the deviation score: x - µ

compute this for each score, be mindful of +/-

square each deviation score (X - µ)2

add up all the deviation scores ∑(X - µ)2

this is SS

66
Q

variance - population

A

take SS divide by N

∑(X-µ)2 / N

large score = more variability = more scores are spread out = BAD

67
Q

standard deviation - population

A

take square root of variance

SS/N = σ2 <- this is variance

√σ2 <- standard deviation

68
Q

SS ( sum of square deviations) - sample

A

find deviation score x - M

compute for each score

square each deviation score (x - M)2

add up all deviation scores ∑(x - M)2 <- this is SS

69
Q

variance - sample

A

take SS divide by n-1

∑(x-M)2 / n - 1 = s2

70
Q

standard deviation - sample

A

square root variance for standard deviation

√s2

71
Q

unbiased statistic - how to correct?

A

unbiased statistic is an accurate representation of the population

n - 1 in sample variance will correct for bias in sample variability

72
Q

z-score

A

provides a precise description of a location in a distribution

describes number of SD forom mean

describes how common/exceptional a score is compared to others

positive z-score = above the mean, negative z-score = below the mean

73
Q

transforming z-scores

A
74
Q

standardizing distributions

A

compare scores across test forms

same shape as origianl distribution (scores renamed, but same location)

e.g. z-score distribution

when transforming x scores to z-scores, new M = 0, new s = 1

75
Q

probability

A

likelihood that something will happen

way to quantify randomness

smaller # -> less likely

over the long run

p = (# of certain outcome)/(#of all possible outcomes)

probability is similar to findign percentile rank: what is the probability of having an IQ of 120 is the same as percentile rank of x = 120

76
Q

experiement (probability)

A

act of flipping a coin or dice

77
Q

mutually exclusive events

A

cannot happen at the same time - rolling a 2 and 6 on a die cant happen simultaneously

78
Q

independent random sampling

A

probability of being selected is independ of the individuals already selected

each individual in population has equal chance of being selected

ensures that the probability of particular outcome does not depend on previous outcomes

79
Q

sampling with replacement

A

returning selections back to the population

probability of picking out a red m&m 1/10 - pick out an m&m, replace. probability is stil 1/10 instead of 1/9, 1/8, etc.

80
Q

Unit normal table for probabilities in a normal distribution

A

transform score to a z-score (z = x-M/s) (x = M + zs)

look up in unit normal table - proportions are always positive, even if z-score is negative

negative z-score: tail is on the left, body on the right

positive z-score: tail on the right, body on the left

81
Q

distribution of sample means

A

set of means from all possible random samples (w/ replacement) of n from a population

the larger the n, the smaller the st. error of the mean (means from multiple trials) -> because there is less error between the sample mean and the population mean.

the more people in the study, the less error between the sample and the population

  • sample means should be centered around population mean
  • expected that M = µx
  • the sample mean is an unbiased estimator of the population mean
  • distribution of sample means will approach a normal distribution even if original dist. is skewed.
82
Q

standard error of the means

A

σM = σ/√n

83
Q

sample mean relationship in distribution of mean

A

each sample mean, M, has a location in the distribution of sample means

can be described in a z-score

calculate: Z = (M-µ)/σM

M of sample means - individual mean/standard error of the mean

84
Q

hypothesis testing

A

determining whether the sample is representative of the population or merely the result of chance

85
Q

null hypothesis

A

suggests that there are no difference between groups

no effect

assume null hypothesis is true unless data prove otherwise

86
Q

alternative hypothesis

A

suggests there IS a difference between groups

there is an effect

87
Q

test statistic

A

of standard errors the sample value is removed from the null value

use to determine whether to reject the null

compared your data with that is expected under the null

e.g. z-score

88
Q

aplha level

A

probability of making a type 1 error

decreasing significance level -> decreases chance for type 1 error but increase chance for type 2

89
Q

critical region

A

composed of the extreme sample values that are very unlikely to be obtained if the null is true

boundaries determined by alpha level

if sample data fall in the critical region, null is rejected

calculate:

  1. define alpha
  2. use unit normal table to find which z-score to be larger (+) or smaller (-) than the critical region levels
90
Q

hypothesis testing steps

A
  1. state hypothesis (one tailed or two tailed - lower response vs. have a effect)
  2. set the criteria
    - alpha level
    - find critical regions
  3. collect data and evaluate
    - calculate standard error
    - calculate z-score
  4. make a decision
    - reject null -> sample data in criical region, tx had an effect
    - fail to reject null -> treatment doesnt have an effect, not in critical region
91
Q

effect size

A

magnitude of the treatment effect

92
Q

Cohen’s D

A

.2 = small effect

.5 = medium effect

.8 = large effect

calculate: µtx - µnotx / s

93
Q

power

A

probability that the test will correctly reject the null hypothesis

helps determine # of participants needed

related to effect size -> higher effect size = higher chance of rejecting the null (both provide magnitude of tx effect)

decrease standard error between two distributions -> increase # of subjects

factors that affect power: sample size, alpha level, 1 tailed vs. 2 tailed

94
Q

R2

A

another way to calculate effect size - the amount of variability/percentage of variance accounted for

.1 = small effect

.09 = medium effect

.25 = large effect

95
Q

t - statistic

A

z stat used with unknown populatio mean and known standard deviation

t stat used to test hypothesis about an unknown population mean when the standard deviation is unknown

only difference between t and z is estiamted standard error

calculate: t = M - µ / Sm

difference between sample mean and population mean divided bt difference expected by chance

96
Q

hypothesis testing using t - stat

A
  1. set up hypothesis H0: M1 = M2; H1: M1 doesn not = M2
  2. set the criteria
    - set alpha
    - find critical region
  3. collect data and evaluate
    - calculate variance or SD (s2= ss/n-1 = ss/df)
    - calculate estimated standard error (sm = s/√n)
    - calculate t-stat (t = M - µ/ sm)
  4. make a decision
97
Q

percentage of variance explained - r2

A

r2 = t2/ t2+df

98
Q

independent measures t test

A

comparing means of 2 independent groups

uses separate sample for each of the tx populations compared

examine difference between population means of 2 independent groups

assumptions

  • independent obersvations -> one observation doesnt affect probability of other observations
  • normal distribution
  • populations have equal variance -> homogeneity of variance
99
Q

hypothesis test for independent measure t-test

A
  1. state H0 and H1
    - H0: µ1 = µ2 OR µ1 - µ2 = 0
    - H1: µ1 ≠ µ2 OR µ1 - µ2 ≠ 0
  2. identify critical regions based on alpha
    - calculate total df (df = df1 + df2)
    - find critical region boudaries in t distribution table
  3. evaluate assumptions
  4. compute statistics
    - pooled variance
    - estimated standard error
    - independent samples t statistic
  5. make decision regarding H0
    - independent measures t test gives us total amount of error involved in using 2 sample means to estimate 2 population means
    - tells average distance between the sample difference and population difference
    - estimate the standard error using the sample standard devision or variance and, since there are two samples, we must average the two sample variances.
100
Q

pooled variance

A

account for both standard errors, find them separate and then add together.

101
Q

estimated standard error

A
102
Q

estimated Cohens D - t-test

A

measures treatment effect

mean difference divided by standard deviation (estimated standard error b/c its a t-test)

M-µ/s

103
Q

repeated measures design

A

repeatedly measures same individuals to assess change (within-subjects)

  • same sample, test twice, before/after tx
  • same subjects are being tested under different conditions
104
Q

hypothesis testing repeated measures t - test

A

difference score (D) - change in an individuals score between two measures

  1. state null and alternative

H0: D = 0

H1: D ≠ 0

  1. select alpha and criticial values
  2. compute the t statistic

(do not have to compute pooled variance because it is one group)

  • estimates standard error
  • dependent sample t statistic
    4. make your decision
105
Q

dependent sample t statistic

A
106
Q

r2 for repeated measures

A
107
Q

repeated measures (adv./disad.)

A

advantages

  • allows researcher to exclude effects of individual differences (own control group)
  • requires fewer participents -> easier to recruit
  • study individuals over time

disadvantages

  • order effects
  • variance reduced
  • other things can affect -> history, maturity, attrition, testing, instrumentation
108
Q

independent measures (adv./disad.)

A

advantages

  • order effects is not a problem
  • does not require as many materials as repeated measures because different people are being studies so you can reuse materials

disadvantages

  • individuals differences
109
Q

correlation

A

measures and describes a relationship between two variable

110
Q

pearsons correlation

A
111
Q

sum of products

A

calculate mean for x and y

find deviation scores (x-M)

multiply deviation score x and deviation score y

add these

(possibly more)

take this amount minus (∑X)(∑Y)/n

all together… ∑XY - (∑X)(∑Y)/n

112
Q

spearman correlation

A

spearman uses ranks, one or both variables are ordinal

d = differece in rank scores

tied scores?

  • list scores smallest to highest
  • assign rank
  • if tied, compute mean fo their ranked positions and assign this value as final rank for each score
113
Q

linear equation

A

line of best fit

y = mx + b

  • m = slope of the line
  • b = y-intercept
114
Q

least squared error solution

A

approach in regression to find the approximate solution of overdetermined systems (set of equations with more questions than unknowns)

115
Q

linear regression equation

A

all you need is slope and the y-intercept to create a line of best fit

y = bx + a

b = SP/SSx

116
Q

ANOVA

A

used to evaluate the diffrence between two or more sample meansm, compared variances

ANOVA is used because multiple t-tests -> more error

compares between tx variance with within tx variance

advantage: performs all tests with one hypothesis and one alpha, avoids the problem of inflated experiement-wise alpha
hypotheses: null = all means are equal, alternative = there is at least one mean difference among the populations

117
Q

ANOVA factors

A

number of independent variables

between subjects = different subjects used for different levels of the factor

within subjects = same subjects used for the different levels of the factor

118
Q

ANOVA levels

A

number of conditions

119
Q

ANOVA between tx variance

A

measures diffrences caused by

  • systematic tx effects
  • random, unsystematic factors
120
Q

ANOVA within tx variance

A

measures differences caused by

  • random, unsystematic factors
121
Q

when are posts tests necessary for ANOVA’s

A

post tests are used when significant results are found and when additional exploration of the differences among means is needed

provided specific info on which means are significanly different from each other

122
Q

ANOVA effect size

A

r2 = ssb/ss total

  • this is the percentage of variance accounted for by the treatment
123
Q

Chi-square test

A

determines association between 2 categorical variables

  • when scores violate assumptions of a parametric test
  • > not normally distributed
  • > unequally high variances
  • usually high variance
  • undetermined or infinite scores
  • this test determines how well the obtained sample proportions fit the population proportions specified by the null hypothesis
    e. g. relationship between personality and color preference
124
Q

hypothesis test for chi-sqaure test goodness of fit

A

hypotheses

H0: equal proportions or no difference from a known population

Example: Men 50%, women 50%

H1: unequal proportions or a difference from known population

F0 = observed frequency

  • represent rela individuals
  • always whole numbers

Fe = expected frequency (proportion times n)

  • predicted from the proportios in the null hypothesis and the sample sie
  • defines an ideal, hypothetical sample distribution that would be obtained if the sample proportions were in perfect agreement with the proportions specified in the null

chi-square stat

df = C-1 (C= # of categories)

use table to determine if stat is in crtiical region

125
Q

differences between F0 and Fe

A

small

  • small value for chi-sqaure
  • conclude there is a good fit between data and hypothesis
  • fail to reject null

large

  • large chi-sqaure
  • reject the null
  • want a large value for chi square!
126
Q

chi square for independence

A

variables are independent when there is no consistent, predictable relationship between them

  • two variables independent -> frequency distribution for one variable has same shape for second variable
  • if there is no relationship between 2 variables (null) -> distributions have equal proportions (null)

each individual classified on each of the 2 variables

  • frequency distribution for sample tests hypothesis about corresponding frequency distribution for population
  • H0: distributions are the same (no differences, no relationship)
127
Q

phi-coefficient

A

.1 = small

.3 = medium

.5 = large

128
Q

cramers v

A

df small medium large

1 .1 .3 .5

2 .07 .21 .35

3 .06 .17 .29

129
Q

percentage of variance accounted for - t-test

A

r2= t2/t2+df