Midterm #1 Flashcards
Statistical Origins
began 100-120 years ago with 4 guys in England
Statistical Origins
- **4 guys (100-120 **years ago in England)
**Francis Galton **
Karl Pearson
Ronald Fisher
William “student” Gossett
Francis Galton
interested in **quantifying human variation **
- money man *
- eugenics *
Karl Pearson
wanted to show relationships between variables
- student of Galton** *
- fan of **Karl Marx ***
- enemy = **Ronald Fisher ***
Ronald Fisher
wanted to **test if something caused something **
- statistics & genetics *
- studied causal relationships *
- enemy: Karl Pearson*
William “student” Gossett
just wanted **everyone to get aloing **
*worked at brewery *
Psychology & **Statistics **
- history/prevalence within psychology
*
When **Freudians & Behaviorists **ruled psych → no need for stats
**Personality, social, cognitive **psychologists created **demand **for statistics
- stats became… when? (2)
- debate (2), when?
→ became language of psychology in 1950s
→ 1980s: stats became more complex (computer rev.)
21st Century - debate **(quantitative vs. qualitative) **
- bigger debate around how we use stats
Definition of Statistics (2)
**Statistics **as:
- **collection **of **numerical facts **
- **methods **for dealing with **data **
(2) Types of Statistics
1) Descriptive
2) **Inferential **
**Inferential **statistics allow us to?
generalize from **samples **to **population **
Population
complete set of **individuals, objects **or **measurements **having some common characteristic
Parameter
any **characteristic **of a population that is measurable
Sample
**subset **of a population
Statistic
**number **resulting from **manipulation **of sample data
Scales **(4) **
NOIR
Nominal
Ordinal
Interval
Ratio
Nominal Scale
observation of **unordered variables **with **no ranking **to be inferred
Ordinal Scale
classes differ & indicate rank
Interval Scale
classes differ in **meaningful way **so arithmetic operations are possible
Ratio Scale
interval scale but with **meaningful zero point **
Grouping
**collapsing **scores into mutually exclusive classes defined by **grouping intervals **
Grouping Data
**- pros (3) **
- difficult to deal w/ large # of cases spread over many scores
- some scores have low frequency counts
- less data leads to greater comprehension
Grouping Data
- **cons (2) **
- info is lost when categories/data are combined
- categories can be **arbitrary **
Ungrouped Frequency Distribution
frequency distribution (table that displays frequency of various outcomes in a sample) that does NOT group data into intervals
Grouped Frequency Distribution
groups data into intervals of size i
- mutually exclusive & exhaustive
frequency is equal to the number of values that fall within this interval
Cumulative Frequency Distribution
also include cumulative frequency (cf ), which indicates the number of values within the specified interval + # of values previously counted
ON GRAPH: highest point reached is total (n) # of values
Cumulative Percentage Distribution
also includes c%, which is cf/n x 100%
- shows the cumulative frequency as a percentage of the total (n) # of values
ON GRAPH: highest point reached is 100%
IQ scores would be an example of data that are?
Interval
Percentile Ranks
form of cumulative percentage that indicate where scores fall in a distribution
How do percentile ranks work?
- i.e. PR = 10%
a score with a PR = 10% indicates that:
- its value is greater than 10% of all scores
- its value is less than 90% of all scores
Central Tendency
index of central location employed in the description of a frequency distribution
Mean
average taken by summing scores & dividing sum by # of scores
- point in a distribution about which summed deviations are equal to zero → Σ(x-bar - x) = 0
Mean **formula **
sample: x-bar = Σx/n
population: µ = Σx/N
Deviation score
score minus mean
x - (x-bar)
→ summed deviations from the mean = 0
Sum of Square Deviation Scores
- aka?
- size?
- positive/negative?
aka SUM of SQUARES (SS)
- never negative
- SS from the mean are LESS than SS from any other number
If data is from population rather than sample?
use N instead of n → # of scores
μ instead of x-bar → mean
Median
score that divides distribution so that same # of scores lie on each side
Median is a special case of?
percentile rank (50th percentile)
Mode
score that occurs with greatest frequency
____ is associated with ___ data
a) Mode
b) Median
c) Mean
a) nominal data
b) ordinal data
c) interval/ratio data
If data distribution is normal…
mean, median & mode are…
same value
If distribution of scores is NOT normal…
mean, median & mode….
fall alphabetically from tail
1) mean
2) median
3) mode
Non-normal distributions may be..
skewed positively or negatively
Which distributions have kurtosis?
distributions that are too light or heavy in the tails have kurtosis.
Do
- a) distributions with kurtosis*
- b) skewed distribution*
affect central tendency?
a) NO
b) YES
A score at the median is at the ____ percentile
50th
assuming normal distribution
Variability
the dispersion of scores in a distribution
range
crude measure easily influenced by outliers
semi-interquartile range
less influenced by outliers but still crude
(75th - 25th) / 2
Standard Deviation & Variance (3)
- reflect…
- basis for…
- exploits…
- reflect dispersion of scores
- basis for all inferential statistics
- exploits mean as best measure of central tendency
Variance
quantitative measure of difference between scores in a distribution that describes degree to which scores are spread out/clustered together
Variance formula
S2 = SS/(n-1)
= Σ(x-x̄)2 /(n-1)
Standard Deviation
square root of variance
provides measure of standard/average distance from mean
Standard Deviation formula
S=√S2
=√ [Σ(x-x̄)2 /(n-1)]
Deviation **Method **formula
if you have **many **scores
SS = ∑x2 - (∑x)2/n
Z score
statistical measurement of a score’s relationship to the mean in a group of scores
Z score formula
Z = x-µ/σ
How are **Z scores **useful when given scores from different normal distributions?
can find the **z score **of the scores in order to facilitate comparison
Z formula is the ___ for many ___ ___
foundation for many inferential statistics
Z formula **numerator (x-µ) **reflects?
how score **deviates **from pop’n parameter (µ)
Z formula **denominator (σ) **reflects?
**variability **of scores in pop’n
the z formula **ratio **represents?
score **(z) **that can be compared to theoretical distribution (normal distribution)
When using **z scores **for a sample rather than individual score…
µx-bar= µ of popn
σx-bar ≠ σ
σx-bar = σ/√n
When variables are not normally distributed..
Central Limit Theorem to address these issues
- Zx-bartest
Central Limit Theorem
the **means **of a large # of independant random samples will be normally distributed regardless of underlying distribution
sampling distribution
theoretical distribution of possible values of some sample statistic that would occur if all possible samples of fixed size were drawn from a given population
If the sampling distribution takes the form of a normal distribution…
we can use the known properties of the normal distribution to make inferences
Null hypothesis
a general statement/default position that there is no relationship between two measured phenomena
Alternative hypothesis
the hypothesis used in hypothesis testing that is contrary to the null hypothesis.
- usually taken to be that observations are result of a real effect
In psychology, outcome is **unremarkable **if probability of outcome **by chance alone **is?
**greater than **5 in 100
(> 5 in 100)
Remarkable outcome if probability of occurance by chance alone is …?
equal to or less than 5 in 100
(≤ 5 in 100)
Type **1 **error
α
when you **reject **null hypothesis that is true
Type **2 **error
β
**failing to reject **null hypothesis that is false
Statistical power
capacity to find something if its there
Logic of Testing
(6) questions to ask
1) what is the **appropriate statistic, **its distribution & its assumptions
2) null & alternative hypotheses
3) probability of making type 1 error
4) obtained value for test
5) critical value for test
6) decision regarding obtained value relative to critical value