PS1018 - statistics Flashcards
why do psychologists need statistics
to summarise/describe data
to generalise from samples to populations
variables
anything that can have different values
discrete variables
limited number of values, countable values
continuous variables
uncountable, infinite data
categorical data
nominal and ordinal
numerical data
interval and ratio
nominal
- no rank/order
- mode most common measure of CT
- Gender/eye colour
ordinal
- categories, but has an order/rank
- median most common measure of CT
- level of agreement
interval
- usually measured in numbers
- have an order, spaces between measurement are equal
- mean most common measure of CT
- temperature
ratio
- ordered/ranked
- distance between points is consistent
- zero point = absolute zero
- mean most common measure of CT
- mean most common
frequency histograms
graphical representation of distribution of a data set
unimodal distributions
only one most common score
bimodal distributions
two equally common scores
positively skewed
tail goes towards the negative end
central tendency
where most of the scores are
variability
degree of ‘spread’ about an average
interquartile range
- find the median
- find first quartile (middle score of lower half of scores)
- find third quartile (middle score of upper half of scores)
- difference between 1st and 3rd quartiles
s2
sample variance
sample variance - xi
term in data set
sample variance - x- (line on top of x)
sample mean
sample variance - n
sample size
how to calculate sample variance
- calculate the mean
- subtract mean from each data value
- square the results
- add results together
- divide this result by n-1
example data : 2, 6, 8, 3, 5, 7, 2, 1, 2
s2 = 6.5
sample standard deviation
square root of sample variance
problems with summary statistics
hides info about full distribution
doesn’t represent whole data set
‘ignores’ information about individual differences
what is sample variance
measures how much the data points in a sample deviate from the sample mean
Standard Error of Mean (SEM)
SD/square root of sample size
the raincloud plot
most rich data representation, showing individual data and distribution
raincloud plot - the cloud
smoothed representation of distribution
raincloud plot - summary plot
the mean
raincloud plot - the ‘rain’
individual data
what does a boxplot show
median
1st and 3rd quartile
error bars - range of data within 1.5 IQR of 1st and 3rd quartiles
why use boxplots
- skew and outliers show
- richer than simple summary stats
histograms
show full distribution of data
probability of an event
number of possible occurrences of the event divided by the total number of all possible events
probability rules - addition rule (A or B)
P(A or B) = p(A) + p(B)
A & B are mutually exclusive (if A occurs, B cannot)
probability rules - multiplication rule (A and B)
p(A,B) = p(A) x p(B)
A & B are independent - A occurring doesn’t effect B occurring
Example: Bag of 100 marbles, containing 10 red, 30 green, 60 blue
- probability of red
10/100 = 0.1 (10%)
Example: Bag of 100 marbles, containing 10 red, 30 green, 60 blue
- probability of green
30/100 = 0.3 (30%)
Example: Bag of 100 marbles, containing 10 red, 30 green, 60 blue
- probability of blue
60/100 = 0.6 (60%)
Example: Bag of 100 marbles, containing 10 red, 30 green, 60 blue
- probability of either a red or blue marble in 1 pick
p(r o b) = p(r) + p(b) = 0.7 (70%)
Example: Bag of 100 marbles, containing 10 red, 30 green, 60 blue
- probability of red marble on 1st pick, then blue marble on 2nd pick (return marble after 1st pick)
p(r,b) = p(r) x p(b) = 0.1 x 0.6 = 0.006 (6%)
What is the sign test
non-parametrical statistical test
what is the purpose of the sign test
to determine whether there is a significant difference between medians of two related groups
when should you use the sign test
- data is paired of matched
- data is ordinal or does not meet parametric assumptions
- testing for a median difference
sign test - step 1
calculate the difference between paired observations
sign test - step 2
assign a sign to each difference (+, -, 0)
sign test - step 3
ignore cases with no differences
sign test - step 4
count the number of positive and negative signs
sign test - step 5
use the binomial distribution to test the null hypothesis
what is the test statistic in the sign test?
the smaller of the counts of positive or negative signs
how to determine if the results are significant in a sign test
- compare the test statistic from the critical value from the binomial distribution table
- if its less or equal to critical value, reject null hypothesis
advantages of the sign test
- simple to use
- doesn’t require normally distributed data
- suitable for small sample sizes
limitations of the sign test
- only considers the direction of change, ignoring magnitude
- tied values (zero differences) are excluded, may reduce sample size
Wilcoxon matched-pairs test
- within-subjects test of differences
- two condition experiments with ordinal data
- non-parametric
What variables do difference tests include?
Discrete and continuous
What variables do relationship tests include?
Two continuous variables
Non-parametric test
data doesn’t have to be normally distributed
Parametric tests
data is normally distributed
why use Wilcoxon instead of sign test when possible?
Sign test ‘throws away’ size of differences, but Wilcoxon is sensitive to this
what test to use with two categorical variables
chi-square
what test to use with one numerical variable
t-test
what test to use with one numerical and one categorical variable
t-test or ANOVA
what test to use with two numerical variables
spearman’s rho, pearsons correlation coefficient