statistical analysis Flashcards
basic results section format of report
- restate hypothesis
- assumption check
- descriptives analysis (including visualisations and tables)
- inferential analysis
- accept/reject/retain hypothesis with brief interpretation
Descriptives
= used to compare conditions in relation to a hypothesis
- data in described in terms of point estimate(central tendencies), spead, shape and outliers.
central tendency types
mean = most sensitive measure as it’s value is directly affected by each value, most common.
median = if data contains extreme values, we use the median. position =(N+1)/2 -> number at this position
mode = used in catagorical data.
shape of distribution
- if mean, median and mode are all equal -> normal distribution (symmetrical)
- negative skew = tail runs towards the lower values, peaks to the right
- positive skew = tail runs towards the larger values (peaks to the left)
measures of spread
= the degree of dispersion or variability of values in a dataset
- range ( subtract lowest from highest value)
- interquartile range (between 25th and 75th percentiles)
- variance ( average squared deviation from the mean)
- standard deviation ( square root of variance)
- all these refer to the sample
calculating outliers using IQR method
IQR x 1.5 = lowest value that is not an outlier. Hence, anything blow this will be an outlier.
- if lower quartile = 90 and IQR = 10 then 90-(10x1.5) = 75, so any value less than 75 is an outlier, repeat for upper q.
- upper q = 90+10 = 100 +(10 x 1.5) = 15 = 115, any value > than 115 is also an outlier.
IQR = length of the box
standard error of the mean
= a measure of deviation of the sample mean from the population mean.
- is calculated by the SD of the sample/ the square root of the number of values
- is represented by SE.
- as the sample size increases, SE will decrease as the uncertainty about the population mean decreases.
probability
= the likelihood of the occurrence of an event or outcome
p = number of ways the event could arise/ number of possible outcomes
- expressed in decimals
joint probability
= the probability of two unrelated events occuring together
- calculated by multiplying together the probability of each individual event
replacement
= resetting the number of outcomes to the original value after an event occurs.
eg take one card out deck of 52, deck becomes 51
No replacement would mean you just used the 51
Replacement would mean you replace the card to make it 52 again
null hypothesis significance testing (NHST)
= assuming that the null hypothesis is true, what is the probability of obtaining the value that we did, or a larger. Always test even if hypothesis is not null.
- we construct two hypothesis
1.) null hypothesis = no difference/ relationship
2.alternative hypothesis = difference
level of significance = alpha(a)
= the pre-determined level of significance at which we reject the null hypothesis, usually .05, the cut off line, false positive error rate
rejection region = portion of a sampling distribution which includes samples with probabilities less than alpha (a).
z- score
= the number of standard deviations any particular score is away from the mean.
- relies on the assumption that the data is not heavily skewed.
- to calculate: subtract the mean from your value, and divide by SD of the dataset.
+ -> value is above the mean
- -> value is below the mean
confidence intervals - population mean
= range of values that is expected to capture the true value of a parameter (population mean) with a specified degree of confidence.
- they are an estimate
- 95% is most common
upper 95% CI value = mean + (1.96 x standard error)
lower 95% CI value = mean - (1.96 x standard error)
one sample chi-squre test
= only interested in whether the observed categorical frequency distribution differs from what would be expected by probability.
- assumes that each participant contributes one observation and their are at least two or more categorical outcomes.
- hypothesis has to be testable and a significant difference can be tested.
- observed frequencies = the actual counts per category
- expected = calulated for each group by 1/k x n
where k is the number of categories and n is the total number of observations.
degrees of freedom
= the number of observations that are free to vary to produce a given outcome (known test statistic)
n-1
n= number of conditions
chi squared value and the standard write up
= if our chi squared value is greater than the critical value, then we reject the null hypothesis and accept the alternative one.
- we found…x^2 (df, N) = chi-square value, p-value, effect-size