STATISTICS REVISION Flashcards
rows columns cells
rows= observations
columns =variables
cells = value of a variable for specific observation
CATEGORIES of responses -nominal ordinal quantitative
nominal - response categories cannot be placed in specific order -ethnicity
ordinal - can BE placed in RANK - preference , levels
quantitative - interval and ratio = responses measured on a continuous scale with rank order -assuming uniform distance between responses - income age temperature
measures of central tendency
MEAN MEDIA MODE
biomodial = more than one mode
measures of dispersion
range , = sensitive to outliers, not accurate representation of data
interquartile range= quartiles= data into quarters - box plot
variance
standard deviation=square root of variance = distance of an observation from the mean
VARIANCE
deviations from mean - square difference from mean then sum differences and divide by n-1
proportions
n of observations in categories divided by the total number of observations
-code variables as numbers 0=no 1=yes
add up values / by n (n=number of respondents)
histograms
frequency distributions for quantitative variables
value of variable = X AXIS
how often = Y AXIS
continuous variables - sample size growing = smooth curve
probability distributions
lists possible outcomes of an event and their probabilities - assigns a probability to each possible value of a random variable
SUM = 1
EMPIRICAL RULE -frequency distributions
higher standard deviation= greater variability
(distance from mean )
68% observations fall between y-s and y+s
95% fall between y-2s and y+2s
all or nearly all fall between y-3s and y+3s=
BELL SHAPED Distribution
Empirical rule - PROBABILITY DISTRIBUTIONS
normal distribution but with - SYMMTERY ABOUT MEAN
BELL SHAPED CURVE
68% IN 1 STANDADRD DEVIATION
95% values in 2
99% in 3
sampling distribution
distribution of all these possible sample means
-by using this info - can predict how close it falls to population mean
central limit theorem
as number of samples increases the sampling distribution approximates the normal distribution
confidence intervals
a range of values in which a ** parameter** will fall in the population with a given probability
Point estimate - margin of error ; point estimate + margin of error
how to interpret confidence interval for a mean
“95% confident that the interval … contains the “mean population age NOT population age is between … “
STATISTICAL SIGNIFICANCE test
uses data to summarise the evidence about a hypothesis by COMPARING point estimates of the parameters with the values predicted by the hypothesis
5 parts of the significance test
ASSUMPTIONS, HYPOTHESES, TEST STATISTIC, P VALUE , ALEVEL SIGNIFICANCE TEST
ASSUMPTIONS -on what?
-type of data
-randomization
-population distribution
-sample size
null hypotheses ?
a statement that the parameter takes a particular value
Alternative hypotheses
parameter falls in some alternative range of values -this is the research hypotheses
proof by contradiction ?
significance test analyses sample evidence about NULL hypothesis by investigating if data contradicts Null hypothesis- if data is unusual REJECT null
TEST statistic
summarises how far the estimate falls from the parameter value in NULL hypothesis number of standard errors between the estimate and null hypothesis value
P VALUE
probability that the TEST STATISTIC equals the observed value
- SMALL p value = stronger evidence against NULL = supporting alternative
A level significance level
reject null if p value falls below a pre specified cut off point = boundary value
SMALLER a level = stronger evidence must be to reject null