statistics Flashcards
types of variables
- qualitative: either dichotomous (binary), nominal, ordinal
- quantitative: discrete or continuous
dichotomous
qualitative variable
data where every observation is in one of two categories (yes/no)
nominal
qualitative variable
- 3 or more categories; no inherent ordering
- ex cow breeds
ordinal
qualitative variables
- categories in 3 or more categories with categories having inherent order
- ex; gum colour; normal, pale, white
discrete data (counts)
quantitative variable
- can only have values as whole numbers
- ex; number of animals, heart rate, bacterial count
continuous
quantitative variable
have any value within a defined range
measurement
ex body weight, blood pressure, age, hormone concentration
descriptive statistics
conducted to explore patterns in data and to validate/ check the data
depend on the type of data
mean
average
median
line them all up and choose one in middle; not affected by a few extremes
- more accurate indicator of average
mode
most commonly observed variable
normal distribution
mode, median and mean are very similar
percentile
- a number that indicates the percentage of values less than or equal to that number
- 50th percentile is the median
- 25% percentile means that at this data point 25% of data is less than that value
box and whisker plot
the box is the 25-75% percentile (lower and upper quartile), the line in the box is the median
any dots outside of the range are outliers; either more or less than 3/2 times of lower/upper quartile
variance and standard deviation are measures of
the spread of data around the mean
variance s^2
the sum of the squares of the difference of each of n values from the mean, divided by degrees of freedom (n-1)
so take the mean, then each data point that point minus the mean
square root it
then divide by n-1
standard deviation s
square root of variance
estimates the average variation of n the values from the mean
tells us how much variability can be expected among individuals
2/3s of the valies will be within mean +/- one standard deviation
95% of values will be within mean +/- 2 standard deviations
standard error of the mean (SEM)
standard deviation/ square root of number sampled
how close sample mean is to actual mean in target population
confidence interval (one sample only ie one type of experiment)
if you have a x confidence interval then in every 100 samples you collect x amount of them contain the actual mean
NOT CORRECT; if i do an experiment today there is x% chance i get the actual pop mean
confidence interval example
mean +/= tSEM
t will be given to us
gives a range
null hypothesis
there is NO difference between groups
alternative hypothesis
hypothesis that there is a difference between groups
want to disprove the
null hypothesis
steps in hypothesis testing
1) from observed data, a test statistic is calculated
2) the probability (p-value) of observing a test statistic as large or larger than observed, if the null hypothesis is true is calculated
3) p value is compared to a cut off termed level of significance; should be small because we don’t want to reject null hypothesis when it is true
p value
probability of observing a test statistic as large or larger than that observed, if null hypothesis is true
- if p very small, unlikely null is true; reject null hypothesis, 0.05 is alpha
- if p is large then data are consistent w the null hypothesis
confidence (2 types of experiments)
- 2 experiments; if x% confidence interval for mean contains 0, then there is no difference between the groups
- so if the confidence interval ie range does NOT include zero then it is unlikely there is no difference ie between groups so we reject the null hypothesis
chi square
2 variables
2 outcomes
yes/no
dichotomous
yes exposure no exposure
yes injury no injury
ex dogfights and injury
one sample t test
- want to find mean but not really, it does find you the mean but more importantly the p value and this goes for all of them
- one variable, one group
- continuous, normally dist
ex height in cows
2 sample t test
- one variable, 2 groups
- continuous, normally dist
ex height in cows and humans
paired t test
- one variable 2 PAIRED groups
- continuous, normally dist
ex blood pressure in high fibre diets vs low fibre diets
anova analysis of variance
- one variable, 3 or more groups
- continuous, normally dist
- ex height on cows, humans and goats
wilcoxon’s signed rank test one sample
- one variable, one group
- continuous, NOT normally dist
wilcoxon’s rank sum test
one variable, 2 groups
continuous not normally distributed
wilcoxon’s signed rank test 2 matched pairs
one variable, 2 paired groups
continuous, not normally dist
kruskal wallis
- 1 variable, 3 or more groups
continuous, not normally dist
kaplan-meier curve with log rank
measures survival rate using log ranked tests