Inference Flashcards
Types of Categorical data
- nominal = categories with no order
- ordinal = categories with order
types of numerical data
- discrete = whole numbers, counts
- continuous = recorded measures
Summary statistics for normal data
mean and s.d
summary statistics for non-normal data
median and interquartile range (Q3-Q1)
Correlation definition
a measure of the degree of linear association between two numerical variables
when is correlation not appropriate?
for method comparison studies; testing equivalence of two methods
p-value definition
the probability of getting the observed data if the null hypothesis were true
comparing means of 2 normal samples
if numerical, use a t-test/2 sample t test
coefficient of determination (R squared)
a measure of the amount of variability in the data which is explained by the regression line; the variability in y explained by x
Testing non-parametric data
either
a. transform the data so that it is normal
b. use a Mann-Whitney test
Testing qualitative data
> Testing proportions
With decent sample size, use Chi-squared
Give confidence interval for difference/ratio of 2 proportions.
Testing Quantitative/dependent data
> if differences between 2 samples are normal then use a paired t-test and CI
if not normal use Wilcoxon signed rank test
Type 1 error
> false positive
reject the null hypothesis when it is true
Type 2 error
> False negative result
accept the null hypothesis when it is false
Confidence interval definition
The range of values within which you can be 95% certain that the population mean value lies
what happens to CI’s as sample size increases?
Get smaller
Difference between confidence and prediction intervals
> Confidence interval represents the mean of the possible values = a population estimate
Prediction interval = represents individual observations; encompasses the full range of possible values in the data
Prediction interval will be wider than confidence as individual observations have greater variability than the mean
How to assess normality
> plot histogram to assess skew
Use Shapiro-Wilk test
T-test assumptions
> Sample was randomly selected
Data are independent (if 2 sample)
Data are normally distributed
Similar variance between groups
what do linear predictor coefficients represent?
how much y changes for a 1 unit increase in x
Prediction interval definition
Range within which a single new predicted value of y will fall, with 95% confidence
power definition
power = 1 - the probability of making a type 2 error (beta)
>the probability of correctly rejecting H0
when to use a mann-whitney test
> 2 independent groups of a quantitative variable
one variable not normally distributed
assumptions of a 2 sample t test
> normality
independence
homogeneity of variance