QUIZ 2 Flashcards
discrete variables
(3)
dichotomous = binary (female/male)
categorical = nominal ( bus, bike, walk)
ordinal ( very satisfied –> neutral –> dissatisfied)
interval = numerical data
(i.e. differences are meaningful)
3
count
ratio ( height)
continuous (time)
measurement
precision vs accuracy
significant figures
describing distributions
univariate or bivariate
univariate
5 characteristics
normal
uniform
bimodal
U-shaped
skewed
descriptive stats
3
measures of central tendency
measures of variation
shape ( higher moments)
measures of central tendency
3
mean, median, mode
measures of variation
5
range, variance and standard deviation, quantiles, percentiles, inter-quartile
shape (higher moments)
3
variance, skewness, kurtosis
confidence
shown by 3
standard error
confidence interval ( credible interval)
statistical significance
scientific/ economic/ clinical significance
likert scale example
strongly disagree–> disagree–> neutral –> agree –> strongly agree
( this data is categorical, ordinal, dichotomous, continuous)
accuracy
true. consistent with the truth or objective (unbiased)
precise
detailed, specific, having low uncertainty, highly resolved
measurement e.g. the flood water is 35 cm deep, plus or minus half a cm
when we measure something in science, we always provide an estimate of our confidence along with the measurement
estimation e.g. ospreys live on average 38.4 +/- 5 years
when we statistically estimate something that was not measured directly, or infer something about the world through our research methods, we must also calculate a measure of confidence for what we report
communicating precision = probabilistic statements
between 31% and 37% with 95 % confidence ( i.e., e [31%,37%] 19 times out of 20)
displaying univariate qualitative (categorical) data
3
tables, bar charts, pie charts
displaying univariate continuous data
3
histogram, box plot, kernel density estimation (carpet plot)
bivariate plots: two categorical variable
1
paired bar plots
how to visualize –> bivariate plots : one categorical, one continuous variable
2
multiple histograms, box plots
visualize –> bivariate plots: two continuous variables
1
scatter plots
univariate measures of dispersion
5
range
standard deviation: average distance from the mean, coefficient of variation, index of dispersion, interquartile range
measures of shape
4
variance, skewness, kurtossis, L-moments
Skewness Left (negative), Right (positive)
(a) negative direction ( mean - median - mode)
(b) positive direction ( mode -median-mean)
confidence interval
the range within which we would expect the value of the statistics to fall, if we were to repeat the study with a very large sample
what does confidence depend on?
sample spread, sample size, ( the nature of the statistic you’re estimating )
credible intervals are a simpler concept from bayesian stats
the interval within which an unobserved parameter value falls with a particular probability, given available data, model, and preexisting knowledge
- best estimate of the true value of parameter
properties of a univariate variable
ex: describe (aspects of) the distribution of an unordered list of sample measurements
multivariate properties
-two things measured for each individual sampled
- describes (aspects of) the relationship between theme
univariate measures of central tendency
mean, median, mode
statistical significance
“within 3%, 19 times out of 20”
- closely related to confidence intervals = error bars
observed relationship is unlikely to be due to chance
scientific ( or economic) significance
an estimate of effect size put into context, for instance by competing effects, natural variation, or comparison to other costs or impacts
discrete variables
dichotomous = binary (female/male)
categorical = nominal ( bus, bike, walk)
ordinal ( very satisfied –> neutral –> dissatisfied)
Determine H & H0.
Are these outcomes equally likely?
H: are the counts /frq. of each category as expected?
H0: the probability of each group is exactly equal (or equal)
Determine H & H0.
are groups different from each other?
H: are the means of X different in the two groups
H0: the mean of Xa equals the mean of Xb
Determine H & H0.
did the outcome change?
H:did the mean of outcome X change across the two measurements?
H0: the mean of X1 equals the mean of X2
Determine H & H0.
are these two outcomes correlated?
H: are they the same?
H0: X and Y are uncorrelated (orthogonal)
Determine H & H0.
are these variables related
H: is there a linear relationship between X and Y
H0: the coefficient of alpha in an OLS regression of Y on X (and Z…) is zero
Procedure for a statistical test
1-based on theory + H0 there is a stat of interest that you can calculate with sample
2-use theory to derive distribution of expected values (under H0). some assumption must be made
3- calculate the statistic actual value given sample
4-state likelihood of answer from sampling method, given that the H0 is true, if unlikely = reject H0
What is the null hypothesis (H0)?
-what you are trying to disprove
- strong results, eliminate the possibility of the H0
- no difference, no change, small difference, no effect
reject the null if ( statistical significance)
p-value is small or near zero
type I error for binary decisions
rejecting the null hypothesis when it is true ( false positive)
type ii error
failing to reject the null hypothesis when it is false (false negative)
the probability of a type I error is determined by ____
significance level
- with a 99% significance threshold, type I error is less likely than with a 95% confidence
the probability of a type ii can be computed for a particular test statistic (i.e. H0), if given______
population distribution (parameterization, mean and SD)
sample size (N) and alpha ( chance of type I error)
hypothesis test power
1 - ( the probability of a type ii error)
AKA - the probability that we correctly reject the null, when the null is false
frequentist statistics
developed before computers & calculators ( t-test, chi-square, F test)
bayesian statistics
rapidly developing framework for using prior expectation + evidence to make bets
measurement validity
how well your metric captures the underlying concept you are trying to measure
internal validity
the degree to which the design of an experiment controls extraneous variables