Basic Statistical Concepts Flashcards
Stages in transforming data into information and evidence
1) Data
2) Information
3) Evidence
4) Knowledge
5) Decisions
5) Impact
To change data into information you must
compile, manage and analyze
to turn information into evidence you must
integrate, interpret and evaluate the information
making evidence into knowledge
format for presentation to planners and stakeholders
Knowledge to Decisions
Influence plans and decisions (planners and policy makers)
Making decisions have impact
implement decisions
Turn impact to data
monitor indicators for change
Arithmetic average of a distribution
Mean
Equation for mean
Mean = (Total score/ sample size)
Only measure that can be manipulated alebraically
Mean
Most sensitive to skew and outliers
Mean
the value in the distribution that occurs most frequently
mode
Always located at the peak of the distribution when in graphs
Mode
Insensitive to extreme values or outliers
Mode
Middle value in an ordered array of data
Median
Divides the upper half of the distribution from the lower half
Median
Hardly affected by outliers
Median
Measurement of symmetry and the extent to which a distribution curve leans
Skewness
Arrangement of mean median and mode when a graph is positively skewed
Mode < Median < Mean
In a graph, when the tail extends to the right
Positively skewed
Skewness when higher values have lower frequencies
Positively skewed
In graphs, skewed to the left
Negatively skewed
In graphs, tail extends to the left
Negatively skewed
Arrangement of mean, median and mode when graph is positively skewed
Mode > Median > Mean
Higher values have higher frequencies
negatively skewed
no skew, balanced tails
Bell curve
Two parameters that determine the normal curve distribution
mean
standard deviation
Measure of variability for the distance away from the mean
Standard deviation
Formula for SD from the mean
SD from the mean [(X-Mean)/SD)
Percentage of area from -1SD to +1SD
68%
Percentage of area outside of -1SD to +1SD
32%
Percentage of the area between -2SD and +2SD
95%
What percentage of the observations would be more than +-2SD away from the mean?
5%
What percentage of observations would lie within 3SD of the mean?
99.7%
1 SD above and below the mean
68%
2 SD above and below the mean
95%
3 SD above and below the mean
99.7
The SD of a sampling distribution that determines whether a sample mean will be higher or lower than the population mean
Standard error
Equation for SE
SE = (s/ sqrt n) s= sample SD n= sample size
Range within which the true magnitude of effects lies with a certain degree of assurance or confidence
Confidence interval
Disprove prevailing hypothesis, always set up as hypothesis of no effect (null)
Refutationalism/ Falsification
Determine sample size from confidence interval
large sample size –> more stable estimate –> narrow CI
Three threats to validity
Sampling variability
Confounding variables
Bias
Methods of evaluating sample variability
Hypothesis testing
Interval estimation
Samples drawn at random from a population may give varying measurements of statistical data
Sampling variability
Used to test for validity and address sampling variability
hypothesis testing
Steps of hypothesis testing
1) Set up two hypothesis
2) Prove null hypothesis wrong
3)
Hypothesis wherein there is no relationship between exposure and outcome
Null hypothesis
Hypothesis wherein there is a relationship between exposure and outcome
Alternative hypothesis
Alternative hypothesis wherein direction is determined
One tailed
Alternative hypothesis wherein direction is not determined
Two tailed
Used to determine whether the hypotheses may be accepted or regected
Test statistic
A statistical test assumed to have normal distribution, used to determine whether two population means are different given a large sample size
Ztest
Equation for Ztest
z=
Numerator of Z
magnitude of the difference between groups
Denominator of Z
the variability of the estimate
You want a large test statistic
big difference between groups, large sample size or
smaller variability of the estimate
The probability of obtaining the observed result due to chance
P value
You want a low p value
Larger test statistic
Study results may say treatments differ but the truth is that hey do not
Type 1 error false positive
Study results say treatments do not differ but the truth is they do
Type 2 false negative