Stats Concepts Flashcards
What is a variable?
A feature of individual units within a study (e.g. people); something that we can observe or measure.
What are the four things an outcome could be?
An observation (at one moment in time - attained weight of a baby at 6 months; mortality, status (dead/alive)), a time to an event (that may or may not happen-Time to death), a count (independent of time - number of measles cases), a rate (dependent on time - No of deaths per 1000 person-years)
What is binary data?
Categorical (not numerical) data which only has 2 alternatives/options - an example would be dead/alive
What is nominal data?
Categorical (not numerical) data which has more than 2 alternatives/options but which has no natural order - classic examples are ethnic groups or blood type
What is ordinal data?
Categorical (not numerical) data which has more than 2 alternatives/options and a natural order to it; for example - hypertensive/borderline/normal/hypotensive
What is discrete data?
Numerical data (quantitative) that is a count - for example, number of measles cases
What is continuous data?
Numerical data (quantitative) where there is an infinite number of values the data can take - for example, blood pressure, weight, age
What is positively skewed data?
Data who’s frequency distribution is skewed to the right on the x axis
What is negatively skewed data?
Data who’s frequency distribution is skewed to the left on the x axis
What are some examples of continuous probability distributions?
Normal distribution, t-distribution, f-distribution, chi-squared distribution
What are some examples of discrete probability distributions?
Binomal, poisson, uniform
What is meant by the range of data?
Simply the highest score minus the lower score; it is the range of scores you would see in your sample
What is VARIANCE?
The average squared distribution from the mean
What does variance tell us?
On average, how much the scores are distributed around the mean
What is STANDARD DEVIATION?
It is the square root of the variance
What happens to the mean in the context of a skewed distribution?
It no longer gives a good impression on the central tendency of the observations
What is a more appropriate measure than the mean for skewed distributions?
When data are skewed, it is more appropriate to use the median and interquartile range (75th to 25th quartile) as your descriptive statistics. If data is positively skewed, a log transformation will also help
What is the only time the mean is appropriate?
When the data is normally distributed
What are population values?
The true values of a measure in a population. They define the population.
• Mean, μ
• standard deviation, σ
What is a sample statistic?
The value of the measure in a sample of the population. It is calculated from the observations in the sample
• Sample mean, x̄
• Sample standard deviation (SD), s
What is sampling error?
Information gained from a single sample is the “best estimate” of what is true in the population
• In truth, the sample statistic may be somewhat larger or smaller than the true population value (i.e. uncertainty)
• This is due to sampling error
What is standard error?
It is the measure of the accuracy of the sample estimate. It calculates how far from the true (but unknown) population value the sample estimate is likely to be - basically, how large an error we are likely to be making.
The standard error of the mean would be calculated as: = SD/ square root of n = s/ square root of n
What are the two main methods of statistical inference?
Hypothesis testing (significance testing) Estimation (confidence intervals)
What does low SD indicate?
Data points are close to the mean
What does high SD indicate?
Data points are far from the mean
95 of values lie within how many SDs of the mean?
2 (1.96 to be precise)
How do you calculate a CI range of values?
95% of effect estimates for ‘large’ samples have values
between(effect - 1.96 SE) and (effect + 1.96 SE). SO, you would calculate as:
For 95% CI for diff = MEAN ± (1.96 x SE) = upper value
= MEAN - (1.96 x SE) = lower value
95% CI (upper value to lower value)
Describe the relationship between p values and confidence intervals
If the ‘no effect’ value falls outside the CI then the result is statistically significant
• Confidence intervals and P-values present complementary information
• Confidence intervals show the range within which the true treatment effect is likely to lie
• P-values measure the strength of the evidence against a hypothesis of particular interest: the null hypothesis
How is a CI determined?
CI is determined by the Standard Error – a measure that combines SD and sample size, n. Standard Error (SE) = SD/ square root of n
What is the definition of a CI?
A confidence interval provides a range of plausible values for the POPULATION mean, not the SAMPLE mean
What does a p-value measure?
The strength of the evidence against a
hypothesis of particular interest: the null hypothesis
What is the definition of incidence?
The number of NEW cases of a disease/condition during a population at risk of developing the disease/condition during a specified time period
How is cumulative incidence different from incidence rate?
Incidence rate uses person-time (the sum of the disease-free time) as the denominator, incidence uses the number of people at risk of developing disease/condition during a specified time period
what is the definition of cumulative incidence?
o The cumulative incidence or risk of a disease is the
probability that the disease occurs during a specified time period.
o Equivalently, cumulative incidence can be defined as the percentage of the at risk population in which the disease occurs during a specified time period.
What is the question that cumulative incidence answers?
“what is the probability or chance that an individual
develops the outcome in a defined period of time?”
When do we use the incidence rate (sometimes called incidence density) instead of simply the incidence?
When all people are not observed for the full time
period or not at risk for the full time period we need to
consider “person-time” at risk and report the incidence
rate (also sometimes called incidence density
What question does incidence rate answer?
“at what rate are new cases of the disease occurring within the at risk population”
What is the definition of prevalence, and how is it calculated?
Prevalence= Number of cases observed at time t /
Total number of individuals at time t
What question does prevalence answer?
“what fraction of the group is affected at this moment in time?”
What is the definition of a p value
A p value is the probability of having observed our data (or more extreme data) given that the null hypothesis is true.