Statistical analysis Flashcards
Differentiate nominal, ordinal, interval, and continuous data
Categorical
• Nominal: Values that fall into an unordered category. For example, gender, disease status (yes/ no) and blood groups
• Ordinal: Categories that have an order. For example, pain scale (1-10), cancer staging (1-4)
Quantitative
• Interval: Restricted to specified values. For examples, number of live births, number of people who attended dental clinics
• Continuous: Number along a continuous scale such as height, weight and periodontal pocket depth
Be familiar with relative and cumulative frequencies
- Frequency is the number of times an event occurs
- Relative frequency: the number of times that the event occurs during experimental trials, divided by the total number of trials conducted. Always recorded as a percentage
• Cumulative frequency: used to determine the number of observations that lie above (or below) a particular value in a data set
Interpret measures of central tendency and variability; mean, median, mode, standard deviation, percentiles, interquartile range
Mean
• Tells you the most common value
• It is greatly affected by outliers
Median
• The centre most value in a distribution
• Used for when there are extreme values
• Not affected by outliers
Mode
• The number that occurs most often in the data set
Standard deviation
• A measure of how spread out numbers are from the mean
Percentiles
• Where a certain value will fall into
Interquartile range
• Breaks the data down into the middle 50%
• IQR is Q1 minus Q3
• Tells how spread out the “middle” values are
Understand the term confidence interval
- Arange of valueswe are fairly sure ourtrue valuelies in
- 95% confidence interval = range of values that you can be 95% certain contains the true population
- Large sample size gives a narrow 95% Cl = precise estimate of effect
- Small sample size gives a wide 95% Cl = imprecise estimate of effect
- If confidence intervals overlap for a few different samples, it is NOT statistically different
- If a confidence interval contains a no effect value, then the confidence level is not statistically significant
Interpret the meaning of prevalence
- The amount of affected people in a population at a given time
- Affected/ total number of population at the time
Interpret incidence in terms of its meaning, how it is determined, cumulative incidence and incidence rate
Incidence
• Number of new cases of affected people
How it is determined
• This is determined by following at risk individuals for a period of time to see their transition into sickness
• Not at risk individuals are excluded from the study
• Since you have to follow someone into disease, you have to account for that time
Cumulative incidence
• Total number of new cases/ population at risk during this time
Incidence rate
• Rate of occurrence of new cases over a given time
• Number of new cases/ person-time
• Members at risk contribute to time as the time spent following them up until they were diagnosed
• Person-time is the sum of total time contributed by all subjects.
Understand the term “person-years”
1/35 person-years
• Means that 1 person becomes sick over 35 years of observation
• 1 new disease is expected to occur is 35 people people are followed for 1 year
• 1 new disease is expected to occur is 5 people were followed for 7 years
State the importance of understanding prevalence and incidence
- Prevalence helps determine needs of a community in treating that disease
- Incidence helps understand the cause of disease and effectiveness of prevention program
Interpret number needed to treat (NNT)
• Quantifies how many patients have to be given a new therapy for a particular duration so that one patient can benefit compared to giving another therapy
Interpret Risk ratio and the values, including confidence intervals
Risk ratio
• Relative Risk: is a ratio of the probability of an event occurring in the exposed group versus the probability of the event occurring in the non-exposed group
RR >1: exposed group has higher risk of getting outcome
RR = 1: no difference
RR < 1: exposed group has lower risk of getting outcome
If a confidence interval contains 1 = no statistical significance
Interpret Risk difference (attributable risk and absolute risk reduction) and the values, including confidence intervals
- Difference (subtraction) between theriskof an outcome in the exposed group and the unexposed group.
- Attributable risk: higher risk in the exposed than non exposed
- Absolute risk reduction: Lower risk in the exposed group than the non exposed (e.g. an intervention to prevent death from a disease)
0 = there is no difference in risk between the two groups
• If a confidence interval contains 0 = no statistical significance
Interpret Odds Ratio (OR)
- Odds of a disease occurring in one group compared to the odds of it occurring in another group)
- Odds compare events with non events. If a horse wins 2 out of every 5 races, its odds of winning are 2 to 3 (expressed as 2:3)
OR = 0 Exposure does not result in outcome
OR> 1 = Exposure associated with higher odds of outcome
OR <1 = Exposure associated with lower odds of outcome
Understand and interpret probability value (P value)
- Null hypothesis: states that results are due to chance and that the two variables being investigated do not cause each other
- If P value is low ==> the null must go!
- If p value is high ==> the null’s your guy!
- Thus a small p-value indicates that the statistical significance is great
- P value ≤ 0.05 = difference is considered to be statistically significant
- P-value > 0.05 = Result is not statistically significant
Understand the Chi-square (x2) text
- Only works for categorical data
- Tests relationship between categorical data
- Gives a ‘p’ value to help decide statistical significance
Understand T-test and ANOVA
T-tests use p-values to determine if there is a statistical significance between two groups. There are two types of t- tests:
Independent Samples t-test • compares the means for two unrelated groups
Dependent sample t-test
• compares means from the same group at different times (say, one year apart).
ANOVA:
• T tests can only test 2 means
• ANOVA can test more than 2 means