Descriptive STATS Flashcards
Validity
The degree to which a screening test or other data collection tool measures what it is intended to measure.
Categorical Data
Qualitative or discrete. Fit into mutually exclusive groups.
Nominal- sex- lack any logical order
Ordinal- age group
Quantitative Data
Numeric and measure concepts like “how many” or “how much”
Discrete- age in years
Continuous- temperature
1 standard deviation= 68%
2 standard deviation= 95%
Values more than 3 standard deviations occur about 0.3% of the time.
Ratios
Converting to ratios allows us to incorporate denominators that help account for differences between the groups we want to compare, such as how long surveillance was performed or how many patients each group includes.
Proportions
Compares a numerator and denominator when the numerator is included in the denominator
Rates
A rate includes a unit of time- providing how fast events are occurring.
Patient days or central line days.
When calculating a rate- the denominator does not have to be related to the numerator as it does for a proportion, but i should only include the population at risk for the event seen in the numerator.
Including not at risk individuals in denominator can make it appear that events of interest happen less often.
Incidence proportion
Proportion aka cumulative incidence. is a person based calculation that incorporates the total population at risk who can be newly counted as cases during the specified time period.
Attack rates- represent the risk of acquiring a disease during an outbreak- are also incidence proportions.
Incidence Rate
AKA incidence density- is generally a more precise estimate of the impact of these events.
It incorporates the amount of time that each person was actually at risk rather than treating everyone as if they were at risk for the entire time period the way the incidence proportion does.
Patient days or device days. Urinary catheter days.
Standardized rates to compare event rates of different groups, such as CAUTI rates for two hospitals.
To do this accurately- we must account for issues that could confound this comparison.
Risk adjusting rates using direct or indirect standardization.
Indirect standardization
Example: SIR Standardized infection ratio (SIR) CDC NHSN
uses standard event rates that are applied to each group’s population.
Direct Standardization
Uses a standard population to which the observed event rates of each group are applied.
Confounding (lurking) variables can imply a false association or hide a real one.
These are variables that affect the analysis findings but are not accounted for in the analysis.
Correlation
-1= a perfect negative relationship
0 no relationship
+1= a perfect positive relationship
Relative Risk aka Risk Ratio is used with prospective studies
compare the risk of an event occurring in an exposed group to the risk of it occurring in an unexposed groups.
Incidence proportion
exposed group. 9cases/14 exposed= 64%
unexposed group 6 cases/28 unexposed= 21%
RR= 64%/21%= 3.0
RR values range from zero to infinity
“If someone is exposed to a specific risk factors, what is the risk they will have the outcome of interest?”
Eg. RR= 3. exposed were three times more likely to have the outcome than those who were not exposed.
RR < 1: Risk of outcome was higher in the group without exposure hence, risk factor is protective.
RR=1 Risk of outcome the same between groups. No apparent impact
RR > 1: Risk of outcome higher in group with exposure. Positive association.
Distance from one shows strength of association.
Attributable Proportion formula
AP = (RR-1)/ RR
Odds ratio for case-control studies
If someone has the outcome of interest, what are the odds that they’ve been exposed to the risk factor?
When the outcome of interest is rare, the OR in a case-control study will approximate the RR in a cohort study.
OR= odds of exposure in group with outcome (cases)/ odds of exposure in group without outcome (control)
Odds of exposure = dividing # of exposed by # of unexposed for each group.
Validity
“Does this measurement actually measure what it claims to? “
Sensitivity
80%
If a person has influenza, the test result will be positive 80% of the time.
If someone has the outcome, what is the likelihood the test will be positive?
(# of true positive results)/ (# of individuals with outcome) x 100%
Specificity
91%
If a person does not have influenza, the test result will be negative 91%
If someone does not have the outcome, what is the likelihood the test will be negative?
(# of true negatives)/ (# of individuals without the outcome) x 100%
Positive predictive Value (PPV)
85%
If the test result is positive, the patient will actually have influenza 85% of the time.
If the test result is positive, what is the likelihood that the person truly has the outcome?
(# of true positives)/ # of individuals with positive results x 100%
Negative predictive value (NPV)
87%
If the test result is negative, the patient will truly not have influenza 87% of time.
If the test is negative, what is the likelihood that the person does not have the outcome?
(# of true negatives)/ (# of individuals with negative results) x 100%.
PPV and NPV are only accurate if the test is used during influenza season.
Sensitivity and specificity don’t change based on the time of year- they are characteristics of the test itself and the test is not changing.
PPV and NPV are impacted by the prevalence of the outcome in the population tested. During the off season, the flue test could have very different values for PPV and NPV.
The more of an outcome there is to find, the more likely it is that a positive result will be accurate.
Axes
Each graph has a X axis and a y axis, and certain data types are traditionally placed on specific axes.
X (horizontal) axis: Can be categorical or quantitative data. time is traditionally put on the x axis. (moths or year)
Y (vertical): nearly always quantitative data (infection count or rate)
Histograms are used to examine the distribution of a variable; bar charts are used to compare groups.
Histograms have a quantitative variable along the x axis; bar charts have a categorical variable.
Lines connect a series of quantitative values, turning many data points into a single visual element.
Excellent for depicting time series data.
May also be used to l ink cumulative percent values in Pareto Charts.
Points
Points are helpful when y axis data do not have an assumed minimum of zero. They do not rely on size to emphasize the relationships being shown.
Pareto Chart
Combine bar chart and line chart to show how much each category contributes to the whole.
Scatterplot
To explore correlations.