Data and Statistics Flashcards
Define data
Data
“Raw” numbers
Counts of individual events or services
Collected at local, state, national, or international level
Data Sets
Data collected and arranged logically according to definite criteria
May/may not represent entire population
Define statistics
Statistics
- Data already analyzed and summarized
- Information presented as text, figures, graphs, or maps
- Not always available freely or publicly on the Web
- Most data on Websites is compiled statistics
What are some key features of health statistics?
- Population based
- Measure a wide range of health indicators for a population
- Entire U.S., state, county, city, zip code
- Often collected and analyzed over a period of time
- Include different types of data
- Vital (birth, death, marriage, divorce)
- Morbidity & mortality
- Use and cost of health care
What are some uses of health statistics?
- Provide key indicators about life and health in a particular region.
- Gauge disparities
- Measure progress
- Disease occurrence and potential
- Identify prevention targets
- Help with public health program planning and evaluation.
- Monitor progress
- Measure health care costs
- Mobilize activities
- Plan for resource allocation
- Used in creation of health policy and legislation.
What are some ways that you can assess the quality of data?
- Nature (source) of the data
- Availability of the data
- Validity and reliability of measures
- Completeness of population coverage
- Strengths and limitations of study design
Data will break your heart!
- Statistics are collected to meet the needs of the collector!
- Studies replicate previous findings
- Collected data is imperfect… but we still act on it
- It is collected by someone with a bias/incentive to lie
Define reliability
- Reliable = consistent
- Overall consistency of a measure
- A reliable measure is one that is relatively free from measurement error
- A scale is reliable if it yields consistent results over repeated applications in a short time frame
- Same survey given at two times of the day that gives different results (low correlation) for the same participants is not reliable
Define validity
- Validity = accuracy
- Scale is valid if it measures what it intends to measure without systematic error
- Is health status truly captured by the measure
- Usually involves comparing two different measures of the same phenomenon (e.g., self- rating scales versus physician’s assessment)
What are three things that goes into the completeness of data?
Representativeness
—Degree to which a sample resembles a parent population
Generalizability (external validity)
—Ability to apply findings to a population that did not participate in the study
Thoroughness
—Care taken to identify all cases of a given disease
How do you find data?
- Formulate the question
- Choose the best resource for the question
- Evaluate the results
- Repeat as often as necessary
List some sources of health data
- Statistics from vital registration system
- Reportable disease statistics
- Insurance data
- Clinical data sources
- School health programs
- Reports from health organizations (e.g., CDC, WHO), advocacy groups
- Economic data
What are the four types of data?
- Nominal
- Ordinal
- Interval
- Ratio
Define nominal data
a measurement scale consisting of qualitative categories whose values have no inherent statistical order or rank (e.g., categories of race/ethnicity, religion, or country of birth).
Define ordinal data
a measurement scale consisting of qualitative categories whose values have a distinct order but no numerical distance between their possible values (e.g., stage of cancer, I, II, III, or IV).
Define interval
a measurement scale consisting of quantitative categories whose values are measured on a scale of equally spaced units, but without a true zero point (e.g., date of birth).
Define ratio
a measurement scale consisting of quantitative categories whose values are intervals with a true zero point (e.g., height in centimeters or duration of illness).
Define normal distribution
a measurement scale consisting of quantitative categories whose values are intervals with a true zero point (e.g., height in centimeters or duration of illness).
What are skewed distributions?
Observations “clustered” at one end of the scale
The presence of outliers
CHECK PIC IN NOTES
What are paranormal distributions
Distributions that don’t exist
Define confidence intervals
Goal: capture the true value (e.g., the true mean) most of the time.
A 95% confidence interval should include the value about 95% of the time.
A 99% confidence interval should include the true value about 99% of the time.
Define p-value
P-value is the probability that we would have seen our data (or something more unexpected) just by chance
P-values of
Define the measure of central tendency
Mean: the average
Median: the middle value
Mode: the most frequent value
What are measures of assocaition?
Strength of the association between two variables, such as an exposure & an outcome (disease)
Two measures of association used most often are the risk ratio (RR) & the odds ratio (OR)
Interpretation of RR and OR:
RR or OR = 1: exposure has no association with disease
RR or OR > 1: exposure may be positively associated with disease
RR or OR < 1: exposure may be negatively associated with disease
Define the risk ratio
- Used when comparing outcomes of those who were exposed to something to those who were not exposed
- Calculated in cohort studies
- Cannot be calculated in case-control studies because the entire population at risk is not included in the study
Define the odds ratio
- Used in case-control studies
- Odds of exposure among cases divided by odds of exposure among controls
- Provides a rough estimate of the risk ratio
Look at the 2x2 table
http://hihg.med.miami.edu/code/http/modules/education/Design/images/Slide405042.jpg