Data Visualization Flashcards
Why do we need data description?
Raw data is messy; we need summary measures to compare different groups and units effectively.
What is the difference between a percentage and a proportion?
Proportion: Standardized to a base of 1.
Percentage: Standardized to a base of 100.
Both compare a part to a whole
When should percentages and proportions be avoided?
When dealing with a small number of cases, as small changes can cause large distortions.
What is a ratio?
A ratio compares one category of a variable to another category (part-to-part comparison).
Example:
“For every 1 male student, there are 1.2 female students.”
What is a rate?
A rate compares actual occurrences to possible occurrences within a given time and is often multiplied by a power of 10.
Example:
Crude Death Rate (CDR) = (Number of deaths / Population) × 1,000
What is a frequency distribution?
A summary that reports how often each value of a variable occurs.
Key properties:
Categories must be mutually exclusive and exhaustive.
Includes a title, category labels, counts for each category, and total cases.
How do frequency distributions differ by variable type?
Nominal & Ordinal: Simple tally of cases per category.
Interval/Ratio: Often grouped into intervals for summarization.
What are midpoints, stated limits, and real limits in frequency distributions?
Midpoint: The value exactly between the upper and lower limits of an interval.
Stated limits: The discrete categories as they appear.
Real limits: The actual boundaries considering continuity.
What is cumulative frequency and percentage?
The number or percentage of cases at or below each category, useful for ordinal or higher-level data.
Why are graphs important in statistics?
They visually summarize data, show distribution shape, and highlight clustering or patterns.
When should you use a pie chart?
For nominal or ordinal variables.
When the number of categories is low. (4 or less)
When should you use a bar chart?
For nominal or ordinal variables.
When the number of categories is higher (>4-5 categories).
Use clustered bar charts to compare multiple variables.
When should you use a histogram?
For interval/ratio variables.
Unlike bar charts, bars touch to show continuous data.
What is a frequency polygon?
A line graph that represents frequencies for interval/ratio data, similar to a histogram but using dots and lines instead of bars.