Unit 1 Flashcards
Categorical data
Data that represents groups or labels (also called qualitative).
Quantitative data
Data that takes on numerical values or amounts that are measured.
Discrete vs. continuous variable
Discrete variables take on a countable number of values (with gaps), while continuous variable take on an infinite number of values (with no gaps)
Variable
A characteristic that changes from one individual to another
Frequency vs. Relative Frequency
Frequency represents the number of individuals in categories, while relative frequency represents the proportion or percent of individuals in categories.
Bar graph vs. Histogram
Bar graphs present categorical data (categories on bottom), while histograms represent numerical data (numbers on bottom)
Which graphs most often display categorical data?
Bar graphs & pie charts
Dot plot vs. Stem & leaf plot
Dot plots use dots to represent numerical data while stem and leaf plots use leading digits and subsequent digits as the leaf.
Marginal distribution
The percent or proportion of individuals that have a specific value for one categorical variable (independent of other categories) (one-way down row or column).
Conditional distribution
The percent or proportion of individuals that have a specific value for one categorical variable among individuals who share the same value for another categorical variable (dependent of another variable) (looking at both rows and columns).
Describing the distribution should include:
Shape, center, variability (spread), and unusual features.
Shapes:
Symmetric, skewed left (high point to right), skewed right (high point to left).
Unimodal (single peak), bimodal (two peaks), uniform (no peaks).
Center
Most common value (sort of like mean)
Variability
Range of data
Unusual Features
Outliers (singular values far away), gaps, clusters.