Unit 1 Flashcards
Categorical data
Data that represents groups or labels (also called qualitative).
Quantitative data
Data that takes on numerical values or amounts that are measured.
Discrete vs. continuous variable
Discrete variables take on a countable number of values (with gaps), while continuous variable take on an infinite number of values (with no gaps)
Variable
A characteristic that changes from one individual to another
Frequency vs. Relative Frequency
Frequency represents the number of individuals in categories, while relative frequency represents the proportion or percent of individuals in categories.
Bar graph vs. Histogram
Bar graphs present categorical data (categories on bottom), while histograms represent numerical data (numbers on bottom)
Which graphs most often display categorical data?
Bar graphs & pie charts
Dot plot vs. Stem & leaf plot
Dot plots use dots to represent numerical data while stem and leaf plots use leading digits and subsequent digits as the leaf.
Marginal distribution
The percent or proportion of individuals that have a specific value for one categorical variable (independent of other categories) (one-way down row or column).
Conditional distribution
The percent or proportion of individuals that have a specific value for one categorical variable among individuals who share the same value for another categorical variable (dependent of another variable) (looking at both rows and columns).
Describing the distribution should include:
Shape, center, variability (spread), and unusual features.
Shapes:
Symmetric, skewed left (high point to right), skewed right (high point to left).
Unimodal (single peak), bimodal (two peaks), uniform (no peaks).
Center
Most common value (sort of like mean)
Variability
Range of data
Unusual Features
Outliers (singular values far away), gaps, clusters.
Mean
Average of all data
Median
Middle value of data.
Middle value of set if odd, average of two middle values if even.
Q1
Median of the first half of the data. (don’t include median)
Q3
Median of the second half of the data. (don’t include median)
Range
Difference between minimum and maximum value in dataset
IQR
Q3-Q1
(median of first half - median of second half).
Standard deviation
The average distance away from the mean.
Five-number summary
min, max, Q1, Q3, median.
Percentile
The percent of data values that are less than or equal to a given value
Z-score
(Data value - mean) / SD