4. Data Visualization & Summarizing Data Flashcards
A visual dimension of a visualization that represents data
Aesthetic
Common types of data visualization
- Scatter plot
- Line graph
- Histogram
- Density chart
- Bar chart
- Stacked bar chart
- Pie chart (usually bad)
The variation in a single variable
Univariate statistics
The variation between two variables
Bivariate statistics
(type of data visualization)
relationship between two numeric variables
Scatter plot
(type of data visualization)
change in a numeric variable or proportion over time
Line graph
(type of data visualization)
univariate view of a numeric variable
Histogram
(type of data visualization)
differences in proportion or mean between categories
Bar chart
(type of data visualization)
proportions of various categories
Pie chart (usually bad)
What are the two types of Statistics?
- Descriptive: Describing a given dataset
Assuming that those data are the population - Inferential: Making inferences from a sample to a population. Quantifying the amount of uncertainty around the values you calculate
What are the three measures of central tendency?
- Mean
- Median
- Mode
What are the measures of spread in numerically describing data? (4)
Range
Quartiles; Inter-Quartile Range (IQR)
Variance
Standard Deviation
Mean
Add up all the values and divide by the total number of values
An observation that is extreme compared to the rest of the observations.
Outlier
Median
Line up the variable in order from lowest to highest and take the middle number; if there are an even number of observations then take the average between the two middle numbers
Mode
In classic statistics parlance – the most common value. Prominent peaks in distributions
The maximum value minus the minimum value.
Range
the data point at which a certain percent of the data is below
percentile (ex. 70th percentile - 70% of people are shorter than you)
Quartiles - What is Q1 and Q3
Q1: where 25% of the data are below – i.e. the 25th percentile
Q3: the point where 75% of the data are below – i.e. the 75th percentile
Inter-Quartile Range (IQR)
to Q3 minus Q1; a span the covers 50% of the data
How far is the typical point away from the center? (standard deviation squared)
Variance
Standard Deviation
just the variance, but correcting for the fact that the units are squared… it’s just the square root of the variance
n-1
in the formula for standard deviation, we use n-1 for sampling
scatter plot
compares two numeric variables to each other
line graph
change in one numeric variable over time
histogram/box plot
univariate view of 1 numeric variable
which charts show us categorical variables?
bar chart, stacked bar chart, and pie chart