Lecture #2/#3 (Data & Central Tendencies) Flashcards
What is the most effective visualization for nominal data?
A pie chart
What is the most effective visualization for ordinal data?
A bar graph
A histogram is
a visualization that shows frequency
How should you group your data?
You should find logical break points, avoid empty classes and use a method that communicates your data
What is an advantage to scatter plots?
Scatterplots is useful for comparing two sets of data
A time series plot is useful why?
It shows trends over time (think climate change graphs)
What are common mistakes with data visualization?
Too much data is present, color contrast is misleading, inappropriate projection, zooming in on favorable data
Name the the numerical summaries (basic, central tendencies and dispersion)
Min, max, range
Mean, median, mode, modality, skewness, kurtosis
Variance, standard deviation, coefficient of variation
A median is the
middle value (median = 5 in this data set: 1, 3, 3, 5, 6, 7, 8)
The mode represents
The most commonly observed value (mode = 3 in this data set: 1, 3, 3, 5, 6, 7, 8)
The mean centre is the
Measure of the average of the X & Y coordinates (think political graph; where do Canadians average on the graph?)
What is the weighted mean centre?
The mean centre but accounting for the weight of each coordinate
What is problematic about the mean centre?
The mean centre is sensitive to outliers
What does the median centre do?
The median centre takes the point where distances to every point is the lowest (best place to have a neighborhood hub)
The median centre is also known as
The Webber Point or the point of Minimum Aggregate Travel (MAT)
The measure of central tendancies can be problematic because
it can hide the range of the data. Both sets come to a mean of 50 (45, 50, 55) (10, 50, 90)
Variance is
The mean of the squared differences from the mean
Standard deviation is
The square root of the variance
What are some flaws of standard deviation?
Standard deviation is an absolute measure so it is unit specific
The coefficient of variation is
Measures of the standard deviation divided by the mean and reported as a percentage
A standard deviation ellipse is when
You take the standard deviation of each the Y and X
Kurtosis is
how flat the distribution is (PLM)
Platykurtic is
a flat kurtosis (think Plat = Flat)
Leptokurtic is
a sharp kurtosis
Mesokurtic is
A normal kurtosis
Skewness is
how the highest value of a differs from the centre value
Positive skewness and negative skewness differ because
positive starts off high and negative ends off high