Data Analysis week 2 Flashcards
Why would we want to visualize data
To show patterns in the data and to summarize large quantities of data
When is a datapoint considered as an oulier in R
If a point is more than 1.5 interquartile ranges lower than the first or larger than the third quartile.
What is the difference between a stripchart and a beeswarm
The point density of the beeswarm is displayed in a better way. The amount of jitter and the y of the datapoints is adjusted based on the point density. In a stripchart, the datapoints have a random y.
What properties of data does a boxplot show
The range, the quartiles, and outliers
What does the box of the boxplot itself represent
The interquartile range (the middle 50% of the data). The left of the box is the first quartile and the right of the box is the third quartile. The middle line is the second quartile, the median.
What are three ways of transforming data
Binning, log-transform and logit transform
What are four binning methods
Equal-width binning, frequancy binning, custom binning and quantile binning
What is binning
Binning turns numeric data into ordered data and it divides continuous numeric data into intervals (bins)
What does log-transform do (especially to the axis)
Shows data on a log scale. If the axis says 3.5, the actual value is 10^3.5.
Why would you want to transform data
It can reveal more information about skewed data that otherwise has ‘weird’ distributions. For example when one extreme outlier hides the rest of the data.
What does logit transfrom do
Is applied to fractions. Can show small and large fractions in the same graph.
What are three unwritten rules for visualizing data
You should maintain graphical integrity, the data-ink ratio should be good and you shouldn’t have chartjunk.
What are two points in maintaining graphical integrity
You should preserve proportionality and barcharts should start at 0.