Chapter 2: Data Visualization Flashcards
What is the correct way to make the points blue in a scatterplot?
- ggplot(persinj, aes(x = op_time, y = total, color = “blue”) + geom_point()
- ggplot(persinj, aes(x = op_time, y = total)) + geom_point(colour = “blue)
- is the correct way
remember that the aes() function is used to map variables in our dataset to visual properties of the graph. If we were to use the first choice, the function would look for a variable “blue” in our dataset, that doesn’t exist.
The mistake is setting an aesthetic to a constant value
T/F: the aesthetics determine what relationships we want to see in the plot and the geoms determine how we want to see these relationships.
T
How could we use the “color” aesthetic the right way in a ggplot?
we could use color = , alongside a comparison of two numeric variables
it will make the points on the graph different colours depending on the value of the factor variable
can you set colour = factor variable without converting it to a factor variable first (in the aesthetics mapping)
no, you have to convert it as a factor using factor() function
what are two common arguments of the geom_smooth() ?
method = “lm” and se = FALSE
what are the 4 common arguments used in geom_point()?
color
alpha = how light/dark you want your points to be
shape
size
what are the 2 common arguments of geom_bar()?
fill
alpha
what are the 3 common arguments for geom_histogram()?
fill
alpha
bins = number of bins you want to use.
what curve is usually used with geom_point?
geom_smooth(method = “lm”, se = FALSE)
you are comparing 3 variables. two are numeric and one is a factor variable.
you want to make the colour of the points different based on the value of the factor variable.
you use geom_point() and geom_smooth() with se bounds.
how could you make sure that the se lines are filled in with the corresponding factor value (consistent with the points)?
ggplot(dataset, x = numeric, y = numeric, color = factor(categorical), fill = factor(categorical)) + geom_point() + geom_smooth(method = “lm”, se = TRUE)
consistent colouring between the SE lines and the smoothed curve. This is because we assigned the factor variable to both fill and colour.
you are comparing 3 variables. two are numeric and one is a factor variable.
you want to make the colour of the points different based on the value of the factor variable.
you use geom_point() and geom_smooth() with an se bound.
how could you make sure that there is only one smoothed line and one se bound for all data points?
ggplot(dataset, aes(x = numeric var, y = numeric var)) + geom_point(aes(color = factor(categorical)) + geom_smooth(method = “lm”, se = TRUE)
- take it out of ggplot
- put it into geom_point as an aesthetic because we want the points to still be different colours
- se = TRUE in the geom_smooth
T/F: aesthetic mappings in the original ggplot function will not be inherited to all geom functions.
False. they are
What is faceting? What output does it provide when used? What function is used?
it’s a convenient way to categorize our data into distinct groups based on the value of our categorical predictors.
faceting displays the observations in separate plots produced for each value of the faceting variable placed side-by-side to ease comparison
the function used is facet_wrap()
How do you use the facet_wrap() function?
ggplot(dataset, aes(x = numeric, y = numeric, color = factor(categorical)) + geom_point() + geom_smooth(method = “lm”, se = TRUE)
+ facet wrap(~ FACET VAR, ncol = n)
facet_wrap(~ FACET VAR, ncol = n) is added to the end of a ggplot
by default, the two graphs will be on the same scale so that they are easy to compare
How do you add titles to a ggplot?
ggplot(dataset, aes(x = numeric, y = numeric, color = factor(categorical)) + geom_point() + geom_smooth(method = “lm”, se = TRUE)
+ labs( x = “x axis title”, y = “y axis title”, main = “main title”)
using the labs() function