Week 4: Data Visualisation and Transformation Flashcards
How many shapes can you plot as a mapping? What happens if there are more than that many factors?
- Additional groups will go unplotted
What are 2 codes to make vectors into a dataframe?
How do you specify the type?
What is a difference between the 2 codes?
tibble(column1= , column2= )
data.frame(column1=, column2= )
can use as.character(column1= )
data.frame automatically sets characters as factors unless you set stringsAsFactors=FALSE
How do you give a ranking to values in a vector x?
can also use rank, is equivalent
min_rank(x) #gives 1 to the lowest value and so on
min_rank(desc(x)) #gives 1 to the highest value
How do you set axis limits in 2 ways?
+ xlim(0,5)
+ coord_cartesian(xlim = c( , ))
this way doesn’t throw away the data outside the limits, used to zoom
How do you define something e.g. a plot in the server and how do you refer to it in the ui in shiny?
# Define a plot called distplot in the server using output$distplot. In ui.R can call directly with distplot
code to keep the first 200 rows and all columns of a dataframe?
data[1:200,]
What is a good way to display a categorical variable vs a continuous variable?
boxplot
freqpoly, map by colour
When should you use facet grid over mapping by colour?
As the number of points and number of categories increases, facet grid becomes better
What are the layers of Wickhams grammar of graphics? (6)
- Aesthetics (position, shape, colour, …)
- Geometric objects (points, lines, bars, …)
- Scales (continuous, discrete, Cartesian coordinates, …)
- Facets (small multiples)
- Statistical transformation (identity, binning, median, …)
- Coordinate system (Cartesian, polar, parallel, …)
How do you replace unusual values with missing values in a dataframe?
mutate(column = ifelse(test, value1, value2))
test is a logical vector ie column<5
value 1 is what to do when test is true
value 2 is what to do when test is false
How do you do a cumulative sum?
cumsum()
What is a high data-ink ratio?
more data with less ink on the page
How do you get a freqpoly to display density instead of count?
geom_freqploy(aes(x=price, y=..density..))
What are the modular arithmetic symbols?
%/% (integer division)
%% (remainder)
How do you reorder boxplots by median?
ggplot(data) + geom_boxplot(aes(x = reorder(xvariable, yvariable, FUN = median), y = yvariable))
What is a good way to display two categorical variables?
geom_count, creates a graph with different size dots at each intersection
geom_tile with fill aesthetic darker depending on combinations of each value:
data %>% count(variable1, variable2) %>% ggplot(aes(x = variable1, y = variable2)) + geom_tile(aes(fill = n))
What is a different way of creating a matrix of plots?
library(GGally)
ggpairs(data, columns= c(‘column1’, ‘column2’, ‘column3’)]
How do you remove a grouping to return to operations on ungrouped variables?
ungroup()
What do aggregate functions do when there are missing values? How do you change the default?
Gives NA when there are missing values
Use na.rm=TRUE to remove missing values prior to calculation
What is the general format of a ggplot code?
ggplot(data = ) +
geom(mapping=aes(), stat=, position=) +
+
What are the Gestalt principles of relatedness? (7)
- Proximity: things that are spatially near to one another seem to be related
- Similarity: things that look alike seem to be related
- Connection: things that are visually tied to one another seem to be related
- Continuity: partially hidden objects are completed into familiar shapes
- Closure: incomplete shapes are perceived as complete
- Figure and ground: visual elements are taken to be either in the foreground or in the background
- Common fate: elements sharing a direction of movement are perceived as a unit
What is the ranking of effectiveness for categorical attributes? (4)
- spatial region
- colour hue
- motion
- shape
What happens to missing values in geom_bar and geom_histogram?
geom_bar creates is own bin for NAs
geom_histogram removes missing values
What are the logical operators in R?
& #and
| #or
! #not
Using facet grid, which variable should you put in the rows and which in the columns?
Put the variable with more levels in the columns
What is the ranking of effectiveness for ordered attributes? (8)
- position on a common scale
- position on an unaligned scale
- length
- tilt/angle
- area
- depth
- colour luminance/colour saturation
- curvature/volume
How do you arrange rows in descending order?
arrange(data, desc(column1))
How do you get mean and standard deviation?
mean ()
sd()
How do you remove the missing values from a dataframe using filter?
filter(!is.na(data))
What package is ggplot2 in?
library(tidyverse)
How do you change the colour to no colour?
colour = NA
When using the + in ggplot, where in the line must it go?
Can’t be at the beginning of a line. Must be at end of previous line
What does filter do with NA values?
Excludes them automatically
How do you add a new column to a dataframe and keep only the new columns?
transmute(data, newcolumn = …)