Week 4: Data Visualisation and Transformation Flashcards
How many shapes can you plot as a mapping? What happens if there are more than that many factors?
- Additional groups will go unplotted
What are 2 codes to make vectors into a dataframe?
How do you specify the type?
What is a difference between the 2 codes?
tibble(column1= , column2= )
data.frame(column1=, column2= )
can use as.character(column1= )
data.frame automatically sets characters as factors unless you set stringsAsFactors=FALSE
How do you give a ranking to values in a vector x?
can also use rank, is equivalent
min_rank(x) #gives 1 to the lowest value and so on
min_rank(desc(x)) #gives 1 to the highest value
How do you set axis limits in 2 ways?
+ xlim(0,5)
+ coord_cartesian(xlim = c( , ))
this way doesn’t throw away the data outside the limits, used to zoom
How do you define something e.g. a plot in the server and how do you refer to it in the ui in shiny?
# Define a plot called distplot in the server using output$distplot. In ui.R can call directly with distplot
code to keep the first 200 rows and all columns of a dataframe?
data[1:200,]
What is a good way to display a categorical variable vs a continuous variable?
boxplot
freqpoly, map by colour
When should you use facet grid over mapping by colour?
As the number of points and number of categories increases, facet grid becomes better
What are the layers of Wickhams grammar of graphics? (6)
- Aesthetics (position, shape, colour, …)
- Geometric objects (points, lines, bars, …)
- Scales (continuous, discrete, Cartesian coordinates, …)
- Facets (small multiples)
- Statistical transformation (identity, binning, median, …)
- Coordinate system (Cartesian, polar, parallel, …)
How do you replace unusual values with missing values in a dataframe?
mutate(column = ifelse(test, value1, value2))
test is a logical vector ie column<5
value 1 is what to do when test is true
value 2 is what to do when test is false
How do you do a cumulative sum?
cumsum()
What is a high data-ink ratio?
more data with less ink on the page
How do you get a freqpoly to display density instead of count?
geom_freqploy(aes(x=price, y=..density..))
What are the modular arithmetic symbols?
%/% (integer division)
%% (remainder)
How do you reorder boxplots by median?
ggplot(data) + geom_boxplot(aes(x = reorder(xvariable, yvariable, FUN = median), y = yvariable))
What is a good way to display two categorical variables?
geom_count, creates a graph with different size dots at each intersection
geom_tile with fill aesthetic darker depending on combinations of each value:
data %>% count(variable1, variable2) %>% ggplot(aes(x = variable1, y = variable2)) + geom_tile(aes(fill = n))
What is a different way of creating a matrix of plots?
library(GGally)
ggpairs(data, columns= c(‘column1’, ‘column2’, ‘column3’)]
How do you remove a grouping to return to operations on ungrouped variables?
ungroup()
What do aggregate functions do when there are missing values? How do you change the default?
Gives NA when there are missing values
Use na.rm=TRUE to remove missing values prior to calculation
What is the general format of a ggplot code?
ggplot(data = ) +
geom(mapping=aes(), stat=, position=) +
+
What are the Gestalt principles of relatedness? (7)
- Proximity: things that are spatially near to one another seem to be related
- Similarity: things that look alike seem to be related
- Connection: things that are visually tied to one another seem to be related
- Continuity: partially hidden objects are completed into familiar shapes
- Closure: incomplete shapes are perceived as complete
- Figure and ground: visual elements are taken to be either in the foreground or in the background
- Common fate: elements sharing a direction of movement are perceived as a unit
What is the ranking of effectiveness for categorical attributes? (4)
- spatial region
- colour hue
- motion
- shape
What happens to missing values in geom_bar and geom_histogram?
geom_bar creates is own bin for NAs
geom_histogram removes missing values
What are the logical operators in R?
& #and
| #or
! #not
Using facet grid, which variable should you put in the rows and which in the columns?
Put the variable with more levels in the columns
What is the ranking of effectiveness for ordered attributes? (8)
- position on a common scale
- position on an unaligned scale
- length
- tilt/angle
- area
- depth
- colour luminance/colour saturation
- curvature/volume
How do you arrange rows in descending order?
arrange(data, desc(column1))
How do you get mean and standard deviation?
mean ()
sd()
How do you remove the missing values from a dataframe using filter?
filter(!is.na(data))
What package is ggplot2 in?
library(tidyverse)
How do you change the colour to no colour?
colour = NA
When using the + in ggplot, where in the line must it go?
Can’t be at the beginning of a line. Must be at end of previous line
What does filter do with NA values?
Excludes them automatically
How do you add a new column to a dataframe and keep only the new columns?
transmute(data, newcolumn = …)
How do you rename a variable?
rename(data, newname = variable1)
How do you get a count of how many x>10 in a variable? How do you get a proportion?
sum(x>10) gives the number of TRUE’s in x>10
mean(x>10) gives the proportion of TRUEs in x>10
How do you select only certain columns of a dataframe?
select(data, variable1, variable2)
How do you create a table with a count for each value in a variable?
table(variable)
What is Steven’s Psychophysical Power Law?
How humans perceive sensations compared to how they actually change
What are some useful geoms? (15)
geom_point() #scatterplot
geom_density_2d() #contour lines from x and y positions
geom_histogram() #histogram, use binwidth to change size of bins
geom_density() #density plot
geom_rug() #adds rug marks of raw data on the axes
geom_boxplot() #boxplot
geom_bar() #barplot, automatically transforms variables to counts using stat_count
geom_col() #column graph, need x and y variables
geom_line() #line chart
geom_label(aes(x=, y=, label=)) #gives a label at the specified x and y position
geom_freqpoly() #same as histogram but uses lines instead of bars, good for overlapping data
geom_area() #area chart
geom_smooth(method=‘lm’) #draws line of best fit. se=TRUE is default and displays the standard error lines aswell
geom_abline() #adds reference line, default is from data
geom_ribbon(x, ymin, ymax) #draws line with lower and upper bounds e.g. confidence interval
What are two ways to add jitter to a plot?
geom_point(position=“jitter”) OR geom_jitter()
How do you use facet wrap?
facet_wrap(~variable)
create formula with ~, variable should be discrete
How do you get a grouped summary (mean for this question) of a variable grouped by another variable?
data %>% group_by(variable) %>% summarise(newname = mean(distance, na.rm=TRUE))
How do you remove the legend?
show.legend = FALSE
What is a good way to display two continuous variables? (3)
geom_point (scatterplot)
geom_bin2d and geom_hex() divide plane into 2d bins and use to colour to display how many points in each bin
use boxplot and divide one continuous variable into a categorical: geom_boxplot(aes(group = cut_width(x, width))
What is a good way to visualise a single continuous variable? (2)
histogram, puts continuous variables into bins density plot (smoothed histogram)
How do you select columns by matching them to names contained in a string vector? (3 ways)
!!vector #bang-bang operator, matches to names of variables in vector
all_of(vector)
any_of(vector)
#in all of, all names must be present. in any of they don’t
What does the “group” mapping do in a geom?
Using aes(group= ) is the same as mapping a variable onto an aesthetic, it separates based on the discrete variable but does not specify the difference with a legend on the graph
What is a good way to visualise a single categorical variable?
bar chart
How do you rearrange the columns to put certain columns at the start?
select(data, variable5, variable7, everything()) #moves variables 5 and 7 to the start but still keeps all others
How do you facet a plot on a combination of 2 variables?
facet_grid(variable1~variable2)
How do you get the counts of each combination when comparing 2 categorical variables?
count(variable1, variable2)
gives number of combinations in a table
How do you exclude a column from a dataframe?
select(data, -(variable1))
How does geom_bar get the count for each variable? How do you override it?
Uses statistical transformation stat_count() to find the count of each variable. Can override using stat = “identity” for example to use a y value instead of count
How do you change the order of rows?
arrange(data, column1, column2) #orders by column 1 then column 2 in ascending order
Can you map continuous variables to colour and shape?
Yes to colour, gives continuous scale. No to shape
What are 6 coordinate systems in ggplot?
coord_cartesian() #default
coord_flip() #switches the x and y axis
coord_map() #sets the aspect ratio correctly for maps
coord_quickmap() #sets the aspect ratio correctly for maps, quicker
coord_polar() #uses polar coordinates. use argument (theta=“y”) to create a pie chart. otherwise you get a bulls-eye chart
coord_fixed() #forces a specified ratio between the axes, default is 1
How do you “flatten” a graph?
get the residual and plot the data against the residual
How do you map an aesthetic to a function
E.g. colour = variable <5. Gives true or false for colours
How do you add a new column to a dataframe?
mutate(data, new column = …)
How do you “straighten” a graph?
+ scale_y_log10() + scale_x_log10()
What are some possible aesthetic mappings? (9)
colour #changes colour,
size #changes size
alpha #changes transparency between 0 and 1
shape #changes shape
linetype #from 0-6. 0 is blank, 1 is solid, others are dotted or dashed
fill #fill colour of shape or curve
stroke #modify width of border. E.g. on shapes
fontface #character “plain”, “bold”, “italic” “bold.italic”
group #groups by a specified variable
What is the structure of a shiny app?
ui
What is Ansocombe’s quartet?
Facet of 4 graphs with very different data but the same line of best fit, which has the same linear regression coefficient 0.7
How do you select column that start, with, end with, contain or match a string?
starts_with(“abc”) c
ends_with(“abc”)
contains(“abc”)
matches(“expression”)
What are the disadvantages of r shiny? (5)
- Unhelpful errors
- Dependency hell
- Unreliable back end
- Need to run R on a server
- Reactivity can be slow
If you have different mappings in ggplot() and geom(aes()), which one will override?
The mapping in the geom overrides the global mapping in ggplot
What are the comparison operators in R?
, >=, !=, ==
How do you label the axes?
+ labs(y= “”, x= “”)
What is contained in the ui.R file in shiny?
provides schematic for how the application will be presented to the user. e.g. titlePanel(), sidebarLayout(), sidebarPanel(), mainPanel()
What is contained in the server.R file in shiny?
contains everything which should be done in the background
What is a violin plot?
Like a boxplot but shows the full density plots
What is the order of how humans perceive sensations? (6)
- electric shock
- saturation
- length
- area
- depth
- brightness
What do lead(x) and lag(x) do to a vector x?
lead(x) removes first number and adds NA to end
lag(x) adds NA to beginning and removes last number
What is a facet?
Facets are subplots that each display a subset of the data
Does the aesthetic mapping go inside or outside the aes()?
If it gives information about a variable it goes inside the aes(). If not it must go outside the aes()
How do you call something from the input in the server in shiny?
anything from the input can be called in server using input$…
How do you select only certain rows of a dataframe based on their values?
filter(dataframe, value1==1, value2==4)
How do you determine if a value is a missing value in R?
is.na()
What are additional arguments to facet_wrap? (2)
nrow =
ncol =
specifies number of rows or columns
How do you get the max value of a variable and the location of the max value?
max(variable) #gives max value
which.max(variable) #gives location of max value
What are 3 distilled principles from Tufte?
- Maximise data-to-ink ratio
- Present more data without losing interpretability
- Use levels of detail
What is reactivity in Shiny?
The output depends on the users input. all output must be wrapped in a renderSomething function, this watches the input and updates accordingly
What happens when you map an aesthetic (colour) to a bar chart? What 3 ways are there to change how this looks?
The bars are automatically stacked with colours for each object in the variable (position = “stack”)
position = “identity” #places each object exactly where it falls in context of graph. Will include overlapping so should use alpha position = “fill” #works like stacking but makes each set of stacked bars the same height. Makes it easier to compare proportions across groups position = “dodge” #places overlapping objects beside each other. Makes it easier to compare individual values
How do you use in and between in R?
%in% #in
between(x, left, right) #finds rows where x is between left and right
How do you split a variable into different sections and label the sections?
cut(variable, breaks=(0,10,20), labels = c(“”, “”))
How do you load a csv into a dataframe in R?
read_csv(‘filepath’)