Week 4: Data Visualisation and Transformation Flashcards
REVERSED
- Maximise data-to-ink ratio
- Present more data without losing interpretability
- Use levels of detail
What are 3 distilled principles from Tufte?
REVERSED
geom_point() #scatterplot
geom_density_2d() #contour lines from x and y positions
geom_histogram() #histogram, use binwidth to change size of bins
geom_density() #density plot
geom_rug() #adds rug marks of raw data on the axes
geom_boxplot() #boxplot
geom_bar() #barplot, automatically transforms variables to counts using stat_count
geom_col() #column graph, need x and y variables
geom_line() #line chart
geom_label(aes(x=, y=, label=)) #gives a label at the specified x and y position
geom_freqpoly() #same as histogram but uses lines instead of bars, good for overlapping data
geom_area() #area chart
geom_smooth(method=‘lm’) #draws line of best fit. se=TRUE is default and displays the standard error lines aswell
geom_abline() #adds reference line, default is from data
geom_ribbon(x, ymin, ymax) #draws line with lower and upper bounds e.g. confidence interval
What are some useful geoms? (15)
REVERSED
filter(!is.na(data))
How do you remove the missing values from a dataframe using filter?
REVERSED
get the residual and plot the data against the residual
How do you “flatten” a graph?
REVERSED
more data with less ink on the page
What is a high data-ink ratio?
REVERSED
ungroup()
How do you remove a grouping to return to operations on ungrouped variables?
REVERSED
+ xlim(0,5)
+ coord_cartesian(xlim = c( , ))
this way doesn’t throw away the data outside the limits, used to zoom
How do you set axis limits in 2 ways?
REVERSED
%in% #in
between(x, left, right) #finds rows where x is between left and right
How do you use in and between in R?
REVERSED
Using aes(group= ) is the same as mapping a variable onto an aesthetic, it separates based on the discrete variable but does not specify the difference with a legend on the graph
What does the “group” mapping do in a geom?
REVERSED
anything from the input can be called in server using input$…
How do you call something from the input in the server in shiny?
REVERSED
is.na()
How do you determine if a value is a missing value in R?
# Define a plot called distplot in the server using output$distplot. In ui.R can call directly with distplot
How do you define something e.g. a plot in the server and how do you refer to it in the ui in shiny?
REVERSED
mutate(data, new column = …)
How do you add a new column to a dataframe?
REVERSED
nrow =
ncol =
specifies number of rows or columns
What are additional arguments to facet_wrap? (2)
REVERSED
The output depends on the users input. all output must be wrapped in a renderSomething function, this watches the input and updates accordingly
What is reactivity in Shiny?
REVERSED
min_rank(x) #gives 1 to the lowest value and so on
min_rank(desc(x)) #gives 1 to the highest value
How do you give a ranking to values in a vector x?
REVERSED
filter(dataframe, value1==1, value2==4)
How do you select only certain rows of a dataframe based on their values?
REVERSED
geom_count, creates a graph with different size dots at each intersection
geom_tile with fill aesthetic darker depending on combinations of each value:
data %>% count(variable1, variable2) %>% ggplot(aes(x = variable1, y = variable2)) + geom_tile(aes(fill = n))
What is a good way to display two categorical variables?
REVERSED
The mapping in the geom overrides the global mapping in ggplot
If you have different mappings in ggplot() and geom(aes()), which one will override?
REVERSED
read_csv(‘filepath’)
How do you load a csv into a dataframe in R?
REVERSED
As the number of points and number of categories increases, facet grid becomes better
When should you use facet grid over mapping by colour?
REVERSED
facet_wrap(~variable)
create formula with ~, variable should be discrete
How do you use facet wrap?
REVERSED
Facet of 4 graphs with very different data but the same line of best fit, which has the same linear regression coefficient 0.7
What is Ansocombe’s quartet?
REVERSED
mutate(column = ifelse(test, value1, value2))
test is a logical vector ie column<5
value 1 is what to do when test is true
value 2 is what to do when test is false
How do you replace unusual values with missing values in a dataframe?
REVERSED
Yes to colour, gives continuous scale. No to shape
Can you map continuous variables to colour and shape?
REVERSED
cut(variable, breaks=(0,10,20), labels = c(“”, “”))
How do you split a variable into different sections and label the sections?
REVERSED
+ scale_y_log10() + scale_x_log10()
How do you “straighten” a graph?
REVERSED
geom_freqploy(aes(x=price, y=..density..))
How do you get a freqpoly to display density instead of count?
REVERSED
cumsum()
How do you do a cumulative sum?
REVERSED
E.g. colour = variable <5. Gives true or false for colours
How do you map an aesthetic to a function
REVERSED
- position on a common scale
- position on an unaligned scale
- length
- tilt/angle
- area
- depth
- colour luminance/colour saturation
- curvature/volume
What is the ranking of effectiveness for ordered attributes? (8)
REVERSED
data %>% group_by(variable) %>% summarise(newname = mean(distance, na.rm=TRUE))
How do you get a grouped summary (mean for this question) of a variable grouped by another variable?
REVERSED
- Unhelpful errors
- Dependency hell
- Unreliable back end
- Need to run R on a server
- Reactivity can be slow
What are the disadvantages of r shiny? (5)
REVERSED
sum(x>10) gives the number of TRUE’s in x>10
mean(x>10) gives the proportion of TRUEs in x>10
How do you get a count of how many x>10 in a variable? How do you get a proportion?
REVERSED
arrange(data, desc(column1))
How do you arrange rows in descending order?
REVERSED
library(tidyverse)
What package is ggplot2 in?
REVERSED
mean ()
sd()
How do you get mean and standard deviation?
REVERSED
colour = NA
How do you change the colour to no colour?
REVERSED
contains everything which should be done in the background
What is contained in the server.R file in shiny?
REVERSED
Put the variable with more levels in the columns
Using facet grid, which variable should you put in the rows and which in the columns?
REVERSED
arrange(data, column1, column2) #orders by column 1 then column 2 in ascending order
How do you change the order of rows?
REVERSED
Excludes them automatically
What does filter do with NA values?
REVERSED
- electric shock
- saturation
- length
- area
- depth
- brightness
What is the order of how humans perceive sensations? (6)
REVERSED
Gives NA when there are missing values
Use na.rm=TRUE to remove missing values prior to calculation
What do aggregate functions do when there are missing values? How do you change the default?
REVERSED
transmute(data, newcolumn = …)
How do you add a new column to a dataframe and keep only the new columns?
REVERSED
geom_point (scatterplot)
geom_bin2d and geom_hex() divide plane into 2d bins and use to colour to display how many points in each bin
use boxplot and divide one continuous variable into a categorical: geom_boxplot(aes(group = cut_width(x, width))
What is a good way to display two continuous variables? (3)
REVERSED
library(GGally)
ggpairs(data, columns= c(‘column1’, ‘column2’, ‘column3’)]
What is a different way of creating a matrix of plots?
REVERSED
- spatial region
- colour hue
- motion
- shape
What is the ranking of effectiveness for categorical attributes? (4)
REVERSED
provides schematic for how the application will be presented to the user. e.g. titlePanel(), sidebarLayout(), sidebarPanel(), mainPanel()
What is contained in the ui.R file in shiny?
REVERSED
, >=, !=, ==
What are the comparison operators in R?
REVERSED
If it gives information about a variable it goes inside the aes(). If not it must go outside the aes()
Does the aesthetic mapping go inside or outside the aes()?
REVERSED
tibble(column1= , column2= )
data.frame(column1=, column2= )
can use as.character(column1= )
data.frame automatically sets characters as factors unless you set stringsAsFactors=FALSE
What are 2 codes to make vectors into a dataframe?
How do you specify the type?
What is a difference between the 2 codes?
REVERSED
Facets are subplots that each display a subset of the data
What is a facet?
REVERSED
- Aesthetics (position, shape, colour, …)
- Geometric objects (points, lines, bars, …)
- Scales (continuous, discrete, Cartesian coordinates, …)
- Facets (small multiples)
- Statistical transformation (identity, binning, median, …)
- Coordinate system (Cartesian, polar, parallel, …)
What are the layers of Wickhams grammar of graphics? (6)
REVERSED
Like a boxplot but shows the full density plots
What is a violin plot?
REVERSED
!!vector #bang-bang operator, matches to names of variables in vector
all_of(vector)
any_of(vector)
#in all of, all names must be present. in any of they don’t
How do you select columns by matching them to names contained in a string vector? (3 ways)
REVERSED
- Additional groups will go unplotted
How many shapes can you plot as a mapping? What happens if there are more than that many factors?
REVERSED
+ labs(y= “”, x= “”)
How do you label the axes?
REVERSED
geom_point(position=“jitter”) OR geom_jitter()
What are two ways to add jitter to a plot?
REVERSED
- Proximity: things that are spatially near to one another seem to be related
- Similarity: things that look alike seem to be related
- Connection: things that are visually tied to one another seem to be related
- Continuity: partially hidden objects are completed into familiar shapes
- Closure: incomplete shapes are perceived as complete
- Figure and ground: visual elements are taken to be either in the foreground or in the background
- Common fate: elements sharing a direction of movement are perceived as a unit
What are the Gestalt principles of relatedness? (7)
REVERSED
& #and
| #or
! #not
What are the logical operators in R?
REVERSED
coord_cartesian() #default
coord_flip() #switches the x and y axis
coord_map() #sets the aspect ratio correctly for maps
coord_quickmap() #sets the aspect ratio correctly for maps, quicker
coord_polar() #uses polar coordinates. use argument (theta=“y”) to create a pie chart. otherwise you get a bulls-eye chart
coord_fixed() #forces a specified ratio between the axes, default is 1
What are 6 coordinate systems in ggplot?
REVERSED
boxplot
freqpoly, map by colour
What is a good way to display a categorical variable vs a continuous variable?
REVERSED
count(variable1, variable2)
gives number of combinations in a table
How do you get the counts of each combination when comparing 2 categorical variables?
REVERSED
ui
What is the structure of a shiny app?
REVERSED
How humans perceive sensations compared to how they actually change
What is Steven’s Psychophysical Power Law?
REVERSED
Can’t be at the beginning of a line. Must be at end of previous line
When using the + in ggplot, where in the line must it go?
REVERSED
data[1:200,]
code to keep the first 200 rows and all columns of a dataframe?
REVERSED
ggplot(data = ) +
geom(mapping=aes(), stat=, position=) +
+
What is the general format of a ggplot code?
REVERSED
rename(data, newname = variable1)
How do you rename a variable?
REVERSED
show.legend = FALSE
How do you remove the legend?
REVERSED
Uses statistical transformation stat_count() to find the count of each variable. Can override using stat = “identity” for example to use a y value instead of count
How does geom_bar get the count for each variable? How do you override it?
REVERSED
facet_grid(variable1~variable2)
How do you facet a plot on a combination of 2 variables?
REVERSED
ggplot(data) + geom_boxplot(aes(x = reorder(xvariable, yvariable, FUN = median), y = yvariable))
How do you reorder boxplots by median?
REVERSED
geom_bar creates is own bin for NAs
geom_histogram removes missing values
What happens to missing values in geom_bar and geom_histogram?
REVERSED
max(variable) #gives max value
which.max(variable) #gives location of max value
How do you get the max value of a variable and the location of the max value?
REVERSED
bar chart
What is a good way to visualise a single categorical variable?
REVERSED
select(data, variable1, variable2)
How do you select only certain columns of a dataframe?
REVERSED
The bars are automatically stacked with colours for each object in the variable (position = “stack”)
position = “identity” #places each object exactly where it falls in context of graph. Will include overlapping so should use alpha position = “fill” #works like stacking but makes each set of stacked bars the same height. Makes it easier to compare proportions across groups position = “dodge” #places overlapping objects beside each other. Makes it easier to compare individual values
What happens when you map an aesthetic (colour) to a bar chart? What 3 ways are there to change how this looks?
REVERSED
lead(x) removes first number and adds NA to end
lag(x) adds NA to beginning and removes last number
What do lead(x) and lag(x) do to a vector x?
REVERSED
select(data, variable5, variable7, everything()) #moves variables 5 and 7 to the start but still keeps all others
How do you rearrange the columns to put certain columns at the start?
REVERSED
colour #changes colour,
size #changes size
alpha #changes transparency between 0 and 1
shape #changes shape
linetype #from 0-6. 0 is blank, 1 is solid, others are dotted or dashed
fill #fill colour of shape or curve
stroke #modify width of border. E.g. on shapes
fontface #character “plain”, “bold”, “italic” “bold.italic”
group #groups by a specified variable
What are some possible aesthetic mappings? (9)
REVERSED
histogram, puts continuous variables into bins density plot (smoothed histogram)
What is a good way to visualise a single continuous variable? (2)
REVERSED
select(data, -(variable1))
How do you exclude a column from a dataframe?
REVERSED
%/% (integer division)
%% (remainder)
What are the modular arithmetic symbols?
REVERSED
table(variable)
How do you create a table with a count for each value in a variable?
REVERSED
starts_with(“abc”) c
ends_with(“abc”)
contains(“abc”)
matches(“expression”)
How do you select column that start, with, end with, contain or match a string?