Week 4: Data Visualisation and Transformation Flashcards

1
Q

How many shapes can you plot as a mapping? What happens if there are more than that many factors?

A
  1. Additional groups will go unplotted
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are 2 codes to make vectors into a dataframe?
How do you specify the type?
What is a difference between the 2 codes?

A

tibble(column1= , column2= )
data.frame(column1=, column2= )

can use as.character(column1= )

data.frame automatically sets characters as factors unless you set stringsAsFactors=FALSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you give a ranking to values in a vector x?

A

can also use rank, is equivalent

min_rank(x) #gives 1 to the lowest value and so on
min_rank(desc(x)) #gives 1 to the highest value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you set axis limits in 2 ways?

A

+ xlim(0,5)

+ coord_cartesian(xlim = c( , ))
this way doesn’t throw away the data outside the limits, used to zoom

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you define something e.g. a plot in the server and how do you refer to it in the ui in shiny?

A
# Define a plot called distplot in the server using output$distplot. 
In ui.R can call directly with distplot
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

code to keep the first 200 rows and all columns of a dataframe?

A

data[1:200,]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a good way to display a categorical variable vs a continuous variable?

A

boxplot
freqpoly, map by colour

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When should you use facet grid over mapping by colour?

A

As the number of points and number of categories increases, facet grid becomes better

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the layers of Wickhams grammar of graphics? (6)

A
  • Aesthetics (position, shape, colour, …)
  • Geometric objects (points, lines, bars, …)
  • Scales (continuous, discrete, Cartesian coordinates, …)
  • Facets (small multiples)
  • Statistical transformation (identity, binning, median, …)
  • Coordinate system (Cartesian, polar, parallel, …)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you replace unusual values with missing values in a dataframe?

A

mutate(column = ifelse(test, value1, value2))
test is a logical vector ie column<5
value 1 is what to do when test is true
value 2 is what to do when test is false

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you do a cumulative sum?

A

cumsum()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a high data-ink ratio?

A

more data with less ink on the page

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you get a freqpoly to display density instead of count?

A

geom_freqploy(aes(x=price, y=..density..))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the modular arithmetic symbols?

A

%/% (integer division)
%% (remainder)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do you reorder boxplots by median?

A

ggplot(data) + geom_boxplot(aes(x = reorder(xvariable, yvariable, FUN = median), y = yvariable))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a good way to display two categorical variables?

A

geom_count, creates a graph with different size dots at each intersection
geom_tile with fill aesthetic darker depending on combinations of each value:
data %>% count(variable1, variable2) %>% ggplot(aes(x = variable1, y = variable2)) + geom_tile(aes(fill = n))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a different way of creating a matrix of plots?

A

library(GGally)
ggpairs(data, columns= c(‘column1’, ‘column2’, ‘column3’)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How do you remove a grouping to return to operations on ungrouped variables?

A

ungroup()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What do aggregate functions do when there are missing values? How do you change the default?

A

Gives NA when there are missing values
Use na.rm=TRUE to remove missing values prior to calculation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the general format of a ggplot code?

A

ggplot(data = ) +
geom(mapping=aes(), stat=, position=) +
+

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the Gestalt principles of relatedness? (7)

A
  • Proximity: things that are spatially near to one another seem to be related
  • Similarity: things that look alike seem to be related
  • Connection: things that are visually tied to one another seem to be related
  • Continuity: partially hidden objects are completed into familiar shapes
  • Closure: incomplete shapes are perceived as complete
  • Figure and ground: visual elements are taken to be either in the foreground or in the background
  • Common fate: elements sharing a direction of movement are perceived as a unit
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the ranking of effectiveness for categorical attributes? (4)

A
  1. spatial region
  2. colour hue
  3. motion
  4. shape
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What happens to missing values in geom_bar and geom_histogram?

A

geom_bar creates is own bin for NAs
geom_histogram removes missing values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are the logical operators in R?

A

& #and
| #or
! #not

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Using facet grid, which variable should you put in the rows and which in the columns?

A

Put the variable with more levels in the columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is the ranking of effectiveness for ordered attributes? (8)

A
  1. position on a common scale
  2. position on an unaligned scale
  3. length
  4. tilt/angle
  5. area
  6. depth
  7. colour luminance/colour saturation
  8. curvature/volume
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

How do you arrange rows in descending order?

A

arrange(data, desc(column1))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

How do you get mean and standard deviation?

A

mean ()
sd()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

How do you remove the missing values from a dataframe using filter?

A

filter(!is.na(data))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What package is ggplot2 in?

A

library(tidyverse)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

How do you change the colour to no colour?

A

colour = NA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

When using the + in ggplot, where in the line must it go?

A

Can’t be at the beginning of a line. Must be at end of previous line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What does filter do with NA values?

A

Excludes them automatically

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

How do you add a new column to a dataframe and keep only the new columns?

A

transmute(data, newcolumn = …)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

How do you rename a variable?

A

rename(data, newname = variable1)

36
Q

How do you get a count of how many x>10 in a variable? How do you get a proportion?

A

sum(x>10) gives the number of TRUE’s in x>10
mean(x>10) gives the proportion of TRUEs in x>10

37
Q

How do you select only certain columns of a dataframe?

A

select(data, variable1, variable2)

38
Q

How do you create a table with a count for each value in a variable?

A

table(variable)

39
Q

What is Steven’s Psychophysical Power Law?

A

How humans perceive sensations compared to how they actually change

40
Q

What are some useful geoms? (15)

A

geom_point() #scatterplot
geom_density_2d() #contour lines from x and y positions
geom_histogram() #histogram, use binwidth to change size of bins
geom_density() #density plot
geom_rug() #adds rug marks of raw data on the axes
geom_boxplot() #boxplot
geom_bar() #barplot, automatically transforms variables to counts using stat_count
geom_col() #column graph, need x and y variables
geom_line() #line chart
geom_label(aes(x=, y=, label=)) #gives a label at the specified x and y position
geom_freqpoly() #same as histogram but uses lines instead of bars, good for overlapping data
geom_area() #area chart
geom_smooth(method=‘lm’) #draws line of best fit. se=TRUE is default and displays the standard error lines aswell
geom_abline() #adds reference line, default is from data
geom_ribbon(x, ymin, ymax) #draws line with lower and upper bounds e.g. confidence interval

41
Q

What are two ways to add jitter to a plot?

A

geom_point(position=“jitter”) OR geom_jitter()

42
Q

How do you use facet wrap?

A

facet_wrap(~variable)
create formula with ~, variable should be discrete

43
Q

How do you get a grouped summary (mean for this question) of a variable grouped by another variable?

A

data %>% group_by(variable) %>% summarise(newname = mean(distance, na.rm=TRUE))

44
Q

How do you remove the legend?

A

show.legend = FALSE

45
Q

What is a good way to display two continuous variables? (3)

A

geom_point (scatterplot)

geom_bin2d and geom_hex() divide plane into 2d bins and use to colour to display how many points in each bin

use boxplot and divide one continuous variable into a categorical: geom_boxplot(aes(group = cut_width(x, width))

46
Q

What is a good way to visualise a single continuous variable? (2)

A
histogram, puts continuous variables into bins 
density plot (smoothed histogram)
47
Q

How do you select columns by matching them to names contained in a string vector? (3 ways)

A

!!vector #bang-bang operator, matches to names of variables in vector

all_of(vector)
any_of(vector)
#in all of, all names must be present. in any of they don’t

48
Q

What does the “group” mapping do in a geom?

A

Using aes(group= ) is the same as mapping a variable onto an aesthetic, it separates based on the discrete variable but does not specify the difference with a legend on the graph

49
Q

What is a good way to visualise a single categorical variable?

A

bar chart

50
Q

How do you rearrange the columns to put certain columns at the start?

A
select(data, variable5, variable7, everything()) 
#moves variables 5 and 7 to the start but still keeps all others
51
Q

How do you facet a plot on a combination of 2 variables?

A

facet_grid(variable1~variable2)

52
Q

How do you get the counts of each combination when comparing 2 categorical variables?

A

count(variable1, variable2)
gives number of combinations in a table

53
Q

How do you exclude a column from a dataframe?

A

select(data, -(variable1))

54
Q

How does geom_bar get the count for each variable? How do you override it?

A

Uses statistical transformation stat_count() to find the count of each variable. Can override using stat = “identity” for example to use a y value instead of count

55
Q

How do you change the order of rows?

A
arrange(data, column1, column2) 
#orders by column 1 then column 2 in ascending order
56
Q

Can you map continuous variables to colour and shape?

A

Yes to colour, gives continuous scale. No to shape

57
Q

What are 6 coordinate systems in ggplot?

A

coord_cartesian() #default
coord_flip() #switches the x and y axis
coord_map() #sets the aspect ratio correctly for maps
coord_quickmap() #sets the aspect ratio correctly for maps, quicker
coord_polar() #uses polar coordinates. use argument (theta=“y”) to create a pie chart. otherwise you get a bulls-eye chart
coord_fixed() #forces a specified ratio between the axes, default is 1

58
Q

How do you “flatten” a graph?

A

get the residual and plot the data against the residual

59
Q

How do you map an aesthetic to a function

A

E.g. colour = variable <5. Gives true or false for colours

60
Q

How do you add a new column to a dataframe?

A

mutate(data, new column = …)

61
Q

How do you “straighten” a graph?

A

+ scale_y_log10() + scale_x_log10()

62
Q

What are some possible aesthetic mappings? (9)

A

colour #changes colour,
size #changes size
alpha #changes transparency between 0 and 1
shape #changes shape
linetype #from 0-6. 0 is blank, 1 is solid, others are dotted or dashed
fill #fill colour of shape or curve
stroke #modify width of border. E.g. on shapes
fontface #character “plain”, “bold”, “italic” “bold.italic”
group #groups by a specified variable

63
Q

What is the structure of a shiny app?

A

ui

64
Q

What is Ansocombe’s quartet?

A

Facet of 4 graphs with very different data but the same line of best fit, which has the same linear regression coefficient 0.7

65
Q

How do you select column that start, with, end with, contain or match a string?

A

starts_with(“abc”) c
ends_with(“abc”)
contains(“abc”)
matches(“expression”)

66
Q

What are the disadvantages of r shiny? (5)

A
  • Unhelpful errors
  • Dependency hell
  • Unreliable back end
  • Need to run R on a server
  • Reactivity can be slow
67
Q

If you have different mappings in ggplot() and geom(aes()), which one will override?

A

The mapping in the geom overrides the global mapping in ggplot

68
Q

What are the comparison operators in R?

A

, >=, !=, ==

69
Q

How do you label the axes?

A

+ labs(y= “”, x= “”)

70
Q

What is contained in the ui.R file in shiny?

A

provides schematic for how the application will be presented to the user. e.g. titlePanel(), sidebarLayout(), sidebarPanel(), mainPanel()

71
Q

What is contained in the server.R file in shiny?

A

contains everything which should be done in the background

72
Q

What is a violin plot?

A

Like a boxplot but shows the full density plots

73
Q

What is the order of how humans perceive sensations? (6)

A
  1. electric shock
  2. saturation
  3. length
  4. area
  5. depth
  6. brightness
74
Q

What do lead(x) and lag(x) do to a vector x?

A

lead(x) removes first number and adds NA to end
lag(x) adds NA to beginning and removes last number

75
Q

What is a facet?

A

Facets are subplots that each display a subset of the data

76
Q

Does the aesthetic mapping go inside or outside the aes()?

A

If it gives information about a variable it goes inside the aes(). If not it must go outside the aes()

77
Q

How do you call something from the input in the server in shiny?

A

anything from the input can be called in server using input$…

78
Q

How do you select only certain rows of a dataframe based on their values?

A

filter(dataframe, value1==1, value2==4)

79
Q

How do you determine if a value is a missing value in R?

A

is.na()

80
Q

What are additional arguments to facet_wrap? (2)

A

nrow =
ncol =
specifies number of rows or columns

81
Q

How do you get the max value of a variable and the location of the max value?

A

max(variable) #gives max value
which.max(variable) #gives location of max value

82
Q

What are 3 distilled principles from Tufte?

A
  • Maximise data-to-ink ratio
  • Present more data without losing interpretability
  • Use levels of detail
83
Q

What is reactivity in Shiny?

A

The output depends on the users input. all output must be wrapped in a renderSomething function, this watches the input and updates accordingly

84
Q

What happens when you map an aesthetic (colour) to a bar chart? What 3 ways are there to change how this looks?

A

The bars are automatically stacked with colours for each object in the variable (position = “stack”)

position = “identity” #places each object exactly where it falls in context of graph. Will include overlapping so should use alpha 
position = “fill” #works like stacking but makes each set of stacked bars the same height. Makes it easier to compare proportions across groups 
position = “dodge” #places overlapping objects beside each other. Makes it easier to compare individual values
85
Q

How do you use in and between in R?

A

%in% #in

between(x, left, right) #finds rows where x is between left and right

86
Q

How do you split a variable into different sections and label the sections?

A

cut(variable, breaks=(0,10,20), labels = c(“”, “”))

87
Q

How do you load a csv into a dataframe in R?

A

read_csv(‘filepath’)