Week 4: Data Visualisation and Transformation Flashcards

1
Q

REVERSED

  • Maximise data-to-ink ratio
  • Present more data without losing interpretability
  • Use levels of detail
A

What are 3 distilled principles from Tufte?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

REVERSED

geom_point() #scatterplot
geom_density_2d() #contour lines from x and y positions
geom_histogram() #histogram, use binwidth to change size of bins
geom_density() #density plot
geom_rug() #adds rug marks of raw data on the axes
geom_boxplot() #boxplot
geom_bar() #barplot, automatically transforms variables to counts using stat_count
geom_col() #column graph, need x and y variables
geom_line() #line chart
geom_label(aes(x=, y=, label=)) #gives a label at the specified x and y position
geom_freqpoly() #same as histogram but uses lines instead of bars, good for overlapping data
geom_area() #area chart
geom_smooth(method=‘lm’) #draws line of best fit. se=TRUE is default and displays the standard error lines aswell
geom_abline() #adds reference line, default is from data
geom_ribbon(x, ymin, ymax) #draws line with lower and upper bounds e.g. confidence interval

A

What are some useful geoms? (15)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

REVERSED

filter(!is.na(data))

A

How do you remove the missing values from a dataframe using filter?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

REVERSED

get the residual and plot the data against the residual

A

How do you “flatten” a graph?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

REVERSED

more data with less ink on the page

A

What is a high data-ink ratio?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

REVERSED

ungroup()

A

How do you remove a grouping to return to operations on ungrouped variables?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

REVERSED

+ xlim(0,5)

+ coord_cartesian(xlim = c( , ))
this way doesn’t throw away the data outside the limits, used to zoom

A

How do you set axis limits in 2 ways?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

REVERSED

%in% #in

between(x, left, right) #finds rows where x is between left and right

A

How do you use in and between in R?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

REVERSED

Using aes(group= ) is the same as mapping a variable onto an aesthetic, it separates based on the discrete variable but does not specify the difference with a legend on the graph

A

What does the “group” mapping do in a geom?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

REVERSED

anything from the input can be called in server using input$…

A

How do you call something from the input in the server in shiny?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

REVERSED

is.na()

A

How do you determine if a value is a missing value in R?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
# Define a plot called distplot in the server using output$distplot. 
In ui.R can call directly with distplot
A

How do you define something e.g. a plot in the server and how do you refer to it in the ui in shiny?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

REVERSED

mutate(data, new column = …)

A

How do you add a new column to a dataframe?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

REVERSED

nrow =
ncol =
specifies number of rows or columns

A

What are additional arguments to facet_wrap? (2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

REVERSED

The output depends on the users input. all output must be wrapped in a renderSomething function, this watches the input and updates accordingly

A

What is reactivity in Shiny?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

REVERSED

min_rank(x) #gives 1 to the lowest value and so on
min_rank(desc(x)) #gives 1 to the highest value

A

How do you give a ranking to values in a vector x?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

REVERSED

filter(dataframe, value1==1, value2==4)

A

How do you select only certain rows of a dataframe based on their values?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

REVERSED

geom_count, creates a graph with different size dots at each intersection
geom_tile with fill aesthetic darker depending on combinations of each value:
data %>% count(variable1, variable2) %>% ggplot(aes(x = variable1, y = variable2)) + geom_tile(aes(fill = n))

A

What is a good way to display two categorical variables?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

REVERSED

The mapping in the geom overrides the global mapping in ggplot

A

If you have different mappings in ggplot() and geom(aes()), which one will override?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

REVERSED

read_csv(‘filepath’)

A

How do you load a csv into a dataframe in R?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

REVERSED

As the number of points and number of categories increases, facet grid becomes better

A

When should you use facet grid over mapping by colour?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

REVERSED

facet_wrap(~variable)
create formula with ~, variable should be discrete

A

How do you use facet wrap?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

REVERSED

Facet of 4 graphs with very different data but the same line of best fit, which has the same linear regression coefficient 0.7

A

What is Ansocombe’s quartet?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

REVERSED

mutate(column = ifelse(test, value1, value2))
test is a logical vector ie column<5
value 1 is what to do when test is true
value 2 is what to do when test is false

A

How do you replace unusual values with missing values in a dataframe?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
# REVERSED Yes to colour, gives continuous scale. No to shape
Can you map continuous variables to colour and shape?
26
# REVERSED cut(variable, breaks=(0,10,20), labels = c("", ""))
How do you split a variable into different sections and label the sections?
27
# REVERSED + scale\_y\_log10() + scale\_x\_log10()
How do you "straighten" a graph?
28
# REVERSED geom\_freqploy(aes(x=price, y=..density..))
How do you get a freqpoly to display density instead of count?
29
# REVERSED cumsum()
How do you do a cumulative sum?
30
# REVERSED E.g. colour = variable \<5. Gives true or false for colours
How do you map an aesthetic to a function
31
# REVERSED 1. position on a common scale 2. position on an unaligned scale 3. length 4. tilt/angle 5. area 6. depth 7. colour luminance/colour saturation 8. curvature/volume
What is the ranking of effectiveness for ordered attributes? (8)
32
# REVERSED data %\>% group\_by(variable) %\>% summarise(newname = mean(distance, na.rm=TRUE))
How do you get a grouped summary (mean for this question) of a variable grouped by another variable?
33
# REVERSED - Unhelpful errors - Dependency hell - Unreliable back end - Need to run R on a server - Reactivity can be slow
What are the disadvantages of r shiny? (5)
34
# REVERSED sum(x\>10) gives the number of TRUE’s in x\>10 mean(x\>10) gives the proportion of TRUEs in x\>10
How do you get a count of how many x\>10 in a variable? How do you get a proportion?
35
# REVERSED arrange(data, desc(column1))
How do you arrange rows in descending order?
36
# REVERSED library(tidyverse)
What package is ggplot2 in?
37
# REVERSED mean () sd()
How do you get mean and standard deviation?
38
# REVERSED colour = NA
How do you change the colour to no colour?
39
# REVERSED contains everything which should be done in the background
What is contained in the server.R file in shiny?
40
# REVERSED Put the variable with more levels in the columns
Using facet grid, which variable should you put in the rows and which in the columns?
41
# REVERSED ``` arrange(data, column1, column2) #orders by column 1 then column 2 in ascending order ```
How do you change the order of rows?
42
# REVERSED Excludes them automatically
What does filter do with NA values?
43
# REVERSED 1. electric shock 2. saturation 3. length 4. area 5. depth 6. brightness
What is the order of how humans perceive sensations? (6)
44
# REVERSED Gives NA when there are missing values Use na.rm=TRUE to remove missing values prior to calculation
What do aggregate functions do when there are missing values? How do you change the default?
45
# REVERSED transmute(data, newcolumn = …)
How do you add a new column to a dataframe and keep only the new columns?
46
# REVERSED geom\_point (scatterplot) geom\_bin2d and geom\_hex() divide plane into 2d bins and use to colour to display how many points in each bin use boxplot and divide one continuous variable into a categorical: geom\_boxplot(aes(group = cut\_width(x, width))
What is a good way to display two continuous variables? (3)
47
# REVERSED library(GGally) ggpairs(data, columns= c('column1', 'column2', 'column3')]
What is a different way of creating a matrix of plots?
48
# REVERSED 1. spatial region 2. colour hue 3. motion 4. shape
What is the ranking of effectiveness for categorical attributes? (4)
49
# REVERSED provides schematic for how the application will be presented to the user. e.g. titlePanel(), sidebarLayout(), sidebarPanel(), mainPanel()
What is contained in the ui.R file in shiny?
50
# REVERSED , \>=, !=, ==
What are the comparison operators in R?
51
# REVERSED If it gives information about a variable it goes inside the aes(). If not it must go outside the aes()
Does the aesthetic mapping go inside or outside the aes()?
52
# REVERSED tibble(column1= , column2= ) data.frame(column1=, column2= ) can use as.character(column1= ) data.frame automatically sets characters as factors unless you set stringsAsFactors=FALSE
What are 2 codes to make vectors into a dataframe? How do you specify the type? What is a difference between the 2 codes?
53
# REVERSED Facets are subplots that each display a subset of the data
What is a facet?
54
# REVERSED - Aesthetics (position, shape, colour, …) - Geometric objects (points, lines, bars, …) - Scales (continuous, discrete, Cartesian coordinates, …) - Facets (small multiples) - Statistical transformation (identity, binning, median, …) - Coordinate system (Cartesian, polar, parallel, …)
What are the layers of Wickhams grammar of graphics? (6)
55
# REVERSED Like a boxplot but shows the full density plots
What is a violin plot?
56
# REVERSED !!vector #bang-bang operator, matches to names of variables in vector all\_of(vector) any\_of(vector) #in all of, all names must be present. in any of they don't
How do you select columns by matching them to names contained in a string vector? (3 ways)
57
# REVERSED 6. Additional groups will go unplotted
How many shapes can you plot as a mapping? What happens if there are more than that many factors?
58
# REVERSED + labs(y= "", x= "")
How do you label the axes?
59
# REVERSED geom\_point(position=“jitter”) OR geom\_jitter()
What are two ways to add jitter to a plot?
60
# REVERSED - Proximity: things that are spatially near to one another seem to be related - Similarity: things that look alike seem to be related - Connection: things that are visually tied to one another seem to be related - Continuity: partially hidden objects are completed into familiar shapes - Closure: incomplete shapes are perceived as complete - Figure and ground: visual elements are taken to be either in the foreground or in the background - Common fate: elements sharing a direction of movement are perceived as a unit
What are the Gestalt principles of relatedness? (7)
61
# REVERSED & #and | #or ! #not
What are the logical operators in R?
62
# REVERSED coord\_cartesian() #default coord\_flip() #switches the x and y axis coord\_map() #sets the aspect ratio correctly for maps coord\_quickmap() #sets the aspect ratio correctly for maps, quicker coord\_polar() #uses polar coordinates. use argument (theta=“y”) to create a pie chart. otherwise you get a bulls-eye chart coord\_fixed() #forces a specified ratio between the axes, default is 1
What are 6 coordinate systems in ggplot?
63
# REVERSED boxplot freqpoly, map by colour
What is a good way to display a categorical variable vs a continuous variable?
64
# REVERSED count(variable1, variable2) gives number of combinations in a table
How do you get the counts of each combination when comparing 2 categorical variables?
65
# REVERSED ui
What is the structure of a shiny app?
66
# REVERSED How humans perceive sensations compared to how they actually change
What is Steven’s Psychophysical Power Law?
67
# REVERSED Can't be at the beginning of a line. Must be at end of previous line
When using the + in ggplot, where in the line must it go?
68
# REVERSED data[1:200,]
code to keep the first 200 rows and all columns of a dataframe?
69
# REVERSED ggplot(data = ) + geom(mapping=aes(), stat=, position=) + +
What is the general format of a ggplot code?
70
# REVERSED rename(data, newname = variable1)
How do you rename a variable?
71
# REVERSED show.legend = FALSE
How do you remove the legend?
72
# REVERSED Uses statistical transformation stat\_count() to find the count of each variable. Can override using stat = "identity" for example to use a y value instead of count
How does geom\_bar get the count for each variable? How do you override it?
73
# REVERSED facet\_grid(variable1~variable2)
How do you facet a plot on a combination of 2 variables?
74
# REVERSED ggplot(data) + geom\_boxplot(aes(x = reorder(xvariable, yvariable, FUN = median), y = yvariable))
How do you reorder boxplots by median?
75
# REVERSED geom\_bar creates is own bin for NAs geom\_histogram removes missing values
What happens to missing values in geom\_bar and geom\_histogram?
76
# REVERSED max(variable) #gives max value which.max(variable) #gives location of max value
How do you get the max value of a variable and the location of the max value?
77
# REVERSED bar chart
What is a good way to visualise a single categorical variable?
78
# REVERSED select(data, variable1, variable2)
How do you select only certain columns of a dataframe?
79
# REVERSED The bars are automatically stacked with colours for each object in the variable (position = "stack") ``` position = “identity” #places each object exactly where it falls in context of graph. Will include overlapping so should use alpha position = “fill” #works like stacking but makes each set of stacked bars the same height. Makes it easier to compare proportions across groups position = “dodge” #places overlapping objects beside each other. Makes it easier to compare individual values ```
What happens when you map an aesthetic (colour) to a bar chart? What 3 ways are there to change how this looks?
80
# REVERSED lead(x) removes first number and adds NA to end lag(x) adds NA to beginning and removes last number
What do lead(x) and lag(x) do to a vector x?
81
# REVERSED ``` select(data, variable5, variable7, everything()) #moves variables 5 and 7 to the start but still keeps all others ```
How do you rearrange the columns to put certain columns at the start?
82
# REVERSED colour #changes colour, size #changes size alpha #changes transparency between 0 and 1 shape #changes shape linetype #from 0-6. 0 is blank, 1 is solid, others are dotted or dashed fill #fill colour of shape or curve stroke #modify width of border. E.g. on shapes fontface #character “plain”, “bold”, “italic” “bold.italic” group #groups by a specified variable
What are some possible aesthetic mappings? (9)
83
# REVERSED ``` histogram, puts continuous variables into bins density plot (smoothed histogram) ```
What is a good way to visualise a single continuous variable? (2)
84
# REVERSED select(data, -(variable1))
How do you exclude a column from a dataframe?
85
# REVERSED %/% (integer division) %% (remainder)
What are the modular arithmetic symbols?
86
# REVERSED table(variable)
How do you create a table with a count for each value in a variable?
87
# REVERSED starts\_with(“abc”) c ends\_with(“abc”) contains(“abc”) matches(“expression”)
How do you select column that start, with, end with, contain or match a string?