Week 4: Data Visualisation and Transformation Flashcards

1
Q

How many shapes can you plot as a mapping? What happens if there are more than that many factors?

A
  1. Additional groups will go unplotted
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are 2 codes to make vectors into a dataframe?
How do you specify the type?
What is a difference between the 2 codes?

A

tibble(column1= , column2= )
data.frame(column1=, column2= )

can use as.character(column1= )

data.frame automatically sets characters as factors unless you set stringsAsFactors=FALSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you give a ranking to values in a vector x?

A

can also use rank, is equivalent

min_rank(x) #gives 1 to the lowest value and so on
min_rank(desc(x)) #gives 1 to the highest value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you set axis limits in 2 ways?

A

+ xlim(0,5)

+ coord_cartesian(xlim = c( , ))
this way doesn’t throw away the data outside the limits, used to zoom

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you define something e.g. a plot in the server and how do you refer to it in the ui in shiny?

A
# Define a plot called distplot in the server using output$distplot. 
In ui.R can call directly with distplot
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

code to keep the first 200 rows and all columns of a dataframe?

A

data[1:200,]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a good way to display a categorical variable vs a continuous variable?

A

boxplot
freqpoly, map by colour

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When should you use facet grid over mapping by colour?

A

As the number of points and number of categories increases, facet grid becomes better

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the layers of Wickhams grammar of graphics? (6)

A
  • Aesthetics (position, shape, colour, …)
  • Geometric objects (points, lines, bars, …)
  • Scales (continuous, discrete, Cartesian coordinates, …)
  • Facets (small multiples)
  • Statistical transformation (identity, binning, median, …)
  • Coordinate system (Cartesian, polar, parallel, …)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you replace unusual values with missing values in a dataframe?

A

mutate(column = ifelse(test, value1, value2))
test is a logical vector ie column<5
value 1 is what to do when test is true
value 2 is what to do when test is false

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you do a cumulative sum?

A

cumsum()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a high data-ink ratio?

A

more data with less ink on the page

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you get a freqpoly to display density instead of count?

A

geom_freqploy(aes(x=price, y=..density..))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the modular arithmetic symbols?

A

%/% (integer division)
%% (remainder)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do you reorder boxplots by median?

A

ggplot(data) + geom_boxplot(aes(x = reorder(xvariable, yvariable, FUN = median), y = yvariable))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a good way to display two categorical variables?

A

geom_count, creates a graph with different size dots at each intersection
geom_tile with fill aesthetic darker depending on combinations of each value:
data %>% count(variable1, variable2) %>% ggplot(aes(x = variable1, y = variable2)) + geom_tile(aes(fill = n))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a different way of creating a matrix of plots?

A

library(GGally)
ggpairs(data, columns= c(‘column1’, ‘column2’, ‘column3’)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How do you remove a grouping to return to operations on ungrouped variables?

A

ungroup()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What do aggregate functions do when there are missing values? How do you change the default?

A

Gives NA when there are missing values
Use na.rm=TRUE to remove missing values prior to calculation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the general format of a ggplot code?

A

ggplot(data = ) +
geom(mapping=aes(), stat=, position=) +
+

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the Gestalt principles of relatedness? (7)

A
  • Proximity: things that are spatially near to one another seem to be related
  • Similarity: things that look alike seem to be related
  • Connection: things that are visually tied to one another seem to be related
  • Continuity: partially hidden objects are completed into familiar shapes
  • Closure: incomplete shapes are perceived as complete
  • Figure and ground: visual elements are taken to be either in the foreground or in the background
  • Common fate: elements sharing a direction of movement are perceived as a unit
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the ranking of effectiveness for categorical attributes? (4)

A
  1. spatial region
  2. colour hue
  3. motion
  4. shape
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What happens to missing values in geom_bar and geom_histogram?

A

geom_bar creates is own bin for NAs
geom_histogram removes missing values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are the logical operators in R?

A

& #and
| #or
! #not

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Using facet grid, which variable should you put in the rows and which in the columns?
Put the variable with more levels in the columns
26
What is the ranking of effectiveness for ordered attributes? (8)
1. position on a common scale 2. position on an unaligned scale 3. length 4. tilt/angle 5. area 6. depth 7. colour luminance/colour saturation 8. curvature/volume
27
How do you arrange rows in descending order?
arrange(data, desc(column1))
28
How do you get mean and standard deviation?
mean () sd()
29
How do you remove the missing values from a dataframe using filter?
filter(!is.na(data))
30
What package is ggplot2 in?
library(tidyverse)
31
How do you change the colour to no colour?
colour = NA
32
When using the + in ggplot, where in the line must it go?
Can't be at the beginning of a line. Must be at end of previous line
33
What does filter do with NA values?
Excludes them automatically
34
How do you add a new column to a dataframe and keep only the new columns?
transmute(data, newcolumn = …)
35
How do you rename a variable?
rename(data, newname = variable1)
36
How do you get a count of how many x\>10 in a variable? How do you get a proportion?
sum(x\>10) gives the number of TRUE’s in x\>10 mean(x\>10) gives the proportion of TRUEs in x\>10
37
How do you select only certain columns of a dataframe?
select(data, variable1, variable2)
38
How do you create a table with a count for each value in a variable?
table(variable)
39
What is Steven’s Psychophysical Power Law?
How humans perceive sensations compared to how they actually change
40
What are some useful geoms? (15)
geom\_point() #scatterplot geom\_density\_2d() #contour lines from x and y positions geom\_histogram() #histogram, use binwidth to change size of bins geom\_density() #density plot geom\_rug() #adds rug marks of raw data on the axes geom\_boxplot() #boxplot geom\_bar() #barplot, automatically transforms variables to counts using stat\_count geom\_col() #column graph, need x and y variables geom\_line() #line chart geom\_label(aes(x=, y=, label=)) #gives a label at the specified x and y position geom\_freqpoly() #same as histogram but uses lines instead of bars, good for overlapping data geom\_area() #area chart geom\_smooth(method=‘lm’) #draws line of best fit. se=TRUE is default and displays the standard error lines aswell geom\_abline() #adds reference line, default is from data geom\_ribbon(x, ymin, ymax) #draws line with lower and upper bounds e.g. confidence interval
41
What are two ways to add jitter to a plot?
geom\_point(position=“jitter”) OR geom\_jitter()
42
How do you use facet wrap?
facet\_wrap(~variable) create formula with ~, variable should be discrete
43
How do you get a grouped summary (mean for this question) of a variable grouped by another variable?
data %\>% group\_by(variable) %\>% summarise(newname = mean(distance, na.rm=TRUE))
44
How do you remove the legend?
show.legend = FALSE
45
What is a good way to display two continuous variables? (3)
geom\_point (scatterplot) geom\_bin2d and geom\_hex() divide plane into 2d bins and use to colour to display how many points in each bin use boxplot and divide one continuous variable into a categorical: geom\_boxplot(aes(group = cut\_width(x, width))
46
What is a good way to visualise a single continuous variable? (2)
``` histogram, puts continuous variables into bins density plot (smoothed histogram) ```
47
How do you select columns by matching them to names contained in a string vector? (3 ways)
!!vector #bang-bang operator, matches to names of variables in vector all\_of(vector) any\_of(vector) #in all of, all names must be present. in any of they don't
48
What does the "group" mapping do in a geom?
Using aes(group= ) is the same as mapping a variable onto an aesthetic, it separates based on the discrete variable but does not specify the difference with a legend on the graph
49
What is a good way to visualise a single categorical variable?
bar chart
50
How do you rearrange the columns to put certain columns at the start?
``` select(data, variable5, variable7, everything()) #moves variables 5 and 7 to the start but still keeps all others ```
51
How do you facet a plot on a combination of 2 variables?
facet\_grid(variable1~variable2)
52
How do you get the counts of each combination when comparing 2 categorical variables?
count(variable1, variable2) gives number of combinations in a table
53
How do you exclude a column from a dataframe?
select(data, -(variable1))
54
How does geom\_bar get the count for each variable? How do you override it?
Uses statistical transformation stat\_count() to find the count of each variable. Can override using stat = "identity" for example to use a y value instead of count
55
How do you change the order of rows?
``` arrange(data, column1, column2) #orders by column 1 then column 2 in ascending order ```
56
Can you map continuous variables to colour and shape?
Yes to colour, gives continuous scale. No to shape
57
What are 6 coordinate systems in ggplot?
coord\_cartesian() #default coord\_flip() #switches the x and y axis coord\_map() #sets the aspect ratio correctly for maps coord\_quickmap() #sets the aspect ratio correctly for maps, quicker coord\_polar() #uses polar coordinates. use argument (theta=“y”) to create a pie chart. otherwise you get a bulls-eye chart coord\_fixed() #forces a specified ratio between the axes, default is 1
58
How do you "flatten" a graph?
get the residual and plot the data against the residual
59
How do you map an aesthetic to a function
E.g. colour = variable \<5. Gives true or false for colours
60
How do you add a new column to a dataframe?
mutate(data, new column = …)
61
How do you "straighten" a graph?
+ scale\_y\_log10() + scale\_x\_log10()
62
What are some possible aesthetic mappings? (9)
colour #changes colour, size #changes size alpha #changes transparency between 0 and 1 shape #changes shape linetype #from 0-6. 0 is blank, 1 is solid, others are dotted or dashed fill #fill colour of shape or curve stroke #modify width of border. E.g. on shapes fontface #character “plain”, “bold”, “italic” “bold.italic” group #groups by a specified variable
63
What is the structure of a shiny app?
ui
64
What is Ansocombe’s quartet?
Facet of 4 graphs with very different data but the same line of best fit, which has the same linear regression coefficient 0.7
65
How do you select column that start, with, end with, contain or match a string?
starts\_with(“abc”) c ends\_with(“abc”) contains(“abc”) matches(“expression”)
66
What are the disadvantages of r shiny? (5)
- Unhelpful errors - Dependency hell - Unreliable back end - Need to run R on a server - Reactivity can be slow
67
If you have different mappings in ggplot() and geom(aes()), which one will override?
The mapping in the geom overrides the global mapping in ggplot
68
What are the comparison operators in R?
, \>=, !=, ==
69
How do you label the axes?
+ labs(y= "", x= "")
70
What is contained in the ui.R file in shiny?
provides schematic for how the application will be presented to the user. e.g. titlePanel(), sidebarLayout(), sidebarPanel(), mainPanel()
71
What is contained in the server.R file in shiny?
contains everything which should be done in the background
72
What is a violin plot?
Like a boxplot but shows the full density plots
73
What is the order of how humans perceive sensations? (6)
1. electric shock 2. saturation 3. length 4. area 5. depth 6. brightness
74
What do lead(x) and lag(x) do to a vector x?
lead(x) removes first number and adds NA to end lag(x) adds NA to beginning and removes last number
75
What is a facet?
Facets are subplots that each display a subset of the data
76
Does the aesthetic mapping go inside or outside the aes()?
If it gives information about a variable it goes inside the aes(). If not it must go outside the aes()
77
How do you call something from the input in the server in shiny?
anything from the input can be called in server using input$…
78
How do you select only certain rows of a dataframe based on their values?
filter(dataframe, value1==1, value2==4)
79
How do you determine if a value is a missing value in R?
is.na()
80
What are additional arguments to facet\_wrap? (2)
nrow = ncol = specifies number of rows or columns
81
How do you get the max value of a variable and the location of the max value?
max(variable) #gives max value which.max(variable) #gives location of max value
82
What are 3 distilled principles from Tufte?
- Maximise data-to-ink ratio - Present more data without losing interpretability - Use levels of detail
83
What is reactivity in Shiny?
The output depends on the users input. all output must be wrapped in a renderSomething function, this watches the input and updates accordingly
84
What happens when you map an aesthetic (colour) to a bar chart? What 3 ways are there to change how this looks?
The bars are automatically stacked with colours for each object in the variable (position = "stack") ``` position = “identity” #places each object exactly where it falls in context of graph. Will include overlapping so should use alpha position = “fill” #works like stacking but makes each set of stacked bars the same height. Makes it easier to compare proportions across groups position = “dodge” #places overlapping objects beside each other. Makes it easier to compare individual values ```
85
How do you use in and between in R?
%in% #in between(x, left, right) #finds rows where x is between left and right
86
How do you split a variable into different sections and label the sections?
cut(variable, breaks=(0,10,20), labels = c("", ""))
87
How do you load a csv into a dataframe in R?
read\_csv(‘filepath’)