Week 4: Data Visualisation and Transformation Flashcards
REVERSED
- Maximise data-to-ink ratio
- Present more data without losing interpretability
- Use levels of detail
What are 3 distilled principles from Tufte?
REVERSED
geom_point() #scatterplot
geom_density_2d() #contour lines from x and y positions
geom_histogram() #histogram, use binwidth to change size of bins
geom_density() #density plot
geom_rug() #adds rug marks of raw data on the axes
geom_boxplot() #boxplot
geom_bar() #barplot, automatically transforms variables to counts using stat_count
geom_col() #column graph, need x and y variables
geom_line() #line chart
geom_label(aes(x=, y=, label=)) #gives a label at the specified x and y position
geom_freqpoly() #same as histogram but uses lines instead of bars, good for overlapping data
geom_area() #area chart
geom_smooth(method=‘lm’) #draws line of best fit. se=TRUE is default and displays the standard error lines aswell
geom_abline() #adds reference line, default is from data
geom_ribbon(x, ymin, ymax) #draws line with lower and upper bounds e.g. confidence interval
What are some useful geoms? (15)
REVERSED
filter(!is.na(data))
How do you remove the missing values from a dataframe using filter?
REVERSED
get the residual and plot the data against the residual
How do you “flatten” a graph?
REVERSED
more data with less ink on the page
What is a high data-ink ratio?
REVERSED
ungroup()
How do you remove a grouping to return to operations on ungrouped variables?
REVERSED
+ xlim(0,5)
+ coord_cartesian(xlim = c( , ))
this way doesn’t throw away the data outside the limits, used to zoom
How do you set axis limits in 2 ways?
REVERSED
%in% #in
between(x, left, right) #finds rows where x is between left and right
How do you use in and between in R?
REVERSED
Using aes(group= ) is the same as mapping a variable onto an aesthetic, it separates based on the discrete variable but does not specify the difference with a legend on the graph
What does the “group” mapping do in a geom?
REVERSED
anything from the input can be called in server using input$…
How do you call something from the input in the server in shiny?
REVERSED
is.na()
How do you determine if a value is a missing value in R?
# Define a plot called distplot in the server using output$distplot. In ui.R can call directly with distplot
How do you define something e.g. a plot in the server and how do you refer to it in the ui in shiny?
REVERSED
mutate(data, new column = …)
How do you add a new column to a dataframe?
REVERSED
nrow =
ncol =
specifies number of rows or columns
What are additional arguments to facet_wrap? (2)
REVERSED
The output depends on the users input. all output must be wrapped in a renderSomething function, this watches the input and updates accordingly
What is reactivity in Shiny?
REVERSED
min_rank(x) #gives 1 to the lowest value and so on
min_rank(desc(x)) #gives 1 to the highest value
How do you give a ranking to values in a vector x?
REVERSED
filter(dataframe, value1==1, value2==4)
How do you select only certain rows of a dataframe based on their values?
REVERSED
geom_count, creates a graph with different size dots at each intersection
geom_tile with fill aesthetic darker depending on combinations of each value:
data %>% count(variable1, variable2) %>% ggplot(aes(x = variable1, y = variable2)) + geom_tile(aes(fill = n))
What is a good way to display two categorical variables?
REVERSED
The mapping in the geom overrides the global mapping in ggplot
If you have different mappings in ggplot() and geom(aes()), which one will override?
REVERSED
read_csv(‘filepath’)
How do you load a csv into a dataframe in R?
REVERSED
As the number of points and number of categories increases, facet grid becomes better
When should you use facet grid over mapping by colour?
REVERSED
facet_wrap(~variable)
create formula with ~, variable should be discrete
How do you use facet wrap?
REVERSED
Facet of 4 graphs with very different data but the same line of best fit, which has the same linear regression coefficient 0.7
What is Ansocombe’s quartet?
REVERSED
mutate(column = ifelse(test, value1, value2))
test is a logical vector ie column<5
value 1 is what to do when test is true
value 2 is what to do when test is false
How do you replace unusual values with missing values in a dataframe?
REVERSED
Yes to colour, gives continuous scale. No to shape
Can you map continuous variables to colour and shape?
REVERSED
cut(variable, breaks=(0,10,20), labels = c(“”, “”))
How do you split a variable into different sections and label the sections?
REVERSED
+ scale_y_log10() + scale_x_log10()
How do you “straighten” a graph?
REVERSED
geom_freqploy(aes(x=price, y=..density..))
How do you get a freqpoly to display density instead of count?
REVERSED
cumsum()
How do you do a cumulative sum?
REVERSED
E.g. colour = variable <5. Gives true or false for colours
How do you map an aesthetic to a function
REVERSED
- position on a common scale
- position on an unaligned scale
- length
- tilt/angle
- area
- depth
- colour luminance/colour saturation
- curvature/volume
What is the ranking of effectiveness for ordered attributes? (8)
REVERSED
data %>% group_by(variable) %>% summarise(newname = mean(distance, na.rm=TRUE))
How do you get a grouped summary (mean for this question) of a variable grouped by another variable?
REVERSED
- Unhelpful errors
- Dependency hell
- Unreliable back end
- Need to run R on a server
- Reactivity can be slow
What are the disadvantages of r shiny? (5)
REVERSED
sum(x>10) gives the number of TRUE’s in x>10
mean(x>10) gives the proportion of TRUEs in x>10
How do you get a count of how many x>10 in a variable? How do you get a proportion?