Week 2 Flashcards
What is rectangular data? Explain in detail
Rectangular data is where variables are in columns and observations are in rows. Variables are a property, quantity or quality that can be measured. A value is the state of the variable.
Observations are a collection of values, each with a different associated variable.
What do we use to visualise continuous variables?
A histogram. Think carat data.
What are the principles of tidy data?
Variables are contained in columns. Observations are contained in rows. Data is in a single table. Long form used for reshaping etc.
Wide form used for analysis.
What are the tidy verbs and what do they do?
Gather / Spread: Gather works by collecting variables that have been placed into rows. Spread works the opposite way. It specifies keys / identifiers.
Unite / Separate: split and combine columns
What two functions create new text values that have no leading “W”?
library(tidyverse)
genes %
gather(variable, expr, -id) %>%
separate(variable, c(“trt”, “leftover”), “-“) %>%
separate(leftover, c(“time”, “rep”), “\.”) %>%
mutate(trt = sub(“W”, “”, trt)) %>%
mutate(rep = sub(“R”, “”, rep))
Mutate and Sub
What does the function read.fwf do? What does the argument c(11, 4, 2, 4, rep(c(5, 1, 1, 1), 31)) do in the function?
Sets the number of columns corresponding to each value
What is the difference between [] and ()? What does [,c(1,2,3,4,seq(5,128,4))] do in the second line?
[] is used for indexing matrices and vectors, () is used with functions for providing input parameters