Week 2 Flashcards

Question 1

Q

What is rectangular data? Explain in detail

Answer

A

Rectangular data is where variables are in columns and observations are in rows. Variables are a property, quantity or quality that can be measured. A value is the state of the variable.
Observations are a collection of values, each with a different associated variable.

Question 2

Q

What do we use to visualise continuous variables?

Answer

A

A histogram. Think carat data.

Question 3

Q

What are the principles of tidy data?

Answer

A

Variables are contained in columns. Observations are contained in rows. Data is in a single table. Long form used for reshaping etc.
Wide form used for analysis.

Question 4

Q

What are the tidy verbs and what do they do?

Answer

A

Gather / Spread: Gather works by collecting variables that have been placed into rows. Spread works the opposite way. It specifies keys / identifiers.
Unite / Separate: split and combine columns

Question 5

Q

What two functions create new text values that have no leading “W”?
library(tidyverse)
genes %
gather(variable, expr, -id) %>%
separate(variable, c(“trt”, “leftover”), “-“) %>%
separate(leftover, c(“time”, “rep”), “\.”) %>%
mutate(trt = sub(“W”, “”, trt)) %>%
mutate(rep = sub(“R”, “”, rep))

Answer

A

Mutate and Sub

Question 6

Q

What does the function read.fwf do? What does the argument c(11, 4, 2, 4, rep(c(5, 1, 1, 1), 31)) do in the function?

Answer

A

Sets the number of columns corresponding to each value

Question 7

Q

What is the difference between [] and ()? What does [,c(1,2,3,4,seq(5,128,4))] do in the second line?

Answer

A

[] is used for indexing matrices and vectors, () is used with functions for providing input parameters

Week 2 Flashcards

(7 cards)