Week 2 Flashcards

1
Q

What is rectangular data? Explain in detail

A

Rectangular data is where variables are in columns and observations are in rows. Variables are a property, quantity or quality that can be measured. A value is the state of the variable.
Observations are a collection of values, each with a different associated variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What do we use to visualise continuous variables?

A

A histogram. Think carat data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the principles of tidy data?

A

Variables are contained in columns. Observations are contained in rows. Data is in a single table. Long form used for reshaping etc.
Wide form used for analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the tidy verbs and what do they do?

A

Gather / Spread: Gather works by collecting variables that have been placed into rows. Spread works the opposite way. It specifies keys / identifiers.
Unite / Separate: split and combine columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What two functions create new text values that have no leading “W”?
library(tidyverse)
genes %
gather(variable, expr, -id) %>%
separate(variable, c(“trt”, “leftover”), “-“) %>%
separate(leftover, c(“time”, “rep”), “\.”) %>%
mutate(trt = sub(“W”, “”, trt)) %>%
mutate(rep = sub(“R”, “”, rep))

A

Mutate and Sub

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does the function read.fwf do? What does the argument c(11, 4, 2, 4, rep(c(5, 1, 1, 1), 31)) do in the function?

A

Sets the number of columns corresponding to each value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the difference between [] and ()? What does [,c(1,2,3,4,seq(5,128,4))] do in the second line?

A

[] is used for indexing matrices and vectors, () is used with functions for providing input parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly