Lecture 3 Flashcards

Question 1

Q

Question 2

Q

What are the three criteria for a tidy dataset?

Answer

A

Each variable has its own column. 2. Each observation has its own row. 3. Each value has its own cell.

Question 3

Q

What are common signs of untidy data?

Answer

A

Column headers are values, not variable names.

Multiple variables stored in one column.

Variables stored across both rows and columns.

A single observational unit stored across multiple tables.

Question 4

Q

What is the function used to convert wide data to long data?

Question 5

Q

What is the function used to convert long data to wide data?

Question 6

Q

How do you split a column into multiple columns?

Answer

A

Use separate(), e.g., separate(data, col = “proportion”, into = c(“votes”, “total_votes”)).

Question 7

Q

How do you combine multiple columns into one?

Answer

A

Use unite(), e.g., unite(data, col = “candidate”, name, surname, sep = “ “).

Question 8

Q

What function is used to concatenate multiple tables?

Answer

A

rbindlist()

Question 9

Q

What are the four types of merges in data.table?

Answer

A

Inner Merge: Only matching rows from both tables.

Outer Merge: All rows from both tables with NAs for missing values.

Left Merge: All rows from the first table, with NAs for non-matching rows in the second.

Right Merge: All rows from the second table, with NAs for non-matching rows in the first.

Question 10

Q

How do you perform an inner merge in data.table?

Answer

A

merge(table1, table2, by = “column”, all = FALSE)

Question 11

Q

How do you merge two tables by multiple columns?

Answer

A

merge(table1, table2, by = c(“col1”, “col2”))

Question 12

Q

Why is there no single tidy representation of a dataset?

Answer

A

The tidy representation depends on the observation and the goal of the analysis.

Question 13

Q

What is the difference between back-end and front-end data needs?

Answer

A

Back-end: Data is normalized to avoid redundancy.

Front-end: Data may be combined for easier analysis, even with some redundancy.

Lecture 3 Flashcards

Tidy Data (13 cards)