Lecture 3 Flashcards

Tidy Data

1
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the three criteria for a tidy dataset?

A
  1. Each variable has its own column. 2. Each observation has its own row. 3. Each value has its own cell.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are common signs of untidy data?

A

Column headers are values, not variable names.

Multiple variables stored in one column.

Variables stored across both rows and columns.

A single observational unit stored across multiple tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the function used to convert wide data to long data?

A

melt()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the function used to convert long data to wide data?

A

dcast()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you split a column into multiple columns?

A

Use separate(), e.g., separate(data, col = “proportion”, into = c(“votes”, “total_votes”)).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you combine multiple columns into one?

A

Use unite(), e.g., unite(data, col = “candidate”, name, surname, sep = “ “).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What function is used to concatenate multiple tables?

A

rbindlist()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the four types of merges in data.table?

A

Inner Merge: Only matching rows from both tables.

Outer Merge: All rows from both tables with NAs for missing values.

Left Merge: All rows from the first table, with NAs for non-matching rows in the second.

Right Merge: All rows from the second table, with NAs for non-matching rows in the first.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you perform an inner merge in data.table?

A

merge(table1, table2, by = “column”, all = FALSE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you merge two tables by multiple columns?

A

merge(table1, table2, by = c(“col1”, “col2”))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why is there no single tidy representation of a dataset?

A

The tidy representation depends on the observation and the goal of the analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the difference between back-end and front-end data needs?

A

Back-end: Data is normalized to avoid redundancy.

Front-end: Data may be combined for easier analysis, even with some redundancy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly