Lecture 2 Flashcards

Data Wrangling with Data.table in R

1
Q

What is the syntax for a data.table operation in R?

A

A: DT[i, j, by] where:

i: Row conditions

j: Column operations

by: Grouping

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do you create a data.table in R?

A

data.table(x = c(), y = c())

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What function converts a data.frame to a data.table?

A

as.data.table()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which function is used to load large data files efficiently into a data.table?

A

fread()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can you access the 2nd row of a data.table?

A

DT[2]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can you subset rows using multiple conditions?

A

Use & for AND and | for OR, e.g., DT[AIRLINE == “AA” & DEPARTURE_TIME > 600]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How can you select rows where a column value is in a set of values?

A

Use %in%, e.g., DT[DESTINATION_AIRPORT %in% c(“JFK”, “LGA”)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How can you ensure changes in a data.table do not affect the original data?

A

Use copy(), e.g., new_DT <- copy(DT)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you access a specific column by name?

A

DT[, COLUMN_NAME]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you access multiple columns as a data.table?

A

DT[, .(col1, col2)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you add a new column in a data.table?

A

Use :=, e.g., DT[, NEW_COLUMN := OLD_COLUMN * 2]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you remove a column in a data.table?

A

DT[, COLUMN_NAME := NULL]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does .N represent in data.table?

A

.N is a built-in variable that counts the number of rows in the table or group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the purpose of the by argument in data.table?

A

It is used for grouped operations, e.g., DT[, .(mean_col = mean(COLUMN)), by = GROUP_COLUMN]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you calculate the mean of a column in a data.table?

A

DT[, mean(COLUMN_NAME, na.rm = TRUE)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the output of rep(6:9, 2) ?
a. 6 7 8 9 6 7 8 9
b. 6 6 7 7 8 8 9 9
c. “6”, “7”, “8”, “9”, “6”, “7”, “8”, “9”
d. “6”, “6”, “7”, “7”, “8”, “8”, “9”, “9”

A

The correct answer is A

rep(6:9, 2)
[1] 6 7 8 9 6 7 8 9

answer B is wrong because rep(6:9, 2) repeats the sequence “6 7 8 9” twice, not each number of the sequence individually.
answer C and D are wrong because their elements are characters (and not integers)

16
Q

What is the output of the following code: c(3 != sqrt(9), TRUE == (3 > 8))?
a. FALSE TRUE
b. TRUE
c. FALSE FALSE
d. FALSE

A

The correct answer is C

c(3 != sqrt(9), TRUE == (3 > 8))
[1] FALSE FALSE

NOTE: in order to understand its logic better, the expression could be simplified to:
c(3 != 3, TRUE == FALSE)

17
Q

Let x <- c(1, 6, 3, 2). What is the output of sort(x)?
a. 1 2 3 6
b. 6 3 2 1
c. 1 4 3 2
d. 2 3 4 1