Data Wrangling Flashcards

1
Q

General Form :

DT[i, j, by]

A

Take DT, subset rows using i, then calculate j grouped by by

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

DT[3:5,] or DT[3:5]

A

Subsetting rows : Selects third to 5th row

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

DT[ V2 == “A”]

DT[ V2 %in% c(“A”, “C”)]

A

Selects all rows that have value A in column V2.

Select all rows that have the value A or C in column V2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

DT[,V2]

A

Select 1 column in j. Column V2 is returned as a vector.

[1] “A” “B” “C” “A” “B” “C” …

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

DT[,.(V2,V3)]

A

Select several columns in j.Columns V2 and V3 are returned as a data.table.

V2 V3

1: A -1.1727
2: B -0.3825
3: C -1.0604

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

.() or list()

A

.() is an alias to list(). If .() is used, the returned value is a data.table. If .() is not used, the result is a vector.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

DT[,sum(V1)]

A

Call functions in j.Returns the sum of all elements of column V1 in a vector.

[1] 18

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

DT[,.(sum(V1),sd(V3))]

A

Computing on several columns. Returns the sum of all elements of column V1 and the standard deviation of V3 in a data.table.

V1 V2
1 : 18 0.7634655

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

DT[,.(Aggregate = sum(V1), Sd.V3 = sd(V3))]

A

Assigning column names to computed columns.
The same with previous card, but with new names.

Aggregate Sd.V3
1 : 18 0.7634655

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

DT[,.(V1, Sd.V3 = sd(V3))]

A

Columns gets recycled if different length. Selects column V1, and compute std. dev. of V3, which returns a single value and gets recycled.

V1              Sd.V3
1: 1         0.7634655 
2: 2       0.7634655 
... 
11: 1        0.7634655 
12: 2      0.7634655
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

DT[,{print(V2)
plot(V3)
NULL}]

A

Multiple expressions can be wrapped in curly braces. Print column V2 and plot V3.

[1] "A" "B" "C" "A" "B" "C" ...
#And a plot
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

DT[,.(V4.Sum = sum(V4)),by=V1]

A

Doing j by group. Calculates the sum of V4, for every group in V1.

V1 V4.Sum
1:1 36

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

DT[,.(V4.Sum = sum(V4)),by=.(V1,V2)]

A

Doing j by several groups using .(). The same with previous cards, but for every group in V1 and V2.

V1   V2    V4.Sum 

1: 1 A 8
2: 2 B 10
3: 1 C 12
4: 2 A 14
5: 1 B 16
6: 2 C 18

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

DT[,.(V4.Sum = sum(V4)),by=sign(V1-1)]

A

Call functions in by. Calculates the sum of V4, for every group in
sign(V1-1).

sign     V4.Sum

1: 0 36
2: 1 42

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

DT[,.(V4.Sum = sum(V4)),

by=.(V1.01 = sign(V1-1))]

A

Assigning new column names in by. Same as previous card, but with a new name for the variable we are grouping by.

V1.01 V4.Sum

1: 0 36
2: 1 42

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

DT[1:5, .(V4.Sum = sum(V4)), by=V1]

A

Grouping only on a subset by specifying i.Calculates the sum of V4, for every group in V1, after subsetting on the first five rows.
V1 V4.Sum
1: 1 2
2: 9 6

17
Q

DT[,.N,by=V1]

A

Using .N to get the total number of observations of each
group. Count the number of rows for every group in V1.

V1     N

1: 1 6
2: 2 6

18
Q

DT[, V1 := round(exp(V1),2)]

A

Adding/updating a column by reference using := in one line. Watch out: extra assignment (DT

19
Q

DT[, c(“V1”,”V2”) := list (round(exp(V1),2), LETTERS [4:6])]

A

Adding/updating several columns by reference using :=. Column V1 and V2 are updated by what is after :=.

Returns the result invisibly. Column V1 changed as above. Column V2 went from: [1] “A” “B” “C” “A” “B” “C” …to:[1] “D” “E” “F” “D” “E” “F” …

20
Q

DT[, ‘:=’ (V1 = round(exp(V1),2),

V2 = LETTERS[4:6])] [ ]

A

Using functional :=. Another way to write the same line as previous card, but easier to write comments side-by-side. Also, when [ ] is added the result is printed to the screen.
Same changes as line above this one, but the result is printed to the screen because of the [ ] at the end of the statement.

21
Q

DT[, V1 := NULL]

A

Remove a column instantly using :=. Removes column V1.

Returns the result invisibly. Column V1 became NULL.

22
Q

DT[, c(“V1”,”V2”) := NULL]

A

Remove several columns instantly using :=.
Removes columns V1 and V2.

Returns the result invisibly. Column V1 and V2 became NULL.

23
Q

2 DT[, Cols.chosen := NULL]

Cols.chosen = c(“A”,”B”)

A

2 Watch out: this deletes the column with column name Cols.chosen.

Wrap the name of a variable which contains column names in parenthesis to pass the contents of that variable to be deleted.

24
Q

setkey(DT,V2)

A
Use setkey() to set a key on a DT. The data is sorted on the column we specified by reference.
A key is set on column V2.
25
Q

DT[“A”]

A

Use keys like supercharged rownames to select rows. Returns all the rows where the key column (set to column V2 in the line above) has the value A.

V1     V2     V3         V4 

1: 1 A -1.1727 1
2: 2 A 0.6651 4
3: 1 A -1.0604 7
4: 2 A -0.3825 10

26
Q

DT[c(“A”,”C”)]

A

Returns all the rows where the key column (V2) has the value A or C.

    V1     V2     V3         V4 
1:   1       A     -1.1727      1 
2:  2      A      0.6651    4 
                     ...
7:   1      C     -1.1727      9 
8:   2     C      0.6651   12
27
Q

DT[“A”, mult =”first”]

DT[“A”, mult = “last”]

A

The mult argument is used to control which row that i matches to is returned, default is all.

Returns first row of all rows that match the value A in the key column (V2).

Returns last row of all rows that match the value A in the key column (V2).

V1     V2     V3         V4  1:   1       A     -1.1727      1 

V1     V2     V3          V4  1:   2       A     -0.3825   10