Five Key Dplyrs Flashcards

Question 1

Q

Filter()

Answer

A

selects rows based on logical conditions.

Filter “Badu, Erkyah” and “Nas” in Billboard
billboard[1:5] |>
filter(artist == “Badu, Erkyah” | artist == “Nas”)

Question 2

Q

Select()

Answer

A

selects columns based on their names.

Subset “artist” and “track” columns
billboard |>
select(artist, track) |>
head()

Question 3

Q

mutate()

Answer

A

generates new columns by applying functions to existing columns.

Compute weeks on Billboard chart
billboard <- billboard |>
rowwise() |>
mutate(n_notNA = sum(!is.na(c_across(contains(“wk”))))) |>
ungroup()
hist(billboard$n_notNA)

Question 4

Q

summarize()

Answer

A

gives summaries, with group_by() for grouped summaries.

Median weeks on Billboard chart
billboard |> summarize(median_wks = median(n_notNA))

Artists with multiple Billboard songs
billboard[1:3] |>
group_by(artist) |>
filter(n() > 1) |>
count(sort = TRUE

Question 5

Q

arrange()

Answer

A

sorts rows by column values. Use desc() for reverse order

Sort by first entry date
billboard[1:3] |> arrange(date.entered)

Longest staying artist and song
billboard[c(1, 2, 80)] |> arrange(desc(n_notNA))
billboard[c(1, 2, 80)] |> filter(n_notNA == max(n_notNA))

Question 6

Q

Pivot_longer()

Answer

A

Converts data from wide to long format by combining multiple columns into two: Variable and Values.

Question 7

Q

Pivot_wider()

Answer

A

Converts from long to wide format by spreading variable -value pairs into separate columns.

Question 8

Q

|> or %>%

Answer

A

Pipe operator, step-by-step flow in data manipulation.

Question 9

Q

group_by()

Answer

A

To create a “group” copy of a table grouped by columns

Question 10

Q

Count()

Answer

A

Count number of rows in each group defined by the variables

Question 11

Q

distinct()

Answer

A

Remove rows with duplicate values

Question 12

Q