Five Key Dplyrs Flashcards
Filter()
selects rows based on logical conditions.
Filter “Badu, Erkyah” and “Nas” in Billboard
billboard[1:5] |>
filter(artist == “Badu, Erkyah” | artist == “Nas”)
Select()
selects columns based on their names.
Subset “artist” and “track” columns
billboard |>
select(artist, track) |>
head()
mutate()
generates new columns by applying functions to existing columns.
Compute weeks on Billboard chart
billboard <- billboard |>
rowwise() |>
mutate(n_notNA = sum(!is.na(c_across(contains(“wk”))))) |>
ungroup()
hist(billboard$n_notNA)
summarize()
gives summaries, with group_by() for grouped summaries.
Median weeks on Billboard chart
billboard |> summarize(median_wks = median(n_notNA))
Artists with multiple Billboard songs
billboard[1:3] |>
group_by(artist) |>
filter(n() > 1) |>
count(sort = TRUE
arrange()
sorts rows by column values. Use desc() for reverse order
Sort by first entry date
billboard[1:3] |> arrange(date.entered)
Longest staying artist and song
billboard[c(1, 2, 80)] |> arrange(desc(n_notNA))
billboard[c(1, 2, 80)] |> filter(n_notNA == max(n_notNA))
Pivot_longer()
Converts data from wide to long format by combining multiple columns into two: Variable and Values.
Pivot_wider()
Converts from long to wide format by spreading variable -value pairs into separate columns.
|> or %>%
Pipe operator, step-by-step flow in data manipulation.
group_by()
To create a “group” copy of a table grouped by columns
Count()
Count number of rows in each group defined by the variables
distinct()
Remove rows with duplicate values