Tidyverse Flashcards
What’s the Tidyverse package that provides ways to ingest rectangular data? What are the functions used to accomplish it and what class of object they do they return?
readr is the package. These are the functions, they all return tibble.
read_delim(): general delimited files
read_csv(): comma separated (CSV) files
read_tsv(): tab separated files
read_fwf(): fixed width files
read_table(): tabular files where columns are separated by white-space.
read_log(): web log files
What are the 8 core packages in Tidyverse and what’s their purpose?
- ggplot2 - create graphics
- dplyr - for data manipulation.
- tidyr - helps to create tidy data
- readr - read rectangular data
- purrr - extends R Functional Programming (vec funs)
- tibble - a modern and enhanced data frame
- stringr - functions for string manipulation
- forcats - tools to work with factors
How can I create a simple tibble like this one?
A tibble: 3 x 2
name age
1 Lucas 12
2 Jose 45
3 Tales 42
Same as data frames: I need to call function tibble( ) providing vectors:
df1 = tibble(name=c(“Lucas”, “Jose”, “Tales”),
age=c(12,45,42)
)
What is a tibble? What’s the class and type of a tibble object?
Tibble is an enhanced data frame, which makes it easier to work with tidy and consistent data. It is the central data structure of Tidyverse.
A tibble is of type list and class tbl_df which is a subclass of data.frame, with different default behavior.
How can I convert a data frame into a tibble? What’s the package being used?
Using function as_tibble from tibble package:
as_tibble(mtcars)
How can I sort tibble df1 by column age? Which package is being used?
Using function arrange from package dplyr:
df1 %>%
arrange(age)
How can I sort tibble df1 by column age in descending order? Which package is being used?
Using functions arrange and desc from package dplyr:
df1 %>%
arrange(desc(age))
How can I select values only for columns “mpg”, “cyl” and “gear” of tibble cars? What’s the pacakge involved?
Using function select from package dplyr:
cars %>%
select(c(“mpg”, “cyl”, “gear”)) # with vector
or
cars %>%
select(mpg, cyl,gear) # without vector and quotes
How can I select rows from 2 to 5 in my tibble df1? What’s the required package?
Using function slice from dplyr:
df1 %>%
slice(2:5)
Is it recommended working with row names in tibbles? Why?
Generally, it is best to avoid row names, because they are basically a character column with different semantics than every other column. They are removed by tibble when subsetting with the [ operator.
Tibble provides functions to convert row names to an explicit column and vice versa, if needed.