R Flashcards
argument
(r) information that a function needs in order to run
variable
representation of a value in R that can be stored for use later during programming (can also be called OBJECT)
vector
a group of data elements of the same type stored in a sequence in R
Pipe
a tool in R for expressing a sequence of multiple operations, represented with “%>%”; takes the output of one statement and makes it the input of the next statement
The 4 types of Vectors
logical (TRUE, FALSE), character (words), integer (1L, 2L, 3L), double (2.5, 4.561)
create a data frame
data.frame(x=c(1,2,3), y=c(1.4, 5.4, 10.4)
create a new folder
dire.create (“destination_folder”)
create a file
file.create(“new_word_file.docx”)
copy a file
file.copy (“new_text_file.txt”, “destination_folder”)
OR operator
I or II
NOT operator
!
common function to preview data (1st 6 rows)
head()
these functions return summary - high level view of each column in your data arranged horizontally
str()- horizontal summary, and glimpse()
function for returning a list of column names from dataset
colnames()
renaming a column
rename(diamonds, carat_new = carat, cut_new = cut)
summarizing your data
summarize(diamonds, mean_carat = mean(carat))
separates plots by a charactaristic
+ facet_wrap(~cut)
code for using diamonds dataset, plotting x axis carat, , y axis price, and dots are colored differently for different cuts, scatter plot, different plots for different cuts
ggplot(data = diamonds, aes(x = carat, y = price, color = cut)) +
geom_point() +
facet_wrap (~cut)
packages (R)
units of reproducible R code
vignette
documentation that acts asa guide to an R package
browseVignettes()
filter by vitamin c dose 0.5
filtered_tg
sort by tooth length (after a filter)
arrange(filtered_tg, len)
Pipe operator shortcut
ctrl + shift + m
switch between a date-time to a date
as_date() (in the lubridate package)
data frame
collection of columns
tibbles
dataframes in the tidyverse you can’t change the type of info (number - string)
how to add a column to a dataframe
mutate(dataframe, column_new = column*100)
install tidyverse
install.packages(“tidyverse”)
after you’re done installing tidyverse, what is the next step?
load it: library(tidyverse)
Tibbles
only pull up first 10 rows of a dataset.
Never change the names of your variables,
or the data types of your inputs.
Part of tidyverse
how to read a csv file
read_csv()
import “hotel_bookings.csv” into R and save it as a data frame titled ‘bookings_df’
bookings_df
if you want to create another (smaller) data frame from the existing dataframe (for example wit hthe “adr” and “adults” columns of the bookings_df dataframe).
new_df
add a column to the dataframe: total = adr/adults
mutate(new_df, total= ‘adr’/adultsread
skimr package
makes summarizing data really easy, lets you skim through it more quickly
janitor package
has functions for cleaning data
functions to get summaries of our dataframes
skim_without_charts(), glimpse(), head(), str(), select()
packages that simplify data cleaning tasks
skimr and janitor
select()
specifies certain columns or excludes columns
if you want all the columns in the penguins dataset EXCEPT the species column
penguins %>%
select( - species)
rename a column (in penguins dataset)
penguins %>%
rename(island_new = island)
make all columns uppercase (or lowercase)
rename_with(penguins, toupper) (or tolower)
clean_names()
ensures only characters, numbers and underscores in the names
%%
returns remainder after division