Intro to R Flashcards
Install packages
install.packages(“packagename”)
Load packages
library(packagename)
Set working directory
setwd(location)
Load data from computer
read.csv(“filename”)
Load pre-assigned dataset
data(“filename”)
Select specific value
select(filename, row, column)
Summary statistics
summary(datafile).
See specific rows
head(datafile, #)
tail(datafile, #)
Omit NA values
na.omit(datafile)
Remove columns
dplyr::select(dataset, -contains(“colname”))
Filter values
filter(datafile, col <= #)
Histogram
hist(dataset$col)
Create new column/mutate using a formula
dataset %>% mutate (newcolname= (formula))
Determine if columns are identical
identical(dataset$col1, dataset$col2)
Save data as CSV
write.csv(newname)
returns the number of columns in the dataset
ncol(datafile)
returns the number of rows in the dataset
nrow(datafile)
returns the names of the columns
colnames(datafile)
provides information about the data types of the columns
str(datafile)
Replicability
the ability to recreate your results using a different dataset
Reproducibility
the ability to recreate you results using the same dataset
IDE
Integrated development environmen (r-studio)
Packages
extensions of the r-lanuage that allow you to run additional functions
R comes with built-in functions, but if you want to do functions outside that pre-set, you can download packages
Clean code
making your code easy to understand and consistent, and therefore reusable
7 tenets of clean code
Meaningful Variable and Function Names: each name of the variable or function should convey a purpose, do not make it vague
Modularization: If the script is too long, break up so its easier to understand
Consistent Formatting: Choose a style and stick to it
Comments and Documentation: explain what and why you did
Avoid Magic Numbers and Hard Coding: Don’t use unexplained numeric values, assign them to a constant/a variable name
Avoid Duplicate Code.
The Single Responsibility Principle (SRP): Each script should have one clear purpose or responsibility