Importing data into R Flashcards
function to load in CSV files
read.csv(“data.csv”, stringsAsFactors = FALSE)
data must be in your working directory, or the path must be specified. Strings as factors default is TRUE, sets strings as factors
List the files in your working directory
dir()
import tab delimited data
read.delim(x, sep = “/t” (space), header = TRUE)
import any tabular data
read.table(x, sep = “”, header = FALSE, stringsAsFactors = TRUE, col.names = “”)
which.min
returns the index of the smallest value in a vector
ex: cars[which.min(cars$MPG),] will return the value which the minimum MPG in the cars vector
colClasses
an argument in the read.delim & read.table functions. Use this argument to specify the data class of the variables you are importing
ex: read.delim(x, sep = “”, colClasses = c(“character”, “logical”, “numeric”))
Hadley’s data import package
readr
read_csv()
readr version of read.csv
read_csv(“mydata”)
loads data as a “tibble”
read.delim for readr
read_tsv (tab seperated value)(“potatoes.txt”, col_names = c(“type”, “weight”))
col_types
argument to specify the variable classes in readr package
read_delim
the main import function in the readr package. Similiar to read.table
Must specify the file and delim arguments
ex/ read_delim(“cars.txt”, delim = “/t”, col_names = c(“automaker”,”mpg”))
skip
skip rows in your import functions.
ex: skip = 5 will skip the first 5 rows and then begin reading in data
n_max
specifies the number of rows you want to read in, often used with skip
ex: read_delim(“cars.txt”, delim = “/”, skip = 2, n_max = 3)
skips the first two rows and reads in rows 3,4, and 5
readxl
Haddley’s excel data import package
function to list different sheets in excel: readxl package
excel_sheets()