Reading/Writing Data Flashcards
What are two functions used to read tabular data?
read.table() and read.csv()
readLines
reads lines of a text file into an object
source
reads R code files
inverse of dump
dget
reading R code files
inverse of dput
load
reading in saved workspaces
unserialize
reading in single R objects in binary form
read.table, file arg
name of the file or connection to be read
read.table, header arg
logical argument, indicates whether the first line of the file is a header line
read.table, sep arg
string indicating the column delimeter
read.table, colClasses arg
character vector indicating the class of each column of the dataset
What’s the default separator for read.table?
space
If there are no comment lines in your file, what’s an easy optimization to make when passing args to read.table?
comment.char = “”
Why is it good to utilize the colClasses argument for large datasets?
Takes a long time for R to read in these data and figure out what the classes for each column are
If you pass in the character vector initially, R can load the dataset ~twice as fast!
What’s the quick and dirty way to figure out the classes for your columns and pass that into the colClasses arg?
initial <- read.table(“test”, nrows=100)
classes <- sapply(initial, class)
tabAll <- read.table(“test”, colClasses = classes)
How do you use the sapply() function, and what is its purpose?
sapply lets you apply a function over a list or vector
sapply(vector, function)