Reading/Writing Data Flashcards
What are two functions used to read tabular data?
read.table() and read.csv()
readLines
reads lines of a text file into an object
source
reads R code files
inverse of dump
dget
reading R code files
inverse of dput
load
reading in saved workspaces
unserialize
reading in single R objects in binary form
read.table, file arg
name of the file or connection to be read
read.table, header arg
logical argument, indicates whether the first line of the file is a header line
read.table, sep arg
string indicating the column delimeter
read.table, colClasses arg
character vector indicating the class of each column of the dataset
What’s the default separator for read.table?
space
If there are no comment lines in your file, what’s an easy optimization to make when passing args to read.table?
comment.char = “”
Why is it good to utilize the colClasses argument for large datasets?
Takes a long time for R to read in these data and figure out what the classes for each column are
If you pass in the character vector initially, R can load the dataset ~twice as fast!
What’s the quick and dirty way to figure out the classes for your columns and pass that into the colClasses arg?
initial <- read.table(“test”, nrows=100)
classes <- sapply(initial, class)
tabAll <- read.table(“test”, colClasses = classes)
How do you use the sapply() function, and what is its purpose?
sapply lets you apply a function over a list or vector
sapply(vector, function)
How much memory will you need to read in a given file in R?
twice the amount of memory for the actual dataset, because of overhead
What are the two main functions used for writing R data to file?
dump() and dput()
What are two of the benefits (and one detriment) of using dump and dput?
preserve metadata for data frames, so you don’t have to read the data in from CSV and specify classes, etc.
text == UNIX awesomeness
One downside: not space-efficient, as a binary file would be
How do you retrieve an object you’ve written to file using dput?
dget()
What’s a major distinction between dump and dput?
dput can be used on a single R object, where dump can write multiple R objects to file
How do you remove objects from a given environment?
rm(object)
How do you print the first ten lines of a gzipped file called text.gz?
con <- gzfile(“text.gz”)
x <- readLines(con, 10)
x