Reading/Writing Data Flashcards

Question 1

Q

What are two functions used to read tabular data?

Answer

A

read.table() and read.csv()

Question 2

Q

readLines

Answer

A

reads lines of a text file into an object

Question 3

Q

source

Answer

A

reads R code files
inverse of dump

Question 4

Q

dget

Answer

A

reading R code files
inverse of dput

Question 5

Q

load

Answer

A

reading in saved workspaces

Question 6

Q

unserialize

Answer

A

reading in single R objects in binary form

Question 7

Q

read.table, file arg

Answer

A

name of the file or connection to be read

Question 8

Q

read.table, header arg

Answer

A

logical argument, indicates whether the first line of the file is a header line

Question 9

Q

read.table, sep arg

Answer

A

string indicating the column delimeter

Question 10

Q

read.table, colClasses arg

Answer

A

character vector indicating the class of each column of the dataset

Question 11

Q

What’s the default separator for read.table?

Question 12

Q

If there are no comment lines in your file, what’s an easy optimization to make when passing args to read.table?

Answer

A

comment.char = “”

Question 13

Q

Why is it good to utilize the colClasses argument for large datasets?

Answer

A

Takes a long time for R to read in these data and figure out what the classes for each column are
If you pass in the character vector initially, R can load the dataset ~twice as fast!

Question 14

Q

What’s the quick and dirty way to figure out the classes for your columns and pass that into the colClasses arg?

Answer

A

initial <- read.table(“test”, nrows=100)
classes <- sapply(initial, class)
tabAll <- read.table(“test”, colClasses = classes)

Question 15

Q

How do you use the sapply() function, and what is its purpose?

Answer

A

sapply lets you apply a function over a list or vector
sapply(vector, function)

Question 16

Q

How much memory will you need to read in a given file in R?

Answer

A

twice the amount of memory for the actual dataset, because of overhead

Question 17

Q

What are the two main functions used for writing R data to file?

Answer

A

dump() and dput()

Question 18

Q

What are two of the benefits (and one detriment) of using dump and dput?

Answer

A

preserve metadata for data frames, so you don’t have to read the data in from CSV and specify classes, etc.
text == UNIX awesomeness
One downside: not space-efficient, as a binary file would be

Question 19

Q

How do you retrieve an object you’ve written to file using dput?

Question 20

Q

What’s a major distinction between dump and dput?

Answer

A

dput can be used on a single R object, where dump can write multiple R objects to file

Question 21

Q

How do you remove objects from a given environment?

Answer

A

rm(object)

Question 22

Q

How do you print the first ten lines of a gzipped file called text.gz?

Answer

A

con <- gzfile(“text.gz”)
x <- readLines(con, 10)
x