Reading/Writing Data Flashcards

1
Q

What are two functions used to read tabular data?

A

read.table() and read.csv()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

readLines

A

reads lines of a text file into an object

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

source

A

reads R code files
inverse of dump

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

dget

A

reading R code files
inverse of dput

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

load

A

reading in saved workspaces

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

unserialize

A

reading in single R objects in binary form

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

read.table, file arg

A

name of the file or connection to be read

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

read.table, header arg

A

logical argument, indicates whether the first line of the file is a header line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

read.table, sep arg

A

string indicating the column delimeter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

read.table, colClasses arg

A

character vector indicating the class of each column of the dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What’s the default separator for read.table?

A

space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

If there are no comment lines in your file, what’s an easy optimization to make when passing args to read.table?

A

comment.char = “”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why is it good to utilize the colClasses argument for large datasets?

A

Takes a long time for R to read in these data and figure out what the classes for each column are
If you pass in the character vector initially, R can load the dataset ~twice as fast!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What’s the quick and dirty way to figure out the classes for your columns and pass that into the colClasses arg?

A

initial <- read.table(“test”, nrows=100)
classes <- sapply(initial, class)
tabAll <- read.table(“test”, colClasses = classes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do you use the sapply() function, and what is its purpose?

A

sapply lets you apply a function over a list or vector
sapply(vector, function)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How much memory will you need to read in a given file in R?

A

twice the amount of memory for the actual dataset, because of overhead

17
Q

What are the two main functions used for writing R data to file?

A

dump() and dput()

18
Q

What are two of the benefits (and one detriment) of using dump and dput?

A

preserve metadata for data frames, so you don’t have to read the data in from CSV and specify classes, etc.
text == UNIX awesomeness
One downside: not space-efficient, as a binary file would be

19
Q

How do you retrieve an object you’ve written to file using dput?

A

dget()

20
Q

What’s a major distinction between dump and dput?

A

dput can be used on a single R object, where dump can write multiple R objects to file

21
Q

How do you remove objects from a given environment?

A

rm(object)

22
Q

How do you print the first ten lines of a gzipped file called text.gz?

A

con <- gzfile(“text.gz”)
x <- readLines(con, 10)
x