Importing data into R Flashcards
function to load in CSV files
read.csv(“data.csv”, stringsAsFactors = FALSE)
data must be in your working directory, or the path must be specified. Strings as factors default is TRUE, sets strings as factors
List the files in your working directory
dir()
import tab delimited data
read.delim(x, sep = “/t” (space), header = TRUE)
import any tabular data
read.table(x, sep = “”, header = FALSE, stringsAsFactors = TRUE, col.names = “”)
which.min
returns the index of the smallest value in a vector
ex: cars[which.min(cars$MPG),] will return the value which the minimum MPG in the cars vector
colClasses
an argument in the read.delim & read.table functions. Use this argument to specify the data class of the variables you are importing
ex: read.delim(x, sep = “”, colClasses = c(“character”, “logical”, “numeric”))
Hadley’s data import package
readr
read_csv()
readr version of read.csv
read_csv(“mydata”)
loads data as a “tibble”
read.delim for readr
read_tsv (tab seperated value)(“potatoes.txt”, col_names = c(“type”, “weight”))
col_types
argument to specify the variable classes in readr package
read_delim
the main import function in the readr package. Similiar to read.table
Must specify the file and delim arguments
ex/ read_delim(“cars.txt”, delim = “/t”, col_names = c(“automaker”,”mpg”))
skip
skip rows in your import functions.
ex: skip = 5 will skip the first 5 rows and then begin reading in data
n_max
specifies the number of rows you want to read in, often used with skip
ex: read_delim(“cars.txt”, delim = “/”, skip = 2, n_max = 3)
skips the first two rows and reads in rows 3,4, and 5
readxl
Haddley’s excel data import package
function to list different sheets in excel: readxl package
excel_sheets()
read_excel()
import excel data into R
import data from the second sheet in an excel doc
read_excel(“cars.xls”, sheet = 2)
pop_list
Utilize lapply with readxl functions to read in all sheets in an excel file at once. Must specify the path as a separate argument because the excel_sheets argument only lists the sheets in the file, it does not list the file path
col_types
argument in read_excel. specify the data type of columns “text”, “numeric”, “date”, “blank”
col_types = (“blank”)
read_excel will skip the import of a column with “blank” as col_type.
ex: read_excel(“my.data.xlsx”, col_types = c(“numeric”, “blank”)) will only import column 1 as a numeric column from the excel document
read_excel(“data.xlsxl”, skip = 2)
skip the first two rows of an excel document and then begin importing data
XLConnect
a package that creates a bridge between r session and excel
XLConnct function that builds a bridge betwen R and excel
loadWorkbook()
XLConnect function lists the available sheets in an excel workbook. Requires an XLConnect workbook object as first argument (created through loadWorkbook() function)
getSheets()
XLConnect function loads worksheets in as data. Requires an XLConnect workbook object as first argument
readWorksheet()
arguments in readWorksheet( my_book, startCol = 1, endCol = 3, starRow = 1, endRow = 3)
import in data from the “my_book” workbook object starting in column 1 and ending in column 3. ie only import data from column 1:3 and row 1:3
XLConnect function to add new sheets to an excel workbook object (the bridget created through loadWorkbook())
createSheet( workbook object, “new_SheetName”)
add data to an XLConnect workbook
writeWorksheet(workbook object, new data, “sheet_to_write_to)
save an XLConnect workbook to a new file
saveWorkbook(my_workbook, “filename.xlsx”)
rename a sheet using XLConnect
renameSheet(my_workbook, sheet = 1, newName = “cars”)