V3 Flashcards
Data analysis using R
data from R packages
- can be loaded with the data() function
- example : data (USArrests)
how to load a small part of data set (code)
- head(dataset)
read data (code)
- read.table() and read.csv()
my data
row.names()
a vector containing the row names
a single number giving the column of the table which contains the row names
col.names()
a vector giving the column names
check.names()
names of the variables in the data frame are checked to ensure that they are syntactically valid variable names
skip()
the number of lines of the data file to skip before beginning to read data
how to define what classes we expect in the different colums (code)
- numerical, logical, factor, character
- colClasses = c(“Character”, “factor”, …)
how to get dimension of table (code)
dim()
load in text files on a line by kine basis, why and (code)
readLines(“my text.txt”, n = 10) - read first 10 lines
readLines(“my text.txt”, n = -1) - read everything
- for big amounts of row wise data , data might be to big to load at once
function to read zip file
gzfile(“data.gz”, “r”) - “r” for read (w for write)
how to use url as a connection
url()
how to read excel file
- no native suport for xlsx in R
- available in libraries : xlsx - use4 java to read and write xlsx files
- slow and unreliable
- just export xlsx files as csv
loading binary files
readBin() - 2 dimensional array of pixels - each pixel defined by 3 busted (B, G, R) readBin, before remove header image
%in%
- allows you to filer A %in% B
- which columns of matrix A in matrix B
AinB