Handling Data Flashcards
data()
- Function that reveals R’s built-in data sets.
- Most packages have their own built in data sets.
save()
- Function that allows the selective saving of objects.
- save(junk, junk2, file=”junky.RData”) = will sepcifically save the object junk and junk2 to an external file present in the working directory named junky.RData.
- There does not need to be a relationship between the external name and its contents.
load()
- Function that loads a saved R object.
- load(“junky.RData”) = reloads the objects present in junky.RData
load(url(“website_url”))
- Functions (nested) that allow you to remote load an R Data set.
- Always check the results of your remote load by reviewing the environment tab
Reading Excel Files
- Useful packages: XLconnect, xlsx, gdata, readXL, etc.
- Function to read data into excel: read_excel(“file_name”, sheet = number, col_names = TRUE, col_types = NULL, na = “”, skip = something or nothing)
- Can always learn more about this function using help(read_excel).
Reading Text Files
- Extensions: .txt, .csv, .dat, .tab.
- In the Import Dataset tab in the Environment tab, if you have a Local Text File, R will automatically load it (?).
- If the file is a Web URL:
1) Enter the URL.
2) Choose heading “Yes” if variable names are present.
3) Strings as Factors unchecked.
4) Set encoding to automatic.
5) R will correctly identify the dataset as tab separated. - You can check your results using the View() function.
How to Read Text Files (functions)
1) read.csv()
2) read.delim()
3) read.table()
Reading Non-conforming Yet Formatted Data
1) readLines()
2) scan()
read.csv(“url_name”)
- Function used for reading comma separated (typically have .csv extension) files.
read.delim(“url_name”)
- Function
read.table(“url_name”)
- Function used for most any type of text file as long as a separator exists + more general than either read.csv or read.delim.
readLines(“url_name”)
- Function that will read all or part of a text file.
- Useful for data files that are irregular, have no delimiter (commas or a separator) or do not conform to a standard format.
- Will read virtually any file.
scan(“url_name”)
- Function similar to readLines but will keep a record of the structure or patterning in the data if your need to keep that information.
- More restrictive than readLines().
NA / Missing Values
- A place within a vector may be reserved for the missing element by assigning the special value NA.
- Usually any operation involving NA results in an NA.
- All types of vectors (character, logical, numeric) can use NA to represent missing values.
Numeric Vectors and NA Entries
- Includes the symbols -Inf and Inf (positive and negative “infinity”) and NaN (not a number).
na.omit() / complete.cases()
- Functions that will remove observations with missing values from a dataset.
Package foreign
- Package that allows us to read data files by competitors of R (DBF, Stata, Epi Info, Minitab, Octave SPSS, SAS and Systat).
- SAS files must be in transport format (.xport file) for package foreign.
- SPSS files require a use of an option to be converted into data frames.
- Stata files can be read up to Stata 12.
Package haven
- Package that will only read data files from Stata, SPSS and SAS.
- Can also be used to write Stata and SPSS files.
- SAS files can be read without conversion (.sas7bdat files).
- SPSS files are converted into data frames automatically.
- Stata files can be read up to Stata 13.
Stata
- A competing statistical programming software.
- R can both read and write files to Stata.
read.dta()
- Function in package foreign that can read local Stata files or remotely stored Stata files using a web address.
- Result in a data frame.
read_dta() + read_stata()
Function in package haven that can read local Stata files or remotely stored Stata files using a web address.
- Result in a data frame.
write.dta()
- Function in package foreign that will export R data to Stata.
write_dta()
- Function in package haven that will export R data to Stata.
SPSS
- A competing statistical programming software.
- R can both read and write files to SPSS.
- Package haven does a much better job of reading these files than package foreign.