Handling Data Flashcards
1
Q
data()
A
- Function that reveals R’s built-in data sets.
- Most packages have their own built in data sets.
2
Q
save()
A
- Function that allows the selective saving of objects.
- save(junk, junk2, file=”junky.RData”) = will sepcifically save the object junk and junk2 to an external file present in the working directory named junky.RData.
- There does not need to be a relationship between the external name and its contents.
3
Q
load()
A
- Function that loads a saved R object.
- load(“junky.RData”) = reloads the objects present in junky.RData
4
Q
load(url(“website_url”))
A
- Functions (nested) that allow you to remote load an R Data set.
- Always check the results of your remote load by reviewing the environment tab
5
Q
Reading Excel Files
A
- Useful packages: XLconnect, xlsx, gdata, readXL, etc.
- Function to read data into excel: read_excel(“file_name”, sheet = number, col_names = TRUE, col_types = NULL, na = “”, skip = something or nothing)
- Can always learn more about this function using help(read_excel).
6
Q
Reading Text Files
A
- Extensions: .txt, .csv, .dat, .tab.
- In the Import Dataset tab in the Environment tab, if you have a Local Text File, R will automatically load it (?).
- If the file is a Web URL:
1) Enter the URL.
2) Choose heading “Yes” if variable names are present.
3) Strings as Factors unchecked.
4) Set encoding to automatic.
5) R will correctly identify the dataset as tab separated. - You can check your results using the View() function.
7
Q
How to Read Text Files (functions)
A
1) read.csv()
2) read.delim()
3) read.table()
8
Q
Reading Non-conforming Yet Formatted Data
A
1) readLines()
2) scan()
9
Q
read.csv(“url_name”)
A
- Function used for reading comma separated (typically have .csv extension) files.
10
Q
read.delim(“url_name”)
A
- Function
11
Q
read.table(“url_name”)
A
- Function used for most any type of text file as long as a separator exists + more general than either read.csv or read.delim.
12
Q
readLines(“url_name”)
A
- Function that will read all or part of a text file.
- Useful for data files that are irregular, have no delimiter (commas or a separator) or do not conform to a standard format.
- Will read virtually any file.
13
Q
scan(“url_name”)
A
- Function similar to readLines but will keep a record of the structure or patterning in the data if your need to keep that information.
- More restrictive than readLines().
14
Q
NA / Missing Values
A
- A place within a vector may be reserved for the missing element by assigning the special value NA.
- Usually any operation involving NA results in an NA.
- All types of vectors (character, logical, numeric) can use NA to represent missing values.
15
Q
Numeric Vectors and NA Entries
A
- Includes the symbols -Inf and Inf (positive and negative “infinity”) and NaN (not a number).
16
Q
na.omit() / complete.cases()
A
- Functions that will remove observations with missing values from a dataset.
17
Q
Package foreign
A
- Package that allows us to read data files by competitors of R (DBF, Stata, Epi Info, Minitab, Octave SPSS, SAS and Systat).
- SAS files must be in transport format (.xport file) for package foreign.
- SPSS files require a use of an option to be converted into data frames.
- Stata files can be read up to Stata 12.