Handling Data Flashcards

1
Q

data()

A
  • Function that reveals R’s built-in data sets.

- Most packages have their own built in data sets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

save()

A
  • Function that allows the selective saving of objects.
  • save(junk, junk2, file=”junky.RData”) = will sepcifically save the object junk and junk2 to an external file present in the working directory named junky.RData.
  • There does not need to be a relationship between the external name and its contents.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

load()

A
  • Function that loads a saved R object.

- load(“junky.RData”) = reloads the objects present in junky.RData

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

load(url(“website_url”))

A
  • Functions (nested) that allow you to remote load an R Data set.
  • Always check the results of your remote load by reviewing the environment tab
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Reading Excel Files

A
  • Useful packages: XLconnect, xlsx, gdata, readXL, etc.
  • Function to read data into excel: read_excel(“file_name”, sheet = number, col_names = TRUE, col_types = NULL, na = “”, skip = something or nothing)
  • Can always learn more about this function using help(read_excel).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Reading Text Files

A
  • Extensions: .txt, .csv, .dat, .tab.
  • In the Import Dataset tab in the Environment tab, if you have a Local Text File, R will automatically load it (?).
  • If the file is a Web URL:
    1) Enter the URL.
    2) Choose heading “Yes” if variable names are present.
    3) Strings as Factors unchecked.
    4) Set encoding to automatic.
    5) R will correctly identify the dataset as tab separated.
  • You can check your results using the View() function.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How to Read Text Files (functions)

A

1) read.csv()
2) read.delim()
3) read.table()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Reading Non-conforming Yet Formatted Data

A

1) readLines()

2) scan()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

read.csv(“url_name”)

A
  • Function used for reading comma separated (typically have .csv extension) files.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

read.delim(“url_name”)

A
  • Function
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

read.table(“url_name”)

A
  • Function used for most any type of text file as long as a separator exists + more general than either read.csv or read.delim.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

readLines(“url_name”)

A
  • Function that will read all or part of a text file.
  • Useful for data files that are irregular, have no delimiter (commas or a separator) or do not conform to a standard format.
  • Will read virtually any file.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

scan(“url_name”)

A
  • Function similar to readLines but will keep a record of the structure or patterning in the data if your need to keep that information.
  • More restrictive than readLines().
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

NA / Missing Values

A
  • A place within a vector may be reserved for the missing element by assigning the special value NA.
  • Usually any operation involving NA results in an NA.
  • All types of vectors (character, logical, numeric) can use NA to represent missing values.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Numeric Vectors and NA Entries

A
  • Includes the symbols -Inf and Inf (positive and negative “infinity”) and NaN (not a number).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

na.omit() / complete.cases()

A
  • Functions that will remove observations with missing values from a dataset.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Package foreign

A
  • Package that allows us to read data files by competitors of R (DBF, Stata, Epi Info, Minitab, Octave SPSS, SAS and Systat).
  • SAS files must be in transport format (.xport file) for package foreign.
  • SPSS files require a use of an option to be converted into data frames.
  • Stata files can be read up to Stata 12.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Package haven

A
  • Package that will only read data files from Stata, SPSS and SAS.
  • Can also be used to write Stata and SPSS files.
  • SAS files can be read without conversion (.sas7bdat files).
  • SPSS files are converted into data frames automatically.
  • Stata files can be read up to Stata 13.
19
Q

Stata

A
  • A competing statistical programming software.

- R can both read and write files to Stata.

20
Q

read.dta()

A
  • Function in package foreign that can read local Stata files or remotely stored Stata files using a web address.
  • Result in a data frame.
21
Q

read_dta() + read_stata()

A

Function in package haven that can read local Stata files or remotely stored Stata files using a web address.
- Result in a data frame.

22
Q

write.dta()

A
  • Function in package foreign that will export R data to Stata.
23
Q

write_dta()

A
  • Function in package haven that will export R data to Stata.
24
Q

SPSS

A
  • A competing statistical programming software.
  • R can both read and write files to SPSS.
  • Package haven does a much better job of reading these files than package foreign.
25
Q

read.spss()

A
  • Function in package foreign that is used to convert SPSS data files into R objects.
  • Can read local or remotely stored files.
  • Needs option to.data.frame = to make the resulting R object a data frame, otherwise read.spss() returns a list.
26
Q

read_spss() + read_sav

A
  • Functions in package haven that read SPSS data files and convert them into data frames.
27
Q

write_sav()

A
  • Function in package haven that can write SPSS files.

- PAckage foreign cannot write SPSS files.

28
Q

Differences Between Packages Foreign and Haven (SPSS)

A
  • Foreign preserves labels but haven converts variables into numbers and stored the labels.
  • Can convert these numbers back into variables using the function as_factor().
29
Q

SAS

A
  • A competing statistical programming software.
  • Package foreign can read SAS data files but those files must exist in a portable format created by SAS software or user must have a copy of SAS software on local computer.
30
Q

read_sas()

A
  • Function in package haven that imports standard SAS files directly to an R data frame.
  • Does not require SAS.
31
Q

write.table(where_to_store, file = “name_of_file, sep = “some separator, like tab (\t)”, row.names = FALSE (will not add a row of numbering))

A
  • Function that will coerce an R object to be a data frame (if it isn’t already one) and then save it as an external text file = can be imported into different non-R applications.
  • Data frame written will have columns space delimited (default) but can be change to tab, comma or virtually anything.
  • Missing values default to NA which can also be changed.
32
Q

write.csv(where_to_store, file = “name_of_file”, row.names = FALSE)

A
  • Function that is a special case of write.table().
  • When used, the result will be a comma separated values file.
  • .csv files are the common format for data that is to be exchanged between software.
33
Q

Results of write.table() or write.csv()

A
  • Tab separated (delimited) files will usually have character quoted values + “jagged” appearance with what appear to be spaces between variables (fields).
  • Comma separated (delimited) files will also have character values quoted but commas separating fields (also appear to be “jagged”) but easier to see the separation between the fields.
  • Both will typically have a first row which has the names of the columns.
34
Q

subset(x, subset, select, drop = FALSE, …)

A
  • Function that generates subsets of a data frame or matrix based upon certain conditions.
  • x: the data frame / object.
  • subset: a condition you want to impose.
  • select =: which columns or rows to take.
  • drop =: if you want to remove any rows or columns.
35
Q

Substitutions

A
  • Can just change the entries of the data frame or matrix to the desired entries.
36
Q

is.na()

A
  • Function which will force all NA’s in an object to be replaced with something of your choice.
37
Q

$

A
  • Operator which can be used to grow data frame columns or take data from columns of the data frame with the desired column name.
  • data_frame_name$column_name
38
Q

Adding new columns + Combining vectors with cbind() + Adding new rows + Combining data frames with rbind()

A
  • Can use cbind() and rowbind() to add new columns or rows to a data frame.
  • junk$Name
39
Q

Merging Data Frames

A
  • Can merge data frames but must have same matching information.
40
Q

List Extraction Methods

A
  • Can use $, [] or [[]].
  • [] and [[]] will extract component entries.
  • $ will extract a vector.
41
Q

Webscrapping

A
  • Needs the XML library.
  • Try to use the readHTMLTable() function first = parses an HTML page and retrieves the table elements.
  • Utilize readHTMLTable(“url_name”, stringsAsFactors = FALSE).
42
Q

unlist()

A
  • Function that converts a list into a vector.
43
Q

Package rvest

A
  • Package that also contains many tools to scrape webpages.