3 | R: Data Flashcards

Question

R: How can you make one dataframe out of two? Give an example

Answer 1

``` > load('../../../data/authors.RData') > authors surname nationality deceased 1 Tukey US yes 2 Venables Australia no 3 Tierney US no 4 Ripley UK no 5 McNeil Australia no > head(books,4) name title other.author 1 Tukey Exploratory Data Analysis 2 Venables Modern Applied Statistics ... Ripley 3 Tierney LISP‐STAT 4 Ripley Spatial Statistics merge(authors,books,by.x="surname",by.y="name") surname nationality deceased title other.author 1 McNeil Australia no Interactive Data Analysis 2 Ripley UK no Spatial Statistics 3 Ripley UK no Stochastic Simulation 4 Tierney US no LISP‐STAT 5 Tukey US yes Exploratory Data Analysis 6 Venables Australia no Modern Applied Statistics ... ```

Answer 2

matrices - always 2 dimensional - only 1 type (usually numeric)

Answer 3

> mt=as.matrix(mt)

Answer 4

No atomic error vector - matrices are internally saved as vectors (very efficient) so $ operator doesn't work must use brackets and col/row names or indices [ ]

Answer 5

* Using aggregate makes no sense here because we only have 1 type * There is no column with categories * remember all columns in a matrix must have the same type

Answer 6

> head(apply(mt,1,sum),8) (first 8 values) > head(apply(mt,2,sum),5)

Answer 7

lapply listapply = for every list element advantage: don't need to loop over elements --> faster computation ``` > childs=list(Fritz=c("Max","Moritz"), + Klaus=c("Otto","Emi","Karl","Lotta")) > lapply(childs,length) $Fritz [1] 2 $Klaus [1] 4 > lapply(childs,length)$Klaus [1] 4 ``` rapply ``` > nc.childs=list(Fritz=list(Gerda=c("Max","Moritz"),Frieda= + c("Else")),Klaus=list(Marlene=c("Otto","Emi","Karl","Lotta"))) > rapply(nc.childs,length) Fritz.Gerda Fritz.Frieda Klaus.Marlene 2 1 4 ```

Answer 8

``` > D %*% M ``` issue: ``` > N = D %*% M > identical(M, N) # FALSE arrrghhh! Floating point issue! [1] FALSE > N == M ``` rounding issues ! representation of floats is not 100% exact solution: ``` > all.equal(M, N) # internal small rounding ```

Answer 9

table ftable apply matrix (and then table)

Answer 10

* contingency tables for counts * each combination of factor levels is counted

Answer 11

tables are not a dataframe, but rather contingency table which contains counted items for different categories

Answer 12

str: str displays structures of R objects. mostly used for displaying the contents of a list. str () is an alternative function to display the summary of the output produced, especially when the data set is huge, eg more than two dimensions eg: ``` > str(Titanic) 'table' num [1:4, 1:2, 1:2, 1:2] 0 0 35 0 0 0 17 0 118 154 ... - attr(*, "dimnames")=List of 4 ..$ Class : chr [1:4] "1st" "2nd" "3rd" "Crew" ..$ Sex : chr [1:2] "Male" "Female" ..$ Age : chr [1:2] "Child" "Adult" ..$ Survived: chr [1:2] "No" "Yes" ```

Answer 13

It 'flattens' data so that it will have 2 dimensions eg: ``` > Titanic[c("1st","2nd"),"Male",,] , , Survived = No Age Class Child Adult 1st 0 118 2nd 0 154 , , Survived = Yes Age Class Child Adult 1st 5 57 2nd 11 14 > ftable(Titanic[c("1st","2nd"),"Male",,]) Survived No Yes Class Age 1st Child 0 5 Adult 118 57 2nd Child 0 11 Adult 154 14 ```

Answer 14

prop.table(table)

Answer 15

change the scope of variables: with imports within the current evaluation parenthesis the inner variables to global scope

Answer 16

with, attach, detach

Answer 17

attach imports permanently the inner variables into global scope detach forgets the imported variables Hint: don’t use attach and detach

Answer 18

intersect() union() setdiff() setequal()

Answer 19

* confusing feature of R * && works on first vector element only * returns FALSE here the condition here is not TRUE (52) * speed up to not go through long vectors * but in R 4.3 now an error * good - I mostly did this by accident‼ in many languages we use && but don't use it in R

Answer 20

write.table → data frame in a file with tabstop as a separator save → saved as binary - can't be inspected from terminal. 1/3 of the size of tabular file ``` > head(nym[order(nym$age),],n=2) place gender age home time 116 23373 Male 18 MEX 408.3333 182 8823 Female 20 MEX 244.8833 > nym2=nym[order(nym$age),] > write.table(nym2,file="nym2.tab",sep="\t",quote=FALSE) > save(nym2,file="nym2.RData") ```

Answer 21

read.table always produces a data frame - need to convert ```` > mt[1:2,1:4] RedMeat WhiteMeat Eggs Milk Albania 10.1 1.4 0.5 8.9 Austria 8.9 14.0 4.3 19.9 > write.table(mt,file="protein‐consumption2.tab", + sep="\t",quote=FALSE) > mt2=read.table("protein‐consumption2.tab", header=TRUE, + stringsAsFactors=TRUE) > mt2[1:2,1:4] RedMeat WhiteMeat Eggs Milk Albania 10.1 1.4 0.5 8.9 Austria 8.9 14.0 4.3 19.9 > class(mt) [1] "matrix" "array" > class(mt2) [1] "data.frame" > mt2=as.matrix(mt2) > mt2[1:2,1:4] RedMeat WhiteMeat Eggs Milk Albania 10.1 1.4 0.5 8.9 Austria 8.9 14.0 4.3 19.9 > class(mt2) [1] "matrix" "array" > sum(mt2‐mt) [1] 0 > identical(mt2,mt) # or use all.equal to be sure [1] TRUE ```

Answer 22

use for tables with more than 2 dimensions - flattens them - better way to present data eg ``` > ftable(Titanic["1st",,,]) > write.ftable(ftable(Titanic["1st",,,]), + file="ftable.ftab") > sam=read.ftable("ftable.ftab") > sam

Answer 23

decimal separators instead of the usual dot (.). When read.table() is used without specifying the decimal separator, R assumes commas are column delimiters, making the entire dataset characters instead of numbers. The issue is fixed by explicitly setting dec=',' in read.table(), telling R that commas indicate decimal points.

Answer 24

“Serialization Interface for Single Objects" - R's own data file format saveRDS(object, file="filename.RDS") saves a single object to a file. readRDS("filename.RDS") loads the object into a variable (does not change any existing variables). Use saveRDS() and readRDS() for single objects when you want explicit assignment and to avoid accidental overwrites. Yes, RDS files save only single objects - but you can create lists!

Answer 25

it will overwrite variables if they exist

Answer 26

many different packages for this eg: ``` > install.packages('openxlsx') > library(openxlsx) > sample=read.xlsx("../../../data/sample.xlsx") ```

Answer 27

a. Tab files – you can use the read.table command b. RData files - You can use the load command to import data in RData files c. RDS files - readRDS is an Inbuild function The commands load, read.table, readRDS can be used to Import RData, Tab and RDS files into R without installing additional packages.

Answer 28

loads a single object into a given variable name

Answer 29

loads a single object without variable assignment

Answer 30

loads data from a flat text file

Answer 31

loads and execute R code from a flat file

Answer 32

loads old session R commands into the current sesssion

Answer 33

order – indices will be returned (sort → values)

Answer 34

To get the average value of all columns of a numerical matrix we usually use the apply function together with the mean function, whereas for calculating group based means of a data frame for a numerical vector in this data frame against a factor vector in the data frame we use the aggregate function. To add new rows for both data structures we use the rbind function whereas for new columns we use the cbind function. How many rows and columns are in both structures we can find out using the dim function.

Answer 35

To change the name of the last column of a data frame df to a name ‘last’ we use the following construct: ``` colnames(df)[length(df)]=”last” ```

Answer 36

To remove the column with the name ‘last’ from the df data frame we use the following code: ``` Df$last==NULL ```

Answer 37

intersect()

Answer 38

To get the elements of vector 1 which are not in vector 2 we can use the setdiff command setdiff()

Answer 39

To display tables with more than two dimensions we use the ftable command ftable()

Answer 40

apply(), sum()

Answer 41

save – Saves multiple R objects to file in binary format (.RData). saveRDS – Saves single R object to file in binary format (.rds), allowing selective loading. dev.copy2pdf – Copies current graphics device output to PDF file. write.ftable – Writes flat contingency table (ftable) to a text file. write.table – Exports df or matrix to a text file (CSV-like). save.image – Saves entire current R workspace (all objects) to .RData file. savehistory – Saves command history to a file (.Rhistory)