3 | R: Data Flashcards

1
Q

(POLL)

Return or print? To use the value of a function outside of that function what would you use at the end of your function?
- print
- return
- none-of-both

A

return

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

(POLL)

The dim for a data frame returns?
- number of columns
- number of rows
- both, first columns then rows
- both, first rows then columns

A

both, first rows then columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

(POLL)

To display the data from a data frame df for the column col in a sorted manner, what is the right statement to do so?
- df[order(df$col)]
- df[order(df$col),]
- df[sort(df$col)]
- df[sort(df$col),]
- sort(df)

A

df[order(df$col),]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

(POLL)

To summarise a column of a vector by one or more categories we use?
- aggregate
- apply
- print
- summary

A

aggregate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

(POLL)

What is the command to combine two data frames by a column which have the same set of values?
- attach
- cbind
- join
- merge
- rbind

A

merge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

(POLL)

What is the command you would use to get the sum of all row values for a matrix?
- aggregate
- apply
- sum
- summary

A

apply

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

(POLL)

To display tables with more than 2 dimensions we use:
- cat
- ftable
- Summary
- Table

A

ftable

(Summary: might give unwanted info
Table: maybe also)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

(POLL)

To write a single data frame to the file system in a compressed compact file we use …
- save
- Save.image
- saveRDS
- writehistory()
- Write.table

A

save, saveRDS

(Save.image: saves whole workspace!
writehistory()
Write.table: uncompressed!)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

R:

Four data frames we worked with?

A

● survey, nym.2002 - data frames with different column types
● authors, books - two data frames to be merged
● protein-consumption - matrix of percentages for eating
● Titanic - contingency table for people on the ship belonging to certain categories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

R:

How to return multiple objects in a function?

A

return(list(a, b, c, etc))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

R:

How to create a date frame from survey in a .tab file?

A
> survey = read.table("../../../data/survey‐2019‐11.tab",
> header=TRUE, stringsAsFactors=TRUE)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

R:

How to check dimensions of a dataframe?

A

dim(dataframe)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

R:

Two ways to check number of rows in a data frame?

A
> dim(dataframe)[1]
> nrow(dataframe)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

R:

What is ordering? Code to order a dataframe?

A
  • gives the indices of elements in some order
  • does not change the data frame

eg:
~~~
head(somedf[order(somedf$someCol),])
~~~

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

R:

What is the difference between sorting and ordering?

A

sort ‐ gives back values and makes changes

order - gives back indices and does not make changes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

R:

Basic aggregate() Usage

What is the general syntax of the aggregate() function in R?

A
aggregate(numeric_vector, by = list(categorical_vector), FUN, ...)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

R:

How do you calculate the mean age by gender from the nym dataset?

A
aggregate(nym$age, by = list(nym$gender), mean)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

R:

How do you calculate the mean for place, age, and time, grouped by gender with trimmed mean (10%)?

A
aggregate(nym[, c('place', 'age', 'time')], by = list(nym$gender), mean, trim = 0.1)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

R:

How can you use with() to avoid $ notation in aggregate()?

A
with(survey, aggregate(cm, by = list(gender), mean))
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

R:

How do you replace country codes in nym$home with USA for two-letter codes and World otherwise?

A
usa = as.character(nym$home)
usa[grep("^[A-Z][A-Z]$", nym$home)] = "USA"
usa[-grep("^[A-Z][A-Z]$", nym$home)] = "World"
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

R:

How do you calculate the mean place, age, and time, grouped by gender and home (USA/World)?

A
aggregate(nym[, c('place', 'age', 'time')], by = list(nym$gender, as.factor(usa)), mean)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

R:

nym dataset:

How do you count the number of observations for each gender-home combination?

A
aggregate(nym[, c('age')], by = list(nym$gender, as.factor(usa)), length)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

R:

Which function can you use to add columns to a dataframe?

give an example

and rows?

A

using cbind()

> gender=c("male","female", "female","male")
> ages=c(12,23,22,11)
> df=data.frame(age=ages, gender=gender)
> colors=c("yellow","orange","yellow","green")
> df=cbind(df,color=colors)

rows analogously with rbind()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

R:

What can you do with cbind() and rbind()? When does this not work?

A

to add rows or columns to a data frame.

Don’t work if dimensions are not the same –> but smartbind() from ‘gtools’ package does this

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

R:

How can you make one dataframe out of two? Give an example

A
> load('../../../data/authors.RData')
> authors
surname nationality deceased
1 Tukey US yes
2 Venables Australia no
3 Tierney US no
4 Ripley UK no
5 McNeil Australia no
> head(books,4)
name title other.author
1 Tukey Exploratory Data Analysis <NA>
2 Venables Modern Applied Statistics ... Ripley
3 Tierney LISP‐STAT <NA>
4 Ripley Spatial Statistics <NA

> merge(authors,books,by.x="surname",by.y="name")
surname nationality deceased title other.author
1 McNeil Australia no Interactive Data Analysis <NA>
2 Ripley UK no Spatial Statistics <NA>
3 Ripley UK no Stochastic Simulation <NA>
4 Tierney US no LISP‐STAT <NA>
5 Tukey US yes Exploratory Data Analysis <NA>
6 Venables Australia no Modern Applied Statistics ... 
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

R:

Compare dataframes and matrices

A

matrices
- always 2 dimensional
- only 1 type (usually numeric)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

R:

How can you convert a data frame to a matrix?

A

> mt=as.matrix(mt)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

R:

Can you use $ operator on matrices?

A

No

atomic error vector - matrices are internally saved as vectors (very efficient) so $ operator doesn’t work

must use brackets and col/row names or indices [ ]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

R:

Would you use summary or aggregate with matrices?

A
  • Using aggregate makes no sense here because we only have 1 type
  • There is no column with categories
  • remember all columns in a matrix must have the same type
30
Q

R:

How could you get sums of all columns or rows in a matrix?

A

> head(apply(mt,1,sum),8)
(first 8 values)

> head(apply(mt,2,sum),5)

31
Q

R:

What are some useful variants of apply? Examples of usage?

A

lapply

listapply = for every list element

advantage: don’t need to loop over elements –> faster computation

> childs=list(Fritz=c("Max","Moritz"),
\+ Klaus=c("Otto","Emi","Karl","Lotta"))
> lapply(childs,length)
$Fritz
[1] 2
$Klaus
[1] 4
> lapply(childs,length)$Klaus
[1] 4

rapply
~~~
> nc.childs=list(Fritz=list(Gerda=c(“Max”,”Moritz”),Frieda=
+ c(“Else”)),Klaus=list(Marlene=c(“Otto”,”Emi”,”Karl”,”Lotta”)))
> rapply(nc.childs,length)
Fritz.Gerda Fritz.Frieda Klaus.Marlene
2 1 4
~~~

32
Q

R:

How do you do matrix multiplication ? What issue can arise?

A
> D %*% M

issue:
~~~
> N = D %*% M
> identical(M, N) # FALSE arrrghhh! Floating point issue!
[1] FALSE
> N == M
~~~
rounding issues !
representation of floats is not 100% exact

solution:
~~~
> all.equal(M, N) # internal small rounding
~~~

33
Q

R:

Different ways to create a table in R?

A

table
ftable
apply
matrix (and then table)

34
Q

R:

What is in a table?

A
  • contingency tables for counts
  • each combination of factor levels is counted
35
Q

R:

table vs data frame?

A

tables are not a dataframe, but rather contingency table which contains counted items for different categories

36
Q

R:

What is str()?

A

str: str displays structures of R objects.

mostly used for displaying the contents of a list.

str () is an alternative function to display the summary of the output produced, especially when the data set is huge, eg more than two dimensions

eg:
~~~
> str(Titanic)
‘table’ num [1:4, 1:2, 1:2, 1:2] 0 0 35 0 0 0 17 0 118 154 …
- attr(*, “dimnames”)=List of 4
..$ Class : chr [1:4] “1st” “2nd” “3rd” “Crew”
..$ Sex : chr [1:2] “Male” “Female”
..$ Age : chr [1:2] “Child” “Adult”
..$ Survived: chr [1:2] “No” “Yes”
~~~

37
Q

R:

What does ftable() do?

A

It ‘flattens’ data so that it will have 2 dimensions

eg:
~~~
> Titanic[c(“1st”,”2nd”),”Male”,,]
, , Survived = No
Age
Class Child Adult
1st 0 118
2nd 0 154
, , Survived = Yes
Age
Class Child Adult
1st 5 57
2nd 11 14

> ftable(Titanic[c(“1st”,”2nd”),”Male”,,])
Survived No Yes
Class Age
1st Child 0 5
Adult 118 57
2nd Child 0 11
Adult 154 14
~~~

38
Q

R:

How to get a proportion table?

A

prop.table(table)

39
Q

R:

How to transpose a table?

40
Q

R:

What does with do ?

A

change the scope of variables:

with imports within the current evaluation parenthesis the
inner variables to global scope

41
Q

R:

How can you change the scope of variables?

A

with, attach, detach

42
Q

R:

What do attach and detach do?

A

attach imports permanently the inner variables into global scope

detach forgets the imported variables

Hint: don’t use attach and detach

43
Q

R:

What set operations are there? Name 4

A

intersect()

union()

setdiff()

setequal()

44
Q

R:

&& vs & and || vs | ?

A
  • confusing feature of R
  • && works on first vector element only
  • returns FALSE here the condition here is not TRUE (52)
  • speed up to not go through long vectors
  • but in R 4.3 now an error
  • good - I mostly did this by accident‼

in many languages we use && but don’t use it in R

45
Q

R:

Reading and Saving data:
write.table and save - what’s the difference?

Give an example with nym

A

write.table → data frame in a file with tabstop as a separator

save → saved as binary - can’t be inspected from terminal. 1/3 of the size of tabular file

> head(nym[order(nym$age),],n=2)
place gender age home time
116 23373 Male 18 MEX 408.3333
182 8823 Female 20 MEX 244.8833
> nym2=nym[order(nym$age),]
> write.table(nym2,file="nym2.tab",sep="\t",quote=FALSE)
> save(nym2,file="nym2.RData")
46
Q

R:

read.table, write.table - datatypes?

A

read.table always produces a data frame
- need to convert

````
> mt[1:2,1:4]
RedMeat WhiteMeat Eggs Milk
Albania 10.1 1.4 0.5 8.9
Austria 8.9 14.0 4.3 19.9
> write.table(mt,file=”protein‐consumption2.tab”,
+ sep=”\t”,quote=FALSE)
> mt2=read.table(“protein‐consumption2.tab”, header=TRUE,
+ stringsAsFactors=TRUE)
> mt2[1:2,1:4]
RedMeat WhiteMeat Eggs Milk
Albania 10.1 1.4 0.5 8.9
Austria 8.9 14.0 4.3 19.9
> class(mt)
[1] “matrix” “array”
> class(mt2)
[1] “data.frame”
> mt2=as.matrix(mt2)
> mt2[1:2,1:4]
RedMeat WhiteMeat Eggs Milk
Albania 10.1 1.4 0.5 8.9
Austria 8.9 14.0 4.3 19.9
> class(mt2)
[1] “matrix” “array”
> sum(mt2‐mt)
[1] 0
> identical(mt2,mt) # or use all.equal to be sure
[1] TRUE
~~~

47
Q

R:

read.ftable / write.ftable - when to use them? Benefit?

A

use for tables with more than 2 dimensions - flattens them - better way to present data

eg
~~~
> ftable(Titanic[“1st”,,,])
> write.ftable(ftable(Titanic[“1st”,,,]),
+ file=”ftable.ftab”)
> sam=read.ftable(“ftable.ftab”)
> sam

48
Q

R:

dot, comma problems with read.table?

A

decimal separators instead of the usual dot (.).

When read.table() is used without specifying the decimal separator, R assumes commas are column delimiters, making the entire dataset characters instead of numbers.

The issue is fixed by explicitly setting dec=’,’ in read.table(), telling R that commas indicate decimal points.

49
Q

R:

what is RDS

when / how to use?

A

“Serialization Interface for Single Objects” - R’s own data file format

saveRDS(object, file=”filename.RDS”) saves a single object to a file.

readRDS(“filename.RDS”) loads the object into a variable (does not change any existing variables).

Use saveRDS() and readRDS() for single objects when you want explicit assignment and to avoid accidental overwrites.

Yes, RDS files save only single objects - but you can create
lists!

50
Q

R:

what to be careful of with load()?

A

it will overwrite variables if they exist

51
Q

R:

How to load excel files?

A

many different packages for this

eg:
~~~
> install.packages(‘openxlsx’)
> library(openxlsx)
> sample=read.xlsx(“../../../data/sample.xlsx”)
~~~

52
Q

(Quiz 1)
From which data sources R can directly import data without additional packages? Several answers are possible.
a. Tab files
b. RData files
c. RDS files
d. SQL Databases
e. Excel files

A

a. Tab files – you can use the read.table command
b. RData files - You can use the load command to import data in RData files
c. RDS files - readRDS is an Inbuild function
The commands load, read.table, readRDS can be used to Import RData, Tab and RDS files into R without installing additional packages.

53
Q

(Quiz 1)
What does this command do?
readRDS

A

loads a single object into a given variable name

54
Q

(Quiz 1)
What does this command do?
load

A

loads a single object without variable assignment

55
Q

(Quiz 1)
What does this command do?
read.table

A

loads data from a flat text file

56
Q

(Quiz 1)
What does this command do?
source

A

loads and execute R code from a flat file

57
Q

(Quiz 1)
What does this command do?
loadhistory

A

loads old session R commands into the current sesssion

58
Q

(Quiz 2)
Within rectangular braces for sorting / ordering data of a data frame, which is probably the better choice?
* order
* by
* sort

A

order – indices will be returned (sort → values)

59
Q

(Quiz 2)
To get the average value of all columns of a numerical matrix we usually use the ______ function together with the mean function, whereas for calculating group based means of a date frame for a numerical vector in this data frame against a factor vector in the data frame we use the ______ function. To add new rows for both data structures we use the ______ function whereas for new columns we use the ______ function. How many rows and columns are in both structures we can find out using the _____ function.

A

To get the average value of all columns of a numerical matrix we usually use the apply function together with the mean function, whereas for calculating group based means of a data frame for a numerical vector in this data frame against a factor vector in the data frame we use the aggregate function. To add new rows for both data structures we use the rbind function whereas for new columns we use the cbind function. How many rows and columns are in both structures we can find out using the dim function.

60
Q

(Quiz 2)
Complete the code.
To change the name of the last column of a data frame df to a name ‘last’ we use the following construct:
________________

A

To change the name of the last column of a data frame df to a name ‘last’ we use the following construct:
~~~
colnames(df)[length(df)]=”last”
~~~

61
Q

(Quiz 2)
To remove the column with the name ‘last’ from the df data frame we use the following code:
________________

A

To remove the column with the name ‘last’ from the df data frame we use the following code:
~~~
Df$last==NULL
~~~

62
Q

(Quiz 2)
To combine two data frames based on a column with values which can be matched, we use the ______ command.

63
Q

(Quiz 2)
To get the elements of two vectors that are in both vectors, not in only one vector, we use the ______ command.

A

intersect()

64
Q

(Quiz 2)
To get the elements of two vectors that are in one or both vectors we use the ______ command

65
Q

(Quiz 2)
To get the elements of vector 1 which are not in vector 2 we can use the ______ command

A

To get the elements of vector 1 which are not in vector 2 we can use the setdiff command
setdiff()

66
Q

(Quiz 2)
To create a contingency table out of two variables we use the ______function

67
Q

(Quiz 2)
To display tables with more than two dimensions we use the ______ command

A

To display tables with more than two dimensions we use the ftable command
ftable()

68
Q

(Quiz 2)
To extract to variables for a multidimensional table we can use the ______ command together with the ______ command

A

apply(), sum()

69
Q

(Quiz 2)
To create a new contingency table out of four given numbers we can use the _____command.

70
Q

(EXAM - VL3)

1- save
2- saveRDS
3- dev.copy2pdf
4- write.ftable
5- write.table
6- save.image
7- savehistory
?
(2024-2)

A

save – Saves multiple R objects to file in binary format (.RData).

saveRDS – Saves single R object to file in binary format (.rds), allowing selective loading.

dev.copy2pdf – Copies current graphics device output to PDF file.

write.ftable – Writes flat contingency table (ftable) to a text file.

write.table – Exports df or matrix to a text file (CSV-like).

save.image – Saves entire current R workspace (all objects) to .RData file.

savehistory – Saves command history to a file (.Rhistory)