3 | R: Data Flashcards
(POLL)
Return or print? To use the value of a function outside of that function what would you use at the end of your function?
- print
- return
- none-of-both
return
(POLL)
The dim for a data frame returns?
- number of columns
- number of rows
- both, first columns then rows
- both, first rows then columns
both, first rows then columns
(POLL)
To display the data from a data frame df
for the column col
in a sorted manner, what is the right statement to do so?
- df[order(df$col)]
- df[order(df$col),]
- df[sort(df$col)]
- df[sort(df$col),]
- sort(df)
df[order(df$col),]
(POLL)
To summarise a column of a vector by one or more categories we use?
- aggregate
- apply
- print
- summary
aggregate
(POLL)
What is the command to combine two data frames by a column which have the same set of values?
- attach
- cbind
- join
- merge
- rbind
merge
(POLL)
What is the command you would use to get the sum of all row values for a matrix?
- aggregate
- apply
- sum
- summary
apply
(POLL)
To display tables with more than 2 dimensions we use:
- cat
- ftable
- Summary
- Table
ftable
(Summary: might give unwanted info
Table: maybe also)
(POLL)
To write a single data frame to the file system in a compressed compact file we use …
- save
- Save.image
- saveRDS
- writehistory()
- Write.table
save, saveRDS
(Save.image: saves whole workspace!
writehistory()
Write.table: uncompressed!)
R:
Four data frames we worked with?
● survey, nym.2002 - data frames with different column types
● authors, books - two data frames to be merged
● protein-consumption - matrix of percentages for eating
● Titanic - contingency table for people on the ship belonging to certain categories
R:
How to return multiple objects in a function?
return(list(a, b, c, etc))
R:
How to create a date frame from survey in a .tab file?
> survey = read.table("../../../data/survey‐2019‐11.tab", > header=TRUE, stringsAsFactors=TRUE)
R:
How to check dimensions of a dataframe?
dim(dataframe)
R:
Two ways to check number of rows in a data frame?
> dim(dataframe)[1] > nrow(dataframe)
R:
What is ordering? Code to order a dataframe?
- gives the indices of elements in some order
- does not change the data frame
eg:
head(somedf[order(somedf$someCol),])
R:
What is the difference between sorting and ordering?
sort ‐ gives back values and makes changes
order - gives back indices and does not make changes
R:
Basic aggregate() Usage
What is the general syntax of the aggregate() function in R?
aggregate(numeric_vector, by = list(categorical_vector), FUN, ...)
R:
How do you calculate the mean age by gender from the nym dataset?
aggregate(nym$age, by = list(nym$gender), mean)
R:
How do you calculate the mean for place, age, and time, grouped by gender with trimmed mean (10%)?
aggregate(nym[, c('place', 'age', 'time')], by = list(nym$gender), mean, trim = 0.1)
R:
How can you use with() to avoid $ notation in aggregate()?
with(survey, aggregate(cm, by = list(gender), mean))
R:
How do you replace country codes in nym$home with USA for two-letter codes and World otherwise?
usa = as.character(nym$home) usa[grep("^[A-Z][A-Z]$", nym$home)] = "USA" usa[-grep("^[A-Z][A-Z]$", nym$home)] = "World"
R:
How do you calculate the mean place, age, and time, grouped by gender and home (USA/World)?
aggregate(nym[, c('place', 'age', 'time')], by = list(nym$gender, as.factor(usa)), mean)
R:
nym dataset:
How do you count the number of observations for each gender-home combination?
aggregate(nym[, c('age')], by = list(nym$gender, as.factor(usa)), length)
R:
Which function can you use to add columns to a dataframe?
give an example
and rows?
using cbind()
> gender=c("male","female", "female","male") > ages=c(12,23,22,11) > df=data.frame(age=ages, gender=gender) > colors=c("yellow","orange","yellow","green") > df=cbind(df,color=colors)
rows analogously with rbind()
R:
What can you do with cbind() and rbind()? When does this not work?
to add rows or columns to a data frame.
Don’t work if dimensions are not the same –> but smartbind() from ‘gtools’ package does this