Exam Questions Flashcards
(EXAM - VL1)
What is S3 in R? Explain.
(2024-2)
- simpler / informal / lightweight / more flexible OOP system
- generic functions –> objects have different behaviours based on class
- Unlike S4: does not enforce strict definition of objects/methods.
(EXAM - VL2)
Write a function that calculates waist to hip ratio (as a new measurement instead of BMI), return the result where the user can choose how many decimal places he/she wants otherwise the default digits should be 3.
(2024-2, 2020 similar)
wth = function(w, h, digits = 3) { whr <- w / h return(round(whr, digits)) }
(could add check if hip >0)
(EXAM - VL2)
Write a function who calculates the BMI/
BMI = kg/m2
(2020)
bmi = function(w, h, d = 2) { bmi = w / (h^2) return(round(bmi, d)) }
(could add check if height is > 0)
(EXAM - VL2)
Write a function tmean which calculates a trimmed mean by removing the highest and lowest value of a vector and calculating therafter the mean of the remaining vector elements. Return the normal mean if only one or two values are given. You can ignore the NA problem, but you are not allowed to use the mean function of R :( (na problem bonus points)
(2024-1, 2024-sample)
tmean = function(x) { n = length(x) if (n <= 2) { return(sum(x) / n) } x_sort = sort(x) x_trim = x_sorted[-c(1, n)] trim_mean = sum(x_trim) / length(x_trim) return(trim_mean) }
(EXAM - VL2)
Write a function gmean which calculates the geometric mean of a vector. You can ignore the NA problem. Below is the formula for the geometric mean.
(2018)
gmean = function(v) { n = length(v) p = prod(v) return(p^(1/n)) }
(EXAM - VL2)
What is three dot … operator in R (2024-1)
(VL2)
Three dots, or “ellipsis” argument
Used to allow function to accept additional arguments without explicitly defining in function signature.
→ makes functions more flexible and adaptable to different situations.
(EXAM - VL2)
Write a function is_even() that returns TRUE if a numb
er is even and FALSE if it’s odd.
(ct)
is_even = function(x) { return(x %% 2 == 0) }
(EXAM - VL2)
Write a function factorial_custom() that calculates the factorial of a number without using factorial().
(ct)
factorial_custom = function(n) { if (n == 0) { return(1) } else { result = 1 for (i in 1:n) { result = result * i } return(result) } }
(EXAM - VL2)
Annotate this code:
gentoo=as.data.frame(penguins[penguins$species=="Gentoo",]) dim(gentoo) cWeight=cut(adelie$body_mass_g,breaks=quantile(adelie$body_mass_g,c(0,1/3,2/3,1),na.rm=TRUE),include.lowest=TRUE) table(cWeight) table(cWeight,adelie$sex) ...
(2024-1)
Convert the filtered data into a data frame
gentoo=as.data.frame(penguins[penguins$species==”Gentoo”,])
→ extracts information for subset of penguins, gentoo species
dim(gentoo)
→ Display number of rows, columns of Gentoo subset
cWeight=cut(adelie$body_mass_g,breaks=quantile(adelie$body_mass_g,c(0,1/3,2/3,1),na.rm=TRUE),include.lowest=TRUE)
→ categorises Adelie penguins’ body mass into 3 weight groups based on quantiles
→ include.lowest = TRUE’ ensures the lowest value is included in the first category
→ numeric to categorical
table(cWeight)
→ create frequency table showing count of observations in each weight category
table(cWeight,adelie$sex)
→ create contingency table showing count of observations for each weight category, split by sex
…
(EXAM - VL2)
Fill in the blanks:
\_\_\_\_\_\_(palmerpenguins) \_\_\_\_\_\_(penguins) with(penguins,\_\_\_\_\_\_\_(body_mass_g ~ sex*species,col=c("salmon","skyblue"))) \_\_\_\_\_\_\_\_\_\_(body_mass_g ~ sex + species, data = penguins, FUN = mean, na.rm = TRUE)
(2024-1)
[what does this do?]
aggregate - calculates Mean Body Mass by Sex & Species
_library_(palmerpenguins) _data_(penguins) with(penguins, _boxplot_(body_mass_g ~ sex*species,col=c("salmon","skyblue"))) _aggregate_(body_mass_g ~ sex + species, data = penguins, FUN = mean, na.rm = TRUE)
library(palmerpenguins)
data_(penguins)
with(penguins, boxplot(body_mass_g ~ sex*species,col=c(“salmon”,”skyblue”)))
aggregate(body_mass_g ~ sex + species, data = penguins, FUN = mean, na.rm = TRUE)
(EXAM - VL2)
The Titanic data set contains data about the Titanic from 1912. Given are different categories and the survival of passengers and crew members.
What does the following R code mean? Explain the commands and the output:
> options(width=70)
(This now more a command assignment task, so you have to place commands like names, str, dim, data (into empty command fields))
(2024-sample)
→ sets max number of chars / line when displaying output in console.
Why Use It?
- controls how wide printed output appears in console.
- helps format long outputs (e.g., df, lists, matrices) -> prevent wrapping in messy way.
- working with narrow terminal windows / printing wide tables.
(EXAM - VL2)
The Titanic data set contains data about the Titanic from 1912. Given are different categories and the survival of passengers and crew members.
What does the following R code mean? Explain the commands and the output:
> ftable(Titanic[1:3,,,])
(This now more a command assignment task, so you have to place commands like names, str, dim, data (into empty command fields))
(2024-sample)
ftable()
→ creates flat contingency table (instead of displaying multi-level format)
Titanic[1:3,,,]
→ selects first 3 levels of “Class” variable (i.e., 1st, 2nd, 3rd), keeps all levels of other dims.
ftable()
(EXAM - VL2)
The Titanic data set contains data about the Titanic from 1912. Given are different categories and the survival of passengers and crew members.
What does the following R code mean? Explain the commands and the output:
> names(dimnames(Titanic)) [1] "Class" "Sex" "Age" "Survived"
(This now more a command assignment task, so you have to place commands like names, str, dim, data (into empty command fields))
(2024-sample)
dimnames(Titanic)
→ retrieves the dim names (or labels) of Titanic dataset.
names(dimnames(Titanic))
→ extracts just names of these dims, returning:
**[1] “Class” “Sex” “Age” “Survived” **
→ dataset is structured as 4D table with these cats
Titanic is a 4D contingency table → has 4 cat dims:
“Class” → Passenger class (1st, 2nd, 3rd, Crew)
“Sex” → Male, Female
“Age” → Child, Adult
“Survived” → No, Yes
(EXAM - VL2)
You are analyzing the flipper lengths of Adelie and Chinstrap penguins. Fill in the missing parts (bold) in the R code below:
library(palmerpenguins) \_\_\_\_\_\_(penguins) adelie_chinstrap = penguins[penguins$species \_\_\_\_\_ c("Adelie", "Chinstrap"), ] boxplot(flipper_length_mm ~ \_\_\_\_\_\_\_\_\_\_ * sex, data=adelie_chinstrap, col=c("lightblue", "pink")) _aggregate_(flipper_length_mm ~ species, data=adelie_chinstrap, \_\_\_\_\_\_, \_\_\_\_\_\_)
Anotate code and summarise the findings shortly (2022)
More code, fill in the gaps in code (2022)
describe code (2020)
Fill in the gaps on the code below using these R commands Options, by, colnames, TRUE, dim, read.table, with, (2024-2)
(ct)
library(palmerpenguins) _data_(penguins) adelie_chinstrap = penguins[penguins$species _%in%_ c("Adelie", "Chinstrap"), ] boxplot(flipper_length_mm ~ _species_ * sex, _data_=adelie_chinstrap, col=c("lightblue", "pink")) aggregate(flipper_length_mm ~ species, data=adelie_chinstrap, _mean_, _na.rm=TRUE_)
(EXAM - VL2)
The dataset mtcars contains data on different car models.
The following R code filters cars with more than 6 cylinders (mtcars$cyl) and calculates the max horsepower (mtcars$hp) by number of gears (mtcars$gears). Fill in the blanks:
\_\_\_\_\_\_(mtcars) high_cyl = mtcars\_\_\_\_\_\_ aggregate(\_\_\_\_\_\_, data=high_cyl, \_\_\_\_\_\_)
Anotate code and summarise the findings shortly (2022)
More code, fill in the gaps in code (2022)
describe code (2020)
Fill in the gaps on the code below using these R commands Options, by, colnames, TRUE, dim, read.table, with, (2024-2)
(ct)
_data_(mtcars) high_cyl = mtcars_[mtcars$cyl > 6, ]_ dim(high_cyl) aggregate(_hp ~ gear_, data=high_cyl, _max_)
(EXAM - VL2)
You were investigating two light schedule treatments (trt 1, trt2) against normal light conditions, 12 hours of continuous light (ctrl), on the daily dry weight increments of plants.
Please explain the R analysis below and the final result.
> data(PlantGrowth) > dim(PlantGrowth) [1] 30 2 > head(PlantGrowth,n=3) Weight group 1 4.17 ctrl 2 5.58 ctrl 3 5.18 ctrl > with(PlantGrowth, aggregate(weight,by=list(group),max)) Group1 x 1 ctrl 6.11 2 trt1 6.03 3 trt2 6.31 > PlantGrowth[PlantGrowth$weight>quantile(PlantGrowth$weight,0.9),] Weight group 4 6.11 ctrl 21 6.31 trt2 28 6.15 trt2
(2018)
with: saves having to write PlantGrowth again and again
> data(PlantGrowth) # loads the dataset > dim(PlantGrowth) # displays dims of dataset -> rows, cols [1] 30 2 > head(PlantGrowth,n=3) # displays first 3 rows of the dataset Weight group 1 4.17 ctrl 2 5.58 ctrl 3 5.18 ctrl > with(PlantGrowth, aggregate(weight,by=list(group),max)) # aggregate: get summary of numeric data, computes max weight of each treatment group Group1 x 1 ctrl 6.11 2 trt1 6.03 3 trt2 6.31 > PlantGrowth[PlantGrowth$weight>quantile(PlantGrowth$weight,0.9),] # Identifies plants with weights greater than 90th percentile (i.e., top 10%) Weight group 4 6.11 ctrl 21 6.31 trt2 28 6.15 trt2
Final Result
trt2 has highest recorded weight (6.31).
top 10% of weights include more trt2 plants -> trt2 might have had stronger effect on growth than other conditions.
(EXAM - VL3)
Describe how you would transform a quantitative variable into a qualitative one with around equal sized classes. Explain shortly why such approach could be useful. (3 points)
(2024-2, 2024-sample)
- Sort data
- Define # of bins (e.g., 3 or 4).
- Divide range into equal-sized intervals (e.g., using quantiles).
- Label bins (e.g., “Low”, “Medium”, “High”).
in R:
cut() function, assign levels with function
~~~
data_cat <- cut(data, breaks = 3, labels = c(“Low”, “Medium”, “High”))
~~~
(for good split -> use quantiles)
Why it’s useful:
- Simplifies interpretation.
- Facilitates comparisons (e.g., with chi-square tests).
- Handles non normal data.
(EXAM - VL3)
1- save
2- saveRDS
3- dev.copy2pdf
4- write.ftable
5- write.table
6- save.image
7- savehistory
?
(2024-2)
save – Saves multiple R objects to file in binary format (.RData).
saveRDS – Saves single R object to file in binary format (.rds), allowing selective loading.
dev.copy2pdf – Copies current graphics device output to PDF file.
write.ftable – Writes flat contingency table (ftable) to a text file.
write.table – Exports df or matrix to a text file (CSV-like).
save.image – Saves entire current R workspace (all objects) to .RData file.
savehistory – Saves command history to a file (.Rhistory)
(EXAM - VL3)
Explain similarities and differences between the R commands
- data
- read.table
- source
- load. (3 points)
(This is now more an a, b, c, d, e assignment task!)
(2024-sample)
Similarities:
All these commands are used to import data into R for analysis.
Differences:
- data(): Loads datasets bundled with R or packages
- read.table(): Reads external text files (CSV, tab-delimited ) into R as df .
- source(): Executes R script file
- load(): Loads R objects (saved as .RData or .rda files) into environment.
(EXAM - VL3)
Describe what these do:
- table
- readRDS
- data.frame
(2022)
- table - creates a frequency table of categorical variables.
- readRDS - Reads a single R object saved in .rds format into R.
- data.frame - reates a data frame, a table-like structure for storing data in R.
(EXAM - VL3)
How would you transform a categorical variable into numerical? How such an approach might be useful?
(2024-2)
[probably they mean tthe other way around? but:]
df$category_num = as.numeric(as.factor(df$category))
why useful?
- Enables statistical and machine learning models to process categorical data.
- Helps in finding patterns and relationships in data.
- Allows numerical operations like computing correlations.
(EXAM - VL3)
what is
- save
- save.image
- saveRDS
(2022, 2020)
- save: saves R objects to a file, typically in .RData or .rda format
- save.image: Saves entire current R workspace (all objects) to .RData file.
- save.rds: saves single R object to a file in .rds format.
(EXAM - VL3)
Describe how you would transform a quantitative variable into a qualitative one with around equal sized classes. Explain shortly why such approach could be useful. (2024-sample) (3 points)
(probably mean the other way around? but here we go)
Using numeric encoding: Assign each category a unique number (e.g., “Low” = 1, “Medium” = 2, “High” = 3).
In R, use as.numeric(factor(variable)) to convert categorical values to numbers.
Using one-hot encoding: Convert each category into a binary column (0 or 1).
In R, use model.matrix(~ variable - 1) for one-hot encoding.
Usefulness:
Enables use of categorical data in machine learning algorithms that require numerical input.
Makes it easier to calculate statistics like correlation or regression when dealing with categorical data.
(EXAM - VL5)
Explain Will Rogers Phenomenon (2024-2, 2024-1)
Will Rogers: “When the Okies left Oklahoma and moved to California, they raised average intelligence in both states.” (Due to Feinstein et al. (1985).)
CHATGPT:
- moving individual from one group to another
- –> raises AVG of both groups
- even though no actual improvement has happened.
🔹 Example:
Imagine two classes:
- Low achievers: Average grade = 50
- High achievers: Average grade = 80
If we move a student with grade 60 from the low achievers to the high achievers:
- The low achievers’ new avg increases (60 above previous avg).
- The high achievers’ new avg also increases (60 below previous avg).
common in medicine, ecology, and statistics, where reclassification makes both groups look better without actual improvement.