VL 9 Flashcards

Question 1

Q

Control Flow: if, else, if , else Syntax?

Answer

A

question: is cond TRUE or FALSE?

if (cond1) {
# if cond1 is TRUE
# do something …
} else if (cond2) {
# if cond2 is TRUE
# do something …
} else {
# neither cond1 or cond2 are true
# do something else …
}

example:
> binf=readRDS(‘pbinf-2022-08.RDS’)
> survey=binf$data$survey
> if (nrow(survey)>320) {
+ print(‘new data of 2017 added already’)
+ } else {
+ print(‘new data of 2017 not added yet’) +}

Question 2

Q

Programming Loops: for(!), while, (repeat)

Answer

A

for (i in vector) {
# do something for every element in vector
}
while (cond) {
# do something while cond is TRUE
}
repeat {
if (cond) { break }
# do something at least once
}

example:
> for (i in 1:nrow(survey)) {
+ if (is.na(survey[i,’cm’])) {
+ next
+ }
+ if (survey[i,’cm’]>197) {
+ print(survey[i,1:6])
+ }
+}

95% you use for!

Question 3

Q

Useful Operators in R

Answer

A

Mathematical: *, /, +, -, <, > , ==, …
Logical: & (and), | (or), %in% (in) , ! (not)…
Own: ’%ni%’<- Negate(’%in%’)

Question 4

Q

Structure of a function in R.
(Write your own function)

Answer

A

myCV = function (x) {}

myCV –> Name of the function (whatever you like)
= –> Assignment Operator
function –> function keyword
(x) –> Parameter Argument
{} –> the implementation / function

example: CV function
myCV = function (x) { cv=100*sd(x,na.rm=TRUE)/mean(x,na.rm=TRUE)
return(cv)
}

Always add return to function - just in case

Question 5

Q

The … Argument?

Answer

A

take any argument and delegate it
my.barplot: light blue barplot always with a box around

SPICKER!

Question 6

Q

Correlation

Answer

A

observe the association between two numerical variables
if two numerical variables are associated we say they are correlated
the correlation coefficient is a quantity that describes the strength of the association

Question 7

Q

Observation

Answer

A

individuals with high amounts of C20-22 fatty acids have as well higher insuline sensitivity
two variables vary together in the same direction
there is a lot of covariation or correlation
direction and magnitude of a correlation can be
quantified with the correlation coefficient r
value range [-1,+1]
value 0: no variation together
negative val: one var values increase, other decrease
positive val: both change in the same direction
values of 1 or -1: straight lines

Question 8

Q

Interpretation of r?

Answer

A

The Pearson correlation coefficient, denoted as “r,” measures the strength and direction of a linear relationship between two continuous variables. It ranges from -1 to +1, where positive values indicate a positive correlation, negative values indicate a negative correlation, and values close to 0 indicate a weak or no correlation. It is commonly used to assess relationships between variables in various fields of study.

–> Don’t combine two populations in correlation!
–> Pearson correlation is sensitiv to outliers

Question 9

Q

What does the correlation r-squared r^2

Answer

A

r2 often also called coefficient of determination
r2 is between 0 and 1, smaller than r
r2 is interpreted as the fraction of variance that is shared
between the variables

R-squared (coefficient of determination) measures how well a regression model fits the data. It ranges from 0 to 1, where 1 means a perfect fit, and 0 means no fit. It shows the proportion of the dependent variable variance explained by the independent variable(s) in the model.

Question 10

Q

What’s the Spearman Rank Correlation and when use it?

Answer

A

Spearman correlation is more robust against outliers!
Correlation with one outlier is not significant!!
Spearman correlation is calculated on ranks of values, not on
the values directly.
It’s a non-parametric test.
It does not assumes normal distribution of data.
It is more conservative.
If in doubt use Spearman correlation

The Spearman rank correlation (ρ) measures the strength and direction of the monotonic relationship between two variables. It is used when the relationship is non-linear, ordinal, or when data contains outliers. It is a non-parametric alternative to Pearson correlation.

Question 11

Q

When to use Spearman and when Pearson?

Answer

A

Normal distribution and no outliers –> Pearson
Non-Normal
1. try to normalise your data, if its possible –> Pearson
2. If you can’t normalise data –> Spearman or Kendall tau (even more robust to Outliers)

Question 12

Q

Effectsize r and rs

Answer

A

Pearsons r and Spearmans rs are quite similar in their values
but rs2 is the proportion of rank variances for
Kendalls τ is numerical different
66-75% of r or rs, don’t square it
r of 0.1 small effect, 1% of variance
r of 0.3 medium effect, 9% of variance
r of 0.5 large effect, 25% of variance

Question 13

Q

What is partial correlation?

Answer

A

Partial correlation is a statistical method that measures the relationship between two variables while controlling for the influence of other variables. It allows assessing the direct association between the two variables of interest, removing the effects of confounding factors.

Remember: Male and female mixture ..
e.g: partial correlation of body height and weight after removing the effect of sex

When we control for control variable(s) on the relationship between variable 1 and variable 2, we find the following (in)signifikant partial correlation:
r(df) = …, 95%CI = […,….], p < ….

Question 14

Q

What is Mutual Information?

Answer

A

Mutual information measures the degree of dependence or shared information between two random variables. It quantifies how much knowing one variable reduces uncertainty about the other. High mutual information indicates strong dependence, while low or zero mutual information suggests independence. It is used in various fields, including machine learning and feature selection.

Pearson correlation does work only for linear relationships between two variables
mutual information does work for any relationship between two variables