VL 9 Flashcards
Control Flow: if, else, if , else Syntax?
question: is cond TRUE or FALSE?
if (cond1) {
# if cond1 is TRUE
# do something …
} else if (cond2) {
# if cond2 is TRUE
# do something …
} else {
# neither cond1 or cond2 are true
# do something else …
}
example:
> binf=readRDS(‘pbinf-2022-08.RDS’)
> survey=binf$data$survey
> if (nrow(survey)>320) {
+ print(‘new data of 2017 added already’)
+ } else {
+ print(‘new data of 2017 not added yet’) +}
Programming Loops: for(!), while, (repeat)
for (i in vector) {
# do something for every element in vector
}
while (cond) {
# do something while cond is TRUE
}
repeat {
if (cond) { break }
# do something at least once
}
example:
> for (i in 1:nrow(survey)) {
+ if (is.na(survey[i,’cm’])) {
+ next
+ }
+ if (survey[i,’cm’]>197) {
+ print(survey[i,1:6])
+ }
+}
95% you use for!
Useful Operators in R
- Mathematical: *, /, +, -, <, > , ==, …
- Logical: & (and), | (or), %in% (in) , ! (not)…
- Own: ’%ni%’<- Negate(’%in%’)
Structure of a function in R.
(Write your own function)
myCV = function (x) {}
myCV –> Name of the function (whatever you like)
= –> Assignment Operator
function –> function keyword
(x) –> Parameter Argument
{} –> the implementation / function
example: CV function
myCV = function (x) { cv=100*sd(x,na.rm=TRUE)/mean(x,na.rm=TRUE)
return(cv)
}
Always add return to function - just in case
The … Argument?
take any argument and delegate it
my.barplot: light blue barplot always with a box around
SPICKER!
Correlation
- observe the association between two numerical variables
- if two numerical variables are associated we say they are correlated
- the correlation coefficient is a quantity that describes the strength of the association
Observation
- individuals with high amounts of C20-22 fatty acids have as well higher insuline sensitivity
- two variables vary together in the same direction
- there is a lot of covariation or correlation
- direction and magnitude of a correlation can be
quantified with the correlation coefficient r - value range [-1,+1]
- value 0: no variation together
- negative val: one var values increase, other decrease
- positive val: both change in the same direction
- values of 1 or -1: straight lines
Interpretation of r?
The Pearson correlation coefficient, denoted as “r,” measures the strength and direction of a linear relationship between two continuous variables. It ranges from -1 to +1, where positive values indicate a positive correlation, negative values indicate a negative correlation, and values close to 0 indicate a weak or no correlation. It is commonly used to assess relationships between variables in various fields of study.
–> Don’t combine two populations in correlation!
–> Pearson correlation is sensitiv to outliers
What does the correlation r-squared r^2
- r2 often also called coefficient of determination
- r2 is between 0 and 1, smaller than r
- r2 is interpreted as the fraction of variance that is shared
between the variables
R-squared (coefficient of determination) measures how well a regression model fits the data. It ranges from 0 to 1, where 1 means a perfect fit, and 0 means no fit. It shows the proportion of the dependent variable variance explained by the independent variable(s) in the model.
What’s the Spearman Rank Correlation and when use it?
- Spearman correlation is more robust against outliers!
- Correlation with one outlier is not significant!!
- Spearman correlation is calculated on ranks of values, not on
the values directly. - It’s a non-parametric test.
- It does not assumes normal distribution of data.
- It is more conservative.
- If in doubt use Spearman correlation
The Spearman rank correlation (ρ) measures the strength and direction of the monotonic relationship between two variables. It is used when the relationship is non-linear, ordinal, or when data contains outliers. It is a non-parametric alternative to Pearson correlation.
When to use Spearman and when Pearson?
- Normal distribution and no outliers –> Pearson
- Non-Normal
1. try to normalise your data, if its possible –> Pearson
2. If you can’t normalise data –> Spearman or Kendall tau (even more robust to Outliers)
Effectsize r and rs
- Pearsons r and Spearmans rs are quite similar in their values
- but rs2 is the proportion of rank variances for
- Kendalls τ is numerical different
66-75% of r or rs, don’t square it - r of 0.1 small effect, 1% of variance
- r of 0.3 medium effect, 9% of variance
- r of 0.5 large effect, 25% of variance
What is partial correlation?
Partial correlation is a statistical method that measures the relationship between two variables while controlling for the influence of other variables. It allows assessing the direct association between the two variables of interest, removing the effects of confounding factors.
Remember: Male and female mixture ..
e.g: partial correlation of body height and weight after removing the effect of sex
When we control for control variable(s) on the relationship between variable 1 and variable 2, we find the following (in)signifikant partial correlation:
r(df) = …, 95%CI = […,….], p < ….
What is Mutual Information?
Mutual information measures the degree of dependence or shared information between two random variables. It quantifies how much knowing one variable reduces uncertainty about the other. High mutual information indicates strong dependence, while low or zero mutual information suggests independence. It is used in various fields, including machine learning and feature selection.
- Pearson correlation does work only for linear relationships between two variables
- mutual information does work for any relationship between two variables