Statistics VII - Exam Questions I Flashcards
What is the difference between 1-way ANOVA and 2-way ANOVA?
The 2-way is an extension of the 1-way ANOVA to include different categorical independent variables on one dependent variable. (1-way ANOVA uses just one independent variable)
What does ANOVA mean?
Analysis of Variance
What is an interaction term?
If two predictor variables affect the outcome
variable in a way that is non-additive, we
need to include an interaction term in the
model to capture this effect.
What’s a Bonferroni correction?
The Bonferroni correction is a multiple-comparison correction used when several dependent or independent statistical tests are being performed simultaneously.
How to do a Bonferroni correction?
To perform a Bonferroni correction, divide the critical P value (α) by the number of comparisons being made. For example, if 10 hypotheses are being tested, the new critical P value would be α/10. The statistical power of the study is then calculated based on this modified P value.
Interpret the following regression formula:
y = 0.8 + 1.5 x
There is a positive correlation between x and y.
What does a correlation coefficient of 0.6 mean?
There is a positive correlation.
Example with 3 variables and 2 group measures á: What multivariate test can we use for testing the group means?
MANOVA
Example with 3 variables and 2 group measures á: What univariate test can we use for testing the group means?
Student’s t-test
Can we do a PCA when all given eigenvalues are nearly equal?
no, because that would mean that it’s isotrop and the variance is in all direction the same and there is no direction of a largest variance.
What are common / specific factors in the context of factor analysis?
Common factors have influence on several variables, specific factors influence only one variable.
What is the difference multiple and multivariate regression?
Multiple regression takes several independent variables and only one dependent variable.
Multivariate regression takes one independent variable and several dependent variables.
What is the first principal component?
The first principal component is defined as the direction of the greatest variance.
What does it mean to standardize a variable and when should one do so?
It means to center the variable around its mean and using SD as a unit. In this way, variables with different units can be compared. This is for example done in the calculation of the correlation.
What is the difference between covariance and correlation?
Covariance can only be used with variables in the same units. Correlation uses standardized variables.
A normally distributed variable has a mean of 8.3 and a variance of 4.1. One specimen has the value 14.6 for this variable. How many standard deviations away from the mean is this specimen? How to interpret this?
(14.6 - 8.3) / √4.1
What is Mahalanobis distance?
Mahalanobis distance is like a multivariate version of the z-score: it provides a way to measure distances that takes into account the scale of the data.
The Mahalanobis distance has the following properties:
It accounts for the fact that the variances in each direction are different.
It accounts for the covariance between variables.
It reduces to the familiar Euclidean distance for uncorrelated variables with unit variance.
What is quadratic classification?
A quadratic classifier is used in machine learning and statistical classification to separate measurements of two or more classes of objects or events by a quadric surface. It is a more general version of the linear classifier.
What are the assumptions of discriminant function analysis?
The assumptions of discriminant analysis are the same as those for MANOVA:
- Multivariate normality
- Homogenity of variance/covariance
- Multicolinearity
- Independence
What is a leave-one-out cross validation?
(LOOCV) involves using a single observation from the original sample as the validation data, and the remaining observations as the training data. This is repeated such that each observation in the sample is used once as the validation data. This is the same as a K-fold cross-validation with K being equal to the number of observations in the original sampling.
How long is the vector v = (2, 5, 3, 1)?
||v|| = √(4 + 25 + 9 +1) = √39
Transform the vector v = (2, 8, 7) into a unit vector (Einheitsvektor)!
||v|| = √117
->
v0 = (2/√117, 8/√117, 7/√117)
The first PC is the direction of the greatest variance. What is the definition for the second PC?
It is the direction of the second greatest variance and it is orthogonal to the first PC.
In a PCA the 3 eigenvalues of the covariance matrix are I1 = 1120.3, I2 = 3.7, I3 = 1.1.
How much variance in the data set is explained by the first, second and third PC?
PC1: 1120.3/1125.1 = 99.57 %
PC2: 3.7/1125.1 = 0.3 %
PC3: 1.1/1125.1 = …