Interpreting Data with Advanced Statistical Models Flashcards
A friend is trying to create a classifier for a startup. She first tries a logistic regression and clearly gets underfitting. She then tries an SVM and has the same issue. Finally, she tries logistic regression with fifth degree polynomial features and gets clearly overfits. What should she do?
Try an SVM with Gaussian Kernel and a higher C value
Try an SVM with Linear Kernel and a higher C value
Try an SVM with Linear Kernel and a lower C value
Try an SVM with Gaussian Kernel and a lower C value
Try an SVM with Gaussian Kernel and a higher C value
In which of the following unsupervised learning techniques do you think it is most important to feature scale, if different variables have different scales?
PCA
Anomaly Detection
K-means
Hierarchical clustering
PCA
For a medical trial to detect melanoma, you have a dataset of several patients with numerous variables and the advancement of the disease. You try to find some combination of variables that predicts melanoma with high precision.
You first run PCA to reduce the dimensionality, then you run linear regression. You find that the variance explained is low because there are a couple of possible outliers, so you run an algorithm to detect outliers. Which type of learning is each step in this pipeline?
Linear Regression: Unsupervised Learning, PCA: Unsupervised Learning, Outlier finding, Supervised Learning
Linear Regression: Supervised Learning, PCA: supervised Learning, Outlier finding, Unsupervised Learning
Linear Regression: Supervised Learning, PCA: Unsupervised Learning, Outlier finding, Supervised Learning
Linear Regression: Supervised Learning, PCA: Unsupervised Learning, Outlier finding, Unsupervised Learning
Linear Regression: Supervised Learning, PCA: Unsupervised Learning, Outlier finding, Unsupervised Learning
You optimize a training with gradient descent and get the following curve. How would you assess the quality of the curve?
[PICTURE]
As gradient descent uses all training data, the loss vs. iterations should be monotonic. Therefore, this is incorrect and GD is badly implemented or GD was not used.
Since the loss goes up sometimes, you may be overshooting. Try reducing the learning rate.
It is normal that sometimes the loss goes up. You see this in SGD, so you are OK since you reached a minima at 20 iterations.
Use a larger learning rate to get to the optima in a lower number of iterations
As gradient descent uses all training data, the loss vs. iterations should be monotonic. Therefore, this is incorrect and GD is badly implemented or GD was not used.
You run the following model in a regression problem: y = ax + by + c x*y. You get a significant c value greater than 0. What could this indicate?
Nothing, since you do not check significance of coefficients in multiple linear regression
That a quadratic model will give even better results, since R2 will be higher
That you have a non-significant interaction between x and y. This indicates collinearity and you need to run multivariate techniques to reduce the dimension
That the model is wrong since it does not comply with parsimony principle
That you have a non-significant interaction between x and y. This indicates collinearity and you need to run multivariate techniques to reduce the dimension
Mark performs simple and quadratic linear regression on a given problem. However, in the rush of presenting the results, he forgets the labels of what is what! Can you help him decide what to do next? The data you have is:
[PICTURE]
Model 1: Linear Model
Model 2: Quadratic Model
Evaluation: Model 1 has a high bias problem
Model 1: Linear Model
Model 2: Quadratic Model
Evaluation: Model 2 has a high bias problem
Model 1: Quadratic Model
Model 2: Linear model
Evaluation: Model 2 has a high variance problem
Model 1: Quadratic Model
Model 2: Linear model
Evaluation: Model 1 has a high variance problem
Model 1: Quadratic Model
Model 2: Linear model
Evaluation: Model 1 has a high variance problem
For an important medical trial to detect melanoma, you have a dataset of several patients with numerous variables and the advancement of the disease. You are trying to find some combination of variables that predict melanoma with high precision.
You first run PCA to reduce the dimensionality, then run linear regression. You find that the variance explained is low because there are a couple of possible outliers. What would you do?
Check the significance of the model with ANOVA; low variance explained is a sign of a non-significant linear regression
As PCA outputs principal components with the most variance explained, you don’t have outliers after that step. You need a more complex regression
Using clustering, you could find possible outliers and check manually
Dimensionality reduction takes away outliers, so you must replace PCA with a more robust algorithm
Using clustering, you could find possible outliers and check manually
You have a scatter plot that you want to classify. You try to classify with the following logistic regression: h(x) = logit( theta_0 + theta_1x_1 + theta_2x_2), but fail to have high accuracy. What should you do?
[PICTURE]
You should continue as you are, since accuracy is not a great metric
Adding quadratic terms may help, because in that space, the data becomes linearly separable
You should switch to a neural network
You should try to classify with SVM
Adding quadratic terms may help, because in that space, the data becomes linearly separable
Regularization adds a term to the cost function to decrease the size of the parameters. You forget which models corresponds with each value of lambda = 0.01, 0.1, 1. Can you match each value of lambda to its correspondent final model?
[PICTURE]
Model 1: lambda=1
Model 2: lambda=0.1
Model 3: lambda=0.01
Model 1: lambda=0.01
Model 2: lambda=1
Model 3: lambda=0.1
Model 1: lambda=0.01
Model 2: lambda=0.1
Model 3: lambda=1
Model 1: lambda=0.1
Model 2: lambda =1
Model 3: lambda=0.01
Model 1: lambda=0.1
Model 2: lambda =1
Model 3: lambda=0.01
What does the Naive Bayes classifier optimize?
The variance of the classes to be similar
A robust classification above great fit
The normality of the posterior probability
The posterior probability of a class, based on previous events, using different way of calculating those probabilities
The posterior probability of a class, based on previous events, using different way of calculating those probabilities