Modeling Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

Tell me about how you designed a model for a past employer or client.

A

Answer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are your favorite data visualization techniques?

A

Answer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How would you effectively represent data with 5 dimensions?

A

Answer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How is k-NN different from k-means clustering?

A

k-NN, or k-nearest neighbors is a classification algorithm, where the k is an integer describing the number of neighboring data points that influence the classification of a given observation. K-means is a clustering algorithm, where the k is an integer describing the number of clusters to be created from the given data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How would you create a logistic regression model?

A

Answer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Have you used a time series model? Do you understand cross-correlations with time lags?

A

Answer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Explain the 80/20 rule, and tell me about its importance in model validation.

A

“People usually tend to start with a 80-20% split (80% training set – 20% test set) and split the training set once more into a 80-20% ratio to create the validation set.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain what precision and recall are. How do they relate to the ROC curve?

A

Recall describes what percentage of true positives are described as positive by the model. Precision describes what percent of positive predictions were correct. The ROC curve shows the relationship between model recall and specificity–specificity being a measure of the percent of true negatives being described as negative by the model. Recall, precision, and the ROC are measures used to identify how useful a given classification model is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain the difference between L1 and L2 regularization methods.

A

“A regression model that uses L1 regularization technique is called Lasso Regression and model which uses L2 is called Ridge Regression. The key difference between these two is the penalty term.” Read more here.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In your opinion, which is more important when designing a machine learning model: model performance or model accuracy?

A

Answer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Is it better to spend five days developing a 90-percent accurate solution or 10 days for 100-percent accuracy?

A

Answer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are some situations where a general linear model fails?

A

Answer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Do you think 50 small decision trees are better than a large one? Why?

A

Answer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Is it better to have too many false positives or too many false negatives

A

Answer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Your data science team must build a binary classifier, and the number one criterion is the fastest possible scoring at deployment. It may even be deployed in real time. Which technique will produce a model that will likely be fastest for the deployment team use to new cases?

random forest
logistic regression
KNN
deep neural network

A

To predict a new value,

Random Forest - Value has to be fed to all the trees and then some voting rule applied

KNN - distances have to be computed against the n observations

Logistic Regression - value is fed into sigmoid

Logistic Regression is much quicker in deployment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly