Interview Questions Flashcards

Question

What is lasso regression?

Answer 1

A form of L1 regularization, adds a penalty to the OLS calculation (beta * |slope|). Betas can reach zero.

Answer 2

Independent variables are independent. Independent variables are linearly related to dependent. Variables are normally distributed.

Answer 3

Aggregating the results of bootstrapped data.

Answer 4

Start with a weak model and add small additive models to optimize a loss function.

Answer 5

Like Gradient Boosting on steroids. Parellization, tree tuning, cross-validation, regularization + efficient algorithms.

Answer 6

Linear predicts values, logistic predicts binary results. Mathematically logistic regression is linear except you apply the sigmoid function (1 / (1+e^-y)).

Answer 7

Support Vector Machine. Uses Support Vector Classifiers except will also optimize by performing the Kernel Trick (increasing the dimensionality of the data without actually transforming it) to optimize the classifiers.

Answer 8

A way of using SVM on data from higher dimensions without the computational overload. Something about dot products?

Answer 9

Working backwards from the desired results to see what changes to weights would be required to achieve the right results. Usually done in batches (stochastic gradient descent).

Answer 10

K-Means & Gaussian Mixture Models.

Answer 11

Support Vector Classifiers, k-Nearest Neighbours & Naive Bayes.

Answer 12

PCA, UMAP and t-SNE

Answer 13

Data that is rare or unique, i.e. identification numbers.

Answer 14

% of true positives, out of all those identified as positive.

Answer 15

% of true positive, out of all those actually positive.

Answer 16

A combination of precious & recall 2 *( (precision*recall)/(precision+recall) )

Answer 17

Precious/Recall, F1 Score & ROC AUC

Answer 18

Receiver Operator Characteristics Area Under Curve

Answer 19

(R)MSE, MAPE & R2

Answer 20

Similar to Return except it returns intermediary results, i.e. holds the state in memory.

Answer 21

Condenses a function into a line.

Answer 22

Parallel processing on steroids for data processing. Maps the documents across clusters, Shuffles the results into associated piles, Reduces the piles to the required result.

Answer 23

A tuple is an immutable object. Immutable meaning the object in memory cannot be changed.

Answer 24

Bias: How well the model captures the relationship in the training set (accurate center). Variance: How well the model captures the relationship in the testing set (tight circle).

Answer 25

Step 1 - Turn the data into a graph network (usually through k-nearest neighbours as the links). Step 2 - Turn the graph network into a Laplacian Matrix. Step 3 - Find the eigenvectors/values of the matrix. Step 4 - Split the data by the eigenvectors of the 2nd highest eigenvalue. Step 4.5 - If the data cannot be split in two, can use multiple eigenvalues + k-means to find multiple clusters.

Answer 26

A matrix view of a graph, where for any n, n = the number of neighbours or the total value of neighbour weights. And n, m = the negative value of the connection between n and m.

Answer 27

A technique in which you "encode" an input (x) into a latent-space representation (h), then "decode" it back to the original form (r). The idea being that if X ends up being close to R, and H is smaller than X, then you are able to compress X into it's key features.

Answer 28

Generative Adversarial Network. An unsupervised learning model "generates" new data based on a given sample of real data. A supervised model (discriminator) then looks to predict "real" or "generated" on this data. After each iteration, the supervised model becomes better at telling real from generated, and the generator becomes better at fooling the discriminator.

Answer 29

Long-Short Term Memory. A variant of an RNN network, except with an ability to make more complex decisions on what previous knowledge to keep / delete.

Answer 30

The impact the early layers will have on the final output dramatically decreases as the number of layers increase, making it very difficult to tune through back-propagation (as there impact to the final result is so small). Also an issue in RNN where the "recurring" data quickly vanishes.

Answer 31

Extra-Trees stands for EXTremely RAndom Trees, similar to Random Forests except the optimal cut-off point isn't found - simply picks one at random.

Interview Questions Flashcards

(55 cards)