Interview Questions Flashcards

1
Q

What are the steps of a Data Science project?

A
Define, refine and measure the problem statement
Gather and understand the data
Prepare the data
Build the models
Evaluate the models
Deploy into production
Track performance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do you multiply matrices?

A

Integer x Matrix = Each element is multiplied by the integer

Matrix x Matrix = Has to be of the shape R1 = C2, Dot product by summing the product of row x column.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the normal distribution?

A

Symmetric bell shaped curve where c. 68% of the data falls within 1 s.d. & 95% within 2 s.d.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the log-normal distribution?

A

Same as normal distribution except taking the log of the numbers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Four methods to check for normality?

A

Visually (histogram), Skewness (0), Kurtosis (3) & Q-Q plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Four common distributions in the Exponential family and why they are Exponential?

A

Gaussian, Poisson, Binomial & Gamma.
All can be expressed in the form:
exp( n(beta) * T(x) - A(beta) + B(x) )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the Dirichlet distribution?

A

A multivariate version of the beta distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a Gaussian distribution?

A

Bell shaped around the mean, equal on both sides.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a Poisson distribution?

A

Part of the exponential family. A distribution of an event over time. Used to calculate the probability of seeing k events in X time.

I.e. if a bus arrives on average once every 5 minutes, if you were interested in the probability of seeing 6 buses in an hour then lambda = 60/5 = 12.

e^-u * u^x * x!
where u is your average and x is the value you’re predicting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the Binomial distribution?

A

The distribution of the probability of having a set number of successes events given n repetitions.

I.e. rolling a 6 has 1/6 probability. If I rolled a dice 60 times, the highest probability (peak of the curve) is 10 occurrences, with reducing probability of having 9 or 11 occurences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a gamma distribution?

A

A two parameter distribution with shape a, and scale beta. Will always return a positive integer value and is often used as a prior for another distribution, such as the lambda in a Poisson distribution.

I.e. if we want to calculate the probability of k buses arriving at a bus station in an hour, but we’re unsure what lambda (mean) is - we can assume a gamma prior for lambda with shape a and scale beta.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are summary statistics?

A

Statistics that can be used to capture meaningful aspects of the data, this may include:

Mean, Medium, Mode, Quartiles, Kurtosis, Skewness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Skewness and how is it calculated?

A

The symmetry of the data around the mean, calculated with the 3rd moment of Sum of (Xi - Xu)^3 / (N-1) sigma^3. 0 is no skewness.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Kurtosis and how is it calculated?

A

The shape of the tails compared to the normal distribution. Calculated as the 4th moment of X (Xi-Xu)^4 / sigma^4. 3 indicates no Kurtosis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How to calculate sample size?

A

Either conduct small experiment or make valid assumptions. Then use Cochran’s formula to estimate a sample size (Desired Z value^2 * assumed value * (1-assumed value) / Desired precision^2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Power Analysis?

A

Calculating the probability of a hypothesis test finding an effect, if it exists.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a confounding variable?

A

A variable which impacts both the dependent and other independent variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How do you calculate a confidence interval?

A

Point estimate +- The desired critical value (Z value or t-table) * standard deviation of data/standard error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a Type I and Type II error?

A
Type I (False Positive) - Identifying hypothesis as true when it is false.
Type II (False Negative) - Identifying hypothesis as false when it is true.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Difference between Z-test, T-test, F-test & ANOVA test?

A

Z-test and T-test are used to test point estimate hypothesis, estimate means or compare means. Z-test requires the population s.d to be known, whilst T-tests do not.

F-test is used to test whether there is variability in two samples. ANOVA (Analysis of Variance) is a type of F-test, focusing on systematic variance & error-variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How do you conduct A|B testing?

A

Choose a single change to make.
Choose a sample size (low if risky)
Randomly assign users to experience change.
Gather data.
T-test the variation between the two groups with 95-99% confidence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is a permutation test?

A

Like bootstrapping, tests that a sampled population shares the same qualities of the overall population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is muliticollinearity?

A

Multiple independent variables being non-independent.

24
Q

What is ridge regression?

A

A form of L2 regularization, adds a penalty to the OLS calculation (beta * slope^2). Will reduce Betas to approach zero, but never reach it.

25
Q

What is lasso regression?

A

A form of L1 regularization, adds a penalty to the OLS calculation (beta * |slope|). Betas can reach zero.

26
Q

What are the assumptions of linear regression?

A

Independent variables are independent. Independent variables are linearly related to dependent. Variables are normally distributed.

27
Q

What is bagging?

A

Aggregating the results of bootstrapped data.

28
Q

What is gradient boosting?

A

Start with a weak model and add small additive models to optimize a loss function.

29
Q

What is the XGBoost package?

A

Like Gradient Boosting on steroids. Parellization, tree tuning, cross-validation, regularization + efficient algorithms.

30
Q

What is the difference between Linear and Logistic regression? What is the mathematical difference?

A

Linear predicts values, logistic predicts binary results. Mathematically logistic regression is linear except you apply the sigmoid function (1 / (1+e^-y)).

31
Q

What is an SVM?

A

Support Vector Machine.

Uses Support Vector Classifiers except will also optimize by performing the Kernel Trick (increasing the dimensionality of the data without actually transforming it) to optimize the classifiers.

32
Q

What is the kernel trick?

A

A way of using SVM on data from higher dimensions without the computational overload. Something about dot products?

33
Q

Explain back propagation.

A

Working backwards from the desired results to see what changes to weights would be required to achieve the right results. Usually done in batches (stochastic gradient descent).

34
Q

Name two types of unsupervised classification techniques?

A

K-Means & Gaussian Mixture Models.

35
Q

Name three types of supervised classification techniques?

A

Support Vector Classifiers, k-Nearest Neighbours & Naive Bayes.

36
Q

Name three types of dimensionality reduction techniques?

A

PCA, UMAP and t-SNE

37
Q

What is high-cardinality data?

A

Data that is rare or unique, i.e. identification numbers.

38
Q

What is precision?

A

% of true positives, out of all those identified as positive.

39
Q

What is recall?

A

% of true positive, out of all those actually positive.

40
Q

What is an F1 score?

A

A combination of precious & recall 2 ( (precisionrecall)/(precision+recall) )

41
Q

Name three types of classification evaluation metrics?

A

Precious/Recall, F1 Score & ROC AUC

42
Q

What does ROC AUC stand for?

A

Receiver Operator Characteristics Area Under Curve

43
Q

Name three types of regression evaluation metrics?

A

(R)MSE, MAPE & R2

44
Q

What does the Yield function do in Python?

A

Similar to Return except it returns intermediary results, i.e. holds the state in memory.

45
Q

What does Lambda do in Python?

A

Condenses a function into a line.

46
Q

What is MapReduce?

A

Parallel processing on steroids for data processing. Maps the documents across clusters, Shuffles the results into associated piles, Reduces the piles to the required result.

47
Q

What is the difference between a List and a Tuple in Python?

A

A tuple is an immutable object. Immutable meaning the object in memory cannot be changed.

48
Q

What is bias/variance?

A

Bias: How well the model captures the relationship in the training set (accurate center).
Variance: How well the model captures the relationship in the testing set (tight circle).

49
Q

What is spectral clustering?

A

Step 1 - Turn the data into a graph network (usually through k-nearest neighbours as the links).
Step 2 - Turn the graph network into a Laplacian Matrix.
Step 3 - Find the eigenvectors/values of the matrix.
Step 4 - Split the data by the eigenvectors of the 2nd highest eigenvalue.
Step 4.5 - If the data cannot be split in two, can use multiple eigenvalues + k-means to find multiple clusters.

50
Q

What is a Laplacian Matrix?

A

A matrix view of a graph, where for any n, n = the number of neighbours or the total value of neighbour weights.
And n, m = the negative value of the connection between n and m.

51
Q

What is an auto-encoder?

A

A technique in which you “encode” an input (x) into a latent-space representation (h), then “decode” it back to the original form (r).

The idea being that if X ends up being close to R, and H is smaller than X, then you are able to compress X into it’s key features.

52
Q

What is a GAN model?

A

Generative Adversarial Network.

An unsupervised learning model “generates” new data based on a given sample of real data.

A supervised model (discriminator) then looks to predict “real” or “generated” on this data.

After each iteration, the supervised model becomes better at telling real from generated, and the generator becomes better at fooling the discriminator.

53
Q

What is an LSTM model?

A

Long-Short Term Memory.

A variant of an RNN network, except with an ability to make more complex decisions on what previous knowledge to keep / delete.

54
Q

What is the vanishing gradient problem?

A

The impact the early layers will have on the final output dramatically decreases as the number of layers increase, making it very difficult to tune through back-propagation (as there impact to the final result is so small).

Also an issue in RNN where the “recurring” data quickly vanishes.

55
Q

What is an extra-tree classifer?

A

Extra-Trees stands for EXTremely RAndom Trees, similar to Random Forests except the optimal cut-off point isn’t found - simply picks one at random.