Interview Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What is linear regression?

A

It is a linear approach to modelling the relationship between an explanatory variable and a response variable. 1 Variable simple linear regression many variables multiple linear regressions. Example number of covid tests with number of cases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the assumptions of linear regression?

A

1) There should be a linear relationship between the explanatory and response variable - Scatter plot should check - See constant straight line - not like non-linear e.g. time and actual COVID cases
2) The explanatory variables should not exhibit multi-collinearity (variance inflation factor aim for no more than 2.5) means variance inflated by factor 2.5
3) Homoscedasticity equal distribution of errors - plot residuals, fit with constant variance term
4) For any fixed value of the explanatory variables the response is normally distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What metrics can you use to evaluate linear regression models?

A

R2 - Percentage of variation explained by the model
Adjusted R2 takes into account addition of additional parameters to reduce overfitting - relative fit

Mean Squared Area - measures average of the squared difference between observed and actual - absolute measure of model fit

Root Mean Square Error - measure of distance between actual value and predicted value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the difference between an absolute fit and a relative fit?

A

Relative fit compares the fitted model to the null model, absolute fit just look at the fit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is overfitting?

A

It’s where we’ve fit our model to the data too well and we’re going to struggle to generalise to other data. Example with prime ministers, observed by test data accuracy being lower than training data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you deal with overfitting?

A

Cross validation, more data (can help find signal), remove irrelevant input features, Early stopping (monitor iterations then stop), regularisation techniques (pruning, dropout, penalty parameter to cost function, Ensembling)

https://elitedatascience.com/overfitting-in-machine-learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you deal with underfitting

A

Increase the number of features more data won’t help

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you evaluate the performance of a classifier

A

Confusion matrix is a good place to start. Accuracy is the number of predictions you got right/total, null accuracy comparison is what you would have got if you just assumed the most frequent class.

https://www.ritchieng.com/machine-learning-evaluate-classification-model/

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a type I error?

A

False Positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a type 2 error?

A

False Negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does cross-validation work

A

Split data into k folds then take one out for test set train on the k-1 separate folds and then average test vs train performance - Check this!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How to handle missing data?

A

Very circumstantial, Why is it missing, is it random, is there a reason for it not being there, how much of it is missing? Could be useful i.e. internet outage I hear it a lot you don’t want this it’s full of missing data….Some algorithms will deal with this XGBoost. Mean/median imputation for continuous column, impute most frequent value if categorical. K-NN imputation (computationally expensive)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the main types of machine learning?

A

Supervised, Unsupervised, semi-supervised, reinforcement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is precision vs recall?

A

Precision is TP/(TP+FP) - True Positive/Actual results (Percentage of results which are relevant) ‘What proportion of positive identifications was actually correct?’

Recall is TP/(TP+FN) - True Positive/Predicted Results (Percentage of total relevant results “What proportion of actual positives was identified correctly?”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How is F1 defined?

A

F1 = 2*1/(1/precision)+1/(1/Recall))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How is deep learning different to machine learning?

A

Artificial Intelligence is a technique which enables machines to mimic human behavior.
Machine Learning is a subset of AI technique which uses statistical methods to enable machines to improve with experience.

Deep learning is a subset of ML which make the computation of multi-layer neural network feasible. It uses Neural networks to simulate human-like decision making.

Deep learning does the feature engineering for you, Deep learning general performs poorly with small amounts of data but excels with large amounts of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What open source datasets have you used?

A

ONS postcode centroids and local authority shape files

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is selection bias?

A

Selection bias, in general, is a problematic situation in which error is introduced due to a non-random population sample

Selecting phone calls only not web traffic etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How do you go about influencing technical and non-technical audiences?

A

Use analogies for non-technical in terms they understand, keep in mind the purpose of why you’re telling them something, what do they need to get from this. Storytelling, an example etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is anova?

A

Analysis of variance asks do the samples come from different population. A one way ANOVA is one factor accounted for (2+ levels), a two way ANOVA is two factors (each 2+ levels) investigated at the same time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How does a one-way ANOVA work?

A

Hypothesis test with only one single factor or categorical variable, compare 3 or more sample means (if 2 use t-test) Null hypothesis no difference in means, alternative hypothesis there is a difference in means. Compute variance within samples, compute variance between sample means then produce Fstatistic from the ratio ‘between group variability’/’within group variability’

  • Does age, sex or income have an effect whether someone becomes prime minister?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are the assumptions of ANOVA

A

The responses for each factor level have a normal population distribution.
These distributions have the same variance.
The data are independent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Explain K-Means

A

We can select K using our knowledge or by doing it empirically. Initialise k-points at random positions in the feature space, these are known as the cluster centroids. The Euclidean distance is calculated from each observation to the centroid and assigned to the closest centroid.

Inertia measures how far a sample is from a cluster centroid - lower values of inertia are better

Do an elbow plot inertia vs number of clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How to choose K

A

Elbow plot (K vs sum of total within sum of squares) look for characteristic elbow or Silhouette analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Give examples of where a false negative is more important than a false positive

A

Covid-19

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is logistic regression?

A

A classification model that takes in input variables and relates it to whether a binary category is the result

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is the null hypothesis and how do we state it?

A

(in a statistical test) the hypothesis that there is no significant difference between specified populations, any observed difference being due to sampling or experimental error.

The observed patterns are due to random chance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is and how do you deal with heteroskedasticity?

A

uneven distribution of errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is a p-value?

A

In statistics, the p-value is the probability of obtaining results at least as extreme as the observed results of a statistical hypothesis test, assuming that the null hypothesis is correct.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is statistical power and how do you calculate it?

A

Power is the probability of not making a type II error,
To increase power
1. Increase the effect size (the difference between the null and alternative values) to be detected
2. Increase the sample size(s)
3. Decrease the variability in the sample(s)
4. Increase the significance level (alpha) of the test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

How do you find the correlation between a categorical variable and a continuous variable?

A

You can’t; at least, not if the categorical variable has more than two levels. If it has two levels, you can use point biserial correlation.

But, with a categorical variable that has three or more levels, the notion of correlation breaks down. Correlation is a measure of the linear relationship between two variables. That makes no sense with a categorical variable.

There are ways to measure the relationship between a continuous and categorical variable; probably the closest to correlation is a log linear model. Regression (which some other people said would be good) imposes a dependent and independent variable which correlation does not.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is a p-value?

A

In statistics, the p-value is the probability of obtaining results at least as extreme as the observed results of a statistical hypothesis test, assuming that the null hypothesis is correct.

Likelihood that the null hypothesis is correct

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

How do you deal with imbalance?

A

Use the right evaluation metrics (not just accuracy), under over sample data (k fold must be done before oversampling - otherwise we overfit on specific artificial result )

34
Q

Explain the bias/variance tradeoff

A

High bias - underfitting, high variance - overfitting - middle ground is just right thus there is a tradeoff

35
Q

How do you deal with imbalance?

A

Use the right evaluation metrics (not just accuracy), under over sample data (k fold must be done before oversampling - otherwise we overfit on specific artificial result )

if you want the minority class - oversample it or undersample the majority class

increase the cost of misclassifying the minority class

36
Q

What is the difference between a box plot and a histogram?

A

Whilst both show the distribution of data, they communicate it differently. Histograms show us the shape of the distribution, boxplots show us the quartiles and the tukey fences and are better for comparing multiple plots.

37
Q

Compare logistic regression to random forest

A

Random forest doesn’t assume a linear relationship

LG more explanable and scales better

38
Q

Assume you need to generate a predictive model using multiple regression. Explain how you intend to validate this model

A

Adjusted R^2 - adding more variables increases the R2 value

Cross Validation

39
Q

When would you use random forests Vs SVM and why

A

Random forests allow you to determine the feature importance. SVM’s can’t do this.

Random forests are much quicker and simpler to build than an SVM.

For multi-class classification problems, SVMs require a one-vs-rest method, which is less scalable and more memory intensive.

40
Q

What is the difference between union and union all in SQL?

A

union only combines distinct values, union all create duplicates

41
Q

Why is dimension reduction important?

A

1) It reduces storage space
2) Removal of multi-collinearity improves the interpretation of the parameters of the machine learning model
3) It becomes easier to visualize the data when reduced to very low dimensions such as 2D or 3D
4) It avoids the curse of dimensionality

42
Q

Why is Naive Bayes so bad? How would you improve a spam detection algorithm that uses naive Bayes?

A

assumes inputs are uncorrelated. Garden flavoured ice cream

43
Q

What are the drawbacks of a linear model?

A

A linear model holds some strong assumptions that may not be true in application. It assumes a linear relationship, multivariate normality, no or little multicollinearity, no auto-correlation, and homoscedasticity

A linear model can’t be used for discrete or binary outcomes.

You can’t vary the model flexibility of a linear model.

44
Q

What is the significance of a Cost/Loss function?

A

It is the function telling us how badly our model maps X -> y

45
Q

When should you use precision-recall curve over ROC?

A

When dataset is imbalanced

46
Q

When should you use precision-recall curve over ROC?

A

When dataset is imbalanced

https://www.quora.com/What-is-the-difference-between-a-ROC-curve-and-a-precision-recall-curve-When-should-I-use-each

47
Q

Explain what resampling methods are and why they are useful. Also explain their limitations

A

Classical statistical parametric tests compare observed statistics to theoretical sampling distributions. Resampling a data-driven, not theory-driven methodology which is based upon repeated sampling within the same sample.

Resampling refers to methods for doing one of these
Estimating the precision of sample statistics (medians, variances, percentiles) by using subsets of available data (jackknifing) or drawing randomly with replacement from a set of data points (bootstrapping)
Exchanging labels on data points when performing significance tests (permutation tests, also called exact tests, randomization tests, or re-randomization tests)
Validating models by using random subsets (bootstrapping, cross validation)

48
Q

Give an example of an unsupervised learning technique for continuous data

A

Dimensionality reduction

49
Q

How can you deal with outliers?

A

Ignore, remove, log transform…might want to keep depending on business problem like cyber security etc.

50
Q

If I repeat a cluster analysis will I get the same result?

A

No could find a local minima not the global

51
Q

What is the difference between a bar graph and a histogram?

A

A bar graph is for discrete data whereas a histogram is for continuous data

52
Q

KNN

A

Creates decision boundary

53
Q

How does KNN work?

A

Creates decision boundary

54
Q

What are the residuals in linear regression?

A

Vertical distance between fitted line and points

55
Q

What is a cost function?

A

A measure of how badly our model maps x to y

56
Q

What is the difference between long and tall data?

A

Long data in one context in another, wide 1 feature to 1 column

57
Q

What are parametric vs non-parametric models?

A

A parametric model is an ml model that captures all the information about its predictions in a finite number of parameters

58
Q

What is meant by the term confidence interval?

A

Range of plausible values for an unknown parameter

59
Q

Draw a graph of precision vs accuracy what is more precise a 95% confidence interval or a 99%?

A

99%

60
Q

How do you calculate a z-score? (use for questions comparing two results with different means and SD)

A

z = (x-u)/sigma

61
Q

What is a standard deviation?

A

Measure for dispersion

62
Q

How does hierarchical clustering work?

A

starts with everything as a cluster then merges with nearest neighbour etc.

63
Q

What is a q-q plot?

A

plotting the quantiles of a variable against each other will give a straight line if the variable is normally distributed

64
Q

How does scaling work?

A

Juice analogy, normalization bound number between e.g. 0-1, standardisation zero mean and a variance of 1

Feature scaling also speeds up gradient descent

65
Q

How would you use a chi square test for feature selection in machine learning

A

Checks the independence of two variables

chi square test compares proportions of discrete categories

66
Q

Why are outliers a problem?

A

Standard error increases - increases variance

67
Q

How does CNN work?

A

This will be your ‘favourite’

68
Q

How does scaling work?

A

Juice analogy, normalization bound number between e.g. 0-1, standardisation zero mean and a variance of 1

Feature scaling is essential for machine learning algorithms that calculate distances between data.

KNN
K-means
Principle component analysis

Whereas random forest (rules) and naive bayes (weights) are unaffected by scaling

Feature scaling also speeds up gradient descent

69
Q

How do a one or a two sample t-test differ

A

is the mean of the sample different to a given value

is the mean of the sample different to the mean of the other sample

70
Q

Fichers test vs chi square test

A

Chi squared assumes large sample size

71
Q

Fichers test vs chi square test

A

Chi squared assumes large sample size (p value is approximate)
Fischer is the two sided version

72
Q

What is a confounding variable?

A

A confounding variable, also called a confounder or confounding factor, is a third variable in a study examining a potential cause-and-effect relationship

73
Q

How do you deal with confounding?

A

Blocking make sure equal proportions of a confounding variable are in treatment and control group

74
Q

What is statistical significance vs effect size

A

Statistical significance is how certain we are that an effect happened. The effect size is how much difference that effect makes

75
Q

What is statistical significance vs effect size

A

Statistical significance is how certain we are that an effect happened. The effect size is how much difference that effect makes. You can get to effect size using Cohen’s D

76
Q

What values can power take?

A

0 won’t detect 1 will always detect

77
Q

What values can power take?

A

0 won’t detect 1 will always detect as power increases type 2 effects decreases

78
Q

What is a q-q plot?

A

plotting the quantiles of a variable against theoretical quantiles of a normal distribution will give a straight line if the variable is normally distributed

79
Q

What is the problem of missing data in ML

A

It tends to introduce bias - skewing results and reducing accuracy

80
Q

Compare ridge and lasso regression

A

scikit learn series