Client Interview Flashcards

1
Q

Why do clients want to work with Kubrick consultants?

A
  • Excellent attitudes
  • Critical thinkers
  • Showcase commercial value & insight
  • Excellent and intelligent questioners
  • Trained well, and are proud of it!
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Tell me a bit about yourself.

A
  • Education
  • Experience
  • Interests
  • Why data?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why are you interested in working with data?

A

SPECIFIC TO MICHAEL:
I have always been fascinated by data.
Because data is central to Physics.
Because it fits my interests AND abilities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the Kubrick Core values?

A

Innovation over complacency

Adaptability over inflexibility

Collaboration over isolation

Diversity over homogeneity

Evidence over bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Tell me a bit about the machine learning engineer role.

A

A machine learning engineer applies data science algorithms to solve real-world challenges like price prediction modelling, image classification, and sentiment analysis.

You have to have knowledge not just of the implementation of ML algorithms, but their underlying workings, so that you can mould them to the problem better.

You also have to be a combination of a data scientist, a data engineer, and a cloud engineer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can you see which features impact the model (relate this to p-values).

A

There are a whole host of metrics which can be used to evaluate machine learning models.

There is also a metric called importance which can be calculated for tree models.
This finds the sum of the improvements to the loss function for which a single feature is responsible.

You may also perform feature selection:

  • Coefficient comparison
  • Correlation comparison
  • Best subset selection
  • Forward/backward subset selection
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How would you containerize your model?

A

Containerisation involves encapsulating or packaging up software code and all its dependencies so that it can run uniformly and consistently on any infrastructure.

You might have to list all dependencies and packages which were used, as well as the version of Python used at the time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Talk me through your projects.

A

SPECIFIC TO MICHAEL:

At Uni:
Dissertation: Investigating the effect of dusting thin films on the magnetisation of the material, with respect to the material thickness and composition with the goal that the magnetisation could be tuned in three dimensions.

For Anthropos:
Fall Risk Index: Ground-up research and subsequent development of an index which could determine the probability that an older person would have a fall. The data came from sensors placed around the home. The index had to be developed without any assistance. In the end, the project was reasonably successful, but is still in the early stages of development.

For Kubrick:
EDGAR: Automatic sourcing and subsequent sentiment analysis of annual 10-k reports so that the relationship between broadly negative sentiment and stock price could be investigated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why join Kubrick and why a career in Machine Learning? What have you enjoyed most during the training?

A

SPECIFIC TO MICHAEL:
Why join Kubrick:
-Because I am interested in becoming a machine learning engineer, and the training interested me significantly
-Because the company is incredibly successful and is only increasing in size and scope

Why machine learning:

  • I find it utterly fascinating
  • I want a career in something I will enjoy
  • The industry has so much potential

What have I enjoyed the most:

  • Gaining a deep understanding of the machine learning algorithms, I find it incredibly satisfying to understand a complex function completely
  • The quality of the teaching in general
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Explain what a normal distribution is and the parameters associated with it.

A

A normal distribution is a bell-shaped probability distribution that is symmetric about the mean.

The main parameters are:
Mean: the centre of the curve, the expected value of the data
Standard Deviation: The breadth of the curve, the rate to which it extends or drops away
variance: the square of the standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How would you take a Jupyter notebook and then deploy the code?

A

The code can be run clientside but this is not easily distributed.

The code should be written elsewhere, perhaps in VScode and pushed to a repository on github

It can be deployed on a platform like microsoft azure, after passing the appropriate pipelines and/or unit tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does a python dictionary work in the backend?

A

The dictionary is a unordered list of mutable key-value pairs

Keys are assigned hash values which can be used to find elements in the data structure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the difference between a list and a tuple?

A

A list is mutable - which means that it is possible to change its values after its creation

ion the backend, this means that the data is stored in the same place in memory, and future copies point to that memory location. If that value is changed, it changes in all references to that value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Give some examples of how you would check if data was fit for modelling?

A

Quantity - is there enough
Quality:
-Is there a bias in the data
-Is it representative
-Will it suffer from covariate drift or non-stationarity
-Is it relevant to the problem being solved?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Logistic Regression?

A

A way to perform classification to a binary category.

A linear regression function, which optimises the gradient and intercept for each input feature

is placed inside a sigmoid function so as to convert it to a probability

A cutoff point is used to define the classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How did you get a First in your degree?

A

SPECIFIC TO MICHAEL:

  • Natural abilty/interest
  • Hard Work/ Long Hours
  • Unearned Societal Advantage
17
Q

What do you think of the tech stack we’re using?

A

A tech stack is the combination of technologies a company uses to build and run an application or project.

It would be approproate to a company if the stack is:
Sufficient - i.e. has everything you need
Efficient - Not too large or too small
Cost Efficient -Not too expensive

18
Q

We want to find leaks of methane gas in a gas processing facility. We are investing in a network of sensors to record methane concentrations around the facility. We want to use this data to automatically detect and locate leaks.

How would you go about tackling this problem?

A