Client Interview Flashcards

Question 1

Q

Why do clients want to work with Kubrick consultants?

Answer

A

Excellent attitudes
Critical thinkers
Showcase commercial value & insight
Excellent and intelligent questioners
Trained well, and are proud of it!

Question 2

Q

Tell me a bit about yourself.

Answer

A

Education
Experience
Interests
Why data?

Question 3

Q

Why are you interested in working with data?

Answer

A

SPECIFIC TO MICHAEL:
I have always been fascinated by data.
Because data is central to Physics.
Because it fits my interests AND abilities

Question 4

Q

What are the Kubrick Core values?

Answer

A

Innovation over complacency

Adaptability over inflexibility

Collaboration over isolation

Diversity over homogeneity

Evidence over bias

Question 5

Q

Tell me a bit about the machine learning engineer role.

Answer

A

A machine learning engineer applies data science algorithms to solve real-world challenges like price prediction modelling, image classification, and sentiment analysis.

You have to have knowledge not just of the implementation of ML algorithms, but their underlying workings, so that you can mould them to the problem better.

You also have to be a combination of a data scientist, a data engineer, and a cloud engineer.

Question 6

Q

How can you see which features impact the model (relate this to p-values).

Answer

A

There are a whole host of metrics which can be used to evaluate machine learning models.

There is also a metric called importance which can be calculated for tree models.
This finds the sum of the improvements to the loss function for which a single feature is responsible.

You may also perform feature selection:

Coefficient comparison
Correlation comparison
Best subset selection
Forward/backward subset selection

Question 7

Q

How would you containerize your model?

Answer

A

Containerisation involves encapsulating or packaging up software code and all its dependencies so that it can run uniformly and consistently on any infrastructure.

You might have to list all dependencies and packages which were used, as well as the version of Python used at the time.

Question 8

Q

Talk me through your projects.

Answer

A

SPECIFIC TO MICHAEL:

At Uni:
Dissertation: Investigating the effect of dusting thin films on the magnetisation of the material, with respect to the material thickness and composition with the goal that the magnetisation could be tuned in three dimensions.

For Anthropos:
Fall Risk Index: Ground-up research and subsequent development of an index which could determine the probability that an older person would have a fall. The data came from sensors placed around the home. The index had to be developed without any assistance. In the end, the project was reasonably successful, but is still in the early stages of development.

For Kubrick:
EDGAR: Automatic sourcing and subsequent sentiment analysis of annual 10-k reports so that the relationship between broadly negative sentiment and stock price could be investigated.

Question 9

Q

Why join Kubrick and why a career in Machine Learning? What have you enjoyed most during the training?

Answer

A

SPECIFIC TO MICHAEL:
Why join Kubrick:
-Because I am interested in becoming a machine learning engineer, and the training interested me significantly
-Because the company is incredibly successful and is only increasing in size and scope

Why machine learning:

I find it utterly fascinating
I want a career in something I will enjoy
The industry has so much potential

What have I enjoyed the most:

Gaining a deep understanding of the machine learning algorithms, I find it incredibly satisfying to understand a complex function completely
The quality of the teaching in general

Question 10

Q

Explain what a normal distribution is and the parameters associated with it.

Answer

A

A normal distribution is a bell-shaped probability distribution that is symmetric about the mean.

The main parameters are:
Mean: the centre of the curve, the expected value of the data
Standard Deviation: The breadth of the curve, the rate to which it extends or drops away
variance: the square of the standard deviation

Question 11

Q

How would you take a Jupyter notebook and then deploy the code?

Answer

A

The code can be run clientside but this is not easily distributed.

The code should be written elsewhere, perhaps in VScode and pushed to a repository on github

It can be deployed on a platform like microsoft azure, after passing the appropriate pipelines and/or unit tests

Question 12

Q

How does a python dictionary work in the backend?

Answer

A

The dictionary is a unordered list of mutable key-value pairs

Keys are assigned hash values which can be used to find elements in the data structure

Question 13

Q

What is the difference between a list and a tuple?

Answer

A

A list is mutable - which means that it is possible to change its values after its creation

ion the backend, this means that the data is stored in the same place in memory, and future copies point to that memory location. If that value is changed, it changes in all references to that value.

Question 14

Q

Give some examples of how you would check if data was fit for modelling?

Answer

A

Quantity - is there enough
Quality:
-Is there a bias in the data
-Is it representative
-Will it suffer from covariate drift or non-stationarity
-Is it relevant to the problem being solved?

Question 15

Q

What is Logistic Regression?

Answer

A

A way to perform classification to a binary category.

A linear regression function, which optimises the gradient and intercept for each input feature

is placed inside a sigmoid function so as to convert it to a probability

A cutoff point is used to define the classification

Question 16

Q

How did you get a First in your degree?

Answer

Study These Flashcards

A

SPECIFIC TO MICHAEL:

Natural abilty/interest
Hard Work/ Long Hours
Unearned Societal Advantage

Question 17

Q

What do you think of the tech stack we’re using?

Answer

Study These Flashcards

A

A tech stack is the combination of technologies a company uses to build and run an application or project.

It would be approproate to a company if the stack is:
Sufficient - i.e. has everything you need
Efficient - Not too large or too small
Cost Efficient -Not too expensive

Question 18

Q

We want to find leaks of methane gas in a gas processing facility. We are investing in a network of sensors to record methane concentrations around the facility. We want to use this data to automatically detect and locate leaks.

How would you go about tackling this problem?

Answer

Study These Flashcards

A

Client Interview Flashcards

(18 cards)