Client Interview Flashcards
Why do clients want to work with Kubrick consultants?
- Excellent attitudes
- Critical thinkers
- Showcase commercial value & insight
- Excellent and intelligent questioners
- Trained well, and are proud of it!
Tell me a bit about yourself.
- Education
- Experience
- Interests
- Why data?
Why are you interested in working with data?
SPECIFIC TO MICHAEL:
I have always been fascinated by data.
Because data is central to Physics.
Because it fits my interests AND abilities
What are the Kubrick Core values?
Innovation over complacency
Adaptability over inflexibility
Collaboration over isolation
Diversity over homogeneity
Evidence over bias
Tell me a bit about the machine learning engineer role.
A machine learning engineer applies data science algorithms to solve real-world challenges like price prediction modelling, image classification, and sentiment analysis.
You have to have knowledge not just of the implementation of ML algorithms, but their underlying workings, so that you can mould them to the problem better.
You also have to be a combination of a data scientist, a data engineer, and a cloud engineer.
How can you see which features impact the model (relate this to p-values).
There are a whole host of metrics which can be used to evaluate machine learning models.
There is also a metric called importance which can be calculated for tree models.
This finds the sum of the improvements to the loss function for which a single feature is responsible.
You may also perform feature selection:
- Coefficient comparison
- Correlation comparison
- Best subset selection
- Forward/backward subset selection
How would you containerize your model?
Containerisation involves encapsulating or packaging up software code and all its dependencies so that it can run uniformly and consistently on any infrastructure.
You might have to list all dependencies and packages which were used, as well as the version of Python used at the time.
Talk me through your projects.
SPECIFIC TO MICHAEL:
At Uni:
Dissertation: Investigating the effect of dusting thin films on the magnetisation of the material, with respect to the material thickness and composition with the goal that the magnetisation could be tuned in three dimensions.
For Anthropos:
Fall Risk Index: Ground-up research and subsequent development of an index which could determine the probability that an older person would have a fall. The data came from sensors placed around the home. The index had to be developed without any assistance. In the end, the project was reasonably successful, but is still in the early stages of development.
For Kubrick:
EDGAR: Automatic sourcing and subsequent sentiment analysis of annual 10-k reports so that the relationship between broadly negative sentiment and stock price could be investigated.
Why join Kubrick and why a career in Machine Learning? What have you enjoyed most during the training?
SPECIFIC TO MICHAEL:
Why join Kubrick:
-Because I am interested in becoming a machine learning engineer, and the training interested me significantly
-Because the company is incredibly successful and is only increasing in size and scope
Why machine learning:
- I find it utterly fascinating
- I want a career in something I will enjoy
- The industry has so much potential
What have I enjoyed the most:
- Gaining a deep understanding of the machine learning algorithms, I find it incredibly satisfying to understand a complex function completely
- The quality of the teaching in general
Explain what a normal distribution is and the parameters associated with it.
A normal distribution is a bell-shaped probability distribution that is symmetric about the mean.
The main parameters are:
Mean: the centre of the curve, the expected value of the data
Standard Deviation: The breadth of the curve, the rate to which it extends or drops away
variance: the square of the standard deviation
How would you take a Jupyter notebook and then deploy the code?
The code can be run clientside but this is not easily distributed.
The code should be written elsewhere, perhaps in VScode and pushed to a repository on github
It can be deployed on a platform like microsoft azure, after passing the appropriate pipelines and/or unit tests
How does a python dictionary work in the backend?
The dictionary is a unordered list of mutable key-value pairs
Keys are assigned hash values which can be used to find elements in the data structure
What is the difference between a list and a tuple?
A list is mutable - which means that it is possible to change its values after its creation
ion the backend, this means that the data is stored in the same place in memory, and future copies point to that memory location. If that value is changed, it changes in all references to that value.
Give some examples of how you would check if data was fit for modelling?
Quantity - is there enough
Quality:
-Is there a bias in the data
-Is it representative
-Will it suffer from covariate drift or non-stationarity
-Is it relevant to the problem being solved?
What is Logistic Regression?
A way to perform classification to a binary category.
A linear regression function, which optimises the gradient and intercept for each input feature
is placed inside a sigmoid function so as to convert it to a probability
A cutoff point is used to define the classification