ML_Production (Quora) Flashcards

Question 1

Q

domain definition

Answer

A

milions of Q&A
millions of users
thousands of topics
main features: relevance, quality, demand

Question 2

Q

common ML algos

Answer

A

logistic regresssion
ElasticNets
matrix factorization
random forests
DL

Question 3

Q

implicit vs explicit feedback

Answer

A

implicit feedback is more dense as available to all users (rating a movie (implicit) vs watching a movie(explicit))
better correlated with A/B tests
but may not correlate with long term user retention
solution: combine implicit+explicit feedback

Question 4

Q

model learning dependencies

Answer

A

it will learn according to:

training data
target function/variable
metric used

Question 5

Q

using ensembles

Answer

A

flexible in using many different models
flexible in using many approaches
treat each model as a feature and add it to the ensemble (i.e. in a linear ‘supermodel’)
avoid feedback loops

Question 6

Q

feature engineering

Answer

A

main characteristics of a feature:

reusable (across models),
transfomable (applying diff functions)
interpreatable,
reliable (easy to monitor, to fix bugs)

Question 7

Q

ML system goals

Answer

A

strive for all:

allow for experiments
reusable
easy-to-use
flexible
scalable
performant
use same tools in production and research
implement abstraction layers for easy access

Question 8

Q

model easy to debug?

Answer

A

important because:

determines the model used
gives answers when something fails
determines the features to use
determines the selections of tools for its implementation

Question 9

Q

distributed machine learning?

Answer

A

most of practical ML cand be done with a multi-core machine with:
- data sampling
- offline schemes
- efficient parallel code
- optimizing computation
must take into account:
- costs
- latency

Question 10

Q

hyperparameter optimization

Answer

A

using Bayesian optimization (GP) better than CV

- tools like spearmint, AutoML, hyperopt

Question 11

Q

presentation bias

Answer

A

user will click only what the app is showing that in turn is decided by the model based on its predictive analysis
address it for example by improving the probability of user’s click on a position

Question 12

Q

collaborative filtering at a glance

Answer

A

a couple ways to do it:
- user-similarity: cosine sim using vector representation on common rated items
- item-simmilarity: cosine sim using vector representation on common users’ ratings
problems:
- cold start: no info to begin with
- popularity bias: tends to recommend popular items

Question 13

Q

hybridization methods

Answer

A

weighted models based on importance
switching model used based on situation
mixed: results presented together
feature combination (from diff sources for models) for input of a single model
cascade, feature augmentation: using output of one technique as input of another

Question 14

Q

learning to rank approaches

Answer

A

pointwise: using regression or classification (logistic regression)
pairwise: minimize the inversions in ranking