ML_Production (Quora) Flashcards

1
Q

domain definition

A
  • milions of Q&A
  • millions of users
  • thousands of topics
  • main features: relevance, quality, demand
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

common ML algos

A
  • logistic regresssion
  • ElasticNets
  • matrix factorization
  • random forests
  • DL
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

implicit vs explicit feedback

A
  • implicit feedback is more dense as available to all users (rating a movie (implicit) vs watching a movie(explicit))
  • better correlated with A/B tests
  • but may not correlate with long term user retention
  • solution: combine implicit+explicit feedback
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

model learning dependencies

A

it will learn according to:

  • training data
  • target function/variable
  • metric used
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

using ensembles

A
  • flexible in using many different models
  • flexible in using many approaches
  • treat each model as a feature and add it to the ensemble (i.e. in a linear ‘supermodel’)
  • avoid feedback loops
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

feature engineering

A

main characteristics of a feature:

  • reusable (across models),
  • transfomable (applying diff functions)
  • interpreatable,
  • reliable (easy to monitor, to fix bugs)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

ML system goals

A

strive for all:

  • allow for experiments
  • reusable
  • easy-to-use
  • flexible
  • scalable
  • performant
  • use same tools in production and research
  • implement abstraction layers for easy access
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

model easy to debug?

A

important because:

  • determines the model used
  • gives answers when something fails
  • determines the features to use
  • determines the selections of tools for its implementation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

distributed machine learning?

A
most of practical ML cand be done with a multi-core machine with:
- data sampling
- offline schemes
- efficient parallel code
- optimizing computation
must take into account:
- costs
- latency
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

hyperparameter optimization

A
  • using Bayesian optimization (GP) better than CV

- tools like spearmint, AutoML, hyperopt

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

presentation bias

A
  • user will click only what the app is showing that in turn is decided by the model based on its predictive analysis
  • address it for example by improving the probability of user’s click on a position
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

collaborative filtering at a glance

A

a couple ways to do it:
- user-similarity: cosine sim using vector representation on common rated items
- item-simmilarity: cosine sim using vector representation on common users’ ratings
problems:
- cold start: no info to begin with
- popularity bias: tends to recommend popular items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

hybridization methods

A
  • weighted models based on importance
  • switching model used based on situation
  • mixed: results presented together
  • feature combination (from diff sources for models) for input of a single model
  • cascade, feature augmentation: using output of one technique as input of another
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

learning to rank approaches

A
  • pointwise: using regression or classification (logistic regression)
  • pairwise: minimize the inversions in ranking
How well did you know this?
1
Not at all
2
3
4
5
Perfectly