Hard ML Problems Flashcards
Difficulties of clustering.
it is challenging to determine what action to take based on the cluster. You can try to assign a meaning to a cluster, but this can be tricky because the model might not group by criteria that you find intuitive.
What is an alternate approach when clustering?
One alternative approach is to label some items before you cluster, and then try to propagate those labels across the entire cluster.
What is one option to determine what constitutes as an anomaly to get labeled data?
One option is to define a heuristic and use it to label anomalies. However, once you’ve defined this heuristic, you might as well use the heuristic in your production system, since an ML model can’t beat the heuristic used to train it.
What are correlations?
mutual relationships or connections between two or more things
What is causation?
one event or factor causing another
Why is causation hard to determine?
it is easy to see that something happened, but much harder to understand why it happened.
You can’t determine causation from only __ __.
observational data
How to determine causation?
You would need to run an experiment, comparing users who didn’t see the review with similar users who did. In general, you need to intervene in the world—run an experiment—to determine causation; you can’t see it in purely observational data.
If you have no __ to train a model, then machine learning cannot help you
data