W2-Introduction to Machine Learning in Production Flashcards

1
Q

A ML algorithm with low average error isn’t good enough. True/False

A

True, a machine learning system may have low average test set error, but if its performance on a set of disproportionately important examples isn’t good enough, then the machine learning system will still not be acceptable for production deployment. Exp: informational or transactional queries vs navigational queries. The challenge, is that average test set accuracy tends to weight all examples equally, whereas, in web search, some queries are disproportionately important.

one thing you could do is try to give these examples a higher weight. That could work for some applications, but in my experience, just changing the weights of different examples doesn’t always solve the entire problem.
Why low average error isn’t good enough 00:32

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do we use Human Level Performance in determining a baseline performance?

A

using Human Level Performance, which are sometimes abbreviated to HLP, gives you a point of comparison or a baseline that helps you decide where to focus your efforts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Tthe best practices for establishing a baseline are quite different, depending on whether you’re working on unstructured or structured data. True/False

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How can we get started on modeling for a ML project?

A

In order to get started on this first step of coming of the model, here are some suggestions.

When I’m starting on a machine learning project, I almost always start with a quick literature search to see what’s possible, so you can look at online courses, look at blogs, look at open source projects. My advice to you if your goal is to build a practical production system and not to do research is, don’t obsess about finding the latest, greatest algorithm.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

do you need to take into account deployment constraints such as compute constraints when picking a model?

A

Yes, if baseline is already established and you want to deploy the model
No, if the goal is finding a baseline and seeing what’s possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When trying out a learning algorithm for the first time, before running it on all your data, I would urge you to run a few quick sanity checks for your code and your algorithm. Give an example of sanity check

A

For example, I will usually try to overfit a very small training dataset before spending hours or sometimes even overnight or days training the algorithm on a large dataset. Maybe even try to make sure you can fit one training example, especially, if the output is a complex output to see if the algorithm works at all.

The advantage of this is you may be able to train your algorithm on one or a small handful of examples in just minutes or maybe even seconds and this lets you find bugs much more quickly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

By brainstorming different tags to analyze during error analysis, you can segment your data into different categories and then use some questions to try to decide what to prioritize working on. Give examples of these questions.

A

What fraction of errors have that tag?
Of all the data with that tag, what fraction is misclassified?
What fraction of all the data have that tag?
How much room for improvement is there for that tag?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How can we prioritzie the tags of the error analysis to work on?

A

How much room for improvement there is
How frequently that tag appears
How easy it is to improve performance in that tag
How important it is to improve performance in that tag

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

After error analysis, once you’ve decided that there’s a category, or maybe a few categories where you want to improve the average performance, one fruitful approach is to consider ____ or ____ for that one, or maybe a small handful of categories.

A

adding data
improving the quality of that data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is performance auditing?

A

Based on a list of brainstorm ways that the ML system might go wrong, you can then establish metrics to assess performance against these issues on the appropriate slices of data. e.g. Accuracy on differenct genders and ethnics for a speech recognition system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What’s the difference between model-centeric and data centric development?

A

In model centric view, you would hold the data fixed and iteratively improve the code or the model.

In data centric view, we think of the quality of the data as paramount, and you can use tools such as error analysis or data augmentation to systematically improve the data quality, For many applications, I find that if your data is good enough, there are multiple models that will do just fine.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

As a framework for data augmentation, I encourage you to think of how you can create realistic examples that the algorithm does ____ on and humans or other baselines do ____ on.

A

poorly, well

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Now, one way that some people do data augmentation is to generate an augmented data set, and then train the learning algorithm and see if the algorithm does better on the dev set. Then fiddle around with the parameters for data augmentation and change the learning algorithm again and so on. Is this an efficient way of data augmentation? Why?

A

This turns out to be quite inefficient because every time you change your data augmentation parameters, you need to train your new network or train your learning algorithm all over and this can take a long time.

Specifically, here’s a checklist you might go through when you are generating new data.

  • One, does it sound realistic? You want your audio to actually sound like realistic audio of the sort that you want your algorithm to perform on.
  • Two, is the X to Y mapping clear? In other words, can humans still recognize what was said? This is to verify point two here.
  • Three, is the algorithm currently doing poorly on this new data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What’s a data iteration loop in data-centric AI developement?

A

You may have heard of the term model iteration, which refers to iteratively training a model using error analysis and then trying to decide how to improve the model. Taking a data-centric approach AI development, sometimes it’s useful to instead use a data iteration loop where you repeatedly take the data and the -fixed-model, train your learning algorithm, do error analysis, and as you go through this loop, focus on how to add data or improve the quality of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

For a lot of machine learning problems, training sets and dev and test set distribution start at being reasonably similar. But, if you’re using data augmentation, you’re adding to specific parts of the training set such as adding lots of data with cafe noise. So now you’re training set may come from a very different distribution than the dev set and the test set. Is this going to hurt your learning algorithm’s performance?

A

Usually the answer is no with some caveats when you’re working on unstructured data problems.

If you are working on an unstructured data problem and if your model is large, such as a neural network that is quite large and has a large capacity and does low bias.And if the mapping from x to y is clear and by that I mean given only the input x, humans can make accurate predictions. Then it turns out adding accurately labeled data rarely hurts accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Is data augmentation used for structured data as well? if not, what is done instead to fix the problems detected via error analysis?

A

No, For structured data problems, usually you have a fixed set of data and features, making it hard to use data augmentation or collect new data. (the restaurant and vegetarian customers example in the video)

Instead, adding features, can be a more fruitful way to improve the performance of the algorithm to fix problems, identified through error analysis.

17
Q

Over the last several years, there’s been a trend in product recommendations of a shift from collaborative filtering approaches to content based filtering approaches. True/False, define each method of recommendation

A

True
Collaborative filtering approaches is loosely an approach that looks at the user, tries to figure out who is similar to that user and then recommends things to you that people like you also liked.

In contrast, a content based filtering approach will tend to look at you as a person and look at the description of the restaurant or look at the menu of the restaurants and look at other information about the restaurant, to see if that restaurant is a good match for you or not.

18
Q

Error analysis can be harder on structured data problems if there is no good baseline such as human level performance to compare to, and human level performance is hard for structured data because it’s really difficult for people to recommend good restaurants even to each other. True/False

A

True

But error analysis can discover ideas for improvement, so can user feedback and so can benchmarking to competitors.