W2-Introduction to Machine Learning in Production Flashcards
A ML algorithm with low average error isn’t good enough. True/False
True, a machine learning system may have low average test set error, but if its performance on a set of disproportionately important examples isn’t good enough, then the machine learning system will still not be acceptable for production deployment. Exp: informational or transactional queries vs navigational queries. The challenge, is that average test set accuracy tends to weight all examples equally, whereas, in web search, some queries are disproportionately important.
one thing you could do is try to give these examples a higher weight. That could work for some applications, but in my experience, just changing the weights of different examples doesn’t always solve the entire problem.
Why low average error isn’t good enough 00:32
How do we use Human Level Performance in determining a baseline performance?
using Human Level Performance, which are sometimes abbreviated to HLP, gives you a point of comparison or a baseline that helps you decide where to focus your efforts
Tthe best practices for establishing a baseline are quite different, depending on whether you’re working on unstructured or structured data. True/False
True
How can we get started on modeling for a ML project?
In order to get started on this first step of coming of the model, here are some suggestions.
When I’m starting on a machine learning project, I almost always start with a quick literature search to see what’s possible, so you can look at online courses, look at blogs, look at open source projects. My advice to you if your goal is to build a practical production system and not to do research is, don’t obsess about finding the latest, greatest algorithm.
do you need to take into account deployment constraints such as compute constraints when picking a model?
Yes, if baseline is already established and you want to deploy the model
No, if the goal is finding a baseline and seeing what’s possible
When trying out a learning algorithm for the first time, before running it on all your data, I would urge you to run a few quick sanity checks for your code and your algorithm. Give an example of sanity check
For example, I will usually try to overfit a very small training dataset before spending hours or sometimes even overnight or days training the algorithm on a large dataset. Maybe even try to make sure you can fit one training example, especially, if the output is a complex output to see if the algorithm works at all.
The advantage of this is you may be able to train your algorithm on one or a small handful of examples in just minutes or maybe even seconds and this lets you find bugs much more quickly.
By brainstorming different tags to analyze during error analysis, you can segment your data into different categories and then use some questions to try to decide what to prioritize working on. Give examples of these questions.
What fraction of errors have that tag?
Of all the data with that tag, what fraction is misclassified?
What fraction of all the data have that tag?
How much room for improvement is there for that tag?
How can we prioritzie the tags of the error analysis to work on?
How much room for improvement there is
How frequently that tag appears
How easy it is to improve performance in that tag
How important it is to improve performance in that tag
After error analysis, once you’ve decided that there’s a category, or maybe a few categories where you want to improve the average performance, one fruitful approach is to consider ____ or ____ for that one, or maybe a small handful of categories.
adding data
improving the quality of that data
What is performance auditing?
Based on a list of brainstorm ways that the ML system might go wrong, you can then establish metrics to assess performance against these issues on the appropriate slices of data. e.g. Accuracy on differenct genders and ethnics for a speech recognition system.
What’s the difference between model-centeric and data centric development?
In model centric view, you would hold the data fixed and iteratively improve the code or the model.
In data centric view, we think of the quality of the data as paramount, and you can use tools such as error analysis or data augmentation to systematically improve the data quality, For many applications, I find that if your data is good enough, there are multiple models that will do just fine.
As a framework for data augmentation, I encourage you to think of how you can create realistic examples that the algorithm does ____ on and humans or other baselines do ____ on.
poorly, well
Now, one way that some people do data augmentation is to generate an augmented data set, and then train the learning algorithm and see if the algorithm does better on the dev set. Then fiddle around with the parameters for data augmentation and change the learning algorithm again and so on. Is this an efficient way of data augmentation? Why?
This turns out to be quite inefficient because every time you change your data augmentation parameters, you need to train your new network or train your learning algorithm all over and this can take a long time.
Specifically, here’s a checklist you might go through when you are generating new data.
- One, does it sound realistic? You want your audio to actually sound like realistic audio of the sort that you want your algorithm to perform on.
- Two, is the X to Y mapping clear? In other words, can humans still recognize what was said? This is to verify point two here.
- Three, is the algorithm currently doing poorly on this new data.
What’s a data iteration loop in data-centric AI developement?
You may have heard of the term model iteration, which refers to iteratively training a model using error analysis and then trying to decide how to improve the model. Taking a data-centric approach AI development, sometimes it’s useful to instead use a data iteration loop where you repeatedly take the data and the -fixed-model, train your learning algorithm, do error analysis, and as you go through this loop, focus on how to add data or improve the quality of the data.
For a lot of machine learning problems, training sets and dev and test set distribution start at being reasonably similar. But, if you’re using data augmentation, you’re adding to specific parts of the training set such as adding lots of data with cafe noise. So now you’re training set may come from a very different distribution than the dev set and the test set. Is this going to hurt your learning algorithm’s performance?
Usually the answer is no with some caveats when you’re working on unstructured data problems.
If you are working on an unstructured data problem and if your model is large, such as a neural network that is quite large and has a large capacity and does low bias.And if the mapping from x to y is clear and by that I mean given only the input x, humans can make accurate predictions. Then it turns out adding accurately labeled data rarely hurts accuracy.