W3-Introduction to Machine Learning in Production Flashcards

1
Q

What can we do to improve label consistency?

A
  • Having multiple labelers label the same examples
  • When there’s a disagreement, have MLEs or subject matter experts discuss definition of y to reach an agreement.
  • If labelers believe that x doesn’t contain enough information, consider changing x.
  • Iterate until it’s hard to significantly increase agreement
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

We raise Human Level Performance by improving ____ and that ultimately results in better learning outcomes performance as well.

A

label consistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When the ground truth label is externally defined, such as the medical biopsy, then HLP gives an estimate for base error and irreducible error in terms of predicting the outcome of that medical test, the biopsy.
But there are also a lot of problems where the ground truth is just another human label. True/False

A

True

In case of relative ground truth, it may be more useful to see why the ground truth, which is just some other inspector compared to this inspector, don’t agree.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

To summarize, when the ground truth label y comes from a human, HLP being quite a bit less than 100 percent may just indicate that ____.

A

the labeling instructions or labeling convention is ambiguous.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

HLP is not used for structured data at all. True/False

A

False, You already know that structured data problems are less likely to involve human labors and thus HLP is less frequently use. But there are exceptions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

To make sure your system is maintainable, especially when a piece of data upstream ends up needing to be changed, it can be very helpful to do two things, what are they?

A

keep track of data provenance as well as lineage.

Meta-data, data provenance and lineage 04:05

Data provenance refers to where the data came from. Who did you purchase the spam IP address from?

Lineage refers to the sequence of steps needed to get to the end of the pipeline.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Metadata can be very useful for error analysis and spotting unexpected effects or tags or categories of data that have some unusually poor performance or something else, to suggest how to improve your system. True/False

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When your data set is small having ____ train dev and test set can significantly improve your machine learning development process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What questions do we need to answer in the scoping step of a project?

A

The questions that we like to answer, the scoping process are what project or projects should we work on. What are the metrics for success and what are the resources such as data time, people needed to execute this project.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the steps of the scoping process?

A

1- Brainstrom business problems (not AI problems)
2- Brainstorm AI solutions
3- Determine feasibility and value of potential solutions
4- Determine milestones
5- Allocate budget resources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Before you’ve started on the Machine Learning Project, how do you know if this thing can even be built?

A

One way to get a quick sense of feasibility is to use an external benchmark such as the research, literature or other forms of publications, or if different company or even a competitor has managed to build it before.

build a two by two matrix that looks at different cases, depending on whether your problem has unstructured data like speech images or structured data like transaction records. On the other axis, put new (you’re trying to build a system to do a task for the first time) versus existing (you already have some existing system, maybe a machine learning one, maybe not, that is carrying out this task and you’re thinking of scoping out an improvement to an existing system.).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly