W3- Machine Learning Data Lifecycle in Production Flashcards

1
Q

In the event of unexpected pipeline behavior or errors, metadata can be leveraged to analyze the lineage of pipeline components and to help you debug issues. True/False

A

True

MLMD helps you understand and analyze all the interconnected parts of your ML pipeline, instead of analyzing them in isolation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In addition to the executor where your code runs, each component also includes two additional parts, the driver and publisher. What do these 3 parts do?

A

The executor is where the work of the component is done and that’s what makes different components different. Whatever input is needed for the executor, is provided by the driver, which gets it from the metadata store. Finally, the publisher will push the results of running the executor back into the metadata store. Most of the time, you won’t need to customize the driver or publisher.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
A

MLMD is a library for tracking and retrieving metadata associated with ML developer and data scientist workflows.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

MLMD can be used as an integral part of an ML pipeline or it can be used independently. When integrated with an ML pipeline, you have to explicitly interact with MLMD. True/False

A

False, MLMD can be used as an integral part of an ML pipeline or it can be used independently. However, when integrated with an ML pipeline, you may not even explicitly interact with MLMD.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Objects which are stored in MLMD are referred to as ____. MLMD stores the properties of each artifact in a ____ and stores large objects like data sets on disc, or in a file system or block store.

A

artifacts,
relational database,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When you’re working with ML metadata, you need to know how data flows between different successive components. Each step in this data flow is described through an entity that you need to be familiar with. At the highest level of MLMD, there are some data entities that can be considered as units.

Artifcats
Execution
Context

define each.

A

An artifact is data going in as input or generated as output of a component.
\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
Each execution is a record of any component run during the ML pipeline workflow, along with its associated runtime parameters.
\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
Artifacts and executions can be clustered together for each type of component separately. This grouping is referred to as the context.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Definition of:
Feature store
Data Warehouse
Data Lake

A

Feature store: Central repository for storing documented, curated, and access-controlled features, specifically for ML
Data Warehouse: Subject-oriented repository for structured data, optimized for fast read
Data Lakes: Repository of data stored in its raw, natural format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

As data evolves during its life cycle, does addressing “Monitoring model and data provenance” help ML pipelines to operate properly?

C2-W3-Quiz

A

No, Monitor provenance is an important aspect of ML pipelines, but it will not help in coping with evolving data.
Me: Evolving data is basically data changes, that require addressing, now monitoring provenance doesn’t address any challenges created by data evolution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

TFX components interact with each other by getting artifact information from the metadata store. True/False

C2-W3-Assignment

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does ImportSchemaGen do?

C2-W3-Assignment

A

ImportSchemaGen is a TFX component to import a schema file into the pipeline.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly