W3- Machine Learning Data Lifecycle in Production Flashcards

Question 1

Q

In the event of unexpected pipeline behavior or errors, metadata can be leveraged to analyze the lineage of pipeline components and to help you debug issues. True/False

Introduction to ML Metadata 01:09

Answer

A

True

MLMD helps you understand and analyze all the interconnected parts of your ML pipeline, instead of analyzing them in isolation.

Question 2

Q

In addition to the executor where your code runs, each component also includes two additional parts, the driver and publisher. What do these 3 parts do?

Introduction to ML Metadata 02:30

Answer

A

The executor is where the work of the component is done and that’s what makes different components different. Whatever input is needed for the executor, is provided by the driver, which gets it from the metadata store. Finally, the publisher will push the results of running the executor back into the metadata store. Most of the time, you won’t need to customize the driver or publisher.

Question 3

Q

What’s MLMD?

Introduction to ML Metadata 03:13

Answer

A

MLMD is a library for tracking and retrieving metadata associated with ML developer and data scientist workflows.

ML Metadata TFX Doc

Question 4

Q

MLMD can be used as an integral part of an ML pipeline or it can be used independently. When integrated with an ML pipeline, you have to explicitly interact with MLMD. True/False

Introduction to ML Metadata 03:22

Answer

A

False, MLMD can be used as an integral part of an ML pipeline or it can be used independently. However, when integrated with an ML pipeline, you may not even explicitly interact with MLMD.

Question 5

Q

Objects which are stored in MLMD are referred to as ____. MLMD stores the properties of each artifact in a ____ and stores large objects like data sets on disc, or in a file system or block store.

Introduction to ML Metadata 03:42

Answer

A

artifacts,
relational database,

Question 6

Q

When you’re working with ML metadata, you need to know how data flows between different successive components. Each step in this data flow is described through an entity that you need to be familiar with. At the highest level of MLMD, there are some data entities that can be considered as units.

Artifcats
Execution
Context

define each.

Introduction to ML Metadata 04:42

Answer

A

An artifact is data going in as input or generated as output of a component.
\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
Each execution is a record of any component run during the ML pipeline workflow, along with its associated runtime parameters.
\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
Artifacts and executions can be clustered together for each type of component separately. This grouping is referred to as the context.

Question 7

Q

Definition of:
Feature store
Data Warehouse
Data Lake

Data Lakes 02:20

Answer

A

Feature store: Central repository for storing documented, curated, and access-controlled features, specifically for ML
Data Warehouse: Subject-oriented repository for structured data, optimized for fast read
Data Lakes: Repository of data stored in its raw, natural format

Question 8

Q

As data evolves during its life cycle, does addressing “Monitoring model and data provenance” help ML pipelines to operate properly?

C2-W3-Quiz

Answer

A

No, Monitor provenance is an important aspect of ML pipelines, but it will not help in coping with evolving data.
Me: Evolving data is basically data changes, that require addressing, now monitoring provenance doesn’t address any challenges created by data evolution.

Question 9

Q

TFX components interact with each other by getting artifact information from the metadata store. True/False

C2-W3-Assignment

Question 10

Q

What does ImportSchemaGen do?

C2-W3-Assignment

Answer

A

ImportSchemaGen is a TFX component to import a schema file into the pipeline.

W3- Machine Learning Data Lifecycle in Production Flashcards

(10 cards)