Feature Store Flashcards
Online feature store
low latency KV store that holds the latest versions of pre-computed features
Online = latest
Offline = historical
Offline feature store
Holds all historical values of features to be used for training and batch inference
Batch inference
aka offline inference
Generate many predictions all at once
Example: Netflix rec system. If recommendations are generated in batch each night, the user will not be able to see personally tailored recommendations upon first signing up.
Online inference
Real time inference
Dynamic inference
Generate prediction in real time upon request
Can generate predictions for never before seen data (new users)
Example DoorDash estimated time of delivery. Not a batch job than ran the night before!
Materialization
Process of precomputing feature data by executing a feature pipeline and publishing the results to the online and offline feature store
What is a Tecton feature view?
A feature view defines one or more features whose values are generated when the feature view’s transformations run
Tecton entity
- A collection of join keys used when multiple features are joined together
What are the benefits of a feature store?
Uses one feature definition for training and serving
Reuse features across models (feature discovery)
Manage feature lineage and versioning
Orchestration of feature compute
Storage of features
What triggers rematerialization of feature values?
Changes to pipelines (transforms and entities)
Feature Store vs Feature Platform
Feature stores typically store and serve features (meaning they have an API for low latency retrieval)
A feature platform also includes things like defining, testing, orchestrating, monitoring, and managing features
Tecton workspace
cloud env where the tecton repo is applied to update the workspace configuration
live workspaces are intended for serving
development workspaces do not materialize feature data
Spine
DataFrame consisting of rows and columns that identifies the feature data to be read from the offline store
Batch Feature View
Reads from a batch data source and materialize features on schedule
Stream Feature Views
Transform a stream data source and materialize features in near real-time
On Demand Feature Views
Request-time transformations on batch, stream, or request data sources