Lecture 4: Data Mesh Flashcards
Reasons for Data Mesh
- Fail to bootstrap
- Fail to scale resources
- Fail to scale consumers
- Fail to materialize data-driven value
Operational Data Plane
- Running the business
- Serving the users
Analytical Data Plane
- Optimising the business
- Improving the user experience
Timeline
Data Warehouse -> Data Lake -> Multi-model data architecture on cloud
Easiest way to scale a monolithic solution
- Technical decomposition
- Ingest, process, serve
- This led to teams that are responsible for one task (change doesn’t happen locally)
Data Engineers in Monolithic Architecture
- Data engineers are siloed and expected in the monolithic piece in the middle built by highly specialized people
- This leads to disconnected executions of, start with one end and never finish
Monolithic architecture
- Centralised Architecture
- Hyper-specialized Silo Delivery
- Disconnected execution
Data Mesh principles
- Domain-driven Data Ownership Architecture
- Data as a product
- Self-service Infrastructure as a Platform
- Federated Computational Governance
Domain-driven Data Ownership Architecture
- Bring ownership to the domains
Flowing data -> Serving data at source
One canonical model -> Multiple models
Source of truth -> Most relevant copy
Pipeline as first concern -> Domains internal implementation
Technology-driven composition -> Domain-oriented distribution
Data as a product
- Ownership comes with accountability and responsibility of sharing that data as a product
- Discoverability & Understanding
Data as an asset -> Data as a product
Byproduct -> Product
Data as an input of compute -> Data compute as one unit
Self-service Infrastructure as a Platform
- Addressing the cost of ownership -> Enable autonomy
- Self-serve platform empowers the data domain teams to create and share valuable and useful data with less overhead
Big platforms -> Protocols
Specialists -> Generalists
Imperative -> Declarative
Mechanisms -> Experiences
Federated computational governance
- Enable interoperability
- Enable ecosystems to play
Centralised team of experts -> federated team
Responsibility for data quality -> Responsibility for defining aspects of security
Responsibility for canonical data modelling -> Responsible for modelling polysemes
Measure success based on data volume -> Measure success based on value-generated
What is data mesh for shift?
Data Mesh is a paradigm shift in architecture to bring the operational and analytical plane together with a feedback loop
Challenging task seen from the Governance Perspective
Balance between global (central authority) and local governance (autonomous components)
Enforcing GDPR compliance of data processing across Data Mesh
Data product should implement a set of mechanisms (e.g., checking access token) to add GDPR-aware processing of data and provide a set of policy enforcement points for configuring those mechanisms. These capabilities should be part of the management or control interface (port) of the data product