Lecture 4: Data Mesh Flashcards

1
Q

Reasons for Data Mesh

A
  • Fail to bootstrap
  • Fail to scale resources
  • Fail to scale consumers
  • Fail to materialize data-driven value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Operational Data Plane

A
  • Running the business

- Serving the users

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Analytical Data Plane

A
  • Optimising the business

- Improving the user experience

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Timeline

A

Data Warehouse -> Data Lake -> Multi-model data architecture on cloud

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Easiest way to scale a monolithic solution

A
  • Technical decomposition
  • Ingest, process, serve
  • This led to teams that are responsible for one task (change doesn’t happen locally)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Data Engineers in Monolithic Architecture

A
  • Data engineers are siloed and expected in the monolithic piece in the middle built by highly specialized people
  • This leads to disconnected executions of, start with one end and never finish
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Monolithic architecture

A
  • Centralised Architecture
  • Hyper-specialized Silo Delivery
  • Disconnected execution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Data Mesh principles

A
  • Domain-driven Data Ownership Architecture
  • Data as a product
  • Self-service Infrastructure as a Platform
  • Federated Computational Governance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Domain-driven Data Ownership Architecture

A
  • Bring ownership to the domains

Flowing data -> Serving data at source
One canonical model -> Multiple models
Source of truth -> Most relevant copy
Pipeline as first concern -> Domains internal implementation
Technology-driven composition -> Domain-oriented distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Data as a product

A
  • Ownership comes with accountability and responsibility of sharing that data as a product
  • Discoverability & Understanding

Data as an asset -> Data as a product
Byproduct -> Product
Data as an input of compute -> Data compute as one unit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Self-service Infrastructure as a Platform

A
  • Addressing the cost of ownership -> Enable autonomy
  • Self-serve platform empowers the data domain teams to create and share valuable and useful data with less overhead

Big platforms -> Protocols
Specialists -> Generalists
Imperative -> Declarative
Mechanisms -> Experiences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Federated computational governance

A
  • Enable interoperability
  • Enable ecosystems to play

Centralised team of experts -> federated team
Responsibility for data quality -> Responsibility for defining aspects of security
Responsibility for canonical data modelling -> Responsible for modelling polysemes
Measure success based on data volume -> Measure success based on value-generated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is data mesh for shift?

A

Data Mesh is a paradigm shift in architecture to bring the operational and analytical plane together with a feedback loop

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Challenging task seen from the Governance Perspective

A

Balance between global (central authority) and local governance (autonomous components)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Enforcing GDPR compliance of data processing across Data Mesh

A

Data product should implement a set of mechanisms (e.g., checking access token) to add GDPR-aware processing of data and provide a set of policy enforcement points for configuring those mechanisms. These capabilities should be part of the management or control interface (port) of the data product

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Managed Spark service should provide .. to the domain team

A
  1. Programmatic API
  2. Multi-tenancy
  3. Support Generalists
17
Q

Standardization efforts

A
  • Data product interface (OpenAPI)
  • Data Query Language (SQL)
  • Data Modeling (ER)
18
Q

Cross-cutting polices that the federated governance system should enforce globally

A
  1. Data privacy and protection
  2. Data Quality
  3. Data access control and audit
19
Q

Data Mesh Approach

A
  • Cultural change
  • Alignment across operational and analytic data domains
  • A distributed architecture for on-prem and multi-cloud data