Deep Dive - Change Data Capture Flashcards

1
Q

What is Change Data Capture (CDC)?

A

CDC involves reading a database’s transaction log and streaming out the transactions as they happen to Kafka/Kinesis/DB. It is used to share database changes to downstream apps or datastores. It is an improvement on making a batch query to the same database.

For MySQL, the transaction log is called binlog (Binary Log). It is an append-only log that records all data-manipulating operations (INSERT, UPDATE, DELETE).

A CDC process will monitor the binlog and propogate the changes to downstream systems (Kafka, other DB).
Debezium is open-source software that does such a task.

CDC is good for tracking DELETEs, which is often difficult to do when trying to query the original DB.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are some use cases for CDC?

A
  1. Data replication for analytics. Maybe an Insights team wants to query database about sales, view cart, other events. Want to separate the primary DB from the analytics DB. Can use CDC for this.
  2. The ChatGPT question that you were asked. Only talk to the prompt service if database table for Prompts has a new entry.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How fast are CDC solutions?

A

a few milliseconds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are some limitations of log based CDC?

A

some operations (ALTER, TRUNCATE) are not captured in transaction log.

If destination DB is down, transaction log should be kept intact (huh?)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly