3_Pub/Sub Flashcards
1
Q
Tightly-Coupled System
Tightly (direct) coupled systems more likely to fail.
A
2
Q
Loosely-Coupled System
Loosely coupled systems with ‘buffer’ scale have better fault tolerance.
A
3
Q
What is Cloud Pub/Sub?
- Global-scale messaging buffer/coupler.
- Serverless, NoOps (fully managed), global availability, auto-scaling.
- Decouples senders and receivers.
- Real-time or batch.
- 500 million messages per second
- 1TB/s of data
A
4
Q
Pub/Sub Terminology
Topics, Messages, Publishers, Subscribers, Message Store
Messages are base64 encoded and 10Mb or less
A
5
Q
Push and Pull
- Pub/Sub can either push messages to subscribers, or subscribers can pull messages from Pub/Sub (default).
- Push = lower latency, more real-time.
- Push subscribers must be Webhook endpoints that accept POST over HTTPS.
- Pull is ideal for large volumes of messages, and uses batch delivery.
-
Pull is preferred if efficiency and throughput of message processing is required.
- In push delivery, one message per request is sent.
- Pulled messages must be acknowledged.
A
6
Q
IAM
- IAM allows for controlling access at project, topic or subscription level
- Admin/Owner: project, snapshot, subscription, topic level
- Editor: project, snapshot, subscription, topic level
- Viewer: project, snapshot, subscription, topic level
- Publisher: topic level
- Subscriber: snapshot, subscription, topic level
- Service accounts are best practices
- Grant per-topic or per-subscription permissions
- Grant limited access to publish or consume messages.
A
7
Q
At Least Once Delivery
- Each message is delivered at least once for every subscription.
- Undelivered messages are deleted after the message retention duration (range is from 10 minutes to 7 days, with 7 days being default).
- Messages published before a subscription is created will not be delivered to that subscription.
A
8
Q
Out of Order Messaging
- Messages may arrive from multiple sources out of order.
- Pub/Sub does not care about message ordering.
- Dataflow is where out of order messages are processed/resolved.
- It is possible to add message attributes to help with ordering, e.g. timestamps.
- Consider alternatives for transactional ordering.
A
9
Q
Subscription Lifecycle
- Subscriptions expire after 31 days of inactivity.
- New subscriptions with the same name have no relationship to the previous subscription.
- A snapshot on the subscription is the easiest way to safeguard against application deployments, by providing point-in-time recovery. If the previous version of the application needs to be re-deployed, the subscription can be rolled-back to the point in time of the snapshot, and all subsequent messages will be re-processed.
A
10
Q
Common Applications
A
11
Q
Connecting Kafka to GCP
Does Pub/Sub replace Kafka?
- Not always
- Hybrid workloads:
- Interact with existing tools and frameworks
- Don’t need global/scaling capabilities with Pub/Sub
- Can use both: Kafka for on-premises and Pub/Sub for GCP in same data pipeline
How do we connect Kafka to GCP?
Overview on Connectors:
- Open-source plugins that connect Kafka to GCP
- Kafka Connect: one optional “connector service”
- Exist to connect Kafka directly to Pub/Sub, Dataflow and BigQuery (among others)
Additional Terms
- Source connector: An upstream connector: Streams from something to Kafka
- Sink connector: A downstream connector: Streams from Kafka to something
A