Data Pipelines on cloud Streaming Flashcards

Question 1

Q

Log

Answer

A

Append-only data structure, applications ignore details of source

Question 2

Q

Unifed log

Answer

A

collect events from many source systems, enable applications to operate on these event streams as they wish

events automatically deleted after certain time, read once

Question 3

Q

Distributed Unified Log

Answer

A

Log lives across a cluster of machines

Good for scalability and durability (replication)

Question 4

Q

Ordered events

Answer

A

Events in a shard (partition of unified log) have sequential IDs unique to their shard, meaning the ordering is local

Question 5

Q

Single-Event processing

Answer

A

Single event produces zero or more events

Question 6

Q

Multiple-event processing

Answer

A

multiple events collectively produce zero or more events.

Aggregate events, pattern match, reorder

Question 7

Q

Amazon Kinesis Data Streams

Answer

A

real-time streaming service that allows the ingestion and processing of large data streams. Composed of shards and can be scaled by splitting or merging shards

Question 8

Q

Shard

Answer

A

Unit of capacity in Kinesis data streams.

Each shard provides 1MBps of data input

Question 9

Q

Partition Key

Answer

A

User-defined key that determines how records are distributed across shards in Kinesis stream (load balancing support)

Question 10

Q

Re-sharding

Answer

A

Process of splitting or merging shards to scale a data stream either up or down

Question 11

Q

Data Blob

Answer

A

The actual data being streamed through Kinesis

Question 12

Q

Sequence Number

Answer

A

Number assigned to each record by the shard to maintain order within shard

Question 13

Q

Retention period

Answer

A

Maximum amount of time (up to 7 days) that data can be stored in Kinesis before being deleted

Question 14

Q

Kinesis Data Firehose

Answer

A

Service that automatically delivers streaming data to AWS services like S3, Redshidt, ElasticSearch

Question 15

Q

AWS Lambda

Answer

A

Serverless computing (FaaS), used to build modular back-end systems, can process streaming data without need to manage any server infrastructure (scales automatically with size of data stream, paying only for compute time, event-driven)

Question 16

Q

What are some of the operational and administrative activities that Lambda takes care of for you so you can focus on only your code?

Answer

Study These Flashcards

A

Balances memory and CPU
Provisioning Capacity
Monitoring fleet health
Applying security patches

Question 17

Q

FaaS

Answer

Study These Flashcards

A

Write single-purpose stateless functions

(function does just one thing! thats why its modular)

Question 18

Q

Data Pipeline Pattern

Answer

Study These Flashcards

A

Architectural solution to problems in software design

Question 19

Q

Command Pattern

Answer

Study These Flashcards

A

The Command Pattern encapsulates a request as an object, allowing it to be executed later or passed around in the system without knowing when or how it will be executed.

Question 20

Q

Pipes and filters pattern

Answer

Study These Flashcards

A

It decomposes a complex process into a sequence of smaller, manageable steps (filters) connected by pipes, where each filter processes and passes data to the next

Question 21

Q

Messaging Pattern

Answer

Study These Flashcards

A

It decouples different parts of a system by using a queue to send and receive messages, allowing independent processing and handling of tasks.

Question 22

Q

Priority queue pattern

Answer

Study These Flashcards

A

It allows for tasks or messages to be prioritized, ensuring that high-priority items are processed first, while lower-priority tasks wait.

Data Pipelines on cloud Streaming Flashcards

(22 cards)