Data Pipelines on cloud Streaming Flashcards

1
Q

Log

A

Append-only data structure, applications ignore details of source

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Unifed log

A

collect events from many source systems, enable applications to operate on these event streams as they wish

events automatically deleted after certain time, read once

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Distributed Unified Log

A

Log lives across a cluster of machines

Good for scalability and durability (replication)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Ordered events

A

Events in a shard (partition of unified log) have sequential IDs unique to their shard, meaning the ordering is local

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Single-Event processing

A

Single event produces zero or more events

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Multiple-event processing

A

multiple events collectively produce zero or more events.

Aggregate events, pattern match, reorder

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Amazon Kinesis Data Streams

A

real-time streaming service that allows the ingestion and processing of large data streams. Composed of shards and can be scaled by splitting or merging shards

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Shard

A

Unit of capacity in Kinesis data streams.

Each shard provides 1MBps of data input

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Partition Key

A

User-defined key that determines how records are distributed across shards in Kinesis stream (load balancing support)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Re-sharding

A

Process of splitting or merging shards to scale a data stream either up or down

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Data Blob

A

The actual data being streamed through Kinesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Sequence Number

A

Number assigned to each record by the shard to maintain order within shard

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Retention period

A

Maximum amount of time (up to 7 days) that data can be stored in Kinesis before being deleted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Kinesis Data Firehose

A

Service that automatically delivers streaming data to AWS services like S3, Redshidt, ElasticSearch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

AWS Lambda

A

Serverless computing (FaaS), used to build modular back-end systems, can process streaming data without need to manage any server infrastructure (scales automatically with size of data stream, paying only for compute time, event-driven)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are some of the operational and administrative activities that Lambda takes care of for you so you can focus on only your code?

A

Balances memory and CPU
Provisioning Capacity
Monitoring fleet health
Applying security patches

17
Q

FaaS

A

Write single-purpose stateless functions

(function does just one thing! thats why its modular)

18
Q

Data Pipeline Pattern

A

Architectural solution to problems in software design

19
Q

Command Pattern

A

The Command Pattern encapsulates a request as an object, allowing it to be executed later or passed around in the system without knowing when or how it will be executed.

20
Q

Pipes and filters pattern

A

It decomposes a complex process into a sequence of smaller, manageable steps (filters) connected by pipes, where each filter processes and passes data to the next

21
Q

Messaging Pattern

A

It decouples different parts of a system by using a queue to send and receive messages, allowing independent processing and handling of tasks.

22
Q

Priority queue pattern

A

It allows for tasks or messages to be prioritized, ensuring that high-priority items are processed first, while lower-priority tasks wait.