AWS Data Collections Flashcards

Question

Influency Kinesis Producer Library (KPL) Batching

Answer 1

Introducing some delay with RecordMaxBufferedTime (default 100ms)

Answer 2

* The KPL can incur an additional processing delay of up to RecordMaxBufferedTime within the library (user configurable) * Larger values of RecordMaxBufferedTime results in higher packing efficiencies and better performance

Answer 3

* Monitor Log files and sends them to Kinesis Data Streams * Java-based agent, built on top of KPL * Install in Linux-based server environments

Answer 4

* Write from multiple directories and multiple streams * Routing feature based on directory / log file * Pre-process data before sending to streams (single line, csv to json, log to json…) * The agent handles file rotation, checkpointing, and retry upon failures * Emits metrics to CloudWatch for monitoring

Answer 5

* Kinesis SDK * Kinesis Client Library (KCL) * Kinesis Connector Library * 3rd party libraries: Spark, Log4J Appenders, Flume, Kafka Connect… * Kinesis Firehose * AWS Lambda

Answer 6

* Classic Kinesis - Records are polled by consumers from a shard * Each shard has 2 MB total aggregate throughput * GetRecords returns up to 10MB of data (then throttle for 5 seconds) or up to 10000 records * Maximum of 5 GetRecords API calls per shard per second = 200ms latency * If 5 consumers application consume from the same shard, means every consumer can poll once a second and receive less than 400 KB/s

Answer 7

* Amazon S3 * DynamoDB * Redshift * ElasticSearch

Answer 8

20 consumers and 40 MB/s

Answer 9

S3, Redshift, Elasticsearch, Splunk

Answer 10

* Kinesis Data Streams * Kinesis Data Firehose * AWS Lambda

Answer 11

200 MiB/sec and 200,000 records/second

Answer 12

400 MiB/second

Answer 13

1 MiB/second and 1,000 records/second

Answer 14

2 MiB/second

Answer 15

* Order processing * Image Processing * Auto scaling queues according to messages. * Buffer and Batch messages for future processing. * Request Offloading

Answer 16

*Fast log and event data collection and processing * Real-Time metrics and reports * Mobile data capture * Real-Time data analytics * Gaming data feed * Complex Stream Processing * Data Feed from “Internet of Things

Answer 17

* Order processing * Image Processing * Auto-scaling queues according to messages. * Buffer and Batch messages for future processing. * Request Offloading

Answer 18

- Is not native to Kinesis - The API call to change the number of shards is UpdateShardCount - Auto-scaling with Lambda

Answer 19

- We deploy IoT devices ('Things') - We configure them and retrieve data from them

Answer 20

Use SQS Extended Client (Java Library)

Answer 21

- Decouple applications - Buffer writes to a database - Handle large loads of messages coming in

Answer 22

- Auto Scaling through CloudWatch!

Answer 23

XML, JSON, Unformatted text

Answer 24

3,000 messages per second (using batching)

Answer 25

* Pay per API Request * Pay per network usage

Answer 26

* Encryption in flight using the HTTPS endpoint * SSE (Server Side Encryption) using KMS * IAM policy * SQS queue access policy

Answer 27

MQTT, WebSockets or HTTP 1.1 protocols

Answer 28

Quickly and securely migrate databases to AWS, resilient, self healing

Answer 29

* On-Premise and EC2 instances databases: Oracle, MS SQL Server, MySQL, MariaDB, PostgreSQL, MongoDB, SAP, DB2 * Azure: Azure SQL Database * Amazon RDS: all including Aurora * Amazon S3

Answer 30

TARGETS: * On-Premise and EC2 instances databases: Oracle, MS SQL Server, MySQL, MariaDB, PostgreSQL, SAP * Amazon RDS * Amazon Redshift * Amazon DynamoDB * Amazon S3 * ElasticSearch Service * Kinesis Data Streams * DocumentDB

Answer 31

Schema Conversion Tool (SCT)

Answer 32

Provides a dedicated private connection from a remote network to your VPC

Answer 33

* Increase bandwidth throughput - working with large data sets – lower cost * More consistent network experience - applications using real-time data feeds * Hybrid Environments (on prem + cloud)

Answer 34

If you want to setup a Direct Connect to one or more VPC in many different regions (same account), you must use a Direct Connect Gateway

Answer 35

Dedicated Connections Hosted Connections

Answer 36

Snowcone, Snowball Edge, Snowmobile

Answer 37

Snowcone, Snowball Edge, Snowmobile

Answer 38

Snowcone, Snowball Edge

Answer 39

80 TB of HDD capacity

Answer 40

42 TB of HDD capacity

Answer 41

* Preprocess data * Machine learning at the edge * Transcoding media streams

Answer 42

Snowcone (smaller) Snowball Edge – Compute Optimized Snowball Edge – Storage Optimized

Answer 43

(a software you install on your computer / laptop) to manage your Snow Family Device

Answer 44

Managed Streaming for Apache Kafka

Answer 45

* Choose the number of AZ (3 – recommended, or 2) * Choose the VPC & Subnets * The broker instance type (ex: kafka.m5.large) * The number of brokers per AZ (can add brokers later) * Size of your EBS volumes (1GB – 16TB)

Answer 46

- Encryption - Network Security - Authentication & Authorization

Answer 47

* Define who can read/write to which topics * Mutual TLS (AuthN) + Kafka ACLs (AuthZ) * SASL/SCRAM (AuthN) + Kafka ACLs (AuthZ) * IAM Access Control (AuthN + AuthZ)

Answer 48

* CloudWatch Metrics * Prometheus (Open-Source Monitoring) * Broker Log Delivery

Answer 49

* Delivery to CloudWatch Logs * Delivery to Amazon S3 * Delivery to Kinesis Data Streams

Answer 50

* You can deploy any Kafka Connect connectors to MSK Connect as a plugin

Answer 51

EBS volumes

Answer 52

Kinesis, IoT, RDS

Answer 53

EMR, S3, SageMaker, Kinesis, RDS

Answer 54

1GB - 16TB

Answer 55

- Kinesis SDK - Kinesis Producer Library (KPL) - Kinesis Agent - Bibliotecas: Spark, Log4J, Appenders, Flume, Kafka Connect, NiFi...

Answer 56

Serviço de streaming da AWS que permite ingestão, processamento e análise de dados em tempo real.

Answer 57

Componente responsável pela ingestão de dados em tempo real.

Answer 58

Componente responsável pelo processamento e análise dos dados. Responsável por ler os dados de 1 ou mais shards

Answer 59

Serviço que permite processar e analisar dados em tempo real utilizando consultas SQL padrão.

AWS Data Collections Flashcards

(90 cards)