AWS Data Collections Flashcards

1
Q

AWS Types Collection

A
  • Real - Time Collection
  • Near Real-Time Collections
  • Batch
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Real-Time Collections Services

A

Kinesis Data Streams
SQS
IoT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Near Real Time Collections

A

Kinesis Data Firehose
Data Migration Service

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Batch - Historical Analytics Services

A

Snowball
Data Pipeline

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explane Kinesis Data Streams Service

A

Managed service that allows you to collect, process and analyze real-time streaming data from various sources such as IoT, mobile devices, server logs, social networks and other real-time data sources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Producers Kinesis Data Streams

A

Applications, Client, SDK, KPL, Kinesis Agent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Consumers Kinesis Data Streams

A

Apps (KCL, SDK), Lambda, Kinesis Data Firehose and Kinesis Data Analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Types Capacity Modes Kinesis Data Streams

A

Provisioned Mode
On-Deman Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In Kinesis Data Streams each shard gets in Provisioned Mode

A

1 MB/s or 1000 records per second

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Kinesis In On-demand mode default capacity provisioned

A

4 MB/s in or 4000 records per second

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Points in Kinesis Data Streams Security

A

IAM - Control Access

Encryption usin HTTPS endpoints

KMS encryption
encryption/decryption of data on client side

VPC Endpoints

Monitor API using CloudTrail

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explane Kinesis Producer SDK - PutRecords

A

API’s used PutRecords one and many records

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

PutRecords uses…

A

Batching and increases less HTTP requests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

PutRecords use batching…

A

less HTTP requests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Kinesis Producer SDK - If we go over the limits

A

ProvisionedThroughputExceeded if we go over the limits

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Managed AWS sources for Kinesis Data Streams

A

CloudWatch Logs, AWS IoT, Kinesis Data Analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

We need to send data asynchronously API to Kinesis…

A

Key Producer Library (KPL)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How to submit metrics Kinesis Producer Library

A

CloudWatch for monitoring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

KPL Batching some delay with…

A

RecordMaxBufferedTime (default 100 ms)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Define Features Kinesis Agent

A

Monitor log files send to KDS
Java-based agent
Install Linux server environments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Data Collection Services

A

Amazon Kinesis
AWS IoT Core
AWS Snowball
SQS
DMS
Direct Connect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

O que é ProvisionedThroughputExceeded?

A

Exceção que pode ocorrer no Kinesis quando aplicação atinge o limite de provisionamento de taxa de transferência definido para o stream.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Causes ProvisionedThroughputExceeded Exceptions

A

exceeding MB/s or TPS for any shard

Make sure you don’t have a hot shard (such as your partition key is bad

and too much data goes to that partition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Solution for ProvisionedThroughputExceeded Exceptions

A
  • Retries with backoff
  • Increase shards (scaling)
  • Ensure your partition key is a good one
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Influency Kinesis Producer Library (KPL)
Batching

A

Introducing some
delay with RecordMaxBufferedTime (default 100ms)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Kinesis Producer Library – When not to
use

A
  • The KPL can incur an additional processing delay of up to RecordMaxBufferedTime within the library (user configurable)
  • Larger values of RecordMaxBufferedTime results in higher packing efficiencies and better performance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Kinesis Agent functions

A
  • Monitor Log files and sends them to Kinesis Data Streams
  • Java-based agent, built on top of KPL
  • Install in Linux-based server environments
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Features Kinesis Agent

A
  • Write from multiple directories and multiple streams
  • Routing feature based on directory / log file
  • Pre-process data before sending to streams (single line, csv to json, log to
    json…)
  • The agent handles file rotation, checkpointing, and retry upon failures
  • Emits metrics to CloudWatch for monitoring
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Elements Kinesis Consumers Classic

A
  • Kinesis SDK
  • Kinesis Client Library (KCL)
  • Kinesis Connector Library
  • 3rd party libraries: Spark,
    Log4J Appenders, Flume,
    Kafka Connect…
  • Kinesis Firehose
  • AWS Lambda
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Features Kinesis Consumer SDK - GetRecords

A
  • Classic Kinesis - Records
    are polled by consumers from
    a shard
  • Each shard has 2 MB total
    aggregate throughput
  • GetRecords returns up to
    10MB of data (then throttle for
    5 seconds) or up to 10000
    records
  • Maximum of 5 GetRecords
    API calls per shard per
    second = 200ms latency
  • If 5 consumers application
    consume from the same
    shard, means every consumer
    can poll once a second and
    receive less than 400 KB/s
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Kinesis Connector Library write data to:

A
  • Amazon S3
  • DynamoDB
  • Redshift
  • ElasticSearch
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Each consumer per shards in Kinesis Enhanced Fan Out

A

2 MB/s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Means consumers and MB/s per shard in Kinesis Enhanced Fan Out

A

20 consumers and 40 MB/s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Tolerate latency in Standard Consumers

A

~ 200 ms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Latency requirements enhanced consumers

A

~ 70 ms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Destiny Kinesis Firehose

A

S3, Redshift, Elasticsearch, Splunk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

True or False: Spark / KCL read from KDF

A

False

38
Q

You can stream CloudWatch Logs into…

A
  • Kinesis Data Streams
  • Kinesis Data Firehose
  • AWS Lambda
39
Q

Data Stream write capacity on-demand maximum

A

200 MiB/sec and 200,000 records/second

40
Q

Data Stream read capacity on-demand maximum per consumer

A

400 MiB/second

41
Q

Data Stream write capacity in provisioned mode

A

1 MiB/second and 1,000 records/second

42
Q

Data Stream read capacity in provisioned mode

A

2 MiB/second

43
Q

SQS Use cases

A
  • Order processing
  • Image Processing
  • Auto scaling queues according to messages. * Buffer and Batch messages for future processing.
  • Request Offloading
44
Q

Kinesis Data Streams use cases

A

*Fast log and event data collection and processing
* Real-Time metrics and reports
* Mobile data capture
* Real-Time data analytics
* Gaming data feed
* Complex Stream Processing
* Data Feed from “Internet of Things

45
Q

SQS Use cases

A
  • Order processing
  • Image Processing
  • Auto-scaling queues according to messages. * Buffer and Batch messages for future processing.
  • Request Offloading
46
Q

Features
Kinesis Auto Scaling

A
  • Is not native to Kinesis
  • The API call to change the number of shards is UpdateShardCount
  • Auto-scaling with Lambda
47
Q

IoT Overview

A
  • We deploy IoT devices (‘Things’)
  • We configure them and retrieve data from them
48
Q

SQS Limit per message sent

A

256 KB

49
Q

SQS how to send large messages

A

Use SQS Extended Client (Java Library)

50
Q

SQS use cases

A
  • Decouple applications
  • Buffer writes to a database
  • Handle large loads of messages coming in
51
Q

SQS can be integrated with…

A
  • Auto Scaling through CloudWatch!
52
Q

SQS Max messages per consumers

A

120.000

53
Q

SQS Message content format

A

XML, JSON, Unformatted text

54
Q

SQS FIFO queues support maximum messages per second

A

3,000 messages per second (using
batching)

55
Q

SQS Pricing mode

A
  • Pay per API Request
  • Pay per network usage
56
Q

SQS Types Security

A
  • Encryption in flight using the HTTPS endpoint
  • SSE (Server Side Encryption) using KMS
  • IAM policy
  • SQS queue access policy
57
Q

IoT messages using the protocols types

A

MQTT, WebSockets or HTTP 1.1
protocols

58
Q

Data Migration Service

A

Quickly and securely migrate databases to AWS, resilient, self healing

59
Q

DMS Sources

A
  • On-Premise and EC2
    instances databases: Oracle,
    MS SQL Server, MySQL,
    MariaDB, PostgreSQL,
    MongoDB, SAP, DB2
  • Azure: Azure SQL Database
  • Amazon RDS: all including
    Aurora
  • Amazon S3
60
Q

DMS Targets

A

TARGETS:
* On-Premise and EC2
instances databases: Oracle,
MS SQL Server, MySQL,
MariaDB, PostgreSQL, SAP
* Amazon RDS
* Amazon Redshift
* Amazon DynamoDB
* Amazon S3
* ElasticSearch Service
* Kinesis Data Streams
* DocumentDB

61
Q

DMS Convert your Database’s Schema from one engine to another

A

Schema Conversion Tool (SCT)

62
Q

Direct Connect (DX)

A

Provides a dedicated private connection from a remote network to your VPC

63
Q

Use cases Direct Connect

A
  • Increase bandwidth throughput - working with large data sets – lower cost
  • More consistent network experience - applications using real-time data feeds
  • Hybrid Environments (on prem + cloud)
64
Q

Direct Connect Gateway

A

If you want to setup a Direct Connect to one or more VPC in many different regions (same account), you must use a Direct Connect Gateway

65
Q

Direct Connect – Connection Types

A

Dedicated Connections
Hosted Connections

66
Q

Services AWS Snow Family

A

Snowcone, Snowball Edge, Snowmobile

67
Q

Data Migration Services Snow Family

A

Snowcone, Snowball Edge, Snowmobile

68
Q

Edge Computing services

A

Snowcone, Snowball Edge

69
Q

Snowball Edge Storage Optimized capacity

A

80 TB of HDD capacity

70
Q

Snowball Edge Compute Optimized capacity

A

42 TB of HDD capacity

71
Q

AWS Snowcone capacity

A

8 TB

72
Q

Use cases of Edge Computing

A
  • Preprocess data
  • Machine learning at the edge
  • Transcoding media streams
73
Q

Snow Family – Edge Computing

A

Snowcone (smaller)
Snowball Edge – Compute Optimized
Snowball Edge – Storage Optimized

74
Q

AWS OpsHub

A

(a software you install on your computer /
laptop) to manage your Snow Family Device

75
Q

Amazon MSK is:

A

Managed Streaming for
Apache Kafka

76
Q

MSK – Configurations

A
  • Choose the number of AZ
    (3 – recommended, or 2)
  • Choose the VPC & Subnets
  • The broker instance type
    (ex: kafka.m5.large)
  • The number of brokers per
    AZ (can add brokers later)
  • Size of your EBS volumes
    (1GB – 16TB)
77
Q

MSK – Security

A
  • Encryption
  • Network Security
  • Authentication & Authorization
78
Q

MSK Authentication & Authorization
(important):

A
  • Define who can read/write to which topics
  • Mutual TLS (AuthN) + Kafka ACLs (AuthZ)
  • SASL/SCRAM (AuthN) + Kafka ACLs
    (AuthZ)
  • IAM Access Control (AuthN + AuthZ)
79
Q

MSK – Monitoring

A
  • CloudWatch Metrics
  • Prometheus (Open-Source Monitoring)
  • Broker Log Delivery
80
Q

MSK options Broken Log Delivery

A
  • Delivery to CloudWatch Logs
  • Delivery to Amazon S3
  • Delivery to Kinesis Data Streams
81
Q

MSK Connect

A
  • You can deploy any Kafka Connect connectors to MSK Connect as a
    plugin
82
Q

MSK Data is Stored on…

A

EBS volumes

83
Q

Producers Examples MSK

A

Kinesis, IoT, RDS

84
Q

Consumers examples MSK

A

EMR, S3, SageMaker, Kinesis, RDS

85
Q

MKS Size of your EBS volumes

A

1GB - 16TB

86
Q

Componentes do Kinesis Producers

A
  • Kinesis SDK
  • Kinesis Producer Library (KPL)
  • Kinesis Agent
  • Bibliotecas: Spark, Log4J, Appenders, Flume, Kafka Connect, NiFi…
87
Q

O que é o Kinesis Data Stream?

A

Serviço de streaming da AWS que permite ingestão, processamento e análise de dados em tempo real.

88
Q

O que é o Kinesis Data Streams Producers?

A

Componente responsável pela ingestão de dados em tempo real.

89
Q

O que é o Kinesis Data Streams Consumers

A

Componente responsável pelo processamento e análise dos dados. Responsável por ler os dados de 1 ou mais shards

90
Q

O que é o Kinesis Data Analytics

A

Serviço que permite processar e analisar dados em tempo real utilizando consultas SQL padrão.