Domain 4: Analysis Flashcards
RANDOM_CUT_FOREST
Kinesis Data Analytics SQL (or Flink) Function for anomaly detection in numeric columns
Kinesis Firehose Buffer Limits
1 to 128 MB
60 to 900 seconds
Kinesis Data Analytics Supported Sources
Kinesis Streams and Kinesis Firehose
Kinesis Data Analytics Supported Destinations
Kinesis Streams, Kinesis Firehose, Lambda
What happens if a record arrives late to a Kinesis Data Analytics application
Record is written to the error stream
In what form does Kinesis Data Analytics provision capacity?
Kinesis Processing Units
How much memory is provided per KPU?
4GB
What is the default number of KPU per Kinesis Data Analytics application?
8
What is the name of the visualization tool in the Elastic Stack?
Kibana
Is ElasticSearch Serverless?
No, still have to scales servers
What should ElasticSearch NOT be used for?
- OLTP (RDS or DynamoDB instead)
- Ad-Hoc Querying (Athena instead)
How can data be imported to ElasticSearch?
Kinesis, DynamoDB, Logstash, Beats, ElasticSearch API
What query engine does Athena use?
Presto
What data formats does Athena support?
CSV, JSON, Parquet, ORC, Avro
Is Athena serverless?
Yes
Does Athena support unstructured data?
Yes
Which data formats are columnar?
ORC and Parquet
Which data formats are splittable?
ORC, Parquet, Avro
Which notebooks can Athena integrate with?
Jupyter, Zeppelin, RStudio
What is the cost rate for Athena?
$5 per TB scanned
Do cancelled queries count toward Athena charges?
Yes
Do failed queries count toward Athena charges?
No
What data format will be the most cost effective in Athena?
Columnar (ORC, Parquet)
Does Athena charge for DDL processing?
No
How can Athena results be encrypted?
Encrypt at rest in S3 using SSE-S3, SSE-KMS, CSE-KMS
Can Athena access S3 in another account?
Yes
How are Athena results encrypted in transit?
Transport Layer Security (TLS)
Is Redshift Serverless or Fully Managed?
Fully Managed?
What is the maximum number of compute nodes in a Redshift cluster?
128
What are the two types of compute nodes that can be selected for a Redshift cluster?
Dense Storage (DS) - uses HDDs for large size at low cost Dense Compute (DC) - uses SSD and lots of memory for faster performance at a higher cost
How many HDDs on an ds2.xlarge Redshift compute node?
3 for a total of 2TB storage
How many HDDs on an ds2.8xlarge Redshift compute node?
24 for a total of 16TB storage
How many SSDs on an dc2.large Redshift compute node?
160GB SSD storage, 15GB RAM
How many SSDs on an dc2.8xlarge Redshift compute node?
2.6TB SSD, 244GB RAM
What determines the number of Node Slices on a Compute Node?
The size of the Compute Node
What kind of data storage does Redshift use for high performance?
Columnar