Analysis Flashcards
From which sources can the input for Kinesis analytics be obtained ?
- MySQL and Kinesis Data Streams
- DynamoDB and Kinesis Firehose deliver streams
- Kinesis data streams and Kinesis Firehose delivery streams
- Kinesis Data Streams and DynamoDB
Kinesis data streams and Kinesis Firehose delivery streams (Kinesis Analytics can only monitor streams from Kinesis, but both data streams and Firehose are supported.)
After real-time analysis has been performed on the input source, where may you send the processed data for further processing?
Kinesis Data Stream or Firehose (While you might in turn connect S3 or Redshift to your Kinesis Analytics output stream, Kinesis Analytics must have a stream as its input, and a stream or Lambda function as its output.)
If a record arrives late to your application during stream processing, what happens to it?
The record is written to the error stream
You have heard from your AWS consultant that Amazon Kinesis Data Analytics elastically scales the application to accommodate the data throughput. What though is default capacity of the processing application in terms of memory?
32 GB (Kinesis Data Analytics provisions capacity in the form of Kinesis Processing Units (KPU). A single KPU provides you with the memory (4 GB) and corresponding computing and networking. The default limit for KPUs for your application is eight.)
You have configured data analytics and have been streaming the source data to the application. You have also configured the destination correctly. However, even after waiting for a while, you are not seeing any data come up in the destination. What might be a possible cause?
- Issue with IAM role
- Mismatched name for the output stream
- Destination service is currently unavailable
- Any of above
Any of above
How can you ensure maximum security for your Amazon ES cluster?
- Bind with a VPC
- Use security groups
- Use IAM policies
- Use access policies associated with the Elasticsearch domain creation
- All of the above
All of the above
As recommended by AWS, you are going to ensure you have dedicated master nodes for high performance. As a user, what can you configure for the master nodes?
- The count and instance types of the master nodes
- The EBS volume associated with the node
- The upper limit of network traffic / bandwidth
- All of the above
The count and instance types of the master nodes
Which are supported ways to import data into your Amazon ES domain?
- Directly from an RDS instance
- Via Kinesis, Logstash, and Elasticsearch’s API’s
- Via Kinesis, SQS, and Beats
- Via SQS, Firehose, and Logstash
Via Kinesis, Logstash, and Elasticsearch’s API’s
What can you do to prevent data loss due to nodes within your ES domain failing?
Maintain snapshots of the Elasticsearch Service domain (Amazon ES created daily snapshots to S3 by default, and you can create them more often if you wish.)
You are going to setup an Amazon ES cluster and have it configured in your VPC. You want your customers outside your VPC to visualize the logs reaching the ES using Kibana. How can this be achieved?
- Use a reverse proxy
- Use a VPN
- Use VPC
- Use VPC Direct Connect
- Any of the above
Any of the above
As a Big Data analyst, you need to query/analyze data from a set of CSV files stored in S3. Which of the following serverless services helps you with this?
- AWS Glacier
- AWS EMR
- AWS Athena
- AWS Redshift
AWS Athena
What are two columnar data formats supported by Athena?
Parquet and ORC
Your organization is querying JSON data stored in S3 using Athena, and wishes to reduce costs and improve performance with Athena. What steps might you take?
Convert the data from JSON to ORC format, and analyze the ORC data with Athena
When using Athena, you are charged separately for using the AWS Glue Data Catalog. True or False ?
True
Which of the following statements is NOT TRUE regarding Athena pricing?
- Amazon Athena charges you for cancelled queries
- Amazon Athena charges you for failed queries
- You will get charges less when using a columnar format
- Amazon Athena is priced per query and charges based on the amount of data scanned by the query
Amazon Athena charges you for failed queries