Datawarehousing Flashcards
Is Redshift good for ELT?
Yes
Can Lambda Expression be trigged by IOT?
Yes
Can Lambda Expression be trigged by Kinesis?
Yes
Can Apache Spark notebooks run on EMR?
Yes
Can Apache Spark read from S3?
Yes
Can Apache Zeppelin be used to visualize data in Amazon Redshift?
Yes
Is Redshift a columnar database?
Yes
Is Redshift MPP?
Yes
Is Redshift ANSI SQL Compliant?
Yes
In addition, to data compression and columnar storage, how is I/O reduced in Redshift?
Zone maps : A zone map exists for each 1 MB block, and consists of in-memory metadata that tracks the minimum and maximum values within the block, Hence if you sort the column e.g. a date_column If it is sorted then it will be faster to find the block in which data is stored. Amazon redshift does not use indexes as any conventional database.
Can Redshift Clusters be managed via API?
Yes
Does redshift support ODBC and JDBC?
Yes
Describe Redshift architecture?
1 Leader Node. Communicating to multiple Compute nodes that house the data
Does Redshift encrypt data at rest?
Yes AES-256
Does Amazon Redshift take care of key management?
Yes
Anti-Patterns for Redshift
Small datasets, OLTP, Unstructured data, BLOB data
What are the 2 methods used by Kinesis Firehouse?
PutRecord and PutRecordBatch
What is the max size for a Firehouse PutRecord?
1000 Kb
Kinesis Agent
Java agent is a stand-alone software which can send information to Kinesis and Kinesis Firehose. It can be installed on Linux servers
Can the Kinesis Agent monitor multiple files and write to multiple streams?
Yes
What is the max buffer size for Kinesis Firehose?
3Mb
Can Kinesis Firehouse invoke a Lambda Function?
Yes
Why should a record separator be added to Kinesis Stream data?
Kinesis stream bundles records together. If you don’t add a record separator, you can’t split the records later.
What are buffer sizes for S3?
1 MB - 128 MB
What are the buffer intervals for S3?
60 to 900 Seconds
Can Kinesis Firehouse dynamically raise the buffer size?
Yes
What does the Redshift copy command do?
Copies data from dynamoDB or S3 into Redshift existing table
Before you send a record to Kinesis Firehouse, what do you need to do?
Flatten the record and make sure it is in UTF-8 encoded into a single JSON object
What is the elastic search buffer size range?
1 MB to 100 MB
What is the buffer interval for elastic search
60 to 900 seconds
Describe Kinesis Analytics
A SQL based query that can aggregate data in a stream and output to a kinesis stream or a lambda function
What is the maximum time a Lambda Function can run?
5 minutes
How do Kinesis Stream and Kinesis Firehose differ?
Kinesis Streams. The more customizable option, Streams is best suited for developers building custom applications or streaming data for specialized needs. The customizability of the approach, however, requires manual scaling and provisioning. Data typically is made available in a stream for 24 hours, but for an additional cost, users can gain data availability for up to seven days.
Kineses Firehose. The simpler approach, Firehose handles loading data streams directly into AWS products for processing. Scaling is handled automatically, up to gigabytes per second, and allows for batching, encrypting, and compressing. Firehose also allows for streaming to S3, Elasticsearch Service, or Redshift, where data can be copied for processing through additional services.
What are some destinations for Kinesis Analytics?
Firehouse, Streams, S3, Redshift, Elastic Search
Can data be enriched via Kinesis Stream?
Yes, but it must be stored in S3 and then an in-application reference table is created by Kinesis stream
What is a common use case for Kinesis Stream?
Read streaming data and analyze and aggregate it and drop to EMR or Redshift