AI, Machine Learning, Analytics Technology and Services Flashcards
What is RedShift?
RedShift is a data warehousing service used for reporting and analytics that can store and query petabytes of data.
What does RedShift allow you to do with multiple sources of data?
RedShift allows you to combine multiple sources of data into one place, enabling you to perform analytics on the data.
What does MPP stand for and what does it mean in the context of RedShift?
MPP stands for massively parallel processing. In the context of RedShift, it means that RedShift is capable of running complex queries in parallel.
What are the benefits of using RedShift for OLAP?
RedShift is designed for Online Analytical Processing (OLAP), making it great for analytics and reporting. It provides automated data management, including backup, replication, and scaling without downtime.
What is RedShift Serverless?
RedShift Serverless is a serverless option of RedShift that simplifies the use of RedShift by eliminating the need to manage any infrastructure. It automatically provisions and scales everything.
What are some use cases for RedShift?
Some use cases for RedShift include complex querying and reporting for businesses that need to analyze large volumes of data, integration with data lakes for querying structured and unstructured data, and operational analytics for making time-sensitive decisions based on real-time data.
What are the advantages of using RedShift Serverless for unpredictable workloads?
RedShift Serverless eliminates the need to manage infrastructure, allowing you to focus on analyzing your data. It automatically provisions and scales everything, making it a great option for unpredictable workloads.
What is a data lake and how can RedShift integrate with it?
A data lake is a central repository of structured and unstructured data, often stored in S3. RedShift can integrate with a data lake, allowing you to query that data using RedShift.
What type of workload is RedShift designed for?
RedShift is designed for business intelligence workloads, specifically for reporting and analytics.
What are some automated data management features provided by RedShift?
RedShift provides automated data backup, replication, and scaling without any downtime.
What is the main purpose of Kinesis?
The main purpose of Kinesis is to collect, process, and analyze streaming data in real time.
What does the name “Kinesis” mean and why is it a fitting name for this service?
“Kinesis” is a Greek word that means movement or motion. It is a fitting name for this service because Kinesis deals with data that is in motion, moving from one place to another.
How does Kinesis data streams store and retain data?
Kinesis data streams store data in shards, which are sequences of data records. The data is retained by default for 24 hours, with a maximum retention of 365 days.
What is the difference between streaming data and static data?
Streaming data refers to data that is generated continuously by multiple data sources or producers, while static data is data that is stored on disk, in S3, or in a database.
Give examples of the types of data that can be handled by Kinesis.
Examples of data that can be handled by Kinesis include financial transactions, stock prices, in-game data, social media feeds, location tracking data, IoT sensor data, clickstream data, and application log files.
What are shards in Kinesis data streams?
Shards in Kinesis data streams are storage units that hold data records. Each data record has a unique sequence number, and a Kinesis stream is made up of one or more shards.
What role do data consumers play in the Kinesis architecture?
Data consumers in the Kinesis architecture consume data from the shards and process it. They can perform various actions on the data, such as running algorithms, analyzing sentiment, or generating recommendations.
Give examples of actions that data consumers can perform on the data.
Data consumers can perform actions such as running algorithms on stock prices, sentiment analysis on social media feeds, or analyzing clickstream data to generate product recommendations.
What are some possible destinations for data after it has been processed by data consumers?
After being processed by data consumers, the data can be sent to permanent storage destinations such as DynamoDB, S3, Elastic MapReduce, or Redshift.
Explain the main purpose of Kinesis data streams and Kinesis video streams.
Kinesis data streams is designed for handling streaming data, while Kinesis video streams is specifically designed for streaming video data from connected video devices.
What is Kinesis Data Firehose?
Kinesis Data Firehose, also known as Kinesis Firehose, is a fully managed service that allows you to capture, transform, and load data streams into AWS data stores for near real-time analytics.
What are the primary functions of Kinesis Data Firehose?
How does Kinesis Data Firehose handle varying data volumes?
Kinesis Data Firehose dynamically adjusts its resources to handle varying data volumes, scaling automatically.
What is the typical processing time for data in Kinesis Data Firehose?
Kinesis Data Firehose processes and delivers data within 60 seconds for timely insights.
The primary functions of Kinesis Data Firehose are capturing, transforming, and loading data.
Is there any data retention in Kinesis Data Firehose?
No, Kinesis Data Firehose does not retain data.
Can you transform data with Kinesis Data Firehose before loading it into storage?
Yes, you can transform and customize the data using AWS Lambda before loading it into permanent storage.
What tools can be used for analytics after data is loaded by Kinesis Data Firehose?
Business intelligence tools can be used for analytics after data is loaded into its final destination by Kinesis Data Firehose.
What monitoring tools are integrated with Kinesis Data Firehose?
Kinesis Data Firehose includes integrated monitoring with CloudWatch.
What happens if there is an error in data processing within Kinesis Data Firehose?
Kinesis Data Firehose has automatic error retries if something goes wrong.
Does Kinesis Data Firehose retain data temporarily?
No, Kinesis Data Firehose does not retain data, even temporarily.
What does a data lake refer to in the context of Kinesis Data Firehose?
A data lake refers to a large-scale data repository for storing streaming data.
What AWS service can you use to transform data in Kinesis Data Firehose?
You can use AWS Lambda to transform data in Kinesis Data Firehose.
What are some use cases for Kinesis Data Firehose?
Use cases include real-time analytics, feeding data into data lakes, log data management, and IoT data integration.
What are some common destinations for data after processing in Kinesis Data Firehose?
Common destinations include Amazon S3, Amazon Redshift, and Amazon OpenSearch Service.
What is the difference between Kinesis Data Streams and Kinesis Data Firehose?
Kinesis Data Streams capture and store streaming video and data, whereas Kinesis Data Firehose captures, transforms, and loads data continuously into data stores.
What is Amazon Athena?
Amazon Athena is an interactive query service that enables you to run standard SQL queries on data stored in Amazon S3.
What type of queries can you run with Amazon Athena?
You can run standard SQL queries with Amazon Athena.
What is a key feature of Amazon Athena regarding infrastructure?
Amazon Athena is serverless, meaning there is nothing to provision and manage.