AWS Overview Flashcards
What is AWS Glue?
AWS Glue is a fully managed extract transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics.
You can create and run an ETL job with a few clicks in the AWS Management Console.
You simply point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores the associated metadata in the AWS Glue Data Cataloge.
ONce cataloged, your data is immediately searchable, queryable and available for ETL
What is AWS Lake Formation?
AWS Lake Formation is a service that ,akes it easy to set up a secure data lake in days.
A data lake is a centralized, curated, and secured repository that stores all your data, both in its original form and prepared for analysis.
A data lake enables you to break down data silos and combine different types of analytics to gain insights and guide better business decisions
However, setting up and managing data lakes today involves a lot of manual, complicated and time consuming tasks.
This work includes loading data from diverse sources, monitoring those data flows, setting up partitions, turning on encryption and managing keys, defining transformation jobs and monitoring their operation, re-organizing data in columnar format, configuring access control settings, deduplicating redundant datal, matching linked records, granting access to data sets and auditing access over time.
Creating a data lake with Lake Formation is as simple as defining where your data resides and what data access and security policies you want to apply.
Lake Formation then collects and catalogs data from databases and object storage, moves the data into your new S3 data lake, cleans and classifies data using machine learning algorithms, and secure access to your sensitive data.
Your users can then access a centralized catalog of data which describes available data sets and their appropriate usage.
Your users then leverage these data sets with their choice of analytics and machine learning services, like Amazon EMR for Apache Spark, Amazon Redshift, Amazon Athena, SageMaker and Amazon QuickSight
What is Amazon Managed Streaming for Apache Kafka (Amazon MSK)?
Amazon Managed Streaming for Apache Kafka (Amazon MSK) is a fully managed service that makes it easier for you to build and run applications that use Apache Kafka to process streaming data.
Apache Kafka is an open-source platform for building real-time streaming data pipelines and applications.
With Amazon MSK, you can use Apache Kafka APIs to populate data lakes, stream changes to and from databases, and power machine learning and analytics applications
What are Amazon Step Functions?
AWS Step Functions let you coordinate multiple AWS services into