Domain 1 Solutions Flashcards
Helps you set up a secure data lake and govern, secure, and globally share data for ML and analytics. Manages fine-grained access control on S3 and metadata in Glue Data Catalog with its own permissions model that augments IAM
Lake Formation
Preferred storage option
S3
Used to build, train, and deploy ML models
SageMaker
A file system service that speeds up training jobs by serving your S3 data to SageMaker at high speeds
FSx for Lustre
A training data source that directly launches training jobs from service w/out need for data movement for faster training start times
EFS
Block-level storage device that you can attach to your instances and use as you would use a physical hard drive
EBS
An ETL service to categorize, clean, enrich, and move data b/w various data stores that’s used for batch ingestions, automates data discovery
Glue
This batch ingestion service reads from historical data from source systems, such as relational database management systems, data warehouses, and NoSQL databases, at any desired interval
DMS
Batch ingestion service that automates various ETL tasks that involve complex workflows
Step Functions
Uses Kinesis Producer Library to write to Kinesis data stream
Kinesis Data Streams
Batch/compress data to generate incremental views and execute custom transformation logic using Lambda before delivering incremental view to S3
Kinesis Firehose
Easiest way to process/transform data streaming thru Kinesis Data Streams or Firehose using SQL and provides insights in near real-time from incremental streams before storing in S3
Kinesis Data Analytics
Used to ingest/analyze video/audio data
Kinesis Video Streams
A distributed data store optimized for ingesting and processing streaming data in real-time. Used to publish and subscribe to streams of records, effectively store streams of records in the order in which records were generated, and process streams of records in real time
Apache Kafka
Supports many instance types that have proportionally high CPU with increased network performance, which is well suited for HPC (high-performance computing) applications
EMR
Customers can store a single source of data in Amazon S3 and perform ad hoc analysis
Athena
Uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and machine learning to deliver the best price performance at any scale
Redshift
Provides a protocol of data processing and node task distribution and management and uses algorithms to split datasets into subsets and distribute them across nodes in a compute cluster
Spark
A managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data
EMR
Use to build a visual dashboard for metrics
QuickSight
An open-source Java software framework that supports massive data processing across a cluster of instances. Uses various processing models, such as MapReduce, to distribute processing across multiple instances and also uses a distributed file system called HDFS to store data across multiple instances
Hadoop
A serverless, NoSQL, fully managed database with single-digit millisecond performance at any scale that addresses need to overcome scaling and operational complexities of relational databases
DynamoDB
A service that allows you to visually prepare and clean your data, normalize your data, and run a number of different feature transforms on the dataset without writing code
Glue Data Brew
An agnostic, free, open-source command line tool that works on top of Git repositories
Data Version Control (DVC)
Allows users to leverage Hadoop MapReduce using a SQL interface, enabling analytics at a massive scale, in addition to distributed and fault-tolerant data warehousing
Hive
_____ is an AI service that makes it easy for users to implement image or video analysis workflows into their applications. It aims to leverage Amazon’s vast experience in using deep learning for various image-based workloads such as image classification, object detection, detection of text in image, facial recognition, sentiment, and most recently, public safety.
Amazon Rekognition
_____ is an AI service that allows you to quickly extract intelligence from documents such as financial reports, medical records, tax forms, and university application forms beyond simple optical character recognition (OCR). With this, you don’t have to build deep learning computer vision models to extract text, forms, or tables from PDF documents; this will do that for you, so you can focus on using the extracted information for downstream business tasks.
Amazon Textract
______ converts speech to text, and leverages the same technologies powering Amazon Alexa but is available as a transcription service that allows you to transcribe your voice data without any prior machine learning knowledge.
Amazon Transcribe
Translates text from various languages.
Amazon Translate
Converts text to speech (TTS)
Amazon Polly
_____ is an AWS service, powered by natural language understanding (NLU) and automatic speech recognition (ASR), that allows users to build and deploy conversational interfaces for their applications. With this, you can build a tailored and personalized experience for your customers to engage with your platform without any deep learning expertise.
Amazon Lex
_____ allows you to build your own search application using natural language that provides highly relevant responses to user queries as you would get from a human expert within your organization.
Amazon Kendra
_____ is a machine learning service that allows businesses to rapidly develop personalized recommendation systems to provide a better customer experience to their end customers.
Amazon Personalize
_____ is an AI service that uses both statistical and deep learning–based algorithms to provide highly accurate forecasts. Similar to personalization, as a major retailer and cloud services provider.
Amazon Forecast
_____ provides a set of natural language processing–based APIs to pretrained and custom models that can extract insights from text. Amazon Comprehend can analyze a document for entities, key phrases, PII, language, sentiment, and syntax.
Amazon Comprehend
_____uses program analysis and machine learning built from millions of lines of Java and Python code from the Amazon codebase to provide intelligent recommendations for improving code performance and quality. It consists of two main services: Reviewer and Profiler.
Amazon CodeGuru
_____ is used to get a secondary human review of a low-confidence prediction from machine learning models. It works out of the box with Amazon Rekognition and Textract, but you can also use it with your own custom ML models. It is usually used when you want to review low-confidence predictions or to audit a random sample of predictions regardless of confidence levels.
Amazon Augmented AI (or A2I)
_____ lets you create robotics applications at scale using the Robot Operating System (ROS) framework and extends this to other cloud services like SageMaker for machine learning. It provides you with a robot development environment on the cloud, with simulation capabilities to test these robots on the cloud.
AWS RoboMaker