Amazon Cloud Practioner Flashcards
AWS Management Console
Access and manage Amazon Web Services through the AWS Management Console, a simple and intuitive
user interface. You can also use the AWS Console Mobile Application to quickly view resources on the go.
AWS Command Line Interface
The AWS Command Line Interface (CLI) is a unified tool to manage your AWS services. With just one tool
to download and configure, you can control multiple AWS services from the command line and automate
them through scripts.
Amazon Athena
Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using
standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the
queries that you run.
Athena is easy to use. Simply point to your data in Amazon S3, define the schema, and start querying
using standard SQL. Most results are delivered within seconds. With Athena, there’s no need for complex
extract, transform, and load (ETL) jobs to prepare your data for analysis. This makes it easy for anyone
with SQL skills to quickly analyze large-scale datasets.
Athena is out-of-the-box integrated with AWS Glue Data Catalog, allowing you to create a unified
metadata repository across various services, crawl data sources to discover schemas and populate your
Catalog with new and modified table and partition definitions, and maintain schema versioning.
Amazon CloudSearch
Amazon CloudSearch is a managed service in the AWS Cloud that makes it simple and cost-effective to set up, manage, and scale a search solution for your website or application.
Amazon CloudSearch supports 34 languages and popular search features such as highlighting, autocomplete, and geospatial search. For more information, see
Amazon Elasticsearch Service
makes it easy to deploy, secure, operate, and scale Elasticsearch to search,
analyze, and visualize data in real-time. With Amazon Elasticsearch Service, you get easy-to-use APIs
and real-time analytics capabilities to power use-cases such as log analytics, full-text search, application
monitoring, and clickstream analytics, with enterprise-grade availability, scalability, and security. The
service offers integrations with open-source tools like Kibana and Logstash for data ingestion and
visualization. It also integrates seamlessly with other AWS services such as Amazon Virtual Private Cloud
(Amazon VPC), AWS Key Management Service (AWS KMS), Amazon Kinesis Data Firehose, AWS Lambda,
AWS Identity and Access Management (IAM), Amazon Cognito, and Amazon CloudWatch, so that you can
go from raw data to actionable insights quickly.
Amazon EMR
Amazon EMR is the industry-leading cloud big data platform for processing vast amounts of data
using open source tools such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi,
and Presto. Amazon EMR makes it easy to set up, operate, and scale your big data environments by
automating time-consuming tasks like provisioning capacity and tuning clusters. With EMR you can
run petabyte-scale analysis at less than half of the cost of traditional on-premises solutions andover 3x
faster than standard Apache Spark. You can run workloads on Amazon EC2 instances, on Amazon Elastic
Kubernetes Service (EKS) clusters, or on-premises using EMR on AWS Outposts
Amazon FinSpace
is a data management and analytics service purpose-built for the financial services
industry (FSI). FinSpace reduces the time you spend finding and preparing petabytes of financial data to
be ready for analysis from months to minutes.
Financial services organizations analyze data from internal data stores like portfolio, actuarial, and
risk management systems as well as petabytes of data from third-party data feeds, such as historical
securities prices from stock exchanges. It can take months to find the right data, get permissions to
access the data in a compliant way, and prepare it for analysis.
FinSpace removes the heavy lifting of building and maintaining a data management system for financial
analytics. With FinSpace, you collect data and catalog it by relevant business concepts such as asset class,
risk classification, or geographic region. FinSpace makes it easy to discover and share data across your
organization in accordance with your compliance requirements. You define your data access policies in
one place and FinSpace enforces them while keeping audit logs to allow for compliance and activity
reporting. FinSpace also includes a library of 100+ functions, like time bars and Bollinger bands, for you
to prepare data for analysis.
Amazon Kinesis
makes it easy to collect, process, and analyze real-time, streaming data so you can get
timely insights and react quickly to new information. Amazon Kinesis offers key capabilities to costeffectively process streaming data at any scale, along with the flexibility to choose the tools that best
suit the requirements of your application. With Amazon Kinesis, you can ingest real-time data such
as video, audio, application logs, website clickstreams, and IoT telemetry data for machine learning,
analytics, and other applications. Amazon Kinesis enables you to process and analyze data as it arrives
and respond instantly instead of having to wait until all your data is collected before the processing can
begin.
11
Amazon Kinesis Data Analytics
is the easiest way to analyze streaming data, gain actionable insights,
and respond to your business and customer needs in real time. Amazon Kinesis Data Analytics reduces
the complexity of building, managing, and integrating streaming applications with other AWS services.
SQL users can easily query streaming data or build entire streaming applications using templates and an
interactive SQL editor. Java developers can quickly build sophisticated streaming applications using open
source Java libraries and AWS integrations to transform and analyze data in real-time.
Amazon Kinesis Data Analytics takes care of everything required to run your queries continuously and
scales automatically to match the volume and throughput rate of your incoming data.
Amazon Kinesis Video Streams
Amazon Kinesis Video Streams makes it easy to securely stream video from connected devices to AWS
for analytics, machine learning (ML), playback, and other processing. Kinesis Video Streams automatically
provisions and elastically scales all the infrastructure needed to ingest streaming video data from
millions of devices. It also durably stores, encrypts, and indexes video data in your streams, and allows
you to access your data through easy-to-use APIs. Kinesis Video Streams enables you to playback
video for live and on-demand viewing, and quickly build applications that take advantage of computer
vision and video analytics through integration with Amazon Rekognition Video, and libraries for ML
frameworks such as Apache MxNet, TensorFlow, and OpenCV.
Amazon Redshift
is the most widely used cloud data warehouse. It makes it fast, simple and costeffective to analyze all your data using standard SQL and your existing Business Intelligence (BI) tools.
It allows you to run complex analytic queries against terabytes to petabytes of structured and semistructured data, using sophisticated query optimization, columnar storage on high-performance storage,
and massively parallel query execution. Most results come back in seconds. You can start small for just
$0.25 per hour with no commitments and scale out to petabytes of data for $1,000 per terabyte per
year, less than a tenth the cost of traditional on-premises solutions.
Amazon QuickSight
is a fast, cloud-powered business intelligence (BI) service that makes it easy for you
to deliver insights to everyone in your organization. QuickSight lets you create and publish interactive
dashboards that can be accessed from browsers or mobile devices. You can embed dashboards into your
applications, providing your customers with powerful self-service analytics. QuickSight easily scales to
tens of thousands of users without any software to install, servers to deploy, or infrastructure to manage
AWS Data Exchange
AWS Data Exchange makes it easy to find, subscribe to, and use third-party data in the cloud. Qualified
data providers include category-leading brands such as Reuters, who curate data from over 2.2 million
unique news stories per year in multiple languages; Change Healthcare, who process and anonymize
more than 14 billion healthcare transactions and $1 trillion in claims annually; Dun & Bradstreet, who
maintain a database of more than 330 million global business records; and Foursquare, whose location
data is derived from 220 million unique consumers and includes more than 60 million global commercial
venues.
Once subscribed to a data product, you can use the AWS Data Exchange API to load data directly into
Amazon S3 and then analyze it with a wide variety of AWS analytics and machine learning services.
For example, property insurers can subscribe to data to analyze historical weather patterns to calibrate
insurance coverage requirements in different geographies; restaurants can subscribe to population and
location data to identify optimal regions for expansion; academic researchers can conduct studies on
climate change by subscribing to data on carbon dioxide emissions; and healthcare professionals can
subscribe to aggregated data from historical clinical trials to accelerate their research activities.
For data providers, AWS Data Exchange makes it easy to reach the millions of AWS customers migrating
to the cloud by removing the need to build and maintain infrastructure for data storage, delivery, billing,
and entitling.
AWS Data Pipeline
is a web service that helps you reliably process and move data between different
AWS compute and storage services, as well as on-premises data sources, at specified intervals. With AWS
Data Pipeline, you can regularly access your data where it’s stored, transform and process it at scale, and
efficiently transfer the results to AWS services such as Amazon S3 (p. 74), Amazon RDS (p. 28),
Amazon DynamoDB (p. 26), and Amazon EMR (p. 11).
AWS Data Pipeline helps you easily create complex data processing workloads that are fault tolerant,
repeatable, and highly available. You don’t have to worry about ensuring resource availability, managing
inter-task dependencies, retrying transient failures or timeouts in individual tasks, or creating a failure
notification system. AWS Data Pipeline also allows you to move and process data that was previously
locked up in on-premises data silos.
AWS Glue
AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development. AWS Glue provides all the capabilities needed for data integration so that you can start analyzing your data and putting it to use in minutes instead of months.
Data integration is the process of preparing and combining data for analytics, machine learning, and application development. It involves multiple tasks, such as discovering and extracting data from various sources; enriching, cleaning, normalizing, and combining data; and loading and organizing data in databases, data warehouses, and data lakes. These tasks are often handled by different types of users that each use different products.
AWS Glue provides both visual and code-based interfaces to make data integration easier. Users can easily find and access data using the AWS Glue Data Catalog. Data engineers and ETL (extract, transform, and load) developers can visually create, run, and monitor ETL workflows with a few clicks in AWS Glue Studio. Data analysts and data scientists can use AWS Glue DataBrew to visually enrich, clean, and normalize data without writing code. With AWS Glue Elastic Views, application developers can use familiar Structured Query Language (SQL) to combine and replicate data across different data stores.
AWS Lake Formation
is a service that makes it easy to set up a secure data lake in days. A data lake is
a centralized, curated, and secured repository that stores all your data, both in its original form and
prepared for analysis. A data lake enables you to break down data silos and combine different types of
analytics to gain insights and guide better business decisions.
However, setting up and managing data lakes today involves a lot of manual, complicated, and timeconsuming tasks. This work includes loading data from diverse sources, monitoring those data flows,
setting up partitions, turning on encryption and managing keys, defining transformation jobs and
monitoring their operation, re-organizing data into a columnar format, configuring access control
settings, deduplicating redundant data, matching linked records, granting access to data sets, and
auditing access over time.
Creating a data lake with Lake Formation is as simple as defining where your data resides and what data
access and security policies you want to apply. Lake Formation then collects and catalogs data from
databases and object storage, moves the data into your new Amazon S3 data lake, cleans and classifies
data using machine learning algorithms, and secures access to your sensitive data. Your users can then
access a centralized catalog of data which describes available data sets and their appropriate usage. Your
users then leverage these data sets with their choice of analytics and machine learning services, like
Amazon EMR for Apache Spark, Amazon Redshift, Amazon Athena, SageMaker, and Amazon QuickSight.
Amazon Managed Streaming for Apache Kafka
Amazon MSK
Amazon Managed Streaming for Apache Kafka (Amazon MSK) is a fully managed service that makes
it easy for you to build and run applications that use Apache Kafka to process streaming data. Apache
Kafka is an open-source platform for building real-time streaming data pipelines and applications.
With Amazon MSK, you can use Apache Kafka APIs to populate data lakes, stream changes to and from
databases, and power machine learning and analytics applications.
Apache Kafka clusters are challenging to setup, scale, and manage in production. When you run Apache
Kafka on your own, you need to provision servers, configure Apache Kafka manually, replace servers
when they fail, orchestrate server patches and upgrades, architect the cluster for high availability, ensure
data is durably stored and secured, setup monitoring and alarms, and carefully plan scaling events to
support load changes. Amazon MSK makes it easy for you to build and run production applications on
Apache Kafka without needing Apache Kafka infrastructure management expertise. That means you
spend less time managing infrastructure and more time building applications.
With a few clicks in the Amazon MSK console you can create highly available Apache Kafka clusters
with settings and configuration based on Apache Kafka’s deployment best practices. Amazon MSK
automatically provisions and runs your Apache Kafka clusters. Amazon MSK continuously monitors
cluster health and automatically replaces unhealthy nodes with no downtime to your application. In
addition, Amazon MSK secures your Apache Kafka cluster by encrypting data at rest.
AWS Step Functions
s is a fully managed service that makes it easy to coordinate the components of
distributed applications and microservices using visual workflows. Building applications from individual
components that each perform a discrete function lets you scale easily and change applications quickly.
Step Functions is a reliable way to coordinate components and step through the functions of your
application. Step Functions provides a graphical console to arrange and visualize the components of
your application as a series of steps. This makes it simple to build and run multi-step applications.
Step Functions automatically triggers and tracks each step, and retries when there are errors, so your
application runs in order and as expected. Step Functions logs the state of each step, so when things do
go wrong, you can diagnose and debug problems quickly. You can change and add steps without even
writing code, so you can easily evolve your application and innovate faster.
Amazon AppFlow
is a fully managed integration service that enables you to securely transfer data
between Software-as-a-Service (SaaS) applications like Salesforce, Zendesk, Slack, and ServiceNow, and
AWS services like Amazon S3 and Amazon Redshift, in just a few clicks. With Amazon AppFlow, you can
run data flows at enterprise scale at the frequency you choose - on a schedule, in response to a business
event, or on demand. You can configure data transformation capabilities like filtering and validation to
generate rich, ready-to-use data as part of the flow itself, without additional steps. Amazon AppFlow
automatically encrypts data in motion, and allows users to restrict data from flowing over the public
Internet for SaaS applications that are integrated with AWS PrivateLink, reducing exposure to security
threats.
Amazon EventBridge
is a serverless event bus that makes it easier to build event-driven applications
at scale using events generated from your applications, integrated Software-as-a-Service (SaaS)
applications, and AWS services. EventBridge delivers a stream of real-time data from event sources such
as Zendesk or Shopify to targets like AWS Lambda and other SaaS applications. You can set up routing
rules to determine where to send your data to build application architectures that react in real-time to
your data sources with event publisher and consumer completely decoupled.
Amazon Managed Workflows for Apache Airflow
MWAA
Amazon Managed Workflows for Apache Airflow (MWAA) is a managed orchestration service for Apache Airflow1 that makes it easier to set up and operate end-to-end data pipelines in the cloud at scale. Apache Airflow is an open-source tool used to programmatically author, schedule, and monitor sequences of processes and tasks referred to as “workflows.” With Managed Workflows, you can use Airflow and Python to create workflows without having to manage the underlying infrastructure for scalability, availability, and security. Managed Workflows automatically scales its workflow execution capacity to meet your needs, and is integrated with AWS security services to help provide you with fast and secure access to data.
Amazon MQ
is a managed message broker service for Apache ActiveMQ and RabbitMQ that makes it
easy to set up and operate message brokers in the cloud. Message brokers allow different software
systems–often using different programming languages, and on different platforms–to communicate
and exchange information. Amazon MQ reduces your operational load by managing the provisioning,
setup, and maintenance of ActiveMQ and RabbitMQ, popular open-source message brokers. Connecting
your current applications to Amazon MQ is easy because it uses industry-standard APIs and protocols for
messaging, including JMS, NMS, AMQP, STOMP, MQTT, and WebSocket. Using standards means that in
most cases, there’s no need to rewrite any messaging code when you migrate to AWS.
Amazon Simple Notification Service
is a highly available, durable, secure, fully managed
pub/sub messaging service that enables you to decouple microservices, distributed systems, and
serverless applications. Amazon SNS provides topics for high-throughput, push-based, many-to-many
messaging. Using Amazon SNS topics, your publisher systems can fan out messages to a large number of
subscriber endpoints for parallel processing, including Amazon SQS queues, AWS Lambda functions, and
HTTP/S webhooks. Additionally, SNS can be used to fan out notifications to end users using mobile push,
SMS, and email.
Amazon Simple Queue Service
is a fully managed message queuing service that enables
you to decouple and scale microservices, distributed systems, and serverless applications. SQS eliminates
the complexity and overhead associated with managing and operating message oriented middleware,
and empowers developers to focus on differentiating work. Using SQS, you can send, store, and receive
messages between software components at any volume, without losing messages or requiring other
services to be available. Get started with SQS in minutes using the AWS console, Command Line
Interface or SDK of your choice, and three simple commands.
SQS offers two types of message queues. Standard queues offer maximum throughput, best-effort
ordering, and at-least-once delivery. SQS FIFO queues are designed to guarantee that messages are
processed exactly once, in the exact order that they are sent
Amazon Simple Workflow Service
helps developers build, run, and scale background
jobs that have parallel or sequential steps. You can think of Amazon SWF as a fully-managed state
tracker and task coordinator in the cloud. If your application’s steps take more than 500 milliseconds to
complete, you need to track the state of processing. If you need to recover or retry if a task fails, Amazon
SWF can help you.
Amazon Sumerian
lets you create and run virtual reality (VR), augmented reality (AR), and 3D
applications quickly and easily without requiring any specialized programming or 3D graphics expertise.
With Sumerian, you can build highly immersive and interactive scenes that run on popular hardware
such as Oculus Go, Oculus Rift, HTC Vive, HTC Vive Pro, Google Daydream, and Lenovo Mirage as well
as Android and iOS mobile devices. For example, you can build a virtual classroom that lets you train
new employees around the world, or you can build a virtual environment that enables people to tour
a building remotely. Sumerian makes it easy to create all the building blocks needed to build highly
immersive and interactive 3D experiences including adding objects (e.g. characters, furniture, and
landscape), and designing, animating, and scripting environments. Sumerian does not require specialized
expertise and you can design scenes directly from your browse
Amazon Managed Blockchain
s a fully managed service that makes it easy to create and manage
scalable blockchain networks using the popular open source frameworks Hyperledger Fabric and
Ethereum.
Blockchain makes it possible to build applications where multiple parties can execute transactions
without the need for a trusted, central authority. Today, building a scalable blockchain network with
existing technologies is complex to set up and hard to manage. To create a blockchain network, each
network member needs to manually provision hardware, install software, create and manage certificates
for access control, and configure networking components. Once the blockchain network is running, you
need to continuously monitor the infrastructure and adapt to changes, such as an increase in transaction
requests, or new members joining or leaving the network.
Amazon Managed Blockchain is a fully managed service that allows you to set up and manage a scalable
blockchain network with just a few clicks. Amazon Managed Blockchain eliminates the overhead required
to create the network, and automatically scales to meet the demands of thousands of applications
running millions of transactions. Once your network is up and running, Managed Blockchain makes
it easy to manage and maintain your blockchain network. It manages your certificates, lets you easily
invite new members to join the network, and tracks operational metrics such as usage of compute,
memory, and storage resources. In addition, Managed Blockchain can replicate an immutable copy of
your blockchain network activity into Amazon Quantum Ledger Database (QLDB), a fully managed
ledger database. This allows you to easily analyze the network activity outside the network and gain
insights into trends