Services and Features that make me say "huh?" Flashcards
catch up on tech I haven't encountered
AWS Data Exchange
data set catalog.
Customers subscribe to data product, then use API to load to s3 and analyze.
Data providers easily reach customers w/o needing to build/maintain infra for data storage, delivery, billing, and entitling.
AWS Data Pipeline
web service to process and move data between AWS compute/storage services and on-prem sources at specified intervals.
Access, transform, process at scale, transfer from S3, RDS, DynamoDB, EMR.
Create complex data processing workloads that are fault tolerant, repeatable, and highly available.
Amazon EMR
Big Data platform for processing vast amounts of data using open source tools such as Apache (Spark, Hive, HBase, Flink, Hudi), and Presto.
Automates capacity provisioning and cluster tuning.
< half cost of on-prem, 3x faster than traditional Spark.
Run workloads on EC2, EKS, or on-prem with “Amazon EMR on AWS Outposts”
AWS Lake Formation
Data Lake = centralized, curated, secured repo that stores all your data in original form and prepped for analysis.
Simply define where data resides, define data access and security policies.
LF:
- collects and catalogs from DBs and object storage
- moves it to S3 data lake
- cleans and classifies w/ ML algorithms
- secures access to sensitive data
Users leverage data sets with choice of analytics/ML, such as
- Amazon EMR for Apache Spark
- Amazon Redshift
- Amazon Athena
- SageMaker AI
- Amazon Quicksight
Amazon Managed Streaming for Apache Kafka (Amazon MSK)
Apache Kafka: open-source platform for building real-time streaming data pipelines and apps. Kafka clusters are hard to setup, scale, and manage.
MSK: creates HIghly Available cluster for you, replaces unhealthy nodes, encrypts data at rest.
Users use Kafka APIs to populate data lakes, stream changes to/from DBs, power ML/analytics apps
Amazon OpenSearch Service
easy to deploy, secure, operate, and scale OpenSearch to search, analyze, and visualize data in real-time.
APIs for log analytics, full-text search, app monitoring, clickstream analytics
Integrates with OpenSearch Dashboard and Logstash for data ingestion / visualization.
Integrates with VPC, KMS , Data Firehose, Lambda, IAM, Cognito, CloudWatch
Also has a Serverless option, allowing you to run petabyte-scale workloads w/o managing/scaling your own clusters.
Also a “vector engine for Amazon OpenSearch Serverless” adds simple, scalable, high-performing vector storage and search for ML-augmented search experiences and gen AI apps. Use cases: image search, doc search, music retrieval, product recs, video search, location-based search, fraud detection, anomaly detection
Amazon QuickSight
fast, cloud-powered BI service.
Create and public interactive dashboard for browsers and mobile devices.
Embed dashboards into applications, for customer self-service analytics.
scales to 10k+ user w/o any software to install, servers to deploy, or infra to manage.
Amazon Redshift
cloud data warehouse (w/ CLEAN data, as opposed to a data lake).
fast, simple cost-effective analysis w/ SQL or BI tools, on TB to PB of structured and semi-structured data.
- sophisticated query optimization
- columnar storage on high-performance storage
- massively parallel query completion
scale from $0.25/hour to $1,000/terrabyte/year (less than 1/10th traditional on-prem solutions)
Amazon AppFlow
managed integration service, transfers data between SaaS apps eg Salesforce, Zendesk, Slack, ServiceNow and AWS services eg S3, Redshift.
data flows at enterprise scale in frequency of: on a schedule, in response to biz event, or on demand.
does data transformation eg filtering, validation, to ready to use data
encrypts data in motion, and can restrict from public internet w/ PrivateLink
AWS AppSync
serverless backend for mobile, web, and enterprise apps. Makes it easy to build data driven apps by securely handling all app data mgmt tasks eg online/offline data access, data sync, data manipulation across multiple data sources.
Uses GraphQL api query language
AWS Batch
for devs, scientist, engineers to run 100k+ compute jobs. AWS Batch provisions the optimal qty/type of instances (eg CPU or mem-optimized), based on volume and resource req’s of batch jobs submitted.
plans, schedules, runs batch computing workloads across “full range of AWS compute services and features” eg EC2 and Spot.
AWS Outposts
AWS servers in quarter, half-rack, or full-rack units.
two variants:
* VMware Cloud: use same VMware control plane / APIs you use on your infra
* AWS-native: use same APIs / control plane you already use in AWS Cloud
order from AWS Management Console
AWS Serverless Application Repository
free to use code samples, components, and whole apps. Only pay for AWS resources used by the apps you deploy..
each app is packages with an AWS Serverless Application Model (AWS SAM) template, defining resources used.
public apps include a link to source code
can publish your own for use with team, org, or public
VMware Cloud on AWS
jointly developed offering, for orgs to migrate and extend on-prem VMware vSphere-based envs to AWS Cloud, running on EC2 bare metal infra.
not available globally, but adds new regions each release.
seamlessly integrated AWS Cloud with VMware tech eg vSphere, vSAN, NSX, vCenter Server
AWS Wavelength
AWS infra embedded in cell towers so mobile edge computing apps get better latency/bandwidth benefits.
Amazon ECS Anywhere
ECS on customer managed infra
Amazon EKS Anywhere
EKS on customer managed infra
Amazon EKS Distro
open source k8s distro, so you can run it on your own infra w/o being tied to AWS’ update schedule.
Amazon DocumentDB (with MongoDB compatibility)
fast, scalable, highly available fully managed Binary JSON (BSON) doc db service that supports MongoDB workloads.
Amazon Keyspaces (for Apache Cassandra)
Apache Cassandra is an (old) open-source, NoSQL db designed to store data for apps that require fast read/write. Can store user profile information for online games, device metadata for IoT apps, or records for events.
AK for AC is a scalable, highly available, managed Apache Cassandra compatible db service.
serverless, automatically scale tables up and down in response to app traffic, serve 1k+ requests / second
originally dev’d by facebook in 2008
Amazon Neptune
Amazon Quantum Ledger Database (Amazon QLDB)
Amazon QLDB
Amazon Quantam Ledger Database
AWS Amplify
AWS Device Farm
Amazon Pinpoint
Amazon Comprehend
Amazon Forecast
Amazon Fraud Detector
Amazon Kendra
Amazon Lex
Amazon SageMaker
Amazon Textract
AWS Computer Optimizer
AWS Control Tower
AWS License Manager
AWS Proton
Amazon Elastic Transcoder
Amazon Kinesis Video Streams
AWS Application Discovery Service
AWS Application Migration Service
AWS DataSync
AWS Migration Hub
AWS Artifact
AWS Audit Manager
AWS CloudHSM
Amazon Detective
AWS Firewall Manager
AWS Network Firewall
Amazon Macie
AWS Resource Access Manager (AWS RAM)
AWS Secrets Manager
Amazon FSx (for all types)
AWS Storage Gateway