Aws Practice Test Flashcards
VPC flow logs reporting use case 1
-publish to s3 in Apache format and analyze w Athena
Dual layer server-side encryption KMS
-data at rest
-dual layer better than single for sensitive data
S3 object lock in compliance mode
-prevents deletion including from root user
-retention period says how long can’t be deleted for
Kinesis data stream KCL (client library)
-consumes and writes to destination such as s3
-buffer is where data is aggregated before flushed. Improve performance
Kinesis - Partition random generated key
-Helps distribute data evenly across shards
-when WriteThroughputExceeded
-“some shards heavily used, and others idle”
Glacier vault lock policy
-protect s3 data from being deleted.
HDFS and scaling
-instance fleet =only auto scaling
-uniform instance = custom scaling support
-orchestration open source support
MWAA Apache airflow
EMR security
-can set encryption for data in transit and rest
Sagemaker data wrangler
-visualize and prepare data
-can query data from Athena
-can connect to external dbs using jdbc
Athena and partitions. File from s3 manually deleted.
-in Athena if error, run drop partition command for stale partition
EBS volume size
-gp2= performance tied to size
-no custom iops
Gp3=independent iops
-cost effective for underused volumes
-maintain same performance
Aws glue catalog ResourceNumberLimitExceededException during version update
-increase quota “versions”
-delete older “versions”
Authenticating in redshift with 3rd party identity provider (idp)
-1st step register provider from within redshift
-redshift provides native IDP federation
-after register, can configure clusters to use idp for auth
DMS CDC troubleshooting
-CDCLatencySource= CloudWatch and determine if number is high
-CDCIncomingChanges = change events including inserts updates deletes
Redshift data sharing
-can’t share to AZ only REGIONS
Glue crawler and table creation. Prevent multiple table creation?
-same s3 partition structure (prefix)
-file type (csv, parquet,etc)
-compression type
-schema
Encrypting data catalog metadata and objects
-turn on encryption catalog settings entire catalog
-only supports symmetric customer managed keys
Graviton instance and EMR
-best practice to use mix of on-demand and spot
EMR storage ephemeral vs persistent what file system to use?
-HDFS for ephemeral which is lost when instance terminated
-s3 for persistent/ more permanent
Standard-infrequent access s3
-for near real time at lower cost
OpenSearch
-think full text search and analytics
-example of application images and ML extracting metadata values. Needing analyst search apps by name, date, or text.
SQS queue
-dead-letter queue allows unprocessed msg to move here instead of being lost
3 ways to increase performance when reading kinesis stream w/lambda
-test diff parallel factor settings
-lambda as consumer w increased fan out
-increase # of shards for stream
Aws kms keys SSE-KMS server side
-can run redshift copy command to access s3
Redshift SUPER column type
-stores nested json
-can use partiQL to query