Tutorial Dojo Flashcards

1
Q

AWS Data exchange

A

-3rd party datasets in s3
-accessed via API GetDataSet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Redshift concurrent scaling
and workload management

A

-handles concurrent users/unpredictability like BI workloads
-WLM can set query priority
-WLM up to 8 queues & eachqueue max concurrency of 50

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Athena workgroups

A

-organize and manage queries
-can use Apache spark for analytics
-security and access control

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

AWS datasync

A

-on-prem to aws file storage like s3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

S3 event notification

A

-event type=ObjectCreated for example
-can trigger lambda based off for example suffix .csv

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

AWS Glue for Ray

A

-job type for scale AI and Python and native library
-ray dataset based on Apache arrow

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Aws managed services for Apache flink

A

-for real time, time series analysis
-sliding window for intervals or overlapping

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Object lambda

A

-add code to get request enables real-time transformation as data retrieved
-on the fly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Aws Graviton instance

A

-custom aws for best price performance for workloads

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Lambda provisioned concurrency

A

-setting scale without latency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Redshift data sharing

A

-share read access across clusters, workgroups, accounts, regions
-live data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

-what to check state machine fails to start at a step?

A
  • state machines iam role
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Glue’s sensitive data detection feature

A

-auto recognize PII AND redact

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

S3 VPC gateway endpoint

A

-specify as target route in route table for traffic destined to s3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Athena federated query

A

-connectors using lambda
-nosql, sql, timestream, etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Kenisis reporting with redshift real time

A

-create external schema for data stream
-materialized view referencing schema w auto refresh

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Stl_alert_event_log

A

-redshift view help identify performance issues and solution

18
Q

Glue resource policy

A

-think finance and hr running own etl and access own dbs

19
Q

S3 access point

A

-for multiple application access
-for cross-account access
-works with bucket policy

20
Q

MSCK Repair table

A

-Athena query when new data added to existing partition
-makes new partitions visible but does not necessarily speed up performance

21
Q

EFS and lambda

A

-mounts to efs seamlessly

22
Q

Improve kinesis Performance when processing

A

-add shards
-config parallel satin
-reg lambda func as consumer w enhanced fan-out
-exponential backoff and retry?

23
Q

Glue catalog partition predicates (frame)
&
Push down predicate

A

-server side filtering during frame creation (before data even loaded)
-faster than client side where data loaded in memory

-push down is similar but no mention of partition

24
Q

Transient EMR clusters

A

-think batch jobs
-cluster created then terminated after

26
Q

SQS settings

A

DelaySeconds -how long before visible in queue

VisibilityTimeout -prevents multiple receive/processed

MaxRecieveCount -amt of times a msg can be received before deleted

27
Q

Athena notebooks

A

-interactive python coding environment
-execute spark code visually

28
Q

CloudWatch container insights

A

-for microservices and container apps

29
Q

OpenSearch storage

A

-hot = fastest access expensive
-ultra warm = less accessed cheaper
-cold = infrequent access can attach to ultra warm

30
Q

Stored proc and aurora

A

-can run proc in aurora to trigger lambda when loan is approved for example.

31
Q

MSK kafka ACLs

A

-microservices
-which apps read/write diff topics

32
Q

Cloud trail data events vs management events

A

-data events = executions/s3 put example..
-management events = deleting resources

33
Q

Glue DataBrew masking techniques

A

-substitution = aron changed to donny
-probablistic = different ciphertext each time
-nulling deleting

35
Q

Athena Partition projection

A

-helps query performance focusing on subsets
-good to run when already partitioned and data is growing.

36
Q

Redshift distribution style

A

-EVEN=rows even across node. Good when no joins/no clear dist key

-KEY=rows w same key stored together. Good for query frequently filter or joined on spec column

-ALL = full copy to each node. Best for small static table

-AUTO = may change over time or not clear

37
Q

Sagemaker canvas

A

-no-code visual canvas.
-simplifies whole process from cleaning to prediction

38
Q

Redshift vacuum commands

A

VACUUM FULL -same as vacuum

VACUUM DELETE ONLY -doesn’t speed up performance just reclaims disc space

VACUUM REINDEX -analyze interleaved sort key and performs vacuum.

VACUUM SORT ONLY -sorts w out reclaiming disk. Used when rows unsourced but space not an issue

39
Q

Sagemaker workflows/lineage tracking

A

-save steps in workflow
-visually think step function editor

40
Q

CloudWatch contributor insights & dynamodb

A

-view of dynamodb traffic trends

41
Q

Dynamodb cardinality key

A

-when throttling issues use high cardinality key, so more evenly distributed.
-for hot partition issues

42
Q

RDS performance insights

A

-gathers performance metrics