Analytics Flashcards

1
Q

What does Glue create when it scans your unstructured data in S3?

A

It creates metadata which can be used to query the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Hive?

A

It allows you to run SQL like queries from EMR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Can you import a Hive metastore into AWS Glue?

A

Yes. You can also import AWS Glue metadata into Hive.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How can you increase the performance of a Spark Job?

A

Provision additional DPUs (Data processing units).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can you determine how many DPUs you will need for your job?

A

Enable job metrics to understand the maximum capacity in DPUs that you will need.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Where are Glue errors reported?

A

Cloudwatch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How can you schedule Glue jobs?

A

Glue Scheduler.. This is the most straight forward approach.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a DynamicFrame in AWS Glue?

A

a collection of dynamicRecords

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a DynamicRecord in AWS Glue?

A

They are records that are self-describing and have a schema.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Using native AWS Glue functionality, how can you drop fields or null fields?

A

DropFields or DropNullFields transformation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Using AWS Glue, how can you select a subset of records during your ETL process?

A

Using filter transformation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How can you enrich your data from another source in AWS Glue?

A

Use the join transformation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does the Map transformation in AWS Glue do?

A

It allows you to add fields, delete fields, and perform external lookups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does the FindMatches ML transformation do in AWS Glue?

A

It identifies duplicate or matching records in your dataset. Even when the records do not have a common identifier.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What format conversions can AWS Glue support?

A

CSV, JSON, Avro, Parquet, ORC, XML

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does AWS Glue ResolveChoice do?

A

It deals with ambiguities in your DynamicFrame and returns a new one. example is two fields with the same name.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How do you update your Glue Data Catalog?

A

You can re-run the crawler or have a script use enableUpdateCatalog / updateBehavior

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are AWS Glue Development endpoints?

A

They allow you to use a notebook to develop your ETL script. They are launched in a VPC and can be used with SageMaker notebook or Zeppelin.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What do AWS Job bookmarks do?

A

It keeps track of where you left off so you are not reprocessing old data. Works with S3 sources and relational databases. It only works with new rows in a database, not updated ones. The primary key also needs to be sequential.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Can you start a step function from an AWS Glue event?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How are you billed using AWS Glue?

A

You are billed by the second.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How are AWS Glue development endpoints billed?

A

By the minute

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

If you want to use engines like Hive or Pig, what AWS service is the best fit?

A

EMR. Glue is based on Spark

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Can AWS Glue process streaming data?

A

Yes. It can do this from Kinesis or Kafka.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Can AWS Glue clean and transform streaming data in-flight?

A

yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is AWS Glue Studio?

A

It is a visual interface for ETL workflows

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Where can you view the status of AWS Glue Jobs running?

A

In the Glue Studio Monitoring console

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is AWS Glue Data Quality?

A

It evaluates the data based on rules that you set. It uses DQDL for custom development.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is AWS Glue Data Brew?

A

A visual preparation tool for transforming data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What are Glue Data Brew sources?

A

S3, data warehouse, database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Where does Glue Data Brew output data?

A

S3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is a recipe in Data Brew?

A

It is a saved set of transformations that can be applied to any dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Can you define data quality rules in Data Brew?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

How does Redshift and Snowflake integrate with Data Brew?

A

You can use custom SQL to create datasets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Does Data brew integrate with KMS?

A

Yes, but only with customer master keys KMS SSE-C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Can you schedule a job in Data Brew?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

How can you remove PII in Data Brew

A

Substitution

Shuffling

Deterministic_Encrypt

Probablistic Encryption

NULL or DELETE

MASK OUT

HASH

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What are AWS Event Bridge Batch conditions?

A

It only fires an event when a specific number of events or seconds within a time period are exceeded.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What is AWS Lake Formation?

A

It makes it easy to set up a secure data lake in days.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What can you do in Lake Formation?

A

Anything that you can do in Glue. It is built on Glue.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What AWS services can query lake formation?

A

Athena, Redshift, and EMR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Can you have multiple accounts accessing Lake Formation?

A

Yes. The recipient must be a data lake administrator. You can leverage AWS RAM for this as well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Does Lake Formation support manifests?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

What are AWS Lake Formation Governed Tables?

A

They support ACID transactions across multiple tables. This cannot be changed once enabled. Also works with Kinesis streaming data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

How does Lake Formation optimize storage performance?

A

Automatic compaction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

How can you control access to Lake Formation data?

A

Granular row and column level access

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Other than IAM, what else can Lake Formation tie into for permissions?

A

SAML or external AWS accounts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

What are Lake formation policy tags attached to?

A

Databases, tables, or columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

What are AWS Lake Formation Filters?

A

They provide column, row, or cell level security. This done when granting select permissions on tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

What is AWS Athena?

A

A query service for your data in S3. Data all stays in S3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

Is Athena serverless?

A

yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

What data formats are splitable for parallel processing in Athena?

A

ORC, Parquet, and Avro

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

What are Athena Workgroups?

A

They organize users, teams, and applications into groups. You can control access and track costs by workgroup. They integrate with IAM, CoudWatch, and SNS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

Can you set query limits in Athena by using workgroups?

A

Yes. You can limit how much data is returned.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

Are Athena canceled queries billable?

A

Yes. Only failed queries are not billable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

Are CREATE / ALTER / DROP queries billable in Athena?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

What can you do to save money querying data with Athena?

A

Use a columnar format such as ORC or Parquet. You will scan less data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

Do large files perform better in Athena?

A

Yes. A small number of large files performs better than a large number of small files.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

What should you run when you partition after the fact in Athena?

A

Run MSCK REPAIR TABLE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

If you want to ensure your table is ACID compliant in Athena, what table type should you use?

A

ICEBERG

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

What are Athena time travel operations?

A

You can recover data recently deleted with a SELECT statement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

What should you do if your ACID transactions in Athena are getting slower over time

A

Optimize table command using bin_pack where catalog = N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

How granular does Athena get with permissions?

A

Database and table level.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

What can you use to query Spark directly?

A

Spark SQL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

Does Spark have machine learning capabilities?

A

Yes, using MLLib

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

Can you process streaming data with Spark?

A

Yes. It integrates with Kinesis and Kafka

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q

Can you change data formats in Athena?

A

Yes, using CTAS and the format attribute.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

What is Spark Structured Streaming?

A

It just keeps appending to a table and you query by using windows of time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
69
Q

Can Spark support Redshift?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
70
Q

Can you run a Jupyter notebook with Spark within the Athena console?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
71
Q

What is AWS EMR?

A

A managed hadoop framework that runs on EC2. Uncludes Spark, HBase, Presto, Flink, Hive, and more

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
72
Q

What are EMR notebooks

A

Browser based development in a notebook.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
73
Q

Whare are the EMR node types?

A

Master, Core, and Task node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
74
Q

Where does data persist in EMR?

A

On the core nodes in HDFS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
75
Q

Do EMR task nodes store data?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
76
Q

What is a good strategy to reduce EMR costs?

A

Use spot instances for task nodes since they do not persist data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
77
Q

What is a transient EMR cluster?

A

One that terminates once all the steps are complete. Good for cost savings.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
78
Q

What is a long running cluster?

A

One that must be manually terminated. A good use of reserved instances for cost savings.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
79
Q

When do you configure frameworks and applications in EMR?

A

When the cluster is launched.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
80
Q

How can you run EMR jobs directly?

A

By connecting to the master node or submitting jobs via ordered steps in the console.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
81
Q

When using EMR, where can you store data for it to persist?

A

S3

82
Q

How can you schedule the start of your EMR cluster?

A

The AWS Data Pipeline

83
Q

Is HDFS persistent?

A

No.

84
Q

What is the default block size in HDFS

A

128MB

85
Q

What is EMRFS

A

Allow you to access S3 as if it was HDFS

86
Q

What is EMRFS Consistent View?

A

Uses for consistency and uses DynamoDB to track consistency. S3 is now strongly consistent in 2021.

87
Q

Can you use EBS for HDFS?

A

Yes. It will be ephemeral though. Can only be attached when launching the cluster.

88
Q

How is EMR billed?

A

By the hour

89
Q

How do you increase processing capacity in EMR?

A

You can add task nodes on the fly as long as you do not also need to increase storage capacity.

90
Q

How do you increase processing and storage capacity in EMR?

A

Resize the cluster core nodes.

91
Q

What is EMR Managed Scaling?

A

Adds core nodes and then task nodes up to the max units specified. It also scales down to your configured value.

92
Q

When scaling down, which EMR nodes get removed first.

A

Spot nodes (task and then core)

93
Q

Can you specify the resources needed for your job in EMR Serverless?

A

Yes. Without configuring this, EMR will calculate the value on its own.

94
Q

Is EMR multi-region?

A

No

95
Q

How does EMR Serverless Application Lifecycles move from step to step.

A

API calls. This is not automatic!!

96
Q

Can EMR run on EKS?

A

Yes. It can run alongside other applications.

97
Q

What is a record made up of in Kinesis Data Streams when it is sent from the producer?

A

A partition key and a datablob (up to 1MB)

98
Q

How fast is a shard in Kinesis Data Streams when being sent from the producer to the stream?

A

1MB per second or 1000msg per second per shard.

99
Q

What is a record made up of in Kinesis Data Streams when it is sent to the consumer?

A

A partition key, sequence numvber, and a datablob (up to 1MB)

100
Q

How fast is a shard in Kinesis Data Streams when being sent from the stream to the consumer?

A

Shared Mode
2MB per second per shard across all shards.

Enhanced Mode
2MB per second per shard per consumer

101
Q

What is the maximum retention for a Kinesis Data Stream?

A

Between 1 and 365 days

102
Q

Can you replay data in Kinesis?

A

Yes.

103
Q

What is the provisioned capacity mode in Kinesis Data Streams?

A

You choose the number of shards and scale manually.

104
Q

What is the on-demand capacity mode in Kinesis Data Streams?

A

Automatically scales based on observed throughput. This is 4MB/s or 4K per second

105
Q

How can you increase throughput using the Kinesis Producer SDK?

A

Using Putrecords for batching.

106
Q

What is the best use case for the Kinesis Producer SDK?

A

Low throughput, higher latency.

107
Q

What managed AWS sources send to Kinesis Data Streams?

A

CloudWatch, AWS IoT, and Kinesis Data Analytics

108
Q

What APIs are included in the Kinesis Producer Library?

A

Synchronous and asynchronous

109
Q

Does Kinesis Producer Library support record compression?

A

No

110
Q

How do you add delay in Kinesis Producer Library batching?

A

RecordMaxBufferedTime

111
Q

Can Apache Spark consume Kinesis Data Streams?

A

Yes

112
Q

What is the maximum amount of data returned by the Kinesis SDK GetRecords function?

A

10MB or up to 10000 records

113
Q

What is the Maximum GetRecords API calls per shard per second?

A

5

114
Q

What is checkpointing in the Kinesis Client Library?

A

It marks your progress.

115
Q

When you are checkpointing using the KCL and you recieve the ExpiredIteratorException, what does this mean?

A

You need to increase the WCU of DynamoDB

116
Q

What is the Kinesis Connector Library?

A

It sends data to S3, DynamoDb, RedShift, Opensearch, etc.. It lives on an EC2 instance. Kind of deprecated.

117
Q

Why is Kinesis Enhanced Fanout fast?

A

It uses HTTP/2 to push to consumers.

118
Q

What is the latency when Kinesis Enhanced Fanout is enabled?

A

Less than 70ms

119
Q

When should you use Kinesis Standard Consumers?

A

When there is a low number of consumers

You can tolerate 200ms latency

Cost effective

120
Q

When should you use Kinesis Enhanced Fan Out Consumers?

A

when you have multiple consumer applications for the same stream

Low Latency

Higher Cost

121
Q

What is the default limit of consumers per data stream when using enhanced fan-out in Kinesis?

A

20, but you can ask for a service request to increase it.

122
Q

What happens when you split a hot shard in Kinesis?

A

Two new shards are created

The old shard will go away when the data expires

123
Q

What happens when you merge a shard in Kinesis?

A

One shard is created

The old shards will go away when the data expires

124
Q

What can cause out of order shards in Kinesis?

A

Resharding can cause this. make sure you read entirely from the parent before reading from the new records. This is built into the KCL

125
Q

Can Kinesis Resharding be done in Parallel?

A

No

126
Q

How many resharding operations can be performed at once?

A

One.. This is a problem when you have thousands of shards.

127
Q

What can cause duplicates from your Kinesis Producer?

A

Network Timeouts. Use unique IDs to deduplicate records on the consumer side.

128
Q

What use cases can cause a consumer duplicate in Kinesis?

A

A worker terminates unexpectedly

A worker instance is added or removed

Shards are merged or split

The application is deployed

129
Q

What can you do to fix duplicate consumer records in Kinesis?

A

make your application idempotent

Handle duplicates at the final destination

130
Q

When using a Kinesis Data Stream, how can you transform the data before storing it in S3?

A

With a Lambda in Kinesis Data Firehose

131
Q

Can Kinesis Firehose write to Redshift?

A

Yes. It loads to S3 first and hen issues a COPY command.

132
Q

Can Kinesis Data Firehose write to openSearch?

A

Yes

133
Q

Can Firehose deliver to custom locations?

A

Yes as long as there is an HTTP endpoint

134
Q

Can you store data sent into Kinesis Firehose?

A

Yes. All or failed data can be stored in S3 before the data is sent to S3 in a batch write.

135
Q

What is the minimum latency for Firehose?

A

60 seconds

136
Q

Can Firehose perform data conversions?

A

Limted, but yes. JSON to ORC but only for S3.. Others are done using Lambda.

137
Q

Can Firehose compress your data before sending it to S3?

A

Yes using Gzip, zip, or snappy

138
Q

Can Spark or Kinesis Client Library read from Data Firehose?

A

No

139
Q

What determines when records are sent in Kinesis Data Firehose?

A

The buffer size and buffer time. Whichever limit is hit first.

140
Q

What are the minimum values for Kinesis buffer size and time?

A

Buffer size is a few MB

Buffer time is 1 minute

141
Q

If you need real-time data made searchable using kinesis, what would you use?

A

Kinesis streams with a Lambda to send the data to OpenSearch

142
Q

What is a Cloudwatch subscription filter?

A

A subscription filter allows you to connect to other AWS services like Lambda, Data Streams, etc..

143
Q

Can Kinesis Data Analytics send to Lambda?

A

Yes. This can be used to encrypt, translate to another format, aggregate rows, etc..

144
Q

What can Kinesis Data Analytics integrate with that Firehose cannot?

A

Dynamo DB, Aurora, SNS, SQS, Cloudwatch

145
Q

What is Kinesis Data Analytics now called?

A

Managed Service for Apache Flink

146
Q

What can you use in Managed Service for Apache Flink to access SQL?

A

Table API

147
Q

What are some good use cases for Managed Service for Apache Flink?

A

Streaming ETL

Continuous metric generation

Responsive analytics

148
Q

What is Kinesis Analytics Schema Discovery

A

It analyzes the schema real-time

149
Q

What is Kinesis Data Analytics RANDOM_CUT_FOREST?

A

It detects anomalies in your data.

150
Q

What is AWS MSK?

A

Managed streaming for Apache Kafka. An alternative to Kinesis.

151
Q

What is the maximum message size for AWS MSK?

A

10MB. This is much larger than Kinesis Data Streams at 1MB

152
Q

Can you persist data in AWS MSK?

A

Yes. It uses EBS volumes and is more flexible than Kinesis Data Streams.

153
Q

Can you control who writes to a topic in AWS MSK?

A

Yes. This can be done using:

Mutual TLS and Kafka ACLs

IAM Access control

SASL/SCRAM and Kafka ACLs

154
Q

What is AWS MSK Connect?

A

It allows you to connect to other AWS services for delivery such as S3, Redshift, Opensearch, etc..

155
Q

Can AWS MSK be Serverless?

A

Yes.

156
Q

What is AWS OpenSearch

A

Used to be known as Elasticsearch. Petabyte scale analysis and reporting .

157
Q

What are good use cases for OpenSearch?

A

Full-Text searching

Log analytics

Application Monitoring

Security Analytics

158
Q

What are Types in OpenSearch?

A

They define the schema and mapping shared by documents.

159
Q

What are Indices in OpenSearch?

A

An index. They contain inverted indices that you search across everything within them at once.

160
Q

What is the structure of an index in OpenSearch?

A

They are split into shards and documents are hashed to a particular shard. Shards can be on different nodes in a cluster.

161
Q

Can you offload reads in OpenSearch?

A

Yes, using replicas.

162
Q

What is a domain in OpenSearch?

A

It is essentially the cluster.

163
Q

How can you back your data up in OpenSearch?

A

Snapshot to S3

164
Q

Does OpenSearch support resource or identity based policies?

A

Both. It also supports request signing and IP based policies.

165
Q

How can you allow access to opensearch through a VPC to external users?

A

Using Cognito / SAML, Reverse Proxy, SSH, VPC Direct Connect, or a VPN

166
Q

What type of storage does an OpenSearch data node use by default?

A

Hot storage. This is an instance store or EBS volume.

167
Q

What is UltraWarm storage in OpenSearch

A

It uses S3 and Caching

Best for indices with few writes (log data / immutable data)

slower performance

requires a dedicated master node

168
Q

What is Cold storage in OpenSearch

A

Uses S3

Best for periodic research or forensic analysis on older data

Must have dedicated master node

UltraWarm must also be enabled

169
Q

Can storage data in OpenSearch bet migrated between storage types?

A

Yes

170
Q

What is Index State Management in OpenSearch?

A

Automates index management policies:

automates snapshots

deletes indices over a period of time

Move indices from hot to cold over time

Reduce Replica Count

171
Q

How often are index state management policies run in OpenSource?

A

Every 30 - 48 minutes

172
Q

What are index Rollups in OpenSearch?

A

They roll up old data into summarized indices. New index may have fewer fields. Good to save on storage.

173
Q

What are index transforms in OpenSearch?

A

Like rollups, but purpose is to create a different view to analyze the data differently.

174
Q

Can you replicate data across clusters in OpenSearch?

A

Yes.

175
Q

What is a follower index in OpenSearch?

A

It pulls from the leader index to replicate data.

176
Q

How do you copy indices from cluster to cluster on demand in OpenSearch?

A

Remote Reindex

177
Q

What is the best practice for master nodes?

A

Have three

178
Q

What should you do when you see JVMMemory Pressure Errors in OpenSearch?

A

Delete old or unused indices.

179
Q

What is a big pro for OpenSearch Serverless?

A

On-Demand autoscaling

180
Q

What are the two collection types in OpenSearch Serverless?

A

search or time series

181
Q

What are the sources of QuickSight?

A

Redshift, Aurora, RDS, Athena, OpenSearch, IoT Analytics, Your own database, raw files like csv, excel, log files, etc…

182
Q

Can QuickSight perform ETL?

A

Very light ETL.

183
Q

What is Quicksight Spice?

A

Your datasets get imported into spice. Each user gets 10GB of Spice. It accelerates large queries.

184
Q

What happens when importing data from Athena to Spice takes more than 30 minutes?

A

It times out.

185
Q

What is a good use case for Quicksight.

A

Ad-hoc exploration and visualization

Dashboards and KPIs

186
Q

Does Quicksight support MFA?

A

Yes

187
Q

Does QuickSight support row and column level security?

A

Yes. Row level security is available in standard, but column level security is only available in the enterprise edition.

188
Q

What data security permissions need to be added to Quicksight?

A

You need to make sure QuickSight can access your data.

You need to create IAM policies that restrict what data in S3 users can see.

189
Q

Can quicksight access RedShift data in other regions?

A

No. Quicksight can only acces Redshift data in the same region.

190
Q

How do you access RedShift data to get data from another region using quicksight standard?

A

Use an inbound security group to allow access to Redshift from the Quicksight IP range.

191
Q

If you want to keep QuickSight in a private VPC, what version do you need?

A

Enterprise Edition

192
Q

How do you access RedShift data to get data from another region using enterprise?

A

Use private subnets and peering connections. Route tables will tie it together. It can be used for cross account access using transit Gateway

193
Q

If you want to use an Active Directory connector for quicksight, what version do you need?

A

Enterprise edition

194
Q

Can you use customer managed keys in Quicksight?

A

No. Enterprise edition allows you to use KMS.

195
Q

What is Quicksight Q?

A

An NLP interface on top of QuickSight

196
Q

Can Spice be added to a user?

A

Yes. It is billed by additional GB of spice needed.

197
Q

Is encryption at-rest included in the standard version of QuickSight?

A

No

198
Q

Can you embed dashboards into 3rd party apps using QuickSight?

A

Yes, using the Javascript SDK

199
Q

What needs to be done for embedded dashboards to work on a 3rd party site using QuickSight?

A

Domain Whitelisting

199
Q

What ML capabilities does QuickSight have?

A

anomaly detection

forecasting - seasonality and trends over times. imputes missing values

autonarratives - a story of your data in paragraph format.

Suggested insights - helps decide which feature is right for your dataset.

200
Q
A