General Knowledge Flashcards

1
Q

What is Redshift

A

A petabyte scale, fully managed datawarehouse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Is Redshift designed for OLAP or OLTP?

A

It is specifically designed for OLAP.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

A cluster is made up of what?

A

A leader node and one or more compute nodes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the maximum amount of compute nodes you can have in a cluster?

A

128 max per cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Can clusters have more than one database?

A

Yes, a cluster can contain one or more databases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What node stores user data?

A

The compute node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What node is responsible for managing communication with the client programs?

A

The leader node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What node develops execution plans?

A

The leader node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What node has its own memory, cpu, and attached disk storage?

A

The compute node.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the two types of node types in Redshift?

A

DS (Dense Storage) Node Type. Uses HDDs and is a low cost option. Comes in two sizes. xlarge and 8xlarge

DC (Dense Compute) Node Types. Used to create high performance data warhouses. This uses SSDs. comes in xlarge and 8xlarge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a node slice?

A

An allocation of memory, cpu, and disk allocated for processing a portion of the workload assigned to that node.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is the number of node slices determined?

A

By the size of the node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does Redshift spectrum do?

A

It can query exabytes of data in S3 without loading it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What compression does Redshift spectrum support?

A

Gzip and Snappy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why is redshift spectrum so fast?

A

It uses Massive Parallel Processing, Columnar data storage and column compression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Does Redshift Spectrum scale?

A

Yes, it scales to handle more parallel processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How large is the blocksize for Redshift?

A

1mb

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Can you change the column compression after it is created?

A

No. You cannot change the column compression after the table is created.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Does Redshift replicate?

A

Yes, within the cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Where does Redshift backup to?

A

S3 and it can be asynchronously replicated to another region

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Does Redshift have automated snapshots?

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What happens when a drive or node fails?

A

It is automatically replaced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How many AZs can a single cluster span

A

One. Redshift is a single AZ service

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

If your Redshift cluster goes offline because the AZ is down and you need to query your data ASAP, how can you make this happen?

A

You can restore the cluster using the data stored in S3 into a new AZ that is not being impacted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How does RedShift scale?

A

Redshift scales both horizontally and vertically

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What happens when you scale Redshift from a process perspective

A

A new cluster is created while your old one remains available for Reads. The CName is flipped and data moved in parallel to the new compute nodes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

How does the distribution style Auto work?

A

Redshift figures it out and bases it on the size of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

How does the distribution style Even work?

A

Rows distributed across slices in a round robin fashion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

How does the distribution style “Key” Work?

A

Rows are distributed based on one column

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

How does the distribution style All work?

A

The entire table is copied to every node.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What are Redshift Sort Keys?

A

They are similar to indexes in a traditional relational database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is a single column sort key?

A

A sort key that points to a single column. Such as date

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is a compound sort key?

A

A sort key that is made up of all columns

34
Q

What is the default sort type in redshift?

A

Compund

35
Q

What is an interleaved sort key?

A

A sort key that gives equal weight to every column.

36
Q

What is the COPY command?

A

A command that allows you to read from multiple data files or multiple data streams simultaneously

37
Q

What are the sources of the COPY command?

A

S3, EMR, DynamoDB, remote hosts with SSH

38
Q

When using S3 as a source for the COPY command, what is required other than IAM permissions?

A

A Manifest

39
Q

What does the “UNLOAD” command do?

A

It allows you to export data from RedShift into files in S3

40
Q

What is is Enhanced VPC Routing?

A

It forces all the traffic from COPY and UNLOAD command to use the VPC for communication rather than going over the internet

41
Q

What is the prime use case for the COPY command?

A

To copy data into RedShift from an external source

42
Q

How do you move data within Redshift?

A

SELECT INTO or CREATE TABLE AS

43
Q

Can the “COPY” command decrypt data?

A

Yes. It can decrypt data as it is loaded from S3. It uses hardware accelerated SSL to keep it fast.

44
Q

The “COPY” command can speed up data transfers by using which compression formats?

A

GZip, IZop, and BZip2

45
Q

What does the automatic compression option do when using the “COPY” command?

A

It analyzes data being loaded and figures out the optimal compression to use for storing it.

46
Q

What is a narrow table?

A

A able with lots of rows and few columns

47
Q

What is the best practice to load a narrow table?

A

A single copy transaction if possible.

48
Q

What are the main steps in copying an encrypted snapshot to another region in AWS?

A
  1. Create a KMS Key in the destination region
  2. Create a snapshot copy grant in the destination region
  3. specify the KMS key ID for which you are creating the copy grant in the destination region.
  4. Enable copying of snapshots to the copy grant you just created.
49
Q

What is DBLINK?

A

It allows you to connect Redshift to a PostgreSQL database

50
Q

Can you use the COPY command to load data from EC2 / EMR?

A

Yes, you would use the remote host / ssh option

51
Q

How do you automate data going in and out of tables in Redshift?

A

Datapipeline

52
Q

What tool can be used to help migrate data into Redshift?

A

The AWS Database Migration Service

53
Q

What is Redshift Workload Management (WLM)?

A

It prioritizes short fast queries so they are not blocked by long slow ones. This is configured by using query queues. One for long-running jobs and another for short

54
Q

What is concurrency scaling?

A

It automatically adds capacity to handle increase in concurrent read queries.

55
Q

What is the maximum number of queues you can configure in Automatic Workload Management?

A

8 in total. The default is 5 with even memory allocation

56
Q

What is the default concurrency for Manual Workload Management?

A

Five queries at once in a single queue

57
Q

What is the maximum concurrency level for Manual Workload Management?

A

Fifty

58
Q

What is query queue hopping in Manual Workload Management?

A

Queries that timeout in one queue can “hop” to the next to try again. The second queue could have a higher timeout value.

59
Q

Can you use CREATE TABLE AS Statements with Short Query Acceleration (SQA)?

A

Yes. You can only use it with this statement and read only queries

60
Q

Can you configure how long “short” is for Short Query Acceleration?

A

Yes. You can.

61
Q

What does “VACUUM” do in Redshift?

A

It recovers space from deleted rows

62
Q

What does “VACUUM FULL” do?

A

Default value. It resorts the rows and reclaims space from deleted rows.

63
Q

What does “VACUUM DELETE” do?

A

It only reclaims space from deleted rows

64
Q

What does “VACUUM SORT ONLY” do?

A

It will resort the table, but not clean up disk space

65
Q

What does “VACUUM REINDEX” do?

A

Will reanalyze the sort key columns for interleaved indexes

66
Q

What is Elastic Resize?

A

It allows you to quicklu add or remove nodes of the same type

67
Q

What node instance type has decoupled compute and storage?

A

The RA3 node. SSD Based

68
Q

What is Redshift Datalake export?

A

You can query Redshift and dump the results into a datalake in S3. The motivation is that this is fast and compact

69
Q

What is Cross-Region Data Sharing?

A

The ability to share data across redshift clusters without needing to copy them. This works across accounts and regions. Only works with RA3 node types.

70
Q

What is Aqua?

A

It sits between redshift and S3 and provides accelerated query performance. Up to 10 times faster at no extra cost

71
Q

What certificates are required if you want to use an HSM?

A

The client and server certificate are required

72
Q

Can you enable HSM-encryption on your existing cluster after creation?

A

No. You need to create a new encrypted cluster first and then migrate data to it.

73
Q

What do the “GRANT” and “REVOKE” commands do ?

A

They provide a way to mange access at the table level for a user or group.

74
Q

Can Redshift be serverless?

A

Yes. It can automatically scale and provision for your workloads. Pay only when in use

75
Q

What are good uses for Redshift serverless?

A

Adhoc business analystics, dev and test environments

76
Q

How is Redshift serverless billed?

A

This is billed by RPU per second plus storage.

77
Q

Does Redshift Spectrum work with Serverless?

A

No.

78
Q

Does Workload Management work with Serverless?

A

No

79
Q

Does serverless have a public endpoint?

A

No. You can only connect from within the VPC

80
Q

Can your Query Redshift through the AWS console?

A

Yes. You can use the Query Editor

81
Q

What is an external schema

A

It allows you to connect to an external database like a glue catalog, RDS instance, etc.. You can query the data without loading it into your datawarehouse

82
Q

Can Redshift audit logging support KMS

A

No, only S3 managed keys