General Knowledge Flashcards

(82 cards)

1
Q

What is Redshift

A

A petabyte scale, fully managed datawarehouse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Is Redshift designed for OLAP or OLTP?

A

It is specifically designed for OLAP.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

A cluster is made up of what?

A

A leader node and one or more compute nodes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the maximum amount of compute nodes you can have in a cluster?

A

128 max per cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Can clusters have more than one database?

A

Yes, a cluster can contain one or more databases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What node stores user data?

A

The compute node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What node is responsible for managing communication with the client programs?

A

The leader node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What node develops execution plans?

A

The leader node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What node has its own memory, cpu, and attached disk storage?

A

The compute node.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the two types of node types in Redshift?

A

DS (Dense Storage) Node Type. Uses HDDs and is a low cost option. Comes in two sizes. xlarge and 8xlarge

DC (Dense Compute) Node Types. Used to create high performance data warhouses. This uses SSDs. comes in xlarge and 8xlarge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a node slice?

A

An allocation of memory, cpu, and disk allocated for processing a portion of the workload assigned to that node.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is the number of node slices determined?

A

By the size of the node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does Redshift spectrum do?

A

It can query exabytes of data in S3 without loading it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What compression does Redshift spectrum support?

A

Gzip and Snappy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why is redshift spectrum so fast?

A

It uses Massive Parallel Processing, Columnar data storage and column compression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Does Redshift Spectrum scale?

A

Yes, it scales to handle more parallel processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How large is the blocksize for Redshift?

A

1mb

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Can you change the column compression after it is created?

A

No. You cannot change the column compression after the table is created.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Does Redshift replicate?

A

Yes, within the cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Where does Redshift backup to?

A

S3 and it can be asynchronously replicated to another region

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Does Redshift have automated snapshots?

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What happens when a drive or node fails?

A

It is automatically replaced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How many AZs can a single cluster span

A

One. Redshift is a single AZ service

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

If your Redshift cluster goes offline because the AZ is down and you need to query your data ASAP, how can you make this happen?

A

You can restore the cluster using the data stored in S3 into a new AZ that is not being impacted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
How does RedShift scale?
Redshift scales both horizontally and vertically
26
What happens when you scale Redshift from a process perspective
A new cluster is created while your old one remains available for Reads. The CName is flipped and data moved in parallel to the new compute nodes.
27
How does the distribution style Auto work?
Redshift figures it out and bases it on the size of the data
28
How does the distribution style Even work?
Rows distributed across slices in a round robin fashion
29
How does the distribution style "Key" Work?
Rows are distributed based on one column
30
How does the distribution style All work?
The entire table is copied to every node.
31
What are Redshift Sort Keys?
They are similar to indexes in a traditional relational database
32
What is a single column sort key?
A sort key that points to a single column. Such as date
33
What is a compound sort key?
A sort key that is made up of all columns
34
What is the default sort type in redshift?
Compund
35
What is an interleaved sort key?
A sort key that gives equal weight to every column.
36
What is the COPY command?
A command that allows you to read from multiple data files or multiple data streams simultaneously
37
What are the sources of the COPY command?
S3, EMR, DynamoDB, remote hosts with SSH
38
When using S3 as a source for the COPY command, what is required other than IAM permissions?
A Manifest
39
What does the "UNLOAD" command do?
It allows you to export data from RedShift into files in S3
40
What is is Enhanced VPC Routing?
It forces all the traffic from COPY and UNLOAD command to use the VPC for communication rather than going over the internet
41
What is the prime use case for the COPY command?
To copy data into RedShift from an external source
42
How do you move data within Redshift?
SELECT INTO or CREATE TABLE AS
43
Can the "COPY" command decrypt data?
Yes. It can decrypt data as it is loaded from S3. It uses hardware accelerated SSL to keep it fast.
44
The "COPY" command can speed up data transfers by using which compression formats?
GZip, IZop, and BZip2
45
What does the automatic compression option do when using the "COPY" command?
It analyzes data being loaded and figures out the optimal compression to use for storing it.
46
What is a narrow table?
A able with lots of rows and few columns
47
What is the best practice to load a narrow table?
A single copy transaction if possible.
48
What are the main steps in copying an encrypted snapshot to another region in AWS?
1. Create a KMS Key in the destination region 2. Create a snapshot copy grant in the destination region 3. specify the KMS key ID for which you are creating the copy grant in the destination region. 4. Enable copying of snapshots to the copy grant you just created.
49
What is DBLINK?
It allows you to connect Redshift to a PostgreSQL database
50
Can you use the COPY command to load data from EC2 / EMR?
Yes, you would use the remote host / ssh option
51
How do you automate data going in and out of tables in Redshift?
Datapipeline
52
What tool can be used to help migrate data into Redshift?
The AWS Database Migration Service
53
What is Redshift Workload Management (WLM)?
It prioritizes short fast queries so they are not blocked by long slow ones. This is configured by using query queues. One for long-running jobs and another for short
54
What is concurrency scaling?
It automatically adds capacity to handle increase in concurrent read queries.
55
What is the maximum number of queues you can configure in Automatic Workload Management?
8 in total. The default is 5 with even memory allocation
56
What is the default concurrency for Manual Workload Management?
Five queries at once in a single queue
57
What is the maximum concurrency level for Manual Workload Management?
Fifty
58
What is query queue hopping in Manual Workload Management?
Queries that timeout in one queue can "hop" to the next to try again. The second queue could have a higher timeout value.
59
Can you use CREATE TABLE AS Statements with Short Query Acceleration (SQA)?
Yes. You can only use it with this statement and read only queries
60
Can you configure how long "short" is for Short Query Acceleration?
Yes. You can.
61
What does "VACUUM" do in Redshift?
It recovers space from deleted rows
62
What does "VACUUM FULL" do?
Default value. It resorts the rows and reclaims space from deleted rows.
63
What does "VACUUM DELETE" do?
It only reclaims space from deleted rows
64
What does "VACUUM SORT ONLY" do?
It will resort the table, but not clean up disk space
65
What does "VACUUM REINDEX" do?
Will reanalyze the sort key columns for interleaved indexes
66
What is Elastic Resize?
It allows you to quicklu add or remove nodes of the same type
67
What node instance type has decoupled compute and storage?
The RA3 node. SSD Based
68
What is Redshift Datalake export?
You can query Redshift and dump the results into a datalake in S3. The motivation is that this is fast and compact
69
What is Cross-Region Data Sharing?
The ability to share data across redshift clusters without needing to copy them. This works across accounts and regions. Only works with RA3 node types.
70
What is Aqua?
It sits between redshift and S3 and provides accelerated query performance. Up to 10 times faster at no extra cost
71
What certificates are required if you want to use an HSM?
The client and server certificate are required
72
Can you enable HSM-encryption on your existing cluster after creation?
No. You need to create a new encrypted cluster first and then migrate data to it.
73
What do the "GRANT" and "REVOKE" commands do ?
They provide a way to mange access at the table level for a user or group.
74
Can Redshift be serverless?
Yes. It can automatically scale and provision for your workloads. Pay only when in use
75
What are good uses for Redshift serverless?
Adhoc business analystics, dev and test environments
76
How is Redshift serverless billed?
This is billed by RPU per second plus storage.
77
Does Redshift Spectrum work with Serverless?
No.
78
Does Workload Management work with Serverless?
No
79
Does serverless have a public endpoint?
No. You can only connect from within the VPC
80
Can your Query Redshift through the AWS console?
Yes. You can use the Query Editor
81
What is an external schema
It allows you to connect to an external database like a glue catalog, RDS instance, etc.. You can query the data without loading it into your datawarehouse
82
Can Redshift audit logging support KMS
No, only S3 managed keys