Redshift Flashcards
What is Redshift?
petabyte scale data warehouse service
Starts at $.25 / hour with no commitment
Scales to petabyte or more for $1,000 / terabyte per year, less than one-tenth of most other data ware house solutions
OLAP
Online Analytics Processing
one transaction pulls in large numbers of records
Data warehousing uses a different architecture
columnar
Redshift Configuration (Nodes)
Start with Single Node
Grow to Multi Node
What is in redshift multi node configuration?
Leader Node
Manages client connections, receives queries
Compute Node
stores data, performs queries, computations
How many compute nodes can redshift have?
128
Columnar Data Storage Overview, why is it efficient?
only columns involved in queries are processed
Columnar data stored sequentially on storage media
Requires fewer I/Os
Describe Redshift’s compression
columnar data can be compressed more than row based data because it’s stored sequentially on disk
Redshift uses multiple compression techniques, it samples your data and selects best one
Describe Redshift’s Massively Parallel Processing
automatically distributes loads across all nodes
Makes it easy to add nodes, maintain fast performance as data grows
Describe Redshift Pricing for computes
Compute Node Hours
Total hours you run across all compute nodes for billing period
Billed 1 unit per node per hour
3 node cluster running for 1 month = 2,160 instance hours
Not charged for leader node
Describe redshift pricing for backup and data transfer
You’re billed for backups and for data transfer within a VPC (not outside a VPC)
Redshift security
encrypted in transit, SSL
encrypted at rest, AES 256
by default it takes care of keys for you
Can you use HSM or KMS
Is it multi-AZ?
no
only available in one AZ
Can restore snapshots to other AZ’s if outage occurs