Redshift Flashcards
What is Redshift?
A fully managed, clustered petabyte-scale data warehouse.
What DB is Redshift based upon?
PostgreSQL
What database connection is Redshift compatible with?
ODBC and JBDC
What is the up-front cost of using Redshift?
None, as with all AWS services, it is pay-as-you-go.
What Redshift features allow for complex queries?
parallel processing
columnar data stores
What Redshift feature allows you to query directly from data files in s3?
Redshift Spectrum
How does columnar data storage improve Redshift?
It drastically reduces the overall disk I/O requirements and reduces the amount of data you need to load from disk.
What is parallel processing?
queries are segmented across every core on every node.
What is a data lake?
A repository of a variety of data, on top of which you place a framework or technology to make use of the data.
What types of frameworks or technologies are sometimes used in data lakes?
Machine learning, analytics, on-prem data movements, real-time data movements.
What are some benefits of data lakes?
Queiry raw data without extensive preprocessing.
lessens time from data collection to data value.
Identifies correlations between disparate data sets.
What is Redshift spectrum?
A service that allows fast and complex analysis against objects stored in the cloud, such as s3.