RedShift Flashcards
What are the two different theories behind the name “RedShift”?
Either, named after Hubble’s law of determining the distance of stars, or “shifting” away from Oracle.
Describe RedShift
Pedabite capable data warehouse.
fully managed service.
Extremely cost effective compared to teradata or neteesa.
Can work with almost every BI tool straight from the box (because it’s JDBC and ODBC compatible) and is PostgreSQL compatible.
Features Parallel Processing and Columnar data stores (optimized for complex queries).
New feature allows querying data directly from S3: RedShift Spectrum.
JDBC
Java Database Connectivity: Java Database Connectivity is an API (application programming interface) for the programming language Java, which defines how a client may access a database. It is a Java-based data access technology used for Java database connectivity. It is part of the Java Standard Edition platform, from Oracle Corporation
ODBC
Open Database Connectivity: Open Database Connectivity is a standard API (application programming interface) for accessing database management systems. The designers of ODBC aimed to make it independent of database systems and operating systems.
Columnar Data Stores
A columnar database is optimized for fast retrieval of columns of data, typically in analytical applications. Column-oriented storage for database tables is an important factor in analytic query performance because it drastically reduces the overall disk I/O requirements and reduces the amount of data you need to load from disk.
Data Lake
A large repository of data upon which you put a framework on top of to make sense/use of it.
How does RedShift shorten the distance from collecting data to making sense of the data?
Query raw data without extensive pre-processing.
Lessen time from data collection to data value.
Identify correlations between disparate data sets.