Database Design Flashcards
OLTP
Online Transaction Processing (OLTP)
operational system
- purpose is to support frequent transactions
- Enables the real-time execution of large numbers of database transactions by large numbers of people
- enables multi-user access to the same data while ensuring data integrity
- rapid data processing
OLTP systems are designed for use by frontline workers (e.g., cashiers, bank tellers, part desk clerks) or for customer self-service applications (e.g., online banking, e-commerce, travel reservations).
Google Cloud SQL = optimized for OLTP workloads
OLAP
Online Analytical Processing (OLAP)
operational system optimized for conducting complex data analysis / supporting decision making
Enable complex queries
Typical read-intensive (rather than writing data)
Amazon Redshift = a data warehouse designed for OLAP
Database Transaction
A change, insertion, deletion or query of data in a database
In OLTP, a defining characteristic of a database transaction is its atomicity (indivisibility).
A transaction either succeeds as a whole or fails - it cannot remain in a pending state
How do OLTP and OLAP data work together
OLTP systems often serve as a source of information for OLAP systems
OLTP data is usually stored in an operational database, that is pulled and cleaned to create an OLAP data warehouse.
Without transactional data, data analysis can’t be done.
Analyses from OLAP systems inform business transactions, which can provide a basis for making improvements to the OLTP system
How are OLAP systems and OLTP Systems Different?
OLTP systems use relational databases that can accommodate many concurrent users, frequent queries and updates
OLAP systems typically use data warehouses, which are optimized for complex queries involving multiple data facts from current and historical data
OLTP workloads involve a balance of read and write; OLAP workloads are read-intensive.
OLTP systems require frequent or concurrent backups; OLAP systems can be backed up far less frequently.
Structured Data
Structured data is data whose elements are addressable for effective analysis. It is defined by schemas where datatypes and tables are defined.
Easier to analyze, but less flexible and not very scalable
EX: excel files, POS data
Semi-Structured Data
Semi-structured data is information that has some organizational properties that can make it easier to analyze, but doesn’t follow a larger schema.
Often has an ad-hoc, self describing nature
Examples: JSON, XML
Unstructured Data
Unstructured data is not organized in a predefined manner and doesn’t follow a model. It is not a good fit for most relational databases.
Most data in the world is unstructured.
Examples: photos, chat logs
Traditional Databases
Traditional databases are based on a fixed schema that is static in nature
Used for storing real-time relational, structured data for OLTP Systems
Schema-on-write - requires that data be validated against a schema before being written to the database
Example: MySQL
Data Warehouses
Data warehouses are used for analyzing archived, structured data (OLAP)
Optimized for analytics, reading and aggregating data
Oftentimes read-only
Example: BigQuery
Blob Storage
BLOB = Binary Large Object Storage
Cloud storage for unstructured data
Blob storage keeps masses of data in storage areas called data lakes
Data Lakes
Data lakes are used for storing data of all structures without a set schema. Lets you store data without knowing what you’re going to use it for in the future
Cheaper because it uses object storage
Schema-on-read
You need to catalog the data or it can become a data swamp
You can run big-data analytics on data stored in a data lake using Apache Spark.
Example: Google Cloud Storage
Object Storage
A computer storage architecture designed to handle large amounts of unstructured data
It designates data as distinct units, bundled w/ metadata and a unique ID that can be used to locate and access each unit
schema-on-write
vs
schema-on-read
In traditional databases, the table’s schema is imposed during the load time. Data that doesn’t match the schema is rejected.
Schema-on-read doesn’t require data to match any internal schema when it is loaded. The schema is created only once you are reading the data. More commonly used with data lakes
MPP
Massively Parallel Processing
A storage structure designed to handle multiple operations simultaneously by several processing units. Each processing unit works independently with it’s own operating system and dedicated memory.
This is a data warehouse architecture
Object Storage
Object storage is a data storage architecture for handling large amounts of unstructured data. This data does not conform to, or cannot be easily organized into a traditional relational database with columns and rows
Database Design
Database design is a collection of steps that help create, implement, and maintain a business’s data management systems.
The primary purpose of designing a database is to produce physical and logical models of designs for the proposed database system
Database Models
Database models provide high-level specs for database structure
The most popular model is the relational model, which defines rows as records and columns as attributes