Database Architecture Flashcards
Architecture
components of a computer system and the relationships between components
Tools
Directly interact with database users and administrators, and send queries to the query processor
Query Processor
Manages connections from multiple users and compiles queries into low-level instructions for the storage engine
Storage Engine/Storage Manager
Executes instructions, manages indexes, and interacts with the file system
File System
Accesses data on storage media. The file system contains both system and user data, such as log files, tables, and indexes
MySQL Server/Enterprise
MySQL server - free, MySQL enterprise - paid
Monitor
Collects and displays information on CPU, memory, and index utilization, as well as queries and results.
Audit
Keeps track of all database changes. It also tracks the time of change and who made the change.
Utility Programs
Program suite includes 30 tools grouped into five categories. Installation, Client, Administrative, Developer, and miscellaneous tools.
Connection
Link between tools and the query processor. Each connection specifies a database name, server address, logon name, and password.
Execution plan
Detailed, low-level sequence of steps that specify exactly how to process a query
Query Parser
Check each query for syntax errors and converts valid queries to an internal representation
Query Optimizer
Reads the internal representation, generates alternative execution plans, estimated execution times, and selects the fastest plan. Estimates are based on heuristics and statistics about data, like the number of rows in each table and the number of values in each column. These statistics are maintained in the data dictionary described below.
Cache Manager
Query processor layer has a cache manager that stores reusable information in main memory
Buffer Manager
Retains data blocks from the file system for possible reuse to reduce access time
Buffer
Data blocks retained in an area of main memory
Least Recently Used/LRU
LRU algorithm tracks the time each block was last accessed and, when space is needed, discards ‘stale’ blocksC
Catalog, Data dictionary
Directory of tables, columns, keys, indexes, and other objects in a relational database
Tiers
Computers linked by a network are often grouped in layers, called tiers, and arranged in a hierarhcy.
Single-Tier Architecture
Consists of a personal or corporate computer connected directly to monitors
Multi-tier Architecture
Complex corporate and government applications have increasingly been implemented in a multi-tier architecture
Web Architecture
Consists of web browsers and web servers communicating over the internet. Web browsers on the top tier, webservers on the middle tier, application servers, and then services on the bottom
On-premise
Installed and run on a customer computers
Cloud Services
Computer services on a lower tier of a web architecture. Cloud services are made available over the internet to customers for a monthly fee
Infrastructure-as-a-service/IaaS
provides computer processing, memory, and storage media, as if the customer were renting a computer
Platform-as-a-service/PaaS
provides tool and services, such as a databases, application development tools, and messaging services
Virtual Machine/VM
Software layer that emulates a complete, independent computing environment.
Cloud Database
database offered as a PaaS cloud service
Parallel Computer
Computer consists of multiple processor managed by a single operating system instance
Shared Memory
Processor share the same memory and storage media
Shared Storage
Processor share storage media only
Shared nothing
Processor share neither memory nor storage media
Local Area Network
Consists of cable extending over a small area, typically within one facility. LAN usually use the Ethernet communication protocol.
Wide Area Network
WAN spans multiple facilities in different geographic locations, separated by many miles. WAN communicate via cables, satellite, or telephone lines, often using internet communication protocols.
CLuster
Group of nodes connected by a local area network, managed by separate operating systems instances, and coordinated by specialized cluster management software
Node
Group of computers connected by either a local or wide area network
Parallel Database
Runs on a parallel computer or cluster
Distributed database
runs on multiple computers connected by a wide area network
Distributed transcation
Updates data on multiple nodes of a distributed database. Either all nodes or no nodes must be successfully updated
Two-phase Commit
Phase 1.1, central transaction coordinator notifies all participating nodes of the required updates. 1.2. Participating nodes receive the notification, store the update in a local log, and send a confirmation message. Phase 2.1 begins when the transaction coordinator receives confirmation from all participating nodes. The transition coordinator now instructs all nodes to commit. 2.2 participating nodes receive the commit message, commit the update to the database, and notify the transaction coordinator of success.
Local Transaction
Updates data on a single node of a distribute ddatabase
Synchronous
Distributed transaction are synchronous, since the updates occur at the same time from the perspective of the database user
Asynchronous
updates in separate local transactions are asynchronous
eventually consistent
Databases that use local rather than distributed transactions are called eventually consistent
Consistent
Database conforms to all rules at all times
Available
live nodes must respond to queries at all times
Network partition
When a network error prevents nodes from communicating
Partition-tolerant
Continues to function when a network partition occurs
CAP theorem
Distributed datatbase cannot simultaneously be Consistent, Available, and Partition-tolerant. Distributed database can guarantee any two, but not all three. of these properties.
Replica
Copy of an entire database, a table or a subset of table data
Replicated Database
Maintains two or more replicas on seperate storage devices
Storage arrays
Manage replicas internally, without database intervention
Primary/Secondary
designates one node as primary. All updates are first applied to the primary node in local transactions, Secondary nodes are updated after the primary node commits, with independent local transactions.
Group replication
Applies updates to any node in a group. If a node detects a conflict, an algorithm determines which transaction commits and which rolls back
Central Catalog
Entire catalog resides on a single node.
Replicated catalog
Copy of the catalog resides on each node
Operational Data
Conduct daily business functions
Analytic Data
Data that is used to understand, manage, and plan the business
Reporting Data/Decision support Data.
Data that is reporting data or decision support data
Data warehouse
Seperate database optimized for analytics rather than operations
Data mart
Data warehouse designed for a specific business area, such as sales, human resources, or product development
Extract-transform-load/ETL
Five-step process commonly referred to as the extract-transform-load or ETL process
ETL tools
ETL is a time-consuming process so most organizations use special software products to minimize coding.
Dimensional Design/Star Schema
consists of fact and dimension tables
fact table
Contains numeric data used to measure business performance, such as sales revenue or number of employees. Each row in a fact table consists of numeric fact columns and foreign keys that reference dimension tables.
Dimension table
Contains textual data that described the fact data, such as product line, organizational unit, and geographical region
Dimension Hierarhhcy
Sequence of columns in which each column has a one-many relationship to the next column
Date dimension
Each row corresponds to a day. Contains 36500 rows (100 years x 365days per year)
Time Deimension
Each row corresponds to a minute of the day. Contains 1440 rows (24 hours x 60 minute per hour)
Type 2 Design for slowly chanign dimension
Start and end foreign keys to the fact table
In-memory Database
Database that stores data in main memory, instead of or in addition to storage media
Embedded Database/In-Process Database
Database that is packaged with a programming language. Embedded database and application program execute together in a single software process
SQLite
dominant embedded relational database
SQL Server Compact
Embedded database form microsoft, discontinued support after 2021
Libmysqld
Discontinued version that configured MySQL as embedded.
Federated Database
Collection of two or more participating databases underneath a coordinating software layer. The participating databases are autonomous and heterogeneous
Autonomous datbase
Operates independently of other participating databases. It is administered and can be queried as if the database were not part of a federated database
Heterogeneous databases
Run under different database systems or have incompatible schema
Middleware
coordinating sfotware layer, since the software lies between application programs and database software
Global Catalog
Directory of participating database objects, such as tables, columns, and indexes
Databse Wrapper
Converts the decomposed queries to the appropriate syntax for each participating database
SQL/Management of External Data/SQL/MED
Extension of the SQL standard for federated databases
Nickname
Database name for a participating database object, such as tables and columns
User Mapping
Associates a federated database user with a participating database user
Data Lake
Analytic database of raw, unprocessed data copied from multiple data sources. Data lakes share some characteristics of data warehouses and some characteristic of federated databases
BackTick
Delimits literals that represent identifiers which allows spaces and reserved words to be used as identifiers.
components of a computer system and the relationships between components
Architecture
Directly interact with database users and administrators, and send queries to the query processor
Tools
Manages connections from multiple users and compiles queries into low-level instructions for the storage engine
Query Processor
Executes instructions, manages indexes, and interacts with the file system
Storage Engine/Storage Manager
Accesses data on storage media. The file system contains both system and user data, such as log files, tables, and indexes
File System
MySQL server - free, MySQL enterprise - paid
MySQL Server/Enterprise
Collects and displays information on CPU, memory, and index utilization, as well as queries and results.
Monitor
Keeps track of all database changes. It also tracks the time of change and who made the change.
Audit
Program suite includes 30 tools grouped into five categories. Installation, Client, Administrative, Developer, and miscellaneous tools.
Utility Programs
Link between tools and the query processor. Each connection specifies a database name, server address, logon name, and password.
Connection
Detailed, low-level sequence of steps that specify exactly how to process a query
Execution plan
Check each query for syntax errors and converts valid queries to an internal representation
Query Parser
Reads the internal representation, generates alternative execution plans, estimated execution times, and selects the fastest plan. Estimates are based on heuristics and statistics about data, like the number of rows in each table and the number of values in each column. These statistics are maintained in the data dictionary described below.
Query Optimizer
Query processor layer has a cache manager that stores reusable information in main memory
Cache Manager
Retains data blocks from the file system for possible reuse to reduce access time
Buffer Manager
Data blocks retained in an area of main memory
Buffer
LRU algorithm tracks the time each block was last accessed and, when space is needed, discards ‘stale’ blocksC
Least Recently Used/LRU
Directory of tables, columns, keys, indexes, and other objects in a relational database
Catalog, Data dictionary
Computers linked by a network are often grouped in layers, called tiers, and arranged in a hierarhcy.
Tiers
Consists of a personal or corporate computer connected directly to monitors
Single-Tier Architecture
Complex corporate and government applications have increasingly been implemented in a multi-tier architecture
Multi-tier Architecture
Consists of web browsers and web servers communicating over the internet. Web browsers on the top tier, webservers on the middle tier, application servers, and then services on the bottom
Web Architecture
Installed and run on a customer computers
On-premise
Computer services on a lower tier of a web architecture. Cloud services are made available over the internet to customers for a monthly fee
Cloud Services
provides computer processing, memory, and storage media, as if the customer were renting a computer
Infrastructure-as-a-service/IaaS
provides tool and services, such as a databases, application development tools, and messaging services
Platform-as-a-service/PaaS
Software layer that emulates a complete, independent computing environment.
Virtual Machine/VM
database offered as a PaaS cloud service
Cloud Database
Computer consists of multiple processor managed by a single operating system instance
Parallel Computer
Processor share the same memory and storage media
Shared Memory
Processor share storage media only
Shared Storage
Processor share neither memory nor storage media
Shared nothing
Consists of cable extending over a small area, typically within one facility. LAN usually use the Ethernet communication protocol.
Local Area Network
WAN spans multiple facilities in different geographic locations, separated by many miles. WAN communicate via cables, satellite, or telephone lines, often using internet communication protocols.
Wide Area Network
Group of nodes connected by a local area network, managed by separate operating systems instances, and coordinated by specialized cluster management software
CLuster
Group of computers connected by either a local or wide area network
Node
Runs on a parallel computer or cluster
Parallel Database
runs on multiple computers connected by a wide area network
Distributed database
Updates data on multiple nodes of a distributed database. Either all nodes or no nodes must be successfully updated
Distributed transcation
Phase 1.1, central transaction coordinator notifies all participating nodes of the required updates. 1.2. Participating nodes receive the notification, store the update in a local log, and send a confirmation message. Phase 2.1 begins when the transaction coordinator receives confirmation from all participating nodes. The transition coordinator now instructs all nodes to commit. 2.2 participating nodes receive the commit message, commit the update to the database, and notify the transaction coordinator of success.
Two-phase Commit
Updates data on a single node of a distribute ddatabase
Local Transaction
Distributed transaction are synchronous, since the updates occur at the same time from the perspective of the database user
Synchronous
updates in separate local transactions are asynchronous
Asynchronous
Databases that use local rather than distributed transactions are called eventually consistent
eventually consistent
Database conforms to all rules at all times
Consistent
live nodes must respond to queries at all times
Available
When a network error prevents nodes from communicating
Network partition
Continues to function when a network partition occurs
Partition-tolerant
Distributed datatbase cannot simultaneously be Consistent, Available, and Partition-tolerant. Distributed database can guarantee any two, but not all three. of these properties.
CAP theorem
Copy of an entire database, a table or a subset of table data
Replica
Maintains two or more replicas on seperate storage devices
Replicated Database
Manage replicas internally, without database intervention
Storage arrays
designates one node as primary. All updates are first applied to the primary node in local transactions, Secondary nodes are updated after the primary node commits, with independent local transactions.
Primary/Secondary
Applies updates to any node in a group. If a node detects a conflict, an algorithm determines which transaction commits and which rolls back
Group replication
Entire catalog resides on a single node.
Central Catalog
Copy of the catalog resides on each node
Replicated catalog
Conduct daily business functions
Operational Data
Data that is used to understand, manage, and plan the business
Analytic Data
Data that is reporting data or decision support data
Reporting Data/Decision support Data.
Seperate database optimized for analytics rather than operations
Data warehouse
Data warehouse designed for a specific business area, such as sales, human resources, or product development
Data mart
Five-step process commonly referred to as the extract-transform-load or ETL process
Extract-transform-load/ETL
ETL is a time-consuming process so most organizations use special software products to minimize coding.
ETL tools
consists of fact and dimension tables
Dimensional Design/Star Schema
Contains numeric data used to measure business performance, such as sales revenue or number of employees. Each row in a fact table consists of numeric fact columns and foreign keys that reference dimension tables.
fact table
Contains textual data that described the fact data, such as product line, organizational unit, and geographical region
Dimension table
Sequence of columns in which each column has a one-many relationship to the next column
Dimension Hierarhhcy
Each row corresponds to a day. Contains 36500 rows (100 years x 365days per year)
Date dimension
Each row corresponds to a minute of the day. Contains 1440 rows (24 hours x 60 minute per hour)
Time Deimension
Start and end foreign keys to the fact table
Type 2 Design for slowly chanign dimension
Database that stores data in main memory, instead of or in addition to storage media
In-memory Database
Database that is packaged with a programming language. Embedded database and application program execute together in a single software process
Embedded Database/In-Process Database
dominant embedded relational database
SQLite
Embedded database form microsoft, discontinued support after 2021
SQL Server Compact
Discontinued version that configured MySQL as embedded.
Libmysqld
Collection of two or more participating databases underneath a coordinating software layer. The participating databases are autonomous and heterogeneous
Federated Database
Operates independently of other participating databases. It is administered and can be queried as if the database were not part of a federated database
Autonomous datbase
Run under different database systems or have incompatible schema
Heterogeneous databases
coordinating sfotware layer, since the software lies between application programs and database software
Middleware
Directory of participating database objects, such as tables, columns, and indexes
Global Catalog
Converts the decomposed queries to the appropriate syntax for each participating database
Databse Wrapper
Extension of the SQL standard for federated databases
SQL/Management of External Data/SQL/MED
Database name for a participating database object, such as tables and columns
Nickname
Associates a federated database user with a participating database user
User Mapping
Analytic database of raw, unprocessed data copied from multiple data sources. Data lakes share some characteristics of data warehouses and some characteristic of federated databases
Data Lake
Delimits literals that represent identifiers which allows spaces and reserved words to be used as identifiers.
BackTick