Database Architecture Flashcards
Architecture
components of a computer system and the relationships between components
Tools
Directly interact with database users and administrators, and send queries to the query processor
Query Processor
Manages connections from multiple users and compiles queries into low-level instructions for the storage engine
Storage Engine/Storage Manager
Executes instructions, manages indexes, and interacts with the file system
File System
Accesses data on storage media. The file system contains both system and user data, such as log files, tables, and indexes
MySQL Server/Enterprise
MySQL server - free, MySQL enterprise - paid
Monitor
Collects and displays information on CPU, memory, and index utilization, as well as queries and results.
Audit
Keeps track of all database changes. It also tracks the time of change and who made the change.
Utility Programs
Program suite includes 30 tools grouped into five categories. Installation, Client, Administrative, Developer, and miscellaneous tools.
Connection
Link between tools and the query processor. Each connection specifies a database name, server address, logon name, and password.
Execution plan
Detailed, low-level sequence of steps that specify exactly how to process a query
Query Parser
Check each query for syntax errors and converts valid queries to an internal representation
Query Optimizer
Reads the internal representation, generates alternative execution plans, estimated execution times, and selects the fastest plan. Estimates are based on heuristics and statistics about data, like the number of rows in each table and the number of values in each column. These statistics are maintained in the data dictionary described below.
Cache Manager
Query processor layer has a cache manager that stores reusable information in main memory
Buffer Manager
Retains data blocks from the file system for possible reuse to reduce access time
Buffer
Data blocks retained in an area of main memory
Least Recently Used/LRU
LRU algorithm tracks the time each block was last accessed and, when space is needed, discards ‘stale’ blocksC
Catalog, Data dictionary
Directory of tables, columns, keys, indexes, and other objects in a relational database
Tiers
Computers linked by a network are often grouped in layers, called tiers, and arranged in a hierarhcy.
Single-Tier Architecture
Consists of a personal or corporate computer connected directly to monitors
Multi-tier Architecture
Complex corporate and government applications have increasingly been implemented in a multi-tier architecture
Web Architecture
Consists of web browsers and web servers communicating over the internet. Web browsers on the top tier, webservers on the middle tier, application servers, and then services on the bottom
On-premise
Installed and run on a customer computers
Cloud Services
Computer services on a lower tier of a web architecture. Cloud services are made available over the internet to customers for a monthly fee
Infrastructure-as-a-service/IaaS
provides computer processing, memory, and storage media, as if the customer were renting a computer
Platform-as-a-service/PaaS
provides tool and services, such as a databases, application development tools, and messaging services
Virtual Machine/VM
Software layer that emulates a complete, independent computing environment.
Cloud Database
database offered as a PaaS cloud service
Parallel Computer
Computer consists of multiple processor managed by a single operating system instance
Shared Memory
Processor share the same memory and storage media
Shared Storage
Processor share storage media only
Shared nothing
Processor share neither memory nor storage media
Local Area Network
Consists of cable extending over a small area, typically within one facility. LAN usually use the Ethernet communication protocol.
Wide Area Network
WAN spans multiple facilities in different geographic locations, separated by many miles. WAN communicate via cables, satellite, or telephone lines, often using internet communication protocols.
CLuster
Group of nodes connected by a local area network, managed by separate operating systems instances, and coordinated by specialized cluster management software
Node
Group of computers connected by either a local or wide area network
Parallel Database
Runs on a parallel computer or cluster
Distributed database
runs on multiple computers connected by a wide area network
Distributed transcation
Updates data on multiple nodes of a distributed database. Either all nodes or no nodes must be successfully updated
Two-phase Commit
Phase 1.1, central transaction coordinator notifies all participating nodes of the required updates. 1.2. Participating nodes receive the notification, store the update in a local log, and send a confirmation message. Phase 2.1 begins when the transaction coordinator receives confirmation from all participating nodes. The transition coordinator now instructs all nodes to commit. 2.2 participating nodes receive the commit message, commit the update to the database, and notify the transaction coordinator of success.
Local Transaction
Updates data on a single node of a distribute ddatabase
Synchronous
Distributed transaction are synchronous, since the updates occur at the same time from the perspective of the database user
Asynchronous
updates in separate local transactions are asynchronous
eventually consistent
Databases that use local rather than distributed transactions are called eventually consistent
Consistent
Database conforms to all rules at all times
Available
live nodes must respond to queries at all times
Network partition
When a network error prevents nodes from communicating
Partition-tolerant
Continues to function when a network partition occurs
CAP theorem
Distributed datatbase cannot simultaneously be Consistent, Available, and Partition-tolerant. Distributed database can guarantee any two, but not all three. of these properties.
Replica
Copy of an entire database, a table or a subset of table data
Replicated Database
Maintains two or more replicas on seperate storage devices
Storage arrays
Manage replicas internally, without database intervention
Primary/Secondary
designates one node as primary. All updates are first applied to the primary node in local transactions, Secondary nodes are updated after the primary node commits, with independent local transactions.
Group replication
Applies updates to any node in a group. If a node detects a conflict, an algorithm determines which transaction commits and which rolls back
Central Catalog
Entire catalog resides on a single node.
Replicated catalog
Copy of the catalog resides on each node
Operational Data
Conduct daily business functions
Analytic Data
Data that is used to understand, manage, and plan the business
Reporting Data/Decision support Data.
Data that is reporting data or decision support data
Data warehouse
Seperate database optimized for analytics rather than operations
Data mart
Data warehouse designed for a specific business area, such as sales, human resources, or product development
Extract-transform-load/ETL
Five-step process commonly referred to as the extract-transform-load or ETL process
ETL tools
ETL is a time-consuming process so most organizations use special software products to minimize coding.
Dimensional Design/Star Schema
consists of fact and dimension tables
fact table
Contains numeric data used to measure business performance, such as sales revenue or number of employees. Each row in a fact table consists of numeric fact columns and foreign keys that reference dimension tables.
Dimension table
Contains textual data that described the fact data, such as product line, organizational unit, and geographical region
Dimension Hierarhhcy
Sequence of columns in which each column has a one-many relationship to the next column
Date dimension
Each row corresponds to a day. Contains 36500 rows (100 years x 365days per year)