Fundamentals Flashcards
Database
Structured set of data that is accessible in many ways
A collection of tables and their relationships
(Data + rules on data)
(can run multiple queries without affecting the data structure)
Data Anomalies
Data management problems resulting from poor data structure
Insert Anomaly
Error prohibiting insertion
Update Anomaly
Error from updates OR when multiple updates are required
Risk of missing some items
Delete Anomaly
Deletes other info
Solving poor database structure
Separate data into different tables BUT keep relationships the same
Adding a redundant column removes anomalies (only in relational database)
E. F Codd
A relational Model of Data for Large Shared Data Banks
Donald Chamberlin & Raymond BOyce
SEQUEL /SQL (Structured English Query Language)
Peter Chen
The entity-relationship model-toward a unified view of data
Larry Ellison
Oracle
Jim Gray
The Transaction Concept Virtues and Limitations
Database Management System
A software package which allows to define, store, use and maintain a database
Database System
Database Management System (DBMS) + Database
Hardware + Software (OS, DBMS, Application programs and utility) +
People +
Procedures +
Data
Problems with file approach to data management
- Redundant data => waste of storage space
- Inconsistent data
- Strong dependency between applications and data
- Concurrent actions lead to inconsistent state of data
- Difficult to integrate various applications
Types of Data
Raw & Metadata
Metadata
Data descriptions
Stored in data dictionary in DBMS
Database vs File Approach
DBMS manages both and metadata in catalog
Application-data independence
View (data subset) can be defined
Concurrency control to support multiple users
File Approach
File only contains data; metadata stored in applications
Application - structural data dependence (need to code to retrieve new information)
No views possible (unless duplication)
No multi user support
Lengthy development times
Complex system administration
Data model
A clear description of the data concepts, their relationships and various data constraints that together make up the content of the database
Should provide a formal and perfect mapping of the real world
Conceptual Data Model
High - level data concepts, close to how the business user perceives the data
Logical Data Model
Concepts that may be understood by business users but are not too far removed from physical data organisation
Physical Data Model
Low- level concepts that describe the datas physical storage details
Database schema
Conceptual organisation of the entire database as view by the database administrator
Specified during database design
Not expected to change frequently
Stored in the catalog
Database state
The data in the database at a particular moment (set of instances)
Changes on ongoing basis
The three - schema architecture
External Schema
Conceptual Schema
Internal Schema
Changes in one layer should have minimal impact on the others
External Schema
Describes the part of the database that a particular user group is interested in and hides the rest of the database from that user group
Conceptual Schema
Specifies data concepts, characteristics, relationships, integrity rules and behaviour
Internal Schema
Specifies how the data are stored/ organised physically (e.g. indexes, access paths)
Benefits Three-Schema Architecture
- Efficiency (size)
- Maintainability (ease of updating, reduces redundancy)
- Performance (importance of data)
- Security (sensitive information, authorisation)
Data Dictionary / Catalog
Heart of DBMS
System database with metadata
Contains definitions for external schema, conceptual schema and physical schema
Database Designer
Designs conceptual schema
Database administrator
Designs external and physical shcema
Application Developer
Develops database applications in a programming language
Business user
Makes use of data stored in database via applications (makes queries on the database)
DBMS vendors
Companies selling DBMS i.e. Oracle, Microsoft
Data Definition Language (DDL)
Language used by the database administrator (DBA) to define the database’s logical, internal and external schemas
Data Manipulation Language (DML)
Language used to retrieve, insert, delete and modify data
DML statements can be embedded in a general-purpose programming language or entered interactively through a front-end querying tool
Relational Database Systems (RDBMS)
Stores data in tables so it can be used in relation to other stored databases
SQL is both a DML & DBA in RDBMS
Data and functional independance
Changes in data definitions have minimal to no impact on applications
Physical (computer) Data Independance
Neither applications nor external or conceptual schema must be changed when changes are made to data storage specifications i.e. different storage locations
DBMS provides interfaces between the conceptual and physical data models
Logical (human) Data Independance
Software applications are minimally impacted by changes in the conceptual schema
i.e. adding new data concepts, characteristics, relationships
DBMS provides interfaces between conceptual and external schema
Function
Interface (signature) : name of the function and its arguments
Method
Implementation: specifies how the function should be executed
Functional Independance
Implementation (method) can change without impact on software applications
Data Redundancy
When the same data is stored unnecessarily at different places
May be desired -> increase performance in distributed environments
DBMS manages redundancy (updates all consistently = less error prone; no user intervention)
- > synchronization
- > consistency
Data Integrity Rules
Enforce correctness of data
Embedded in applications (in file approach)
Specified as part of conceptual schema and stored in catalog (database approach)
Syntactical Rules (syntax)
Specify how data is represented and stored i.e.
Customer ID should be numeric
Birthdate should be DD/MM/YYY
Semantical Rules
Focuses on the meaning of the data i.e.
Account balance should be >0
Customer ID should be unique
When are integrity rules enforced?
Whenever anything is updated (database)
When applications are accessing files (file based approach)
Concurrency Control
Concurrent execution of database programs allows for good performance
DBMS ACID properties
Ensures database transaction are processed in a reliable way (must be done together)
ATOMIC
CONSISTENT
ISOLATED
DURABLE
Data Security
Authority -> read access vs write access
Accessibility -> whole database or certain parts
Logins and passwords assigned to users (rules stored in catalog)
Backups Abilities
In case of loss of data due to errors (hardware/network/ bugs)
Full or incremental backups
Allows restoration of data after loss or damage
Performance Utilities
Distributing data storage
Tuning indices (faster queries)
Tuning queries (improve application performance)
Optimizing buffer management
Buffer manager
Software layer that is responsible for brining pages from physical disk to main memory as needed
Divides main memory into a collection of pages known as buffer pool
Data Management Importance
Data management should be
- Scalable
- Accessible
- Accurate
- Secure
- Consistent
- Permanent
Purpose of a database
Provides structure to datasets
Table
Describes a formalized repeating list of data
Consists of rows and columns
Unique Key
Allows the identification of a particular row in a table
A table can have more than one unique key
Can accept only one NULL value for a column
Synthetic Key / Surrogate Key
DBMS generated primary key
Primary Key
A column of table which uniquely identifies each role in a table
Only one is allowed in a table
No duplicate or NULL values allowed
Allows connection of different tables in a database
Foreign Key
A column or group of columns in a relational database table that provides a link between data in two tables
It is a column(s) that references a column (most often a primary key) of another table
Junction or Linking Table
Maps two or more tables together by referencing the primary keys of each data table
Used to establish many-to-many relationships
Transactions
A unit of work
Happens in full or not at all
Acid: Atomic
Execute a transaction ->all or nothing property
If transaction fails, changes reverts back to original state
Acid: Consistency
Transaction must take the database from one consistent state to another
Acid : Isolation
Data and the transaction isolated when transaction occurs and until it is completed
Acid : Durability
Robust transaction
SQL
A declarative query language, not a procedural or imperative language
Describe what you want, no need to specify algorithms to retrieve data
DO CRUD
CRUD
Create
Read
Update
Delete
Knowledge
The body of information and facts about a specific subject.
Implies understanding and insight of the information as it applies to an environment (given context)
Data Management
A process that focuses on data collection, storage and retrieval (CRUD)
Advantages of DBMS
Improved data sharing
Improved data security (better enforcement of data security & privacy)
Better data intergration
Minimized data inconsistency
Improved data access
Improved decision making
Increased end-user productivity
Query
A question or task asked by a end user of a database in the form of SQL code
Ad-hoc Query
A spur of he moment question
Data inconsistency
A condition in which different version of the same data yield different results
Database design
The process that yields the description of the database structure
DBMS Function
- Data dictionary management
- Data storage management
- Data transformation and presentation
- Multiuser access control
- Backup and recovery management
- Data Integrity management
7, Database access languages and application programming interfaces
- Database communication interfaces
Entity
A place, person, thing, concept or event for which data can be stored
Attribute
Characteristic of an entity (aka column -> file approach)
Relationship
Association among entities
Constraint
A restriction placed on the data
Database subschema
Portion of the database “seen” by the application programs
Data Manipulation Language
Defines the environment in which data can be managed and is used to work with the data in the database
Data definition Language
Enables the database administrator to define the schema components
Tuple
A table row in a relation (aka row in file approach)
Why Databases
Data is pervasive
Data is ubiquitous
Databases allow storing and sharing of data in an easy and secure way
Islands of Information
Data in scattered data locations
Data Integrity
A condition in which the data in the database complies with all entity and referential integrity constraints
Relational Data Model
Allows designer to foccs on logical representation of the data and its relationships rather than on physical storage details
Composite Key
Key that is composed of more than one attribute
Key Attribute
An attribute that is part of a key
Candidate Key
A minimal superkey, a key that does not contain a subset of attributes
A Candidate Key can be any column or a combination of columns that can qualify as unique key in database.
Full functional dependency
a primary key chose can be a candidate key
Entity integrity
Condition in which each row in the table has its own unique identity
Primary key must contain unique values
No key attribute in the primary key can contain null
Null
Absence of any data or value
Secondary key
A key used strictly for data retrieval purposes
Flags
Special codes implemented by designers to trigger a required response
Superkey
An attribute or attributes that uniquely identify each entity in a table
can uniquely identity each attribute in the row
Select / Restrict
Unary operator
Yields values for all rows found in the table that satisfy a given condition
horizontal
Project
All values for selected attributes in the order which they are requested (vertical)
Intersect
Yields rows that appear in both tables
Table must be union compatible i.e. numeric and numeric
Difference
Yields all rows in one table that are not found in the other table (substracts one table from another)
Natural join
Links tables by selecting only the rows with common values in their common attributes