Databases Flashcards
Characteristics of a database
Data is logically related (entities with attributes and relations to other entities)
Self-describing (schema/metadata)
Data abstraction (separation of internal and external representation)
3-level way of representing a database management system
External level: the user’s view
Internal level: The technical part
Conceptual level: How the two interact
Advantages and disadvantages of using a DBMS
Advantages: control over data redundancy, consistency, standards, improved accessibility, improved accessibility and productivity (can extract more information from the same amount of data), improved maintenance, scaling
Disadvantages: complexity, size, performance (sometimes worse than single-purpose systems), single point of failure
How does the relational data model work
A relation is a table with rows and columns
A column is called an attribute (or field)
A row is called a record
Two tables are often related by a third table (e.g. Students and Courses are related by a StudentCourses table)
What is Entity-Relationship Modelling?
A way of communicating databases graphically: non-technical, free of ambiguities
Examples include crow’s feet notation and UML
Specifies attributes, relationships, constraints
What is normalisation?
The process of reducing data redundancy and minimising the number of attributes across all tables. This helps avoid update anomalies and saves space.
Creating a table with SQL
CREATE TABLE Staff (staffNo VARCHAR(5), LName VARCHAR(15), salary DECIMAL(7,2));
Inserting records into a table with SQL
INSERT INTO Staff VALUES (“SG16”, “Brown”, 8300);
Querying a table with SQL
SELECT staffNo, LName, Salary
FROM Staff
WHERE salary > 10000;
Purpose of three-level architecture
Realises data abstraction: Users can access same data but have different customised views
Users can change views without affecting other users
Internal structure can be changed without affecting users
External level
Users’ view of the database
Multiple different views of the same data (e.g. 20 Jan ‘22 vs 2022-01-20)
Adapted to specific needs
Different entities, attributes, relationships
Possibly derived/calculated/combined (e.g. getting age from date of birth)
Conceptual level
Community view of the database, shared by all users
Describes what data is stored (entities, attributes, relationships, constraints, security/integrity information)
Internal level
Physical representation of the DB
Describes how the data is stored to achieve optimal runtime performance and storage utilisation
Interface to operating system
Compression and encryption
DB schema
The description of the database, the result of the design process
Should not change
Multiple external subschemata (different views)
One conceptual schema (all entitie/attritutes/relationships)
One internal schema (low-level description, storage, indexes etc)
DB instance
The data in the database at a particular point in time, changes whenever data is edited
Logical data independence
External schemata remain the same if we change the logical structure on the conceptual level
Physical data independence
Conceptual schema remains the same if we change the internal schema (data structures, algorithms)
Conceptual design phase
Construct a first, high-level model of the data
Uses entity-relationship modelling
Identify the entities, attributes, relationships and constraints
Based on users’ requirements
Independent of any physical considerations
Logical design phase
Construct the relational data model (the actual tables, e.g. Students, Courses, StudentCourses)
Normalisation takes place
Physical design
Describe the implementation of the logical design (storage structures, access methods, security)
Optimise for performance
Relation
Table with rows and columns (e.g. Student)
Attribute
Named column in a table (e.g. surname)
Field
Tuple
A row in the table (e.g. S002, Jane, Doe)
Record
Domain of an attribute
Set of allowed values (e.g. positive integers, strings starting with capital letter)
Cell
Value of specific attribute in tuple
Degree
Number of attributes/columns
Cardinality
Number of tuples (rows)
When is a relation normalised?
Every cell has exactly one value, no repetition of identical tuples (rows)
NULL
Absence of a value
Represents an attribute value that is currently unknown or not applicable
Has to be handled in a special way
What is a key
A set of attributes whose values uniquely determine a tuple
Candidate key: Candidate Key is a collection of attributes or a single attribute (single column) that has the ability to uniquely identify records in the table (fully functionally determines all other attributes)
Primary key: Like a candidate key but there’s only one of them
Alternate key: Any candidate key which isn’t the primary key
Simple key: A key that consists of a single attribute
Composite key: A key that consists of multiple attributes
Foreign key
A set of attributes in a table that refers to the primary key of another table (e.g. mentorNo in Student)
Integrity constraints
Domain constraints: attribute values have to lie within their respective domain
Entity integrity: Attributes of the primary key cannot be NULL
Referential integrity: Foreign keys must match some primary key or be NULL
Prevents deleting a tuple if it’s referenced somewhere else
What is a view?
A virtual relation (not stored physically)
Derived from one or more base relations
Computed upon request
Main use: showing customised information, with dynamic quantities (age from date of birth)
ER modelling vs Relational Data Model
ER modelling: Object-based, visual representation, used to understand data requirements, high-level and non-technical, intuitive
Relational Data Model: Record-based, mathematically sound, used to define database schema, low-level technical, not very intuitive
Entity in ER modelling
A thing that should be explicitly represented (e.g. a student, a course, a teacher), represented as a labelled rectangle
Relationships in ER modelling
A named association between two or more entity types (e.g. student attends course, person is a citizen of country, represented as a labelled line, triangle represents reading direction
Constraints in ER modelling
min..max
* = no constraint
“Teacher 1..1 is head of 0..1 School” means that a teacher can be head of 0-1 schools, and all schools must have 1 head
Cardinality of a..b <assoc.> c..d is b:d
1:1 = one-to-one, 1:* = one-to-many, *.* = many-to-many. a and c indicate participation</assoc.>
Attributes in ER modelling
Properties of an entity type
Represented as the lower part of an entity rectangle, primary key underlined
Names use camelCaseNotation
Types of attribute
Simple: one component
Composite: multiple components, indented
Single-valued: Only one value for each entity (e.g. name)
Multi-valued: multiple values for each entity (e.g. phone number), range given as [min..max]
Derived: Computed from other attributes (e.g. age), prefixed with “/”
How to turn an ER model into a Relational Data Model
1:1 relationship: Add foreign key to (either) mandatory entity, if both are mandatory then combine into one relation
1:* relationship: Add foreign key to “many” side (e.g. teacherId as foreign key in Course)
Handling many-to-many relationships (:)
Create separate relation with two foreign keys (e.g. Attendance relation containing studentId and courseId). Connect it with two one-to-many relationships.
Entity1 A:B action C:D Entity2 => Entity1 1:1 action C:D EntityLink A:B 1:1 Entity2
Steps for constructing an ER model
- Identify entities and relationships from real world scenario
- Draw entities and connect them
- Add constraints
- Resolve many-to-many relationships
- Add attributes and define primary keys
Top-down vs Bottom-up approach to designing databases
Top-down: Entity-Relationship model
Start from data requirements, use a graphical description of the database, identify entities in the real world
Bottom-up: Start with examples of actual data in tables, analyse relationships between attributes and redesign the tables in a better way
What does it mean to redesign tables in a “better way”?
No redundancy: Wastes storage space, may lead to inconsistencies
Examples of data redundancy
Multi-valued attributes which need to be stored in separate rows
Dependencies (e.g. collegeAddress is dependent on collegeName)
Examples of anomalies
Insertion anomaly: New records need to duplicate existing data
Modification anomaly: Modifying only one value leads to inconsistencies
Deletion anomaly: Deleting one piece of data and losing other data in the process
Decomposing a table
Avoids modification anomalies
Student(id, forename, surname, collegeName, collegeAddress) => Student(id, forename, surname, college) + College(college, collegeAddress)
college in Student is a foreign key
Functional dependencies
If A and B are two sets of attributes in a relation, and each value of A is associated with exactly one value of B, then B is functionally dependent on A and A determines B (A => B)
Informally this means that if you know the value(s) of A then you know the value(s) of B, and A is the determinant of B
Types of functional dependency
Full functional dependency: B is functionally dependent on A and not on any smaller subset of A
Partial functional dependency: B is functionally dependent on A and at least one smaller subset of A
Transitive functional dependency: If A => B and B => C then C is transitively dependent on A via B
Closure of a set of dependencies
Given a set of functional dependencies F, its closure F+ is all the functional dependencies that are implied by the dependencies in F
Armstrong’s axioms for computing F+ from F
Reflexivity: if B is a subset of A, then A => B
(id, forename, surname) => (forename, surname)
Augmentation: if A => B, then A+C => B+C
(id) => (college)
therefore
(id, forename) => (college, forename)
Transitivity: if A => B and B => C, then A => C
(id) => (college) and (college) => (collegeAddress)
therefore
(id) => (collegeAddress)
(De)composition: if A => B,C then A => B and A => C and vice versa
Composition: if A=>B and C=>D, then A,C => B,D
Finding the closure F+ of F
Start with all dependencies in F
Repeatedly apply Armstrong’s axioms
Add any new dependencies discovered along the way
1st Normal Form
No repeating groups: No attributes or groups of attributes with multiple values for a single occurrence of the primary key (i.e. multi-valued variable)
If there is a fixed number of repetitions, you can use separate attributes (e.g. phoneType + phoneNumber => mobilePhone + workPhone)
General solution for converting to 1st normal form
Flattening: Fill empty cells with repeating values
Create separate relation
2nd Normal Form (only relevant for relations with composite keys)
A table is in 2nd Normal Form if it’s in 1NF and has no partial functional dependencies: every non-key attribute is dependent on the whole primary key (no partially dependent attributes)
Converting to 2nd normal form
Remove the partially dependent attributes and put then in a new relation along with a copy of their determinant
3rd Normal Form
No transitive dependencies, meaning every dependency must be fully/directly dependent on the primary key(s)
Converting to 3rd Normal Form
Remove the transitively dependent attributes and put them in a new relation along with a copy of their determinant
Two components of SQL
Data Definition Language: Defines the schema for each relation, defines the domain of each attribute, specifies integrity constraints (e.g. primary/foreign keys)
Data Manipulation Language: Inserts/updates/deletes/retrieves data from the database. The query language is the part of the DML that involves retrieval
SELECT Statement syntax
SELECT (DISTINCT) <List> FROM <table name>
Optional:
WHERE conditions
GROUP BY columnList
HAVING conditions
ORDER BY columnList (ASC/DESC)</List>
SELECT Statement
FROM = specifies the table(s) to be used
WHERE = filters the rows according to a condition
GROUP BY = forms groups of rows with the same column value
HAVING = filters groups according to a condition
SELECT = specifies which columns should appear in the output
ORDER BY = specifies order of output
“SELECT DISTINCT salary” removes all but one of the rows which share the same salary
SELECT example
SELECT staffNo, fName, lName, europoorSalary/12 AS monthlySalary
FROM Staff
WHERE europoorSalary/12 < 5000
(can’t reuse monthlySalary alias)
SELECT example with derived query
SELECT staffNo, fName, lName, salary AS yearlySalary, (SELECT yearlySalary)/12 AS monthlySalary
FROM Staff
WHERE salary/12< 1500
(SELECT yearlySalary) is a derived query that allows the alias to be reused
Constructing conditions
=, <, >, <=, =>, <> (not equal)
IN (setItem1, setItem2, …)
LIKE ‘pattern’
IS (NOT) NULL
x BETWEEN y AND z (inclusive)
AND, OR, NOT
Using the IN operator
SELECT fName, lName, salary, position
FROM Staff
WHERE position IN (‘Manager’, ‘Supervisor’)
The last line is equivalent to:
WHERE position=’Manager’ OR position=’Supervisor’
Using the LIKE operator (pattern matching)
SQL has two special pattern-matching symbols: % (equivalent to * in cmd.exe), and _ (equivalent to a single arbitrary character)
LIKE ‘H%’ means that the first character is H, but the rest can be anything
LIKE ‘H___’ means there are 4 characters and the first is H
NOT LIKE ‘%e’ means the last character is not “e”
Example: SELECT fName, lName FROM Staff WHERE fName LIKE ‘J%’
Combining conditions
SELECT fName, lName, position, salary
FROM Staff
WHERE position NOT IN (‘Manager’, ‘Supervisor’) AND (fName LIKE ‘J%’ OR salary > 10000)
ORDER BY example
SELECT fName, lName, gender
FROM Staff
ORDER BY gender ASC, fName DESC
Aggregate functions
Operates on a single column and returns a single value:
SUM(column)
AVG(column)
MIN(column)
MAX(column)
COUNT(column) (excludes NULL)
COUNT(*) (number of rows in a table, including NULL)
Example: SELECT SUM(salary)/COUNT(salary) FROM Staff
GROUP BY clause
GROUP BY is used in combination with aggregate functions
It groups rows by one or more attributes and then applies aggregate functions to each group separately
Example:
SELECT position, AVG(salary)
FROM Staff
GROUP BY position
ORDER BY AVG(salary) DESC
The “GROUP BY position” makes the AVG(salary) column display AVG(SELECT salary FROM Staff WHERE position=’BBCbullorwhatever’)
Inner join
a JOIN b ON <matching_criterion>
Merges a and b and excludes all rows which do not fulfill <matching_criterion></matching_criterion></matching_criterion>
SELECT fName, lName, street
FROM Staff JOIN Branch ON Staff.branchNo = Branch.branchNo
Lists the names of staff and the street of the branch they work at (merges Staff and Branch and excludes all rows where the branchNo values are not the same)
Inner join example with aliases
SELECT A.fName, A.lName, A.salary, B.fName, B.lName, B.salary
FROM Staff A JOIN Staff B
ON B.salary <= A.salary and B.staffNo <> A.staffNo
WHERE A.fName LIKE ‘A%’
Outer join
a LEFT JOIN b ON: Includes missing records from the left table
a RIGHT JOIN b ON: Includes missing records from the right table
a FULL JOIN b ON: Includes missing records from both tables
Subqueries
Allow you to reuse the result of one query in another query
SELECT staffNo, fName, lName, position
FROM Staff
WHERE branchNo =
( SELECT branchNo
FROM Branch
WHERE street = ‘163 Main St’ )
The inner select only returns a single value (‘B003’) so this is a single-value subquery
Multi-valued subquery
SELECT staffNo, fName, lName, position, salary
FROM Staff
WHERE salary > SOME (SELECT salary FROM Staff WHERE branchNo = ‘B003’)
This displays all staff whose salary is higher than the salary of at least one member of staff at branch B003
SOME can also be replaced with “ALL”
INSERT syntax
INSERT INTO TableName [(columnList)]
VALUES (valueList)
Example:
INSERT INTO Staff(staffNo, fName, lName, salary)
VALUES (‘SG16’, ‘Alan’, ‘Brown’, 10000)
UPDATE syntax
UPDATE TableName
SET columnName=dataValue
WHERE <condition></condition>
Example:
UPDATE Staff
SET salary = salary*1.03
DELETE syntax
DELETE FROM TableName
WHERE <condition></condition>
Example:
DELETE FROM Staff
WHERE branchNo=’B003’
Deleting a whole table is done with DROP TABLE Staff
Domain types in SQL
CHAR(n): character string of fixed length n
VARCHAR(n): character string of variable length at most n
BIT(n): bit string of fixed length n
INTEGER
SMALLINT: -32768 <= x < 32768
NUMERIC(p,d): A decimal number with at most p digits and d decimal digits
Creating a domain type
CREATE DOMAIN Postcode AS <BaseType> CHECK (<condition>)</condition></BaseType>
CREATE TABLE syntax
CREATE TABLE TableName(Attribute Domain [integrity and referential constraints]
Constraints:
PRIMARY KEY (keyName)
FOREIGN KEY (keyName) REFERENCES (otherTable)
Specifying what to do with a foreign key when the corresponding primary key is updated/deleted:
ON (UPDATE/DELETE) (CASCADE/SET NULL/SET DEFAULT/NO ACTION)
CASCADE = update foreign key/delete tuple
DROP TABLE syntax
DROP TABLE TableName RESTRICT: The drop is rejected if there are other objects whose existence depends on TableName
DROP TABLE TableName CASCADE: The drop is forced and all dependent objects are dropped recursively (default option)
ALTER TABLE syntax
Changes the schema of a relation
Add new attributes: ALTER TABLE Person ADD salary INTEGER
Remove attributes: ALTER TABLE Person DROP married CASCADE
Change attribute characteristics: ALTER TABLE Person ALTER gender SET DEFAULT ‘F’
Defining views
A view is a table that is derived from one or more other tables and is computed on demand
Example:
CREATE VIEW Developers AS
SELECT name, project
FROM Employee
WHERE department = ‘Development’
Types of database language
Declarative (like SQL): Specifies what data to retrieve
Procedural: Specifies how to retrieve data
Relational calculus and algebra
Relations are considered as sets of elements. An element is a tuple of attribute values, and each row in a table specifies one element of the set
Relational calculus (declarative): Uses set theoretic expressions (∀x and ∃y for example) to define new sets based on existing ones
Relational algebra (procedural): Uses operators to construct a new set from one or more existing ones
These two can be used to define/create the same sets. For every calculus expression there are one or more algebraic expressions.
Relational calculus
Predicate: A function that returns true/false when arguments are given
Proposition: An expression made of up of AND, OR, NOT etc.
Example: {x|P(x) ^ Q(x)} means the set of all x such that both P(x) AND Q(x) are true
OR = v, NOT = ¬
Tuple relational calculus
Variables’ values are tuples (entries)
To specify that tuple s should be in relation Staff, use as a predicate: Staff(s)
Use s.element to access named elements in tuples
∀ = for all, ∃ = there exists
{t.A|R(t) ^ t.B > 0} means “Select all A from relation R(A, B) for which the value of B is greater than 0”
Domain relational calculus
Variables’ values are attribute values (cells)
Example: {a | (∃b)(R(a, b) ∧ b > 0)} means select all A from the relation R(a, b) for which the value of B is greater than 0
Relational algebra
Procedural: specifies operations to construct a new set from existing ones
Unary operations: selection, projection, renaming
Binary operations: union, set difference, cartesian product
Selection in relational algebra
σ_condition_(R)
SQL equivalent: SELECT * FROM R WHERE <condition></condition>
Projection in relational algebra
Π_col1, col2, …_(R)
Selects the specified subset of attributes/columns from R (without duplicates)
SQL equivalent: SELECT DISTINCT col1, col2, …, FROM R
Union in relational algebra
A ∪ B
Selects tuples that are in A or B
SQL equivalent: SELECT … UNION SELECT …
Example: Π_city_(Branch) ∪ Π_city_(Property)
This lists all cities where there is either a branch or a property
Set difference in relational algebra
A - B
Selects tuples that are in A but not in B
SQL equivalent: SELECT … MINUS/EXCEPT SELECT …
Example: Π_city_(Property) – Π_city_(Branch)
This lists all cities where there is a property but not a branch
Intersection
A ∩ B
Selects tuples that are in A and in B
SQL equivalent: SELECT … INTERSECT SELECT …
Equivalent to A - (A - B) or B - (B - A)
Union compatibility
To compute A (∪/∩/-) B, the schemata of A and B must match: same number of attributes and each pair of matching attributes must have the same domain
Cartesian product in relational algebra
A x B
Generates all possible combinations of tuples in A and B by concatenating them, resulting in Ra * Rb tuples with Ca + Cb attributes
No compatibility requirements
Attributes with the same name are prefixed with the relation name
Corresponding SQL:
SELECT * FROM A, B
or
SELECT * FROM A CROSS JOIN B
Rename in relational algebra
ρ_Y(A, B, C, …)_(X) returns X renamed as Y, with attributes renamed to A, B, C, …
Division in relational algebra
A ÷ B
Only keeps tuples from A that exist in all possible combinations with B
Find combinations of A x B that are not in A
Project to attributes that are only in A
Remove them from A
Joining in relational algebra
A [INNER] JOIN B ON <criterion>
A_⨝condition_B
Equijoin: contains an = sign
Natural join: equijoin over all common attributes
Semijoin: inner join and project to attributes of the left relation</criterion>
LEFT OUTER JOIN = ⟕
RIGHT OUTER JOIN = ⟖
FULL OUTER JOIN = ⟗