databases Flashcards
top down database design
create ER diagrams (graphical description): start with a set of system requirements and identify entities and attributes then construct relational data model i.e. tables for entities
bottom up database design
start with initial tables for entities and their attributes –> redesign in a “better” way: trickier for larger databases so need a formalization of process
purpose of normalization
- relations represent real world entities
- single valued columns
- avoid redundancy
- easier to update and maintain correctly without anomalies
no redundancy
each data item is stored only once: -minimises space required, - simplifies maintenance
consequence of redundancy
- takes up unnecessary space
- when modifying data, have to modify in different places
- risk of inconsistencies
exception for redundancy
foreign keys: act as pointers
what causes redundancy
- set valued attributes –> multiple rows in corresponding table. e.g. more than one attribute for a particular entry in the table
- dependency e.g. postcode and town are dependent on each other
define redundancy
repeating data in multiple different locations
modification anomaly
failure to maintain all existing instances of a specific value
deletion anomaly
losing other values as a side effect of deleting data. e.g. deleting staff member and deleting info of their workplace simultaneously if unnormalised
insertion anomaly
when adding more data items, much more irrelevant data needs to be added. adding rows forces us to add info about other entities –> can lead to inconsistency
e.g. adding details of new surgery, with no staff, add null to staff fields
overcoming redundancy
schema refinement (decomposition): use 2+ relations to store the original data. can be done manually for smaller tables but formalisation needed for larger DB
functional dependencies specify…
- specify which are candidate, primary and foreign keys
- specify which attributes to combine in the new tables
simple key
key consists of only one attribute
composite key
key consists of several attributes
candidate key
minimal set of attributes whose values uniquely identify tuples.
functionally determines all attributes in a relation
primary key
the candidate key selected to identify rows uniquely within a table
what is functional dependency
describes relationship between two attributes in the same relation
if B is functionally dependent on A
Represent as A -> B
If we know attribute values for A, then we know the unique attribute values of B.
Each value of A is associated with exactly one value of B
determinant
set of attributes on the left hand side
dependent
set of attributes on the right hand side
full functional dependency
B is not functionally dependent on any proper subset of A, i.e. B is functionally dependent on ALL the primary key if composite key, not just individual parts
partial functional dependency
B remains functionally dependent on at least one proper subset of A
transitive functional dependency
if A->B and B->C functional dependencies exist, then the functional dependency A -> C also exists = transitive
how to determine functional dependencies
either -obvious/common sense
or -require discussion/specification from customers
closure, F+ (+ is raised), of a set of functional dependencies, F
set of all functional dependencies implied by dependencies in F. inference rules required to compute this
Armstrong’s axioms (complete and sound)
- reflexivity (if B is a subset of A, then A -> B)
- augmentation (A -> B then A, C -> B, C)
- transitivity (if A ->B and B-> C then A->C)
what does it mean by inference rules being complete
given a set X functional dependencies, all dependencies implied by X can be derived from X using these rules
what does it mean by inference rules being sound
no additional functional dependencies (not implied by X) can be derived from these rules
further rules derived from armstrong’s axioms
- decomposition: if A -> B, C then A -> B and A -> C
- union: if A -> B and A -> C then A -> B, C
- composition: if A -> B and C -> D then A, C -> B, D
proof of union rule using armstrong’s axioms
A-> B: augmentation with C: A, C -> B, C
A -> C: augmentation with A: A, A -> A, C ( = A -> A,C)
transitivity from above 2 dependencies: A -> B, C
pseudocode for computing the closure F+
F+ = F (initialisation)
repeat until F+ doesn’t change any more:
apply rules of reflexivity and augmentation to each functional dependency, f in set F+, and add these new func. dep. to F+.
for each f1, f2: if imply a func dep f3 using transitivity then add to F+
F = set of transitive dependencies
A = set of attributes in a relation
what is A+
set of all attributes that can be implied by the attributes of A using functional dependencies from F
how to compute A+
similarly to F+, start with functional dependencies of F, with attributes A on the LHS and repeatedly apply Armstrongs axioms
candidate key in terms of relational dependencies
minimal set of attributes such that A+ (under F+) includes/ which functionally determine all attributes in a relation
1NF
- no repeating groups i.e. an attribute that occurs with multiple values for a single occurrence of the primary key
- no identical rows
why not repeat columns horizontally to get into 1NF
- waste of space, not all column spaces used
- harder to query and reference specific columns
- need a fixed upper limit on number of repetitions
why not one long string with all attributes for 1NF
-harder to query
1NF: one table solution
-repeat data for each attribute so each belongs to one entry, but redundant data (i.e. fill in blanks/flatten the table)
1NF: two table solution
-place repeating data in a separate relation, with original primary key as a foreign key. iterate until no repeated groups remain
2NF
- 1NF
- no partial functional dependencies: every non-key attribute is dependent on the whole of the composite primary key
How to place in 2NF
-remove partially dependent attributes and place in separate relation with the copy of their determinant
3NF
- 2NF
- no transitive functional dependencies: no non-key attribute is transitively dependent on the primary key: all attributes are dependent on the key, the whole key and nothing but the key
How to place in 3NF
-remove transitively dependent attributes and place in new relation. take the attributes of their determinant as the primary key in the new table
how to find primary key
- intuitive observations from data
- computing functional dependency closure
how to depict functional dependencies
-dependency diagram
Limitations of file based approach
- Data duplication in different programs: different values/formats/waste of space
- Incompatibility: programs written in different languages- difficult to access others’ files
- Data dependencies: structure of the file is defined in the code of the program
- Separation and isolation of data: programs maintain their own data and users may be unaware of useful data in other programs
- Limited queries: each program has a defined purpose, for different queries, more application programs have to be designed
- Hard to recover when crashes occur
- Inefficient searching large data sets
- Many users: controlling simultaneous access = locking+ can’t hide sensitive data = inefficient
- Not synchronised: inconsistencies+lost updates
reason for limitations of file based approach
- definition of data is embedded in the application program (instead of stored separately and independently)
- no central control over the access/manipulation of data
database
a large collection of logically related data, designed to meet the needs of an organisation:
- single repository of data
- minimum duplication
- large databases have a data dictionary (metadata repository i.e. data describing the data)
DBMS
a software system that enables the user to define/create/maintain/control access to the DB
basic features of DBMS
- DDL (data definition language): defines the database- specify the structure, data types + constraints
- DML (data manipulation language): allows users to insert/delete/update/retrieve data from the database. Query language is the part of this that retrieves data. unlimited queries (adv.) Efficiency: good DBMS can answer SQL queries quickly