Week 3 Notes Flashcards
Redundancy causes three kinds of anomolies
update, insertion, and deletion
Redundancy (redundant storage) stores the same data repeatedly
problem: requires more storage and more labor enter the data
update anomaly
if you change one relational variable in one area, you must change the redundant data everywhere it is stored or data will be inconsistent
deletion anomaly
you lose the last copy of some data when you delete other data
insertion anomaly
you cannot insert some data without inserting other data that may not be applicable
normalization of a relational database
process of developing tables, columns, and the dependencies of columns upon one another so they satisfy the normalization rules
each rule builds upon previous ones
a database is in the third normal form if it satisfies the first three rules
goals of normalization (similar to advantages of a relational database)
minimize redundant data, reducing inconsistent data, avoiding insert/ delete/ and update anomalies, and reducing maintenance
requirements for a table to be relational
- each table has a unique name for an owner in the database
- each column has a unique name, data type, and a domain
- each row in the table has a unique combo of values (data integrity)/ or primary key with a unique combo of values
- order of rows and columns is arbitrary
- the value at each row/column is atomic
- NO repeating columns of similar data
dependencies that cause redundancies
- NO repeating columns of similar data = first normal form (1NF)
- NO partial dependencies = (2NF)
- NO transitive dependencies = 3NF
- NO non-key attributes determine the part of the key = boyce/ codd normal form (BCNF)
- NO non-trivial multi-valued dependencies = 4NF
- NO join dependencies = 5NF
- NO remaining dependencies = domain/ key normal form (DKNF)
candidate key
minimal set of attributes which can uniquely identify a tuple
- value unique and not null for every tuple
- there can be more than one candidate key in a relation
super key
set of attributes which can uniquely identify a tuple
- adding zero or more attributes to candidate key generates super key
- candidate is super key but vice versa not true
alternate key
candidate key other than the primary key
surrogate keys
artificial key which uniquely identifies each record
- generated right before a record is inserted into a table
Normalization rule 1
no repeating groups with similar info appear in a table
- procedure: identify the repeating group, remove the repeating group, and move the repeating group into a new table (w copy of column that identifies that group)
normalization rule 2
all non-primary-key columns depend on all parts of the primary key
normalization rule 3
all non-primary columns depend only on the primary key
boyce-codd normal form (BCNF)
table can be in 3NF, but not BCNF
- requirements: 3NF form & for any dependencies, A -> B, A should be a super key
4NF
BCNF form and does not have multi-valued dependency
higher levels of normalization vs query performance
common to violate normalization rules to improve performance or convenience
- typically design is in 3NF
denormalization from 3NF to 2NF only if
- performance with realistic test data is unsatisfactory
- performance cannot be made satisfactory with hardware improvements
- performances cannot be made satisfactory by software improvements
- database designer and builder understand tradeoff (improved performance v losses)
data dictionary
table within a database that lists metadata for all the tables within the database (also may include constraints)
- can also be used as planning tool by database engineer
system catalog
data dictionary may be derived from SC.
- detailed system-created database whose tables store database characteristics
can describe all objects in the database
sometimes, SC is the DD
retrieving or selecting data affects neither raw or metadata stored but
may change format or values displayed (if data is manipulated)