Midterm Flashcards
What are key differences between databases and file systems?
Data separation and isolation – file systems store in separate files, special programs must be written; databases store all data together and has algorithms to automatically relate and combine data
Data duplication – in files, data is often duplicated, difficult to update; in databases, there is minimal data duplications
Program/data independence – in files, if a file structure is updated, all programs must be updated. In databases , the format is stored as part of the db, and application programs only need to identify data needed, DBMS does access
File compatibility – files may have to be changed to compatible format, DMBS does all the data access not programs
Ability to Represent the User’s Perspective of Data – in files, separate files don’t store relationships, Databases store relationships
Disadvantages to file systems
Application program dependency
Data duplication
Separated and isolated data
Long development times
Increased maintenance requirements
Properties/Definition of a Relational Database
Shared, Self Describing, Organized Collection of Logically-Related Persistent Data
Possible types of data that can be stored in a database
Text
Numbers
Date/Time
Graphics and images
Sound and video
Programs and other objects
What are the four components of a relational database?
Application metadata – data about the structure of forms, reports, and queries
Metadata - data about the structure of the DB
Index – an index aids in searching
User data – data tracked by users
Database Management System (DBMS)
A DBMS is a data storage and retrieval system which permits data to be stored non- redundantly while making it appear to the user as if the data is well integrated.
To implement a database schema a Database Management System must be used.
What are the three DBMS subsystems and their purpose?
Design Tools subsystem = software that is used to create tables, forms, queries, and reports; provides interfaces for built-in programming languages.
Run-Time subsystem = processes the components created w/Design Tools; executes & displays data in forms, etc. & provides access for standard
programming languages (C++, Java, etc.).
DBMS engine = software that facilitates the transfer of data between the database and the design and run-time subsystems: handles requests to access the database tables, does transaction management, record locking, backup & recovery.
Disadvantages of databases
Database Administrators (DBAs) and other specialized personnel
needed
Installation and management costs
Conversion costs
Need for explicit backup and recovery
Political problems / organizational conflict
Summary steps of database development lifecycle
- Create a Data Model from gathered requirements -Entity-Relationship Model
- Transpose Data Model into Relation(s)
- Normalize Relations
- Create the Relational Schema within Database Management System.
(Metadata) - Define forms, queries, reports, menus, if supported within DBMS or
external application programs. - Populate with User Data.
- Maintenance
DBMS independent vs dependent
independent -> HIGH-LEVEL or conceptual level (e.g., E-R diagram)
dependent -> LOW-LEVEL or physical level(e.g., file layouts/structures,
indexing, OS access strategies)
DBMS
Data Modeling
define the logical structure of a database and are the fundamental entities used to introduce abstraction in a DBMS. Data models define how data are connected to each other and how they are processed and stored inside
the system.
Used as a means of communication between database developer and client
What is an attribute?
data item that is used to describe an entity.
show by ovals and connected to the entity rectangle with a line
Store vs derived attr
Stored – actually stored
Derived – the data value is calculated (dotted circle) age (calc by DOB)
What is an entity class?
Something the user wants to track.
Typically nouns.
Represented by a rectangle with a singular name
What is an entity instance?
The actual occurrence of the data
Usually never shown
Simple vs Composite attributes
Simple – composed of one piece of data
composite – an attribute composed of other attributes (like an address)
Not shown in crow’s foot notation
Single vs mutli-valued attr
Single – one single piece of value
Multi (double circle) – an attribute that stores multiple data values (skills = carpentry, plumbing, painting)
What is identifier?
Most entities will have one
One or more attributes that identify an entity instance
If two or more, composite identifier
Can be unique or non-unique
Denoted by underline in Chen Notation
What are the elements of ER Models?
Entity
Attribute
Relationship
Chen vs Crow’s Feet
Crow’s foot notation and Chen notation are both used in entity-relationship diagrams to represent the relationships between entities in a database. The main difference between the two notations lies in the way they represent cardinality and optionality in relationships. In Crow’s foot notation, cardinality and optionality are represented using different symbols such as crow’s feet and dashes. In Chen notation, cardinality and optionality are represented using the use of different line styles and symbols. Both notations serve the same purpose of visually representing the structure of a database, but they differ in their visual representation of cardinality and optionality in relationships.
What are the three types of database relationships?
One-to-one (rare)
one-to-many
many-to-many
What is identifier?
Most entities will have one
One or more attributes that identify an entity instance
If two or more, composite identifier
Can be unique or non-unique
Denoted by underline in Chen Notation, Crow’s -> appears in the identifier area
Composite identifier
If an identifier contains two or more attributes it is called a composite identifier
Steps to creating a database
- Create a Data Model from gathered requirements.
- Transpose Data Model into Relation(s).
- Normalize Relations.
- Implement the relational schema within a Database Management System (DBMS). (Metadata)
- Define forms, queries, reports, menus, etc. (Application Metadata), if supported within DBMS or external application programs.
- Populate database with user data.
- Maintenance
¡ Change happens … expect it … plan for it as best you can!
Transposing Process
- The name of the entity becomes the name of the “relation”
- Attributes of the entity become attributes of the “relation”
(Except for any composite attributes (shows a logical connection) and derived attributes (not
physically stored))
Relational Schema Notation (one-dimensional view)
§ RELATION(attr1, attr2, attr3, …)
§ Example: A(2, 3, 4, 5) - The identifier of the entity becomes the “proposed” primary key of the “relation”, denoted by
underlining
Relation
A two-dimensional structure that holds data pertaining to an object of interest.
Columns are attributes, rows are tuples of information or an entity instance.
Natural vs Surrogate Primary Key
A Surrogate primary key exists only to create a unique identifier.
What is a primary key?
A column, or group of columns, used to identify a row.
Column values, when grouped must be unique
May not contain null
must be minimal
Two categories of SQL statements and what statements fall under that category
Data Definition Language (DDL)
Deals primarily with metadata
CREATE…
DROP…
ALTER
Data Manipulation Language (DML)
Deals primarily with user data
INSERT
SELECT
UPDATE
DELETE
SELECT Statement structure
Shows user data in a table
SELECT * | attributes_to_include
FROM tableName;
NULL Values
Means unknown or nothing
§ Does NOT mean an empty string(‘’) or a string of a space (‘ ‘)
What is normalization, and what part of the development process is normalization?
Representation level.
Normalization: decomposing relations to avoid anomalies when inserting, updating or deleting data.
Also serves to reduce redundancy of the data
Functional Dependency
Used to show a relationship between attributes within a relation
Functional Dependency: If an attribute, (A), determines the value of another attribute, (B), then A functionally determines the value of B.
Notation: A->B
Therefore if we know the value of A, the value of B can be determined
Candidate Key
A candidate key functionally determines all other attributes in the relation
Since each tuple in a relation must be unique, there is a unique candidate key for each
tuple
A relation could have several possible unique candidate keys
Each determinant (simple or composite) that is eligible to serve in the role of primary key
are called candidate keys. You decide to use one of them as the primary key
How to determine primary key
The primary key is an attribute, or group of attributes, that will uniquely identify a tuple from all other tuples in a relation
The primary key functionally determines all other attributes in the relation
There can only be one primary key for a relation
That primary key can be composed of more than one attribute (Composite Primary Key)
Before a “proposed primary key” (Step #2) can be confirmed, any candidate keys must be
determined through an examination of functional dependencies for the relation (Step 3.B)
If there is only one candidate key, it will become the primary key
If there are multiple candidate keys, then one must be selected to become the primary key
Sometimes one candidate key will be “better” than the others, other times, candidates may be “equivalent”
A primary key is AWALYS a candidate key. A candidate key may, or may not, become a
primary key.
What is a database anomaly?
A database anomaly is a fault in a database that can occur when data is stored in a flat database or due to poor planning. Anomalies can occur when data is stored multiple times unnecessarily, or when all data is stored in a single table
Three types of database anomalies
Insertion anomaly
When it is not possible to insert new data into a table due to missing information. For example, if you try to insert a supplier who has no order.
Deletion anomaly
When deleting data unintentionally results in the loss of other important data. For example, if you delete the last single order of a supplier, the data of the supplier will also be lost.
Update anomaly
When updating a single data value requires multiple rows to be modified. For example, to update a supplier, you need to update in each and every purchase order of the supplier.
What is a determinant?
Determinant – the attribute(s) on the left side of a functional dependency
What are composite keys?
Composite Candidate Key: A candidate key made up of
more than one attribute.
Composite Primary Key: A primary key made up of more
than one attribute.
Second Normal Form
A relation is in second normal form if it is in first normal form, and
all of its non-key attributes are dependent on all of the primary
key.
Violation: partial dependency – when you have a non-key attribute that is dependent on only part of a composite pk (primary key).
What is a foreign key, and how do you identify it and a reference statement?
Establishes a connection, or relationship, between relations.
When a primary key of one relation appears in any other relation, it is a foreign key.
Denoted by italics (when typing) or dashed underline (when handwriting)
CONTACT_INFO(contactID) must exist in CONTACT(contactID)
What is referential integrity?
A reference statement enforces referential integrity.
Referential Integrity – the set of values that can be used as values for a foreign key value MUST be in the set of values from the respective primary key.
Example: CONTACT_INFO(contactID) must exist in CONTACT(contactID
Third Normal Form
A relation is in third normal form if it is in 2NF and is free from
transitive dependencies.
Violation: The 2nd half of a Transitive Dependency: a non-key
attribute is functionally dependent on another non-key attribute
A -> B -> C
where A is the primary key, and B and C are not.
BCNF
A relation is in BCNF if, and only if, all determinants are a candidate key (which would include primary key).
Determinant: the attribute(s) on the left side of a functional dependency.
Candidate Key: attribute(s) that could serve as a primary key, but for some reason was not selected to serve as the primary key