Exam 1 Flashcards
Functions of a Database Management System (DBMS)
Enables users to create and maintain a database. Allows for:
- Defining: specifying data types, structures and constraints
- Constructing: storing the data on some storage medium controlled by the DBMS
- Manipulating: querying the database to retrieve data, updating to reflect changes in the miniworld, generating reports
- Sharing:allows multiple users or programs access
Query vs. transaction
Query causes data to be retrieved
Transaction causes data to be read or written
Other important functions of a DBMS
Protection: System and security protection
Maintenance: Allow the system to evolve as requirements change
Database System
Database + DBMS
Database Structure
File = Collection of records
Data records = Collection of data elements
Data elements = the different pieces of data
Data types = the specific type of each piece
Database Approach vs. File-Processing Approach
File-processing refers to old practice of each user or application having its own data.
Database approach is better because:
- Self-describing nature of a database system
- Insulation between programs and data, and data abstraction (i.e. program-data independence)
- Support of multiple views of the data
- Sharing of data and multiuser transaction processing
Meta-data
The complete definition of database structure and constraints.
Structure of each file, the type and storage format for each data item, various constraints on the data.
Self-describing data
Data stored as item names and values in one structure. Used in NOSQL
Program-Data Independence
The structure of the data files is stored in the DBMS catalog, separately from the access programs (so a change to the data structure doesn’t break the access programs functioning)
Program-Operation Independence
An operation = method or function
An “interface” to an operation = the name of the method and parameters it takes.
The “implementation” of the operation = The part that takes action. Because the operation can change like a black box, you can change the workings but use the same interface. This is basically data-abstraction.
Data Abstraction allows Program-Operation Independence.
What allows program-data independence and program-operation independence?
Data Abstraction
Conceptual Representation (i.e. how the user might imagine the data or operations to work) ignores the implementation details required to actually perform operations.
Data Model
The conceptual representation of the data that hides implementation details
Catalog
Data might be stored by length and byte references.
Catalog turns that into the rows and columns we understand.
Online Transaction Processing (OLTP)
Ensure that, booking airline seats, say, an agent can only access the same seat on at a time.
Is basically concurrency control software.
Transactions and their 2 properties
Accessing or updating the database.
Must execute correctly while thousands of other transactions take place.
“Isolation” ensures each transaction holds, separate from others.
“Atomicity” ensures either all operations in the transaction takes place or none do.
Redundancy - problems with
1) Have to perform updates on multiple systems
2) Wastes extra storage space
3) “Inconsistency” - two pieces of data that should be the same are different, because one was updated differently from the other
Data Normalization
Norm of storing a giving piece of information (like name) in only one place in the database.
Controlled Redundance
“Denormalization” of data (put in multiple places) so that it can be retrieved all from one file, for example.
Can control against inconsistency by running checks the data hasn’t drifted.
Database Schema
Description of the database.
Specified during design, it is expected not to change.
Steps of database design
1) Collect and analyse requirements - Getting user-defined operations/transactions that will be applied to the database
2) Create a conceptual schema - concise description of the data requirements
3) Logical design/data model mapping - actual implementation of the database
4) Physical design - internal storage structures, file organizations indexes, access paths and physical design parameters
Entity
A thing or object with real world and independent existence.
*Attributes* describe properties of an entity. e.g. EMPLOYE entity maybe described by name, age, address, etc.
Types of Entity Attributes
Simple (or atomic) = can’t be broken down vs…
Composite = Can be divided into parts (e.g. parts of an address)
Single Valued = age (only one value at a time) vs…
Multi-valued = colors on a car (i.e. can hav multiple values at once)
Stored (birth date) vs…
Derived (age derived by birth date)
Entity Type vs. Entity Set/Collection
Entity type defines (is the name of) a collection of entities with the same attributes (the intension).
Entity set/collection is a set of actual entities that are part of the same entity type (the extension).
Key Attribute
The attributes that, together, establish the uniqueness constraint (i.e. that an entity is the one they say it is).
No concept of a “primary key” like in relational database (even though these keys can perform that role).
Meta data
The structure and constraint info that is stored in the catalog so the dbms can refer to the schema
Intension vs extension
Intension is Schema.
Extension is DB state.
Three-Schema architecture
Internal Level or Schema = physical storage and access paths.
Conceptual Level = structure of DB for users (e.g. columns and types).
View or external level = hides part of db so user only sees what’s needed.
The DBMS does “mapping’s” between these layers
Logical data independence
The ability to change the conceptual Schema (i.e. the columns and types) without changing the view or application schema.
Hard, because the application usually is referring to things as defined in the conceptual Schema.
Physical Data Independence
Capacity to change the internal Schema (e.g. how the data is laid out on the drive) without changing the conceptual schema.
Easier because data abstraction allows conceptual schema to stay the same. If conceptual schema is the same, an application doesn’t have to change.
How to achieve logical and physical data independence?
The levels don’t have to change, just the *mappings*
Data Definition Language (DDL)
If no clear levels, DDL defines conceptual and internal schemes.
If there are clear levels, it defines conceptual.
Impedance Mismatch Problem
Data structure of the DBMS not matching that of the programming language
Steps of database design
1) Collect and analyse requirements - Getting user-defined operations/transactions that will be applied to the database
2) Create a conceptual schema - concise description of the data requirements
3) Logical design/data model mapping - actual implementation of the database
4) Physical design - internal storage structures, file organizations indexes, access paths and physical design parameters
Functional Data Model
When two entities represent their relationship by having each other as attributes.
For example: DEPARTMENT might have employee and EMPLOYEE might have department.
These have to be inverses of each other.
Recursive or self-referencing relationships
When the same entity is on both sides of a relationship. For example: SUPERVISION has two employee entities involved: one for employee and one for supervisor
Add elements on diagram:
1) Add arrows to show direction of the relationship
2) Add roles names to name what role each party plays in that relationship
Cardinality Ratios
The constraint on a binary relationship that mimics the “miniworld” constraint. For example: in WORKSFOR the ratio of DEPARTMENT:EMPLOYEE of 1:N means that an employee can only work for one department but many employees can work for a department.
It is always the max number.
Participation Constraint or Minimum cardinality constraint
Specifies the minimum number of relationships an entity has to participate in.
When an entity has to have AT LEAST some relationship to another entity.
For example, every employee should be under a department.
If it is a full set of employees, it is “total participation” if it is only some (e.g. in MANAGES only some employees manage the department) it is called partial.
This is depicted by a double or bolded relationship line.
Structural Constraints
The cardinality ratio (i.e. the max relationship) or participation constraints (min relationships) are, collectively, called structural constraints
How to name entities
Entities are singular, not plural.
Entities and relationships are all caps.
Attributes start with a cap.
Roles are lower case.
Relationships read left to right, top to bottom.
What do these shapes mean (see image)?

Top to bottom:
Entity
Weak Entity
Relationship
Identifying Relationship
What do these shapes mean (see image)?

Attribute
Key Attribute
Multivalued Attribute
Composite Attribute
Derived Attribute
What do these shapes mean (see image)?

Total Participation of E2 in R
Cardinatlity Ration 1:N for E1 : E2 in R
Structural Constraint (min, max) on Participation of E in R
Weak Entity vs. Strong Entity types
A weak entity types don’t have a key attribute of its own
A regular or strong entity type does have their own key attribute
A weak entity will often have a partial key which uniquely identifies them WITHIN THEIR OWNER entity
Identifying a Weak Entity Type
Entities belonging to a weak entity type are identified by being related to another entity of a strong entity type + having their own partial key attribute
For example, DEPENDENT entity might be a DEPENDENT_OF an EMPLOYEE. Two DEPENDENTs might have the same info. But they are still separate entities because the exist only in relation to EMPLOYEE.
Weak entities are depicted by double lined boxes and their “identifying relationship” is depcited as a double lined diamond.
Given EMPLOYEE will likely only have one DEPENDENT with the same First_name, First_name is a partial key
Weak Entity Type relationships
The Strong Entity Type which owns the Weak Entity Type is called the “identifying” or “owner” entity type.
The two entities have a “identifying relationship”.
The weak entity type has “total participation constraint (existence dependency)” with respect to its identifying relationship.
ANSI/SPARC Diagram

Conceptual Schema
Describes all conceptually relevant, general, time invariant structural aspects of reality (e.g. the column names you don’t expect to change once you start gathering data).
Does not describe data representation or physical organisation and access.
External Schema
Describes a parts of the information in the conceptual schema in a form convenient to a particular user group’s view.
It is derived from the conceptual schema.
Basically creates a new view from a query.
Internal Schema
Describes how the info in the conceptual schema is physically organised for optimal performance.
For example, for fast retrieval, some data might be best represented as a trie whereas other data might be stored as a tree.
Physical Data Indepedence
Measure of how much the internal schema can change without affecting the applications.
Basically amounts to letting you change the storage of an entity go from a trie to a tree, say, without having to notify the application.
This arises because the conceptual schema doesn’t change.
Logical Data Independance
Measure of how much you can change the conceptual schema, without changing the applications that run on the database.
Basically can you change the conceptual schema without affecting the application… can but hard because a view based on a query would have to change if the conceptual schema changes.
Much harder than physical data independence.
System Metadata = critical for the DBMS
Where data came from
How data were changed
How data are stored
How data are mapped
Who owns data
Who can access data
Data usage history
Data usage statistics
Business Metadata = critical in a data warehouse
What data are available
Where data are located
What the data mean
How to access the data
Predefined reports
Predefined queries
How current the data are
Metadata Chart (Lesson 2, Lecture 35)
Watch lecture to see how parts relate

Relational Model - Data Structures
Columns have names
of columns = degree of the table
Rows = cardinality
Table name, column names and data types = schema
Schema = stable over time
Integrity vs. Consistency
Integrity = Database reflects reality well
Consistency = the database lacks internal conflicts
Entity type names have to be unique: T/F?
True
Property Values can be what form?
Lexical (words), visible (photo), audible (sound)
They are things that name other things (e.g. email is a thing but also the name of the user)
The value of an identifying property has how many instances?
It has at most one instance of the identified entity
(Every entity must be uniquely referenceable)
Composite Property
A property made up of multiple other properties.
For example: Name may be made up of first name + last name
Multi-valued property types
Shown by a double ellipse
Means the entity can have multiple values at once.
For example: user/interests could be 3 things: beer, games, reading
Partial Function
When entities on either side of a relationship don’t necessarily participate in the relationship.
For example: users linked by “MARRIED TO” might include some users not married.
Shown by single, unboldd line.
Total Function
When all entities in a set must belong in a relationship.
Shown by a bolded line in lectures
Double line in book.
Is many to many (N:M) relationship a function?
No. A function maps one independent value to a dependent value.
So in math terms, many to many is a “relationship”
Can a ternary relationship (three entitites all connected via one, central, super-relationship) be represented by a conjugation of binary relationships (a series of unilateral relationships between all three of the entities)?
No. The binary relationships each specify a relationship, but can’t specify an instance that includes all three relationships.
For example, a user can be on a team, a team at an event, a user at an event (binary relationships)… doesn’t mean that the user was on the team at that event (ternary relationship).
A double diamond means?
An identifying relationship with a weak entity type owned by a strong entity type.
The weak entity can only be identified by its ‘partial identifier’ plus the key of the strong entity type.
NOTE: If there is a chain of weak entities, each being owned by the other, ALL the partial identifies up to the strong entity are needed to identify it.
Supertypes and subtypes
A subtype has to be a member of the supertype.
A “d” denotes disjoint, meaning member of one subtype can’t also be of the other subtype.
An “o” denotes overlap, meaning a member of one subtype can also be of the other subtype.
Subtypes seem to have arrows pointing to the supertype.
Sub types and super types - Inheritance
If a supertype USER has properties email and name, all subtypes will inherit those properties.
BUT, subtypes might ALSO have their own properties, that the supertype won’t have.
Union entity
Union implies that
a) entity is a subtype of the other entity types (i.e. every member of the subtype is in ONE of the supertypes)
b) that there is NO intersection between the supertypes… the subtype doesn’t belong to BOTH supertypes… only one
Abstraction in EER
Classification - Supported
Aggregation - Not supported by EER
Generalisation - Supported
<check></check>
Why are there no EER database products?
Lack of closed query based language
(was in lecture 32, lesson 4)
Components of any data or relational model
Data structures
Constraints
Operations:
1) Algebra
2) Calculus (either tuple (SQL) or domain calculus (QBE))
Does the order of columns or rows matter?
NO! Big deal
Arrow pointing from one entity to the other?
Means that the first entity is a subset of the one it points to
Constraints - Primary Key
Means that in the entity, it can’t be Null
Also, if any other entity has that property, it must be a subset of the entity that has that as a primary key
Specialization
The sub-classes entities of a superclass are a sepcialization.
For example university might have PERSON superclass with specializations of EMPLOYEE, ALUM, STUDENT.
Specialization hierarchy vs. Lattice
Hierarchy implies every subclass only has one superclass.
Lattice implies a subclass can have more than one superclass.
For example: A STUDENTASSISTANT is a subclass under both STUDENT and EMPLOYEE.
Two constraints for specialization
Disjointness Constraint = subclasses can either be disjoint (members can only belong to one subclass) or overlapping (members can belong to multiple subclasses)
Completeness Constraint = a superclass can have total specialization (every member must be in at least one subclass) or can have partial specialization (a member need not be in any subclass)
Shared Subclass
The name for when a subclass has more than one superclass.
They have multiple inheritance in that they inherit attributes from their many superclasses.
What to do if only single inheritance is allowed?
You have to create other entities that represent the combined entities.
For example, if you have EMPLOYEE, ALUM and STUDENT you might need E, A, S, EA, ES, AS and EAS.
Union type or category
An entity can be a SUBSET from a union of other superclass entities (it might not have all items in the superclasses).
For example: three owner types of a vehicle could be BANK, PERSON, COMPANY. They can form a union called OWNER which has a circle with a “u” in it and a union symbol.
A member of OWNER will be ONE OF the superclass types.
NOTE: This is distinct from shared subclasses with multiple inheritance. In those, a member of a subclass entity belongs to ALL superclass entities (e.g. STUDENT + EMPLOYEE is TEACHINGASSISTANT)
Total Union vs. Partial Union
A total union is symbolized by the bold or double line.
It means it has all the items that are the union of its superclasses.
Thoughts on Models
- Models are means of communicating
- Users of models must have some shared knowledge
- Emphasizes some aspects
- Has a language
- Can be erroneous
- Can have aspects that don’t exist in real life (contour lines)
When to use a DBMS?
- Data intensive apps
- Persistent storage of data
- Centralized control of data
- Control of redundancy
- Control of consistency and integrity
- Multi user support
- Sharing of data
- Data documentation
- Data independence
- Control of access and security
- Backup and recovery
When not to use a DBMS?
- When investment in software, hardware, experince is too high
- When generality not needed (overhead for security or concurrency too high)
- Data and apps are simple and stable
- real-time requirements can’t be met
- Multi-user not needed
Model represents?
Perception of structures of reality.
What does data modeling comprise (from lecture)?
Fix perceptions of reality (EER good for this)
Represent the perception (Relational good for this)
Aside: we then select aspects of model to fix and abstract them.
Components of Data Model
- Data Structures
- Constraints
- Operations
- Keys and identifiers
- Integrity and consistency
- Null values
- Surrogates
Hierarchical Model
As distinct from the relational model or entity model.
Was the first database model. Was used on an IBM computer in the ’70s. Still used in XML today.
Name-based vs. surrogate-based representations
Name-based = The rows depend on names for things like email or name or address… if you get something with different names, could be a different person OR the person just changed their details
Surrogate based uses a unique identifier that can’t change for the person. So if other fields change, you still know it is the same person.