Exam 1 Flashcards
Naming rules
Singular names are used for entities
Nouns are entity types, verbs are relationship types
Complex attributes
More than one residence with multiple parts of the address
What are important aspects of a model?
Only emphasizes selected aspects of the real world
Is described in some language
can be erroneous
may have features that do not exist in reality
Relationship
Defines a relationship set
Example: “WORKS_FOR” relationship
Degree of relationship is the number of participants
What may be a reason NOT to use a Database Management System?
- Need to support multiple users
- Need for persistent storage of data
- The overhead of providing security, concurrency control, recovery and integrity control
- Need for data independence
- The overhead of providing security, concurrency control, recovery and integrity control
Differences between databases and file processing
Each user maintains separate files and programs to manipulate those files
Describe the difference between schema and data
A schema describes the intention of the data. The data describes the extension. Data is information about reality.
What is atomicity? (in terms of DBMS transactions)
either everything in a transaction is executed or nothing is
What is a schema
A structure described in a formal language supported by the database management system (DBMS). The term “schema” refers to the organization of data as a blueprint of how the database is constructed
What is a naive end user?
querying and updating (posting withdrawals at a bank)
What are the workers behind the scenes?
They are not interested in the content of the database.
DBMS system designers and implementer.
Transaction
executing program that includes database access (reading or writing)
How to draw an attribute?
oval attached by straight line to entity
Ternary degree
Supplier, part, project
Program-operation independence
the user can operate on the data by using functions regardless of how the operations are implemented
What is controlling redundancy
Make sure data is not in more than one place.
DBMS interfaces
Menu based interface
Apps for mobile devices
Forms based interface
GUI
Natural language interfaces
Keyword-based database search
Speech input and output
Interfaces for parametric use
More advanced interfaces for the DBA (privileged commands)
What definition languages do SQL represent?
SQL represents DDL, VDL, and DML. SDL used to be part of it but was removed
Entity set / entity collection
All EMPLOYEE entities in the company
Entity type describes schema (intention) for a set of entities
Completeness (total) constraint (optional)
Every entity in a superclass must be a member of at least one subclass
Union Types
OWNER can be a COMPANY, a BANK, or a PERSON
It can be one of multiple entity types, but not more than one at once.
Only inherits one of the types, not all of them
Some models don’t have union sets
Total specialization
if every entity must be a member of a subclass
What is a casual end user of a database
occasionally use data but need help
NULL values
Doesn’t apply, for example: maiden name for a man or apartment number for a single family home
Both known and unknown nulls
Generalization
Generalizing several classes into a common super class Superclass is an IS-A relationship
How to draw key attribute
they have their names underlined
Program-data independence
In file processing, changes to the file require changes to the application programs. Databases have program-data independence.
Database system utilities
Loading: loading data files. Conversion tools to transfer from one DBMS to another
Backup: backup and restore
Database storage reorganization: for optimizing performance
Performance monitoring: database usage and gives stats
Recursive relationships
An employee can supervise another employee
In simplest terms, what is a database?
A model of reality
Key attributes
Key uniqueness constraint
“Name” can be unique key of COMPANY since no two companies can have the same name
PERSON can have a social security number
Some entities have more than one key
Association
Objects from several independent classes
IS-ASSOCIATED-WITH
Knowledge representation
accurately modeling a domain of knowledge by creating an ontology
Give examples of constraints:
emails must be unique
emails are not allowed to be null
birthdate must be after 1900
hometown must be a city in the US
View
a perspective of data or virtual data of the database
How to draw a partial key?
underlined with dotted line
What is a collection of related data?
A database
Total or partial category
Total category
Union of all entities in its superclass
Partial category
Subset of the union
Design choices for ER conceptual design
Relationships can start as entities, and then be elevated as the design evolves
The inverse is true too
How to draw weak attribute?
Double rectangle with entity in double diamonds
Binary degree
One employee to one department
Stored vs derived
attributes
Stored is storing the actual name
Derived is like calculating the age from the birth date
Two Types of DML?
High level DML (non-procedural) Like SQL Set at a time Called the query language End users typically use this
Low level (procedural)
Also called record-at-a-time dml
More like a programming language
Data Model
an abstract model that organizes elements of data and standardizes how they relate to one another and to properties of the real world entities.
Defining a database
specifying the data types, structures, constraints
Conceptual representation
no details about how data is stored
Name 3 data models we will use in this class
Entity-Relationship model
Relational Model
Hierarchical Model
What is physical data independence?
A measure of how much the internal schema can change without affecting the application programs.
The ability to change the schema at lower level (in ANSI/SPARC three level architecture) without affecting the schema at the next higher level is called data independence.
What is an entity?
A real world object or concept (employee or project)
Conceptual Schema?
Conceptual level describes the structure of the whole database for a group of users.
Conceptual schema is a representation of the entire content of the database.
These schema contains all the information to build relevant external records.
It hides the internal details of physical storage.
Logical data independence
Change conceptual schema without having to change external schema
View of data report isn’t changed by how the grade_report file is structured
What is the software process
Analysis, Specification, Design, Implementation
Data Abstraction
Hides lower level data structures to only see essential views.
3 Types of Schema?
External Schema: Use of data. As viewed by a user.
Conceptual Schema: Meaning of data
Internal Schema: Storage of data
External schema for user views.
Conceptual schema integrates external schemata.
Internal schema that defines physical storage structures.
Single vs multi-valued attributes
Single valued is age
Multi valued is like list of cities visited, names of siblings
Data model can be described using a class diagram using UML
Class: entity Operation (what an object can do) Domain: data type Association: relationship Link attribute: relationship attribute Class: entity Operation (what an object can do) Domain: data type Association: relationship Link attribute: relationship attribute
Database Protection
protecting against malfunction
Client Module
a program that runs on a pc or mobile device. Has a user friendly UI to access information stored in the database
Basic client/server architectures
Define specialized servers with specific functionality
For example, file server, print server, web, email servers
How does a system analyst use a database?
requirements of end users to develop specs for software
Data definition language (DDL):
Define both internal and conceptual schemas
Passed into a DDL compiler
Server Module
Handles data storage, search, etc
What does a database administrator do?
Responsible for authorizing database access, monitoring its use, and dealing with software and hardware
Deals with performance, security breaches, etc
Subtype
Employee can be general type, but they can be specialized into secretary, technician, manager, etc These are subclasses or subtypes of the employee Subclasses have specialized attributes that their parent class doesn’t have
Concurrency control
allows for multiple users to connect and update at the same time
For example, multiple airline agents booking seats at the same time. Called online transaction processing
Entity Relationship Diagram
Shows entities, attributes and attributes
Attributes are simple vs composite, single vs multi values, stored vs derived
What are the stages of database design?
Requirements specification and analysis: what is needed
Conceptual design
Logical design
Physical design
Role names and recursive relationships
Role name: what the relationship means
Employee = employee/worker Department = department/employer
What are advantages of a DBMS system?
good for data intensive apps has persistent storage of data has centralized control of data has control of redundancy has control of consistency and integrity has multiple user support has sharing of data data documentation data independence control of access and security backup and control
Manipulating a database
querying and changing data
Impedance mismatch problem?
data structures provided by the DBMS were incompatible with the programming languauge’s data structures
Constructing a database
storing data into a database
What is a key?
keys are uniqueness constraints
4 possible constraints:
Disjoint total
Disjoint partial
Overlapping total
Overlapping partial
Physical / Low Level data model
Details on how data is stored on the system.
How to draw a component attribute
they are connected together
Conceptual / High Level Data Models
Concepts close to the way many users perceive data.
Uses entities, attributes, and relationships.
What is the Database Design Process?
Requirements collection and analysis
(Know data requirements and functional requirements)
Create conceptual schema for the database (conceptual design)
Logical design, data model mapping (create a database in the DBMS system)
Physical design
(Internal storage is specified)
Identification
Some classes are uniquelty indentificable by an identifier
Identification distinguishes between classes and objects, and relates them to their real world counterparts
Based on system of unique names
How were early databases designed differently?
Early databases mixed conceptual relationships with physical storage
Grades were stored next to student record
Difficult to reorganize and to add new types of queries
Only provided programming language interfaces
Maintaining a database?
allowing the system to evolve as requirements change over time
How to draw strong entity
rectangular box
Specialization
Classifying a class into specialized subclasses
View definition language (VDL):
Specifies user views and their mappings to conceptual schemas
What does a Database Designer do?
Identifies data, designs structures to represent and store the data
Develops views of data
Participation constraint
Minimum number of relationships
Employee must work for one company
Metadata
data about the data
Entity and Attributes differences?
Entity is a thing or object in the real world
Attribute: property that describes the entity
DBMS component modules
Database and DBMS catalog are stored on disk controlled by the OS
Have own buffer management and stored data manager
Casual users will run interactive query interface
Query compiler changes them to an internal form
Query optimizer optimizes a query
Precompiler extracts DML commands
Runtime database processor: executes privileged commands, query plans, canned transactions
Concurrency control, backup and recovery systems as separate modules
Client program is on a separate computer. DBMS is on database server
Centralized DBMS architectures
On older systems, all parts of the DBMS system were on one machine
What is a surrogate?
A surrogate is a system generated artificial identifier for an entity. It represents an entity of the real world inside the database. It is immutable by the application programs. It is a unique identifier.
How to draw entity relationship diagram
Regular strong entities are drawn in rectangular boxes
Relationships are diamond shaped boxes attached to participating entities with straight lines
Attributes are shown in ovals, attached by straight line to entity
Component attributes are connected together
Key attributes have their names underlined
Weak attributes are double rectangles with entities in double diamonds
Partial key underlined with dotted line
DBMS data manipulation language (DML)
retrieval, insert, deletion, modification
Bottom-up conceptual synthesis
We have staff, faculty, alumnus, student. And we create a generalized person from this list of existing entity types
Referential integrity
A feature provided by relational database management systems (RDBMS’s) that prevents users or applications from entering inconsistent data. Most RDBMS’s have various referential integrity rules that you can apply when you create a relationship between two tables.
Which of the statements does NOT describe differences between a database schema and a database state?
- A database schema is relatively stable over time while the database state changes over time
- A database state is relatively stable over time while the database schema changes over time
- A database schema described the structure of the data that constitutes the database state
- The database state can only be changed under the structures and rules described in the database schema
- A database state is relatively stable over time while the database schema changes over time
Dynamic Aspect
A set of valid user defined operations allowed on database objects. For instance, compute GPA which can be applied to a student object.
Disjointness constraints (optional)
Subclasses of the specialization must be disjoint sets
This means an entity can be a member of at most one subclass
What is a Database Management System?
- A model of structures of reality
- A model of processes of reality
- A software system allowing you to define and use models of structures of reality
- An application system that accesses a database
- A software system allowing you to define and use models of structures of reality
Three Schema Architecture
Internal level has internal schema to describe physical storage
Conceptual level has conceptual schema
Structure of database for community of users. Hides physical storage data
External schema (view level) hides database, only shows high level data
Representational Data Model
An in between point between high and low level data models. Concepts that can be understood by end users but are not far away to the way data is stored.
What is a standalone user?
maintain personal databases
Top down conceptual refinement
Specialize a superclass into subclasses (employee into assistant, manager, etc)
What is a sophisticated end user?
engineers, scientists, analysts, etc
Issues when coming up with a database design
Many specializations and subclasses can cause a model to become cluttered, so only choose the most important ones If a subclass has few specific attributes, it can probably be merged into the superclass or added as a type Unions should be avoided unless necessary Overlapping and partial is the default constraint
Properties of a database?
Represents the real world
Coherent with inherent meaning
Built for a specific purpose
Carnality ratio
Maximum number of relationship instances an entity can participate in
Only can work for one department
1-1, 1-many, many-many
Information Flow Diagram
Information flow showing boundaries of the system. NOT Control flow. Documents connect to tasks. Never connect two documents, and never connect two tasks.`
Describe process modeling
The use of a model to represent processes of reality
processes may be embedded in program code or executed ad-hoc
External Schema?
External level is related to the data which is viewed by individual end users.
This level includes a no. of user views or external schemas.
This level is closest to the user.
External view describes the segment of the database that is required for a particular user group and hides the rest of the database from that user group.
When would we not want to use a DBMS?
When the initial investment in hardware, software, and training is too high
the generality of a database is not needed (overhead for security, concurrency, recovery, etc)
data and applications are simple and stable
real-time requirements cannot be met by it
multiple user access is not needed
Database Normalization?
Storing data in only one place in the database
Facts that can be recorded and have implicit meaning
data
Three-Tier and n-Tier architectures
Adds an intermediate layer between client and database server
Middle tier is called application server or web server
Accepts requests and translates them to database requests
Can add multiple layers for additional abstraction
Additional implications of using databases
Potential to enforce standards Reduced application development time Flexibility Available up to date information Economies of scale
Internal Schema?
Physical level describes the physical storage structure of data in database.
It is also known as Internal Level.
This level is very close to physical storage of data.
At lowest level, it is stored in the form of bits with the physical addresses on the secondary storage device.
At highest level, it can be viewed in the form of files.
The internal schema defines the various stored data types. It uses a physical data model.
How does an application programmer use a database?
Implement specs as programs, test, debug, document, maintain, etc the frontend for using databases
User-defined specialization
if it’s up to the user not the attribute (defined manually for each entity)
Classification
Assigning similar objects to object classes Instantiation: is a member of, or is an instance of Exception objects: objects that differ from other objects in a class Class properties: properties that apply to a class as a whole and not to individual objects EER cannot represent instances (only superclass, subclass)
What is an access path?
Search structure that makes the search more efficient (directory structure).
What does physical data independence allow you to do?
- Change the Conceptual Schema without changing the Internal Schema
- Change the Internal Schema without changing the Conceptual Schema
- Change the External Schema without changing the Conceptual Schema
- Change the Application Programs without changing the Internal Schema
- Change the Internal Schema without changing the Conceptual Schema
What is data consistency?
Does the database have any internal conflicts? If not, then we have good consistency.
Classification of Database Management Systems
By Data model
By Number of users
By Number of sites (centralized or distributed database)
By Homogeneous vs Heterogeneous
By Cost
By Types of access path
What is data abstraction?
Data abstraction is the reduction of a particular body of data to a simplified representation of the whole. Allows for program-data independence and program-operation independence.
Composite / molecular object
A combination of other types of classes
What is data modeling?
fixing a perception of structures of reality and representing this perception
shared subclass
Subclass with more than one superclass
Basic Operations
A standard set of behaviors for retrieval and updates.
What do database operations do
Operations support change and retrieval of data
insert a new user
select emails and birthdate
What are the actors on the scene?
Database Administrators
Database Designers
End Users
Software Developers
They are interested in the specific content of the database
How to draw a relationship?
a diamond
What is an attribute?
A property about an entity (name of person, gender, address, etc).
Tools, application environments, and communication facilities
Data dictionary system
Stores design decisions, usage standards, program descriptions (such as confluence)
Application development environment
Powerbuilder, jbuilder
Communication software
Allows people to log into databases to do work
How to classify DB by data model?
Relational data model (main) Database is a collection of tables Usually in SQL Object data model (not widespread used) Data is a collection of objects and their properties Have methods (kind of like C++ classes) Key-value storage (NOSQL) Document based, graph-based, column-based, key-based Gives unique key to each value Document data model Data is documents Graph data model Data is a series of nodes and edges Column data model Columns of rows scattered on disk network data models (legacy) Represents data as record types Hierarchical (legacy) Data is represented as a tree No standard language Object-relational DBMS Tree structured (XML) Standard on the web. Elements with tags
Attribute-defined specialization
If all subclasses in a specialization have their membership on the same attribute of the superclass (job type for example)
Partial specialization
entity doesn’t have to be part of any subclass
What are advantages of the DBMS approach?
Controlling redundancy
Access Control
Persistent storage for program objects
Storage Structures for efficient query processing
Provides for backup and recovery
Allows for multiple user interfaces
Represents complex relationships among data
Enforcing data integrity
Permitting inferencing and actions using rules and triggers
Specialization hierarchies
There may be more than one level of specialization
Employee can subclass into programmer which can subclass into front-end developer
Every subclass must only have one class/subclass relationship (single inheritance)
Storage definition language (SDL):
Specify internal schema
Mapping between the two is specified in any of them
Database Application Development Methodology Assumptions:
Business Processes are Known
Documents are Known
Tasks are Known
System Boundary is known
One database schema unifying all views can be designed (difficult)
What is a relationship?
An association between entities. For example, an employee entity works on a project entity.
What is isolation? (in terms of DBMS transactions)
Isolation: each transaction is separate
Having some redundant data to improve query performance?
Controlled redundancy
Sharing a database
allowing multiple users to access
Ontologies
An ontology is a specification or conceptualization
Conceptualization, a set of concepts and relationships that are part of reality
Specification: language used to specify the conceptualization
Aggregation
Building composite obhejts from their componenent parts
IS-A-PART-OF, or IS-A-COMPONENT-OF
Car is a combination of seats, a steering wheel, and wheels
What is a constraint?
Constraints express rules that cannot be expressed by the data structures alone.
Attributes of relationship types
Works_on can have number of hours an employee worked on a project
Generalization
If we have objects like a car and truck, we can generalize certain common features into a generalized super class
What do anteaters eat?
ants
Self Describing Data Model
Where the data storage combines the description of the data with the data itself (metadata + data)
XML, key-value stores, and NOSQL
What do end users of a database do? How do they use a database?
Their job requires access to the database for querying, updating, generating reports, etc
Value set (domain)
The range of values that may be used
DBMS?
Database management system
A computerized system that allows users to create and maintain a database
Software system for defining, constructing, manipulating, and sharing databases
Schemas
Need to distinguish between description of the database and the database itself
Database diagram: description of database
Eah object in the schema is a schema construct
The database in each moment in time is called a database state or snapshot. Current set of occurrences or instances
When we define a new database, we only define its schema (we have an empty state, no data)
Every time a database is updated, we get a new database state
Schema is not supposed to change frequently
Schema evolution: when adding new field of data (such as birthday field to a student object)
Specialization lattice
Every subclass can have more than one parent (multiple inheritance)
What is logical data independence?
Logical data independence is a measure of how much the conceptual schema can change without affecting the application programs.
The ability to change the conceptual schema, i.e, the overall database view without affecting the next higher level external schema, i.e. end users’ view of the database is logical data independence.
Data independence
Capacity to change schema at one level without having to change schema at another
Composite vs Simple attributes
Simple attributes are basic strings, values, etc
Composite attributes have multiple parts like different parts of an address (can be subdivided)
Abstraction concepts
Classification, identification, specialization, aggregation
Software Development Process when it comes to Datbases
Information Flow Diagram
Tasks
ER Diagram
Abstract code with SQL
Relational Schema
PHP code with SQL
MySQL Relational Platform
What is data integrity?
Integrity means does data reflect reality well?
Physical data independence
Ability to change internal schema without having to change conceptual schema
For example, improve performance of updating
Data model is a data abstraction that supports what
data abstraction that supports the conceptual representation. Hides storage and implementation details
Two-tier client
User interface and application programs were first moved to the user side
On server side, the query server (SQL server)
Client program communicates with server using ODBC (open database connectivity) via an API
For Java, JDBC is used
Predicate defined subclass (condition defined)
If a job type is “secretary”, they could be part of the SECRETARY subclass
Weak Entity Types
Entities that do not have key attributes of their own
Depends on a strong entity to exist
Partial key: can uniquely identify weak entities that are related to the same owner