Quiz 1 Flashcards
Relational Model
Collection of Tables to represent both data and relationships
Tables
Have unique names and fixed set of columns.
AKA “Relations:
Relation
In math terminology, it is a set of tuples
Tuple
In laymens terms it is referred to as a row but can also be described as a sequence (or list) of values.
Attribute
Column
Domain of the attribute
Set of allowed values for an attribute
Atomicity
indivisible
phone numbers and addresses are not atomic
Null value
A special value that is unknown or does not exist.
e.g. in the instructor table, someone might not have a phone number and therefore that field will be left as null for that person.
They can cause difficulties when we access or update databases so we try to eliminate them as much as possible
Relation Schema
A list of attributes and their corresponding domains.
e.g.
Instructor ( ID, name, dept_name, salary)
Department (dept_name, building, budget)
Relation instances
Current values of a relation
Does order of tuples matter?
No
Should two tuples have the same value for all attributes?
No
SuperKey
A set of one or more attributes that uniquely identify a tuple
e.g. ID or Name but ID is better because someone can have the same name
If K is a SuperKey then:
Any superset of K is also a superKey. So if ID is a superkey then any combination of ID and name is also a superkey.
Candidate Key
Special kind of superkey (minimal superkey). Any proper subset of candidate key can’t be a superkey.
e.g. ID and name are possible candidate keys because all other subsets of that key is not a superkey, which in this case is just zero.
Primary Key
A candidate key
Chosen by DB designer
its attributes values are never, or very rarely changed.
e.g. SSN
and ID or unique identitifier might not be good when say two enterprisees merge and then id values may overlap but in our instructor table/relation that is the best we got…
Primary Key Relation Schema
department(dept_name, buildings, budget)
Primary key underlined and before all other attributes
Foreign Key
dept_name is a primary key for the department relation but also exists in the instructor relation so it is therefore referred to as a Foreign key. The department relation is being referenced by the instructor relation.
Referential Integrity
An attribute in referencing relation should be an attribute in referenced relation.
Foreign key constraints are most common form of referential integrity.
Relational Query languages
A language in which a user requests information from a database.
Higher level than standard programming language
Two types:
Procedural:user instructs system to perform a sequence of operations on DB. e.g. Relational Algebgra
Nonprocedural: User describes desired information without giving a procedure. e.g. SQL
Relational Algebra
A set of operations on relations
Parallel to usual Algebra
Usual Algebra examples
Operations: addition, subtraction or multiplication
Input: One or more numbers
Output: A number
e.g. z = 3x + 4y + 10
Relational Algebra examples
Operations: select, project, union, set difference, cartesian product, rename, set intersection, natural join, assignment
Input: one or more relations
Output: A relation (table)
sigma_salary>=85000(instructor)
σ (selection)
Return rows of the input relation that satisfy the predicate.
σ_salary>=85000(instructor)
π (projection)
Output specified attributes from all rows of the input relation. Remove duplicate tuples from the output.
π_ID,salary(instructor)
BowTie (Natural Join)
Output pairs of rows from the two input relations (regardless of whether or not they have the same values on common attributes)
Instructor BowTie deparment
x (cartesian product)
Ouput all pairs or rows from the two input relations (regardless of whether or not they have the same values on common attributes)
Instructor x department
U (Union)
Output the union of tuples from the two input relations.
π_name(instructor)U π_name(student)
Enitity-Relationship (ER) Model
E-R data model hekps to design DB schema.
Object based data models
Extends E-R model with encapsulation, methods/functions, Object Identity
Problems with Relation Model?
Support limited data type.
- Complex domain requires complex data types (e.g. nested record structures, multivalued attributes, and inheritance)
2. Difficulty in accessing data from C++ or Java Programs
XML
Extensible Markup Language
A great way to exchange data, not just documents
Database
Collection of structured and interrelated data
File Systems
Organizations used to store data in file systems but not anymore!
File System drawbacks
Get more complex over time. e.g. adding new majors and departments
Data redundancy: Can have the same data in two files
Data Inconsistency: May be updated in one department but not the other
Difficulty in accessing data: have to parse through everything to get the data that you want which takes time or you are always writing new programs to solve the problem
Data Isolation: Data can be scattered in different files or files could be in different formats which makes retrieving the data difficult
Integrity Problems: if new constraints are added or existing ones need to be changed then you might run into issues with aforementioned formatting and what not.
What happens if system fails during the transaction? Chaos
Concurrent-Access Anomalies
Security Problems: people having access to information that they shouldn’t be able to view. Hard to protect/limit in file Systems.
DBMS
A collection of interrelated data (database) and a set of programs that allow users to access and modify these data.
AKA Database System
Examples: Oracle, Microsoft SQL server, IBM DB2, MySQL, PostgreSQL
DBMS Top part
Users and interfaces
Who is who and who is allowed to do what
DBMS Bottom Part
Disk Storage Data is stored here Indices Data Dictionary (Metadata) Statistical Data DBMS hides details of how data stored & maintained which is important for Data Abstraction
DBMS Middle part
Query processor helps DBMS to simplify and facilitate access to data.
Storage manager is important because DB typically requires a large storage space.
Query Processor
Processes the Queries
Query: A statement requesting information
Queries are represented by a language (DB language)
Two parts:
- Data definition Language (DDL)
- Data Manipulation Language (DML)
Four components of Storage Manager
- Buffer Manager: fetches data from disk into main memory and decides what data to cache in the main memory.
- File manager: Manages space allocation on disk storages
- Authorization and integrity manager: Tests for satisfaction of integrity constraints and checks authority of users access to data.
- Transaction Manager: A unit of program that accesses and updates data items.
Ensures ACID properties
Transactions are initiated by SQL or programming language using ODBC/JDMC
ACID
Atomicity, Consistency, Isolation, and Durability
Atomicity: all or none transaction
Consistency: preserves consistency of DB, e.g. Value sum of the balances of A and B must be preserved.
Durability: After a successful funds transfer, new values of A and B must persist, even if system fails.
Isolation: Each transaction is unaware of other transactions executing concurrently in system.
For two transactions Ti and Tl, it appears to Ti that either Tj finished execution before Ti started or Tj started execution after Ti finished
DBMS structure
- More complexity
- Data redundancy
- Data Inconsistency
- Difficulty in accessing data
- Data Isolation
- Integrity Problem
- Atomicity Problem
- Concurrent-Access anomalies
- Security Problems
Database Architecture
Influenced by computer system on which DB is running.
Centralized vs Distributed:
Database is located on single computer or data is physically divided among several several computers connected by a network