Test 2 Notes Flashcards
Relational algebra
a theory that uses algebraic structures with well-founded semantics for modeling data and defining queries.
Sets
a set is a collection of elements
{a,b,c},{a,d,e,f} ← no duplications
Bags
collection of elements with duplications
Query
a function over relations; Q(R1,…,Rn) = Rresult
Because the result of a query is a relation it can be used as input to another query. → nested queries
Set Operations
Set union: It is the set of all elements in the collection.
Set intersection: it is the set of all objects that are members of both sets A and B
Set difference; S-T is the set that consists of elements of S which are not elements of T
Relational Algebra Operators
Selection (sigma): takes the horizontal subset of rows of a single table
Projection (pi): takes a vertical subset from the columns of a single table
Cross product (X)
Join (infinity): a combination of cross product, selection, and projection
Union (U)
Set diff (-)
Intersection(Upside down U)
Selection operator
Selection operator, (sigma) is to specify the rows to be retained from input relation. It takes the horizontal subset of rows of a single table that satisfies a particular condition.
Projection
(pi): vertical subset from columns of single table
Cross product(X)
Merges both tables together
Join (infinite looking thing)
combination of cross product, selection, and projection.
Union
A UNION B, this will simply add A and B together, but they must be union compatible. To combine result-set of two or more SELECT statements.
Compatibility test
A and B have same number of fields or attributes, and field in each schema has same type.
Set difference
A and B must be compatible once again, defines relation consisting of tubles in A but not in B.
Intersection
Must be compatible, gets a relation consisting of set of all tuple which occur in both A and B
Database design consists of
Requirement analysis, Architecture design, Implementation, Testing, Maintenance.
Database Design Process
Phase 1: Conceptual Model
Phase 2: Relational Model
Phase 3: Normalization
Phase 4: Physical Schema
Entities
An entity is an object or a class of real world objects having common characteristics and properties.
Attribute
An attribute is a characteristic of an entity or relationship
Relationship
an association among two or more entities
Every entity must have a primary key (T/F)
True
ERD (Entity Relationship Diagram)
a diagram that shows relationships of entity sets stored in a database. It is a conceptual diagram. Crucial to categorize what are entities in db design.
Relational schema
a set of relational tables and associated items that are related to one another
Entity-relationship model
A high level data model used to determine data elements and relationship for specified system
Relational model
a model that represents the database as a collection of relations. A relation is nothing but a table.
ER Model shows real world objects, for anyone, entities attributes relationships and arrows, conceptual or high level.
(T/F)
True
Relational Model does not show objects in tables and how they relate, for programmers, tables, columns, domains, Representational. (T/F)
False
Conceptual model procedure
Step 1: identify entities
Step 2: decide attributes for entity
Step 3: decide datatype for attribute
Step 4: build entity relationships diagrams
Step 5: Create relational schemes - design primary and foreign keys
Step 6: create physical database
Step 7: develop test cases for testing
Weak entities
cannot exist by itself. Depends on strong entity to ensure existence. Has partial discriminator key (no primary key), double rectangle.
Strong entity
does not depend on any other entity for existence. Is also known as an independent entity (Single rectangle)
Key attribute
uniquely identifies an entity
Composite attribute
combination of other attributes
Multivalued attribute
holds multiple values, for example a person can have several phone numbers
Derived attribute
Dynamic and derives from other attributes, for example persons age is derived from date of birth
Normalization
involves decomposing relations with anomalies to produce smaller well-structured relations.
Types of anomalies
Redundancy
Update anomalies
Delete anomalies
Insert anomalies
Inheritance
enables you to share attributes between entities
Functional dependency
dependent and determinant. X–> Y. X is determinant and Y is dependent
Armstrongs axioms
1: if Y ⊆ X, then X → Y (trivial FD)
2: if X→ Y and Y→ Z, then X→ Z (transitivity)
3: if X→ Y and X→ Z, then X→ YZ (composition) , 4. opposite of composition is (decomposition)
Full functional dependency
if determinant attribute functionally determines all other attributes in a table
Partial functional dependency
if a determinant attribute functionally determines only some of attributes in a table.
Full FD is not desirable in practical DB design. (T/F)
False
Advantages of FD
Avoids data redundancy.
Helps maintain quality of data in database
Helps defined meanings and constraints of database
Helps identify bad designs
Helps find facts regarding db design
1NF
a property that indicates a table in relational db satisfies following six req.
1. Each column must have unique name
2. Order of rows and order of columns doesnt matter
3. Each column must have single data type
4. No two rows can contain identical values
5. Each column must contain single value
6. Columns cannot contain repeating groups
2NF
Indicates table is in 1NF and non key attributes are fully functional dependent on primary key
3NF
is a property that indicates a table in relational db is in 2ND and constraints no transitive dependencies
Decomposition is used to
eliminate some problems of bad design like anomalies, inconsistencies, and redundancy.
Types of decomposition
Lossless decomposition: is a decomposition that does not lose any information after the decomposition. Is lossless if natural joins of all the decomp give original relation.
Dependency Preserving decomposition: in which each functional dependency X→ Y specified in F appeared directly in one of the relation schemas R1 in the decomposed relations (i=1, 2, …n)
Unnecessary decomposition: no redundancy; schema is more complicated (and uid is stored twice)
Partitioning
Database process where very large tables are divided into multiple smaller parts
Types of partitioning
Vertical partitioning
Horizontal partitioning
Advantages of database partitioning
Improve scalability
Improve performance
Provide operational flexibility
Improve availability
Vertical Partitioning
divides a table into multiple tables that contain fewer columns.
Vertical partitioning advantages and disadvantages
Advantages:
Speeds up queries that touch only a small fraction of columns
Single column can be compressed effectively, reducing disk I/O
Disadvantages:
Updates are expensive!
Need many joins to access many columns
Repeated key columns add overhead
Horizontal Partitioning (sharding)
divides a table into multiple tables that contain the same number of columns, but fewer rows.
Horizontal partitioning advantages and disadvantages
Advantages:
Efficiency, Better performance, Security
Disadvantages:
Improper horizontal partitions may cause performance issue
Transaction
A series of DB queries, the execution of a sequence of one or more operations on a shared database to perform some higher level function.
Concurrency
execution of several transactions at same time
Scheduling
mechanism that makes sure all transactions run in proper order and in proper states
Atomic action
an indivisible sequence of primitive operations without interruption
Write-read conflict
known as dirty/inconsistent read. A transaction reads value written by another transaction that has not yet committed.
Read-write conflict
known as unrepeatable read. Another transaction modifies that value in between the two reads.
Write-write conflict
called lost update. The second one to write the value overwrite the first change.
ACID Properties
Atomicity: Either all changes performed by a transaction occur or none occurs.
Consistency: A transaction as a whole does not violate integrity
Isolation: Transactions appear to execute one after the other in sequence
Durability: If a transaction commits, its changes will survive failures
State of transactions
Active: the transaction is executing
Partially Committed: A transaction enters this state after performing its final operation
Committed: after successful completion checks
Failed: when the normal execution can no longer proceed.
Aborted: after the transaction has been rolled back
Types of scheduling
Serializable Schedule and Concurrent Schedule
Serial Schedule
is a type of schedule where one transaction is executed completely before starting another transaction
A serializable schedule always leaves the database in a consistent state (T/F)
True
Concurrent Schedule
a type of DBMS schedule in which many transactions can run concurrently
Serial Schedule Advantages
Always gives guarantee for data consistency
Concurrent Schedule Advantages
Reduce waiting time. Improve responses time and throughput.
Serial Schedule Disadvantages
High average waiting time. Low response time an low throughput could be possible.
Concurrent Schedule Disadvantages
Possible data inconsistency. Some time too much context switching.
Two operations conflict if and only if
They are by different transactions, and
They are on the same object, and
And at least one of them is a write.
Locking Scheduler
Each element has a unique lock
Each transaction must first acquire the lock before reading/writing that element
If the lock is taken by another transaction, then wait
The transaction must release the lock after completing its work.
By using lock scheduler we ensure conflict-free operation
Two Phase Locking
2PL is a concurrency control method which divides execution phase of a transaction into two parts: Growing Phase and Shrinking Phase
2PL Protocol
In transactions, all lock requests must precede all unlock requests
Always obtain a S (shared) lock on object before reading.
Always obtain an X (exclusive) lock on object before riting
If a X lock on object, no other locks (S or X) can be obtained on that object
Shared lock
prohibits any other process from requesting a write lock on specified parts of file.
Exclusive lock: gives process exclusive access for writing to the specified part of file.
Recoverable schedule
a schedule is recoverable if each transaction commits only after all transactions from which it has read and has committed in a certain order.
Causes of recovery
data corruption, system failures, data center outage
Principles in recovery
Write-ahead log = A file that records every action of all running transactions
Force log entries to disk
After crash, recovery manager reads log entries and finds out exactly which transactions were in flight