Database Theory and Design Flashcards

1
Q

Define a Database (DB)

A

An organized collection of related data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define a Database Management System (DBMS)

A

Software that manages and controls access to the database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define Database application

A

Program that interacts with the database at some point in its execution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define a Database System

A

A collection of programs that interact with the database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are two ways of storing and managing data?

A
  1. File-based System
  2. Database System
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define a File-based System

A

A group of software tools that help users do different tasks. Each program stores and manages its own data separately

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define a Database System

A

Software that helps store, manage, and organize data in an easy-to-access way. It allows users to save, update, and retrieve data quickly and securely

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the advantages of a File-based approach?

A

User Control
- End user has total control of the stored data
- Can modify application or data at will

Applications and data file can be optimised for particular tasks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the disadvantages of a File-based approach?

A
  • Data duplication
  • Data dependence
  • Incompatible file formats
  • Proliferation of application programs (APPS)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Data Duplication

A

If a record is duplicated in each of the 3 files, a change to the record requires a change to all 3 files.

Integrity of student records may be compromised if the files are not all updated - which file holds the correct information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Data Dependence

A

The way data is stored affects how it can be used, making the data and the program tightly connected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are imcompatible file formats

A

If application programs are separatly implemented to data files, then the data files may be incompatible with the application program

Meaning the data files would have to be translated to fit the application program

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the Proliferation of application programs?

A

If you wanted more information out of data files than initially anticipated

You would have to add more application programs to handle new queries

Leading to the proliferation of files and application programs that each official has to handle

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does a database approach address the limitation of an application program?

A

The data is stored in a DB, which is then accessed by DBMS.

When officials need info from the DB they write queries in SQL to communicate with the DBMS which in turn gives them what they need

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a DBMS

A

Software that interacts with users’ application programs and the DB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define a Database Management System (DBMS)

A

Software system that enables the user to define, create, maintain and control access to the database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does an End-user do?

A

Run applications to perform specific database operations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does an Application Developer do?

A

Use a programming language to provide the required functionality for the end users

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What does a Database Designer do?

A

Ensures the database is well-organized, secure, and works efficiently

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does a Database Administrator (DBA) do?

A

Responsible for implementation and monitoring of the database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the 13 advantages of a DBMS?

A
  • Control of data redundancy
  • Data Consistency
  • Sharing of data
  • Improved data integrity
  • Improved Security
  • Enforcement of Standards
  • Economy of Scale
  • Balance of conflicting requirements
  • Improved data accessibility and responsiveness
  • Increased Productivity
  • Improved maintenance through data independence
  • Increased concurrency
  • Backup and Recovery services
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is Control of Data Redundancy?

A

Since all data is now stored in a single database, there are no unnecessary multiple copies of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is Data Consistency?

A

Only one copy of each data item

(changes only have to be made to 1 item, therefore if there was multiple and they werent all changed data wouldnt be consistent)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How would an organization share data to people inside it?

A

Since data is common to the organization, it’s easy for all authorized users to have access to the same data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
How would you improve data integrity in a database?
Organizational data is stored in one place, it's easy to enforce consistency rules that shouldn't be violated
26
How would you improve security in a database?
Database administrator can easily put in place security and access measures using the DBMS
27
What data standards could be enforced in a database?
Since all data is in the same database, its easy to enforce data standards such as: - Data formats - Naming conventions - Documentation standards - Update procedures - Access rules
28
How is Economy of Scale used in Databases?
Combining an entire organization's data into one database and creating a set of applications that work on this one source results in saving costs
29
In a database how would you balance conflicting requirements?
Since the DBA has an organizational-wide view of all operations, they can optimize the design and operational use of the database to provide the best use of resources as a whole as opposed to satisfying one department at the expense of another
30
What is improved data accessibility and responsiveness in a database?
Common database provides easy data access to all authorized people within orgranization Also, the DBMS provides SQL to enable end-users to easily construct ad hoc queries without detailed programming knowledge
31
How could productivity be improved in a database?
Can be improved by using a DBMS, which handles low-level file operations and provides SQL for user-friendly query development, reducing the need for extensive programming.
32
What is improved maintenance through data independence in databases?
A DBMS separates data management from application programs - so if changes are made to underyling data structure, the DBMS doesnt have to change application programs. The DBMS will provide the necessary mapping between application programa and data stored on the database
33
What is concurrency in a database?
Many DBMSs allow users to access the database simultanoeusly without any conflict
34
What are Backup and recovery services in databases?
DBMS have facilities to enable data recovery following a failure They can also automatically backup the data stored in the database
35
What are the disadvantages of a DBMS?
- Complex: require those using them to fully understand them - Size: Large piece of software that requires alot of memory to run - Cost of DBMS: Cost varies depending on if its single user or a big organization. Maintenance and operations costs also need to be factored. - Cost of conversion: Data conversion costs as well as training and recruiting costs - Performance: Written for general use, so some applications aren't as fast - Greater impact of failure: Centralization increases system vulnerability. Failure may result in everyone being affected in an organization
36
What are the 3 levels in Three-level Architecture?
External Level - The way users percieve data Conceptual Level - Provide both the mapping and desired independence Internal Level - The way DBMS and the OS perceive the data
37
What are the two objectives of Three-level Architecture?
- Data Abstraction - Data Independence
38
What is Data Abstraction?
Hide storage details and present the users with a conceptual view of the database
39
What is Data Independence?
ability to change the database's internal structure (the lower levels) without impacting the higher levels, such as the applications or user interfaces that interact with the data.
40
What is Logical Data Independence?
The ability to change the structure of a database's logical schema (the way data is organized and represented to users) without affecting the applications or queries that use that data.
41
What is Physical Data Independence?
Changing the way data is stored in a database without affecting how the data is stored or viewed by users or applications.
42
What is a Database schema?
The description of the structure of the database
43
What is a Database state?
The content of a DB at a moment in time
44
What is a Data model?
How data is stored, related, and accessed A set of concepts to describe - Data and relationships among data - Data constraints
45
What are the 3 categories of Data model?
Conceptual data model Logical data model Physical data model
46
What is a Conceptual data model?
Identifies the high-level data structure (Independent of DBMS, application programs, physical considerations)
47
What is a Logical data model?
Describes the data in terms of data structures Independent of a particular DBMS product and storage technology
48
What are examples of Logical data models?
Hierarchical Network Relational
49
What is a Hierarchical data model?
Data is organized in a tree-like structure Each node has 1 parent
50
What is a Network data model?
Data is organized as a graph One node can have more than one parent node
51
What is a Relational data model
Data is stored in tables. Each table, called a relation, consists of rows and columns, much like a spreadsheet
52
What is a Physical data model?
Describe how data is stored in the computer, representing record structures, record openings, and access paths Highly dependent on target DBMS
53
What is Motivation in Databases?
One of the most difficult aspect of database design is that designers, programmers and end-users tend to view data in different ways Need a model for communication that is non-technical and free of ambiguities
54
What are the 3 main notations used for an ER Model?
1. Chen Notation 2. Crow's feet Notation 3. UML Notation
55
What does an ER Model consist of?
- Entity - Relationship - Attribute - Constraint
56
Define an Entity.
A group of objects with the same properties
57
Define a Relationship.
Meaningful associations among two or more entities
58
Define a Degree of Relationship.
The number of participating entity types in a relationship
59
What are 4 Degree of Relationship types?
- Degree one: recursive - Degree two: binary - Degree three: ternary - Degree four: quaternary
60
What is a Multiple Relationship?
Two entities are associated through more than one relationship
61
Define an Attribute.
A property of an entity or a relationship
62
Define a Candidate Key
The minimal number of attribute(s) whose value(s) uniquely identify each entity occurrence - A candidate key cannot contain a null
63
Define a Primary Key
A unique identifier for each record in a database table
64
What are the principles of choosing a Primary key?
- Attribute length - Minimal number of attributes required - Future certainty of uniqueness
65
Do relationships have candidate keys?
NO
66
What is a Weak Entity Type?
Existence-dependent on some other entity - Each entity occurrence cannot be uniquely identified using only the attributes assoicated with that entity type
67
Define Multiplicity
The number of possible occurrences of an entity type that may relate to a single occurrence of an associated entity type through a particular relationship
68
What are the 3 relationship types?
1. One-to-one (1:1) 2. One-to-many(1 : *) 3. Many-to-many (* : *)
69
What is a one-to-one relationship?
Each record in one table is linked to exactly one record in another table.
70
What is a one-to-many relationship?
A record in one table can be linked to multiple records in another table.
71
What is a many-to-many relationship?
Multiple records in one table can be linked to multiple records in another table.
72
What is Multiplicity for complex relationships?
It's a way of showing the "how many" in a relationship between entities
73
What is Cardinality?
The maximum values for the multiplicity ranges on either side of the relationship
74
What is Participation?
The minimum values for the multiplicity ranges on either side of the relationship
75
What is the difference between Optional and Mandatory Participation?
Optional (if 0) Mandatory (if 1 or more)
76
What is a fan trap?
When two one-to-many relationships connect in a way that causes incorrect data interpretation.
77
Define Specialisation
The process of maximizing the differences between members of an entity by identifying their distinguishing characteristics
78
Define Generalisation
The process of minimizing the differences between entities by identifying their common characteristics
79
What are the participation constraints?
- Optional - Mandatory
80
What are the disjoint constraints?
- And (nondisjoint) - Or (disjoint)
81
Define Aggregation
An abstraction through which relationships are treated as higher-level entities
82
Define a Relation
A two-dimensional table that has specific characteristics
83
What are 4 terminologies for describing a table?
Attribute - A column that defines a specific piece of information about each record (row). Each attribute represents a data field Tuple - A row that represents a single record or entry. Each tuple contains specific data for all the attributes (columns) in the table. Degree - number of attributes (columns) it has Cardinality - number of tuples (rows) it contains
84
What is a Candidate Key?
The minimal set of attributes whose values uniquely identify each tuple
85
What is a Foreign Key?
An attribute, or set of attributes, within one relation that matches the primary key of another relation
86
List the key properties of Relations
A relation has a name that is distinct from all other relation names in the database Each cell of relation contains exactly one atomic (single) value Each attribute has a distinct name The values of an attribute are all from the same domain Each tuple is distinct - there are no duplicate tuples Order of attributes and tuples has no significance
87
Define Relation Schema
A named relation defined by the relation name (table name) and attributes
88
Define Relation Instance/State
A set of tuples from a relation schema
89
Define Entity Intregrity
Uniqueness : No two tuples can have identical values for candidate keys Not-Null: No attribute of a candidate key can be a NULL
90
What is Referential Intregrity
If a foreign key exists in a relation - Either the foreign key value matches a primary key value in its home relation - Or the foreign key is NULL
91
Define a Domain
Set of allowable values for one or more attributes
92
What are Domain Constraints?
Enables RDBMS to carry out checks on data errors and to determine the range of operations that can be carried out on the domain
93
Define a View
A subset of base relations
94
Define a Base Relation
A base relation is an actual table in the database, not a view or derived table Represents a real-world entity or concept
95
What are Features of Views?
Virtual/derived relation. It does not necessarily exist in the database, but can be produced upon request. Dynamic. The changes made to the base relation(s) that affect the view are immediately reflected in the view. Generated by applying appropriate relational operations (e.g., 𝜎, 𝜋, x, ⋈, ….)
96
What are Strong Entity Relation Schemas?
A schema that represents a strong entity in a database. A strong entity is an entity that can be uniquely identified by its own attributes, without needing any reference to another entity. - For composite attributes, include only the constituent simple attributes
97
Explain a One-to-Many Binary Relationship
One record in a table is related to many records in another table - The many side (child entity) receive the foreign key
98
Explain a One-to-One Binary Relationship with mandatory participation on both sides
Combine the two entities to form one relation The primary key of any one of the two relations is chosen to be the priamry key, whilst the other becomes the alternate key
99
Explain a One-to-One Binary Relationship with mandatory participation on one side
The entity with mandatory participation receives the foreign key
100
Explain a One-to-One Binary Relationship with optional participation on both sides
Foreign key in either side is fine
101
Explain a Many-to-Many Binary Relationship
Create a new relation as an associative entity Primary key: both of these primary keys of its associate entities, may add other attributes
101
Explain a Recursive Relationship
Same rules as binary relationships - (1:1), (1:*) recursive relationships are implemented by foreign keys - (*:*) recursive relationships are implemented by creating a new relation
102
Explain a Complex Relationship
involves more than two entities participating in a relationship
103
In a Superclass/Subclass Relationship what can you do?
1. Only keep the subclasses 2. Only keep the superclass 3. Keep both
104
Explain a Weak Entity 1:1 relationship
In a weak entity relationship, a 1:1 relationship means that each instance of the weak entity is associated with exactly one instance of the strong entity and vice versa. Key Points: Weak Entity: Cannot exist without a strong entity; it relies on it for its identification. Strong Entity: Has its own primary key.
105
Explain a Weak Entity 1:* relationship
In a weak entity relationship, a 1:* relationship means that each instance of the weak entity is associated with one instance of the strong entity, but one instance of the strong entity can be associated with multiple instances of the weak entity. Key Points: Weak Entity: Cannot exist independently; relies on the strong entity for identification. Strong Entity: Has its own primary key.
106
Define a Multi-valued attribute
An attribute that can have more than one value. For example, a person can have multiple phone numbers.
107
What is Normalisation?
A technique for producing a set of relations with desired properties - Remove redundancy - Remove potential for insertion, modification, deletion anomalies
108
Define an Insertion Anomaly
Insertion of data without necessary conditions having been met. Example: Storing a persons credit card in a transaction log without the transaction having ever occured
109
Define a Modification Anomaly
Occurs in databases when data is not properly organized, leading to issues with updating, inserting, or deleting records
110
Define a Deletion Anomaly
Deleting data that should be kept
111
Define a Functional dependency
If A and B are attributes of relation R, if each value of A is assoicated with exactly one value of B, B is said to be functionally dependent (FD) on A. - Denoted: A → B - A is the determinant (Note: A or B is not limited to a single attribute, could be a set of attributes)
112
What is a Transitive dependency?
If A → B and B → C is transitive dependent on A via B: A → C
113
What is a Partial functional dependency?
Given A → B, if removing some attribute(s) from A, the dependency still holds
114
What is a Full functional dependency?
If A → B and B is not functionally dependent on any subset of A
115
Define 1st normal form
Remove repeating groups
116
Define 2nd normal form
Remove partial dependencies
117
Define 3rd normal form
Remove transitive dependencies
118
Define Boyce-Codd normal form (BCNF)
Every determinant is a candidate key (Boyce-Codd Normal Form (BNCF) is also known has 3.5 Normal Form)
119
Define Algebra
In its most general form, algebra is the study of mathematical symbols and the rules of manipulating these symbols
120
What is Relational Algebra and how does it work?
An algebra whose operands are relations Relational algebra provides a mean to query the data in a database and to modify the data - One of the query language - SQL incorporates relational algebra
121
What are all the Relational Operations?
Unary Operations - Projection (𝜋) - Selection (𝜎) - Rename (𝜌) Set Operations - Union (∪) - Intersection (∩) - Set difference (−) - Cartesian Product (X) Binary Operations - Join (⋈) - Outer Join (⋉, ⋊) - Division (÷) Aggregate and Grouping operations (ℑ)
122
Define Projection (𝜋)
- 𝝅<𝒂𝒕𝒕𝒓𝒊𝒃𝒖𝒕𝒆 𝒍𝒊𝒔𝒕>(𝑹): Produce a new relation that has only some of 𝑅‘s columns - Eliminate duplicate tuples, if any.
123
Define Selection (𝜎)
- 𝝈<𝒔𝒆𝒍𝒆𝒄𝒕 𝒄𝒐𝒏𝒅𝒊𝒕𝒊𝒐𝒏>(𝑹): Produce a new relation that contains only those tuples of R that satisfy the condition. - The relation has the same schema as the original schema.
124
Define Rename (𝜌)
- 𝝆𝑺 𝑬 : Rename the expression E to S - 𝝆𝑺(𝒂𝟏,𝒂𝟐,…,𝒂𝒏) 𝑬 : Rename the relation E to S, and rename the attributes as 𝑎1, 𝑎2, … , 𝑎𝑛.
125
Define Union (∪)
A relation that contains tuples from both sets with no duplicates
126
Define Intersection (∩)
A relation that contains tuples that are only from both sets.
127
Define Set Difference (−)
A relation that contains tuples from one set but not the other
128
Define Cartesian Product (X)
A relation that is the concatenation of every tuple of Relation X with every tuple of relation Y
129
Define Join Operations (⋈)
Performs selection over the Cartesian product of two relations
130
What are the 4 types of Join Operations?
1. Theta join (Θ-join) 2. Equijion 3. Natural join 4. Outer join
131
Define Theta join (Θ-join)?
- 𝑹 ⋈<𝒋𝒐𝒊𝒏 𝒄𝒐𝒏𝒅𝒊𝒕𝒊𝒐𝒏> 𝑺: Defines a relation that contains all combinations of tuples from 𝑅 and 𝑆 that satisfy the join condition - 𝑅 ⋈<𝑗𝑜𝑖𝑛 𝑐𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛> 𝑆 = 𝜎<𝑗𝑜𝑖𝑛 𝑐𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛>(𝑅 × 𝑆)
132
Define Equijoin
Produces all the combinations of tuples from that satisfy a join condition with only equality comparisons.
133
Define Natural Join
A way to combine two tables in a database based on common columns 𝑹 ⋈ 𝑺: an equijoin of the two relations R and S over all common attributes.
134
What type of language is SQL and what does that mean?
A Declarative Language: Focuses on the what, not how Also a free-format, case insensitive language
135
What is a DCL (Data Control Language)?
Permissions and access control
136
What is a TCL (Transaction Control Language)?
Managing Transactions (commit, rollback)
137
What are 5 basic selection conditions in SQL?
1. Comparison: =, <> (!=), <, <=, >=, >, OR, AND, NOT 2. Range: BETWEEN ... AND, NOT BETWEEN ... AND 3. Set membership: IN, NOT IN 4. Pattern match: LIKE, NOT LIKE 5. IS NULL, IS NOT NULL
138
What do aggregate functions do and what are the 5 aggregate functions?
Perfom calculations on multiple rows 1. COUNT - counts rows 2. MIN - finds min 3. MAX - finds max 4. SUM - adds up values 5. AVG - finds average
139
What are 5 key clauses for the SELECT statement in SQL?
1. DISTINCT: Eliminates duplicate rows. 2. WHERE: Filters rows based on conditions. 3. GROUP BY: Groups rows sharing a common value. 4. HAVING: Filters grouped rows (like WHERE but for aggregated data). 5. ORDER BY: Specifies sorting of results.
140
What does GROUP BY do in SQL?
Groups customers by a condition For example: Groups customers by country and counts the number in each SELECT COUNT(CustomerID), Country FROM Customers GROUP BY Country;
141
What does HAVING do in SQL?
Used to filter data Filters groups where count > 5: SELECT COUNT(CustomerID), Country FROM Customers GROUP BY Country HAVING COUNT(CustomerID) > 5;
142
Define a subquery
A query inside another query For example: SELECT CustomerName FROM Customers WHERE Country IN ( SELECT Country FROM Orders WHERE TotalSales > 5000 );
143
Define every type of JOIN in SQL
- INNER JOIN: Retrieves matching rows between two tables. - LEFT JOIN: Retrieves all rows from the left table and matching rows from the right table. - RIGHT JOIN: Retrieves all rows from the right table and matching rows from the left table. - SELF JOIN: Joins a table to itself.
144
What does INSERT TO do in SQL?
Adds new rows to a table Example: INSERT INTO Customers (CustomerName, Country) VALUES ('John Doe', 'USA');
145
What does UPDATE do in SQL?
Modifies existing data in a table Example: UPDATE Customers SET City = 'Frankfurt' WHERE CustomerID = 1;
146
What does DELETE do in SQL?
Removes rows from a table Example: DELETE FROM Orders WHERE OrderDate < CURRENT_DATE - 30;
147
What do Aliases do in SQL?
Temporary names for columns or tables Example: SELECT o.OrderID AS "Order ID", c.CustomerName AS "Customer" FROM Orders o JOIN Customers c ON o.CustomerID = c.CustomerID;
148
What are all the Data Types in SQL?
Numeric Types: - INT(n), TINYINT(n), DECIMAL(p,s), FLOAT, DOUBLE, etc. String Types: - CHAR(n), VARCHAR(n), TEXT, etc. Date/Time Types: - DATE: YYYY-MM-DD format. - TIME: hh:mm:ss format. - DATETIME, TIMESTAMP: YYYY-MM-DD hh:mm:ss format.
149
What is the syntax for CREATE TABLE?
CREATE TABLE Students ( ... );
150
What is the Primary key in a table?
A column that uniquely identifies rows in a table
151
What is the Foreign key in a table?
A key that links to the PRIMARY KEY of another table
152
What is ALTER TABLE used for?
To change the contents of a table such as adding, modifying, dropping etc. Example: ALTER TABLE table_name ADD COLUMN new_column data_type [constraint];
153
What is DROP TABLE used for?
Deletes a table and all its data permanently Example: DROP TABLE table_name;
154
How does the UNIQUE constraint work?
Used to check data is not a duplicate Example: CREATE TABLE Suppliers ( supplier_id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(255) NOT NULL, UNIQUE(name) );
155
How does the CHECK constraint work?
Used to check if a condition is true Example: CREATE TABLE Parts ( part_no INT PRIMARY KEY, description VARCHAR(40), price DECIMAL(12,2) NOT NULL CHECK(price > 0), cost DECIMAL(12,2) NOT NULL CHECK(price >= cost) );
156
What does AUTO_INCREMENT do?
Automatically generates a unique number for each new row. Commonly used for primary keys Example: CREATE TABLE Persons ( PersonID INT NOT NULL AUTO_INCREMENT PRIMARY KEY, FirstName VARCHAR(255) NOT NULL, LastName VARCHAR(255) );
157
What is an Index?
A data structure that speeds up data retrieval but can slightly slow down updates.
158
Why use an Index?
1. Speeds up SELECT queries. 2. Improves performance for frequently searched columns.
159
What is a View?
A virtual table created from the result of an SQL query.
160
What are the benefits of a View?
1. Simplifies complex queries. 2. Makes data updates reflect in all linked views (updatability).
161
What does COMMIT do ?
Saves all changes permanently
162
What does ROLLBACK do?
Cancels all uncommitted changes
163
What are all the commands for DML (Data Manipulation Language) ?
1. SELECT: Retrieves data. 2. INSERT INTO: Adds new rows. 3. UPDATE: Modifies existing rows. 4. DELETE: Removes rows.
164
What are all the commands for DDL (Data Defintion Language) ?
1. CREATE TABLE: Defines a new table. 2. ALTER TABLE: Modifies table structure. 3. DROP TABLE/VIEW/INDEX: Deletes tables, views, or indexes. 4. Constraints: NOT NULL, UNIQUE, PRIMARY KEY, FOREIGN KEY.
165
What is Physical Database Design (PDD)?
- Focuses on implementing a database on storage (e.g., hard drive). - Defines how data is stored, indexed, and organized for performance.
166
What are Key Steps in Implementation in PDD?
- Create database schemas (using SQL DDL commands). - Populate data with SQL DML (e.g., INSERT). - Use indexes, constraints, and file organization for optimization.
167
What is the first step in Physical Data Design?
1. Translate Logical Data Model for Target DBMS - Design Base Relations: - Define tables, columns, primary keys, and surrogate keys - Derived Data: - Decide whether to store derived data - General Constraints: - Use CHECK, stored procedures, and triggers for validation.
168
What is the second step in Physical Data Design?
2. Design File Organizations and Indexes - File Organizations: - Choose how data is stored: heap (unordered) or sequential (ordered). - Indexes: - Use clustered indexes (physical order) or non-clustered indexes (logical order). - Disk Space Estimation: - Calculate storage needs based on table size and structure.
169
What is the third (final) step in Physical Data Design?
3. Design User Views and Security - User Views: - Define views for specific user roles and queries. - Security Mechanisms: - Use access control, encryption, and permissions.
170
What is a Surrogate key?
An artificial key added to the table With Surrogate Key: - OrderDetails(OrderDetailID, OrderID, ProductID, Quantity) Without Surrogate Key - Uses natural data to uniquely identify rows: - OrderDetails(OrderID, ProductID, Quantity)
171
When should you use surrogate keys?
- When natural keys are too complex or unstable. - To avoid duplication across tables.
172
What is Derived Data?
Data that is calculated based on other data in the database.
173
What is a constraint?
A rule that enforces data integrity in a table
174
What can be used for File Organization?
Heap Files: - Data is stored unordered (faster for insertion). Sequential Files: - Data is stored in order (faster for specific queries). MySQL Storage Engines: - InnoDB: Default engine for transactions, foreign keys, and performance.
175
What are the two types of indexes and what do they do?
Clustered Index: Physical data stored in index order, one per table Non-Clustered Index: Logical ordering, multiple per table
176
How would you control redundancy?
Denormalization: - Improves read performance at the cost of write performance. - Used when updates are rare, but reads are frequent.
177
What is a Transaction?
A transaction is a group of database operations (insert, update, delete, etc.) that are treated as a single logical unit
178
What does ACID stand for and what does each word mean ?
Atomicity - All or nothing: The transaction is fully completed, or no changes are made. Consistency - Transactions transform the database from one consistent state to another. Isolation - The operations of one transaction are invisible to others until it is completed. Durability - Once a transaction is committed, changes are permanently saved, even in the event of a system failure.
179
How does SQL support transactions?
If any operation in a transaction fails, the entire transaction is rolled back to ensure data integrity.
180
Define a Schedule
A schedule is the order of operations for concurrent transactions
181
What are the types of Schedules and how do they work?
1. Serial Schedule: - Transactions are executed one after the other. - Ensures data consistency but is less efficient. 2. Non-Serial Schedule: - Operations of multiple transactions are interleaved. - Requires careful testing for conflicts.
182
What are two desired properties of a Schedule?
- Serializability: The schedule should be equivalent to a serial execution. - Recoverability: The schedule should ensure no data is lost due to failed transactions.
183
When do Conflicting Operations Occur?
Occurs if two transactions: - Access the same data - At least one is a WRITE operation
184
How to test for Serializability?
Build a precedence graph: - Each node represents a transaction. - Draw edges for conflicting operations. Check for cycles: - If cycles exist, the schedule is not serializable.
185
Define Recoverability
A schedule is recoverable if, for every transaction T2 that reads data written by T1, T1 commits before T2
186
Define Concurrency Control
Managing simultaneous operations on a database to prevent conflicts and maintain consistency
187
What are the goals of Concurrency Control?
1. Schedule Transactions: Avoid interference between them 2. Guarantee Serializability: Ensure the results are as if transactions were executed one after the other
188
What are the techniques of Concurrency Control?
- Pessimistic: Assume conflicts are likely; prevent them proactively (e.g., locking, timestamping) - Optimistic: Assume conflicts are rare; resolve only if conflicts occur at commit time
189
What is Locking?
A mechanism to prevent multiple transactions from accessing the same data simultaneously in a conflicting way
190
Where can Locks be Applied?
Entire database, file, page, record, or even a single field value
191
What are two lock types?
- Read Lock (Shared): Multiple transactions can read, but no writes allowed. - Write Lock (Exclusive): Only one transaction can write.
192
What are the key rules for locking?
1. A transaction must issue read_lock or write_lock before reading. 2. A transaction must issue write_lock before writing. 3. Locks must be released after operations (unlock).
193
What is Two-Phase Locking (2PL)?
A locking protocol ensuring serializability by dividing a transaction into two phases: 1. Growing Phase: Acquire locks, no releases. 2. Shrinking Phase: Release locks, no new acquisitions. Benefits: Guarantees serializability
194
What is a Deadlock?
A situation where two or more transactions are stuck waiting for each other to release locks. Example: Transaction 1 locks X, waits for Y. Transaction 2 locks Y, waits for X.
195
How to handle Deadlocks?
1. Timeouts: Abort a transaction if it waits too long. 2. Detection and Recovery: - Build a Wait-For Graph (WFG): - Nodes: Transactions. - Edges: Dependencies (T1 → T2 means T1 waits for T2). - A cycle indicates a deadlock. - Abort transactions involved in the deadlock. 3. Prevention: - Lock all resources before execution (Conservative 2PL). - Use timestamps to prioritize transactions (e.g., older transactions wait for younger ones).
196
What are 2 Deadlock Prevention Techniques?
1. Wait-Die Algorithm: - Older transactions can wait for younger ones. - Younger transactions roll back if they must wait for older ones. 2. Wound-Wait Algorithm: - Older transactions force younger ones to roll back. - Younger transactions can wait for older ones.
197
What is Timestamping?
- Assigns a unique timestamp to each transaction to order operations. - Older transactions are given priority during conflicts. Key Concept: - Ensures serializability by using timestamps to determine the execution order of transactions.
198
What is Optimistic Concurrency Control?
- Assumes conflicts are rare. - Performs checks at the end of the transaction to detect conflicts.
199
What is the process of Optimistic Concurrency Control?
1. Execute transaction without locks. 2. Before committing, check if conflicts occurred. 3. If yes, roll back and restart the transaction. When to Use: Suitable for systems with low contention for resources.
200
What are the 3 types of Storage Structure (Not secondary storage types)?
Primary Storage: - Volatile (e.g., main memory, cache memory). - Data operations occur here but is lost on power failure. Secondary Storage: - Non-volatile (e.g., magnetic disk, optical disk, flash drives). - Used for storing the database permanently. Stable Storage: - Data is replicated across multiple non-volatile media (e.g., RAID). - Ensures reliability and protects against failures.
201
Define Backup
A periodic copy of the database stored securely to restore data in case of failure.
202
What are the types of Backup?
1. Full Backup: Entire database. 2. Differential Backup: Changes since the last full backup. 3. Incremental Backup: Changes since the last incremental backup.
203
Define a Recovery Technique
Restoring the database to a consistent state after a failure
204
What are the 3 types of Recovery Techniques and how do they work?
1. Log-Based Recovery: - Uses transaction logs to record database changes: - Before-Image (value before update). - After-Image (value after update). - Components: , , . 2. Checkpointing: - Marks a "safe point" to reduce recovery time. - Recovery process: - Redo: Transactions committed after the checkpoint. - Undo: Transactions active during failure. 3. Shadow Paging: - Maintains two page tables: - Shadow table (unchanged). - Current table (updated during transactions). - Advantages: No undo logs needed. - Disadvantages: High overhead for large updates.
205
What are the Key Threats for Database Security?
1. Confidentiality: Prevent unauthorized access. 2. Integrity: Ensure data is accurate and unaltered. 3. Availability: Ensure data is accessible when needed.
206
What are the Causes of Security Risks for Database Security?
- Unauthorized users. - Malicious/accidental errors by authorized users. - Errors by database administrators (DBAs).
207
Define both Database Access Control Methods
1. Discretionary Access Control (DAC): - Users specify access rights. 2. Mandatory Access Control (MAC): - Bell-LaPadula Model: - "No Read Up": Cannot read data above clearance level - "No Write Down": Cannot write data below clearance level
208
Define RAID
- Combines multiple disks for: - Improved reliability (e.g., mirroring). - Increased performance (e.g., striping). - Enhanced availability with error correction.
209
What are the 3 types of Database Security Controls?
1. Access Control 2. Views 3. RAID
210
Define the Role of the Database Administrator (DBA)
1. Manage account creation and user access. 2. Assign and revoke privileges. 3. Implement security policies. 4. Ensure data integrity and performance optimization.
211
What does DBA stand for
Database Administrator
212
Define a Distributed Database
A collection of multiple interconnected databases: - Physically spread across various locations. - Communicate via a network.
213
What are the Desired characteristics of a Distributed Database?
Distribution/Location Transparency: - Users see the database as one logical entity, no matter its physical location. Local Autonomy: - Local databases manage their own data independently, even if remote systems are down.
214
What are the types of Distributed Databases?
Homogeneous: - All sites use the same DBMS. Heterogeneous: - Different DBMSs at each site, with gateways translating between them.
215
Define Fragmentation in a Distributed Database
Dividing a table into smaller subsets (fragments) stored across different sites
216
What are the 3 types of Fragmentation in a Distributed Database?
1. Horizontal: Subsets of rows. 2. Vertical: Subsets of columns (attributes). 3. Hybrid: A mix of horizontal and vertical fragmentation.
217
Define Data Allocation
Assigning fragments to specific sites (optimal allocation is complex)
218
What are the Replication Options in a Distributed Database?
1. Full Replication: Entire database copied at all sites (better reliability). 2. Non-Redundant Allocation: Each fragment stored at one site (saves storage).
219
What are the Pros and Cons of a Distributed Database?
Advantages: 1. Mirrors organizational structure. 2. Simplifies system expansion. 3. Improves reliability and availability. 4. Enhances response time and performance. Disadvantages: 1. Complex design and implementation. 2. Ensuring security and integrity is harder. 3. Lack of standardization.
220
Define a Data Warehouse
A database designed for analysis and decision-making: - Separate from operational databases. - Stores historical, consolidated data (not updated frequently).
221
What are the 4 Key Features of a Data Warehouse?
1. Subject-Oriented: Focused on specific areas (e.g., sales, transactions). 2. Integrated: Combines data from multiple sources. 3. Time-Variant: Captures data changes over time. 4. Non-Volatile: Previous data isn’t altered when new data is added.
222
Operational Database (OLTP) vs Data Warehouse (OLAP)
Feature |Operational Database (OLTP)| Data Warehouse (OLAP) ________________________________________________________________________________ Data | Current, dynamic | Historical, static Purpose | Transaction-driven | Analysis-driven Detail | Primitive, detailed | Summarized, consolidated User | Operational users | Managerial users Decision Support | Day-to-day operations | Strategic decisions
223
How does Data Warehouse Architecture work?
ETL Process: 1. Extract: Retrieve data from source systems. 2. Transform: Convert it into a usable format. 3. Load: Store it into the warehouse.
224
Define OLAP (Online Analytical Processing)
Uses multi-dimensional data views to provide quick access to strategic insights Key Features: OLAP cubes analyze data across dimensions (time, region, product etc.)
225
What is Data Mining?
Process of discovering patterns, trends, and correlations in data using: 1. Statistical methods. 2. Mathematical models. 3. Artificial intelligence techniques.