DataBases Flashcards

Question 1

Q

Definition of a database

Answer

A

an organised collection of
related data

Question 2

Q

Definition of DBMS

Answer

A

Database Management System (DBMS):
software that manages & controls access to
the database

Question 3

Q

What is a database application?

Answer

A

Database Application: program that interacts
with the database at some point in its
execution

Question 4

Q

What is a database system?

Answer

A

Database System: a collection of programs that interact with the database

Question 5

Q

Define a file based & database system and the difference between them

Answer

A

File-based system: A collection of application programs that perform services for the end-users. Each program defines and manages its own data.
* Database system: A shared collection of logically related data and its description, designed to meet the information needs of an organisation.

Question 6

Q

What are advantages and disadvantages of a file-based approach?

Answer

A

Advantages:
* User control:
― End user has total control of the stored data
― Can modify application or data at will
* Applications and data file can be optimised for particular
tasks

Disadvantages:
* Data Duplication
* Data Dependence
* Incompatible file formats
* Proliferation of application programs (APPS)

Question 7

Q

Define data duplication

Answer

A

A record is duplicated in > 1 files
― A change in details requires updates to all relevant files
* Integrity of records may be compromised if the files are not all updated
― i.e. Which file holds the correct information?

Question 8

Q

Define data dependence and the two types

Answer

A

Definition:

Data dependence: Refers to the relationship between database operations based on their reliance on each other’s results.
Types:

RAW Dependence (Read-after-Write):

Operation depends on the result of a preceding write operation.
Example: Retrieving data after it has been updated.
WAR Dependence (Write-after-Read):

Write operation depends on the result of a preceding read operation.
Example: Updating data based on a prior retrieval.

Question 9

Q

What is the proliferation of apps and name a problem that occurs with it

Answer

A

In time, users might want to get more information from their data files than what they originally anticipated when they initially wrote the programs
* To do so they would need to add more application programs to handle the new queries
* This leads to a proliferation of files and application programs which each user has to handle

This can be addressed via the use of DBMS

Question 10

Q

Name 5 advantages of using the DBMS

Answer

A

Control of data redundancy
Data Consistency
Sharing of Data
Improved data integrity
Improved Security
Enforcement of Standards
Economy of Scale
Balance of conflicting requirements
Improved data accessibility and responsiveness
Increased Productivity
Improved maintenance through data independence
Increased concurrency
Backup and Recovery services

Question 11

Q

explain how the DBMS helps with the economy of scale

Answer

A

Combining the entire organisation’s data into one database and
creating a set of applications that work on this one source
results in cost savings
― For instance, only one information system department and one set of computer systems in the organisation will be responsible for the
database as opposed to separate information systems in the different departments
― This is much cheaper than having many separate small departmental information systems units

Question 12

Q

explain how the DBMS helps with improved maintenance through data
independence

Answer

A

A DBMS separates data management from the application
programs
― Hence if changes are made to the underlying data structure, the
DBMS can handle this without changing application programs
The DBMS will provide the necessary mapping between
application program and data stored on the database (using
system catalogue)

Question 13

Q

What are some disadvantages of DBMS (6 total)

Answer

A

Complexity
― DBMSs are complex pieces of software and hence require those
operating and using them to fully understand them if they are to be used to the best advantage
Size
― The complexity and breadth of functionality of the DBMS makes it a large piece of software that requires a large amount of memory to run efficiently
Cost of DBMS
― DBMS cost varies depending on whether it is for a single user or a big organisation. Maintenance and operations costs also need to be factored in. For large organisations with lots of data, hardware costs (e.g. hard disks, high performance processors, memory) are
considerable
Cost of conversion
― When converting from a file-based system to a DBMS, there are data conversion costs, hardware and software procurement costs, as well as costs of training and recruiting personnel
Performance
― The DBMS is written for general use. Hence some applications may not run as fast as customised file-based applications
Greater impact of failure
― Centralisation increases system vulnerability. Failure of the DBMS may result in everyone being affected within an organisation

Question 14

Q

What are the three levels present in three level database architecture(ANSI-SPARC)

Answer

A

External level
Conceptual level
Internal level

Question 15

Q

Describe the External level of databases

Answer

A

It deals with how data is presented to and accessed by end users or applications.
Users at this level define their individual views of the database, specifying what data they need and how they want it to be displayed.
Multiple external views can exist, tailored to the specific needs and requirements of different user groups or applications.
Changes to the external level, such as modifications to views or access patterns, do not affect the underlying database structure.

Question 16

Q

Describe the conceptual level

Answer

A

The conceptual level represents the abstract, logical structure of the entire database.
It defines the relationships among data elements, the constraints, and the integrity rules that must be maintained.
The schema at this level provides a comprehensive, unified view of the entire database, independent of any specific user’s perspective.
Modifications at the conceptual level, such as changes to the data model or structure, impact all external views that rely on it.
The conceptual level acts as a bridge between the external and internal levels, providing a logical representation that abstracts the physical storage details.

Question 17

Q

Describe the Internal level

Answer

A

The internal level is the lowest layer, dealing with the physical storage and retrieval of data.
It is concerned with how data is stored, indexed, and processed at the machine level.
The internal schema defines the storage structures, access paths, and techniques used for efficient data retrieval.
Changes at this level, such as modifications to indexing or storage methods, do not affect the external or conceptual levels.
The internal level provides a level of abstraction over the physical storage details, allowing changes in storage technology without impacting the higher levels.

Question 18

Q

Define data abstraction

Answer

A

Hide storage details and present the users with a conceptual view of the database
Data abstraction in databases refers to the process of simplifying complex details and presenting a high-level view of data to users and applications. It involves creating abstract representations of data that hide the implementation details and complexities of the underlying database structure

Question 19

Q

Define data independence

Answer

A

Data independence in databases refers to the ability to modify the database schema or organization without affecting the applications that use the data.

Question 20

Q

What is the difference between physical and logical data independence

Answer

A

Physical data independence hides how data is stored, allowing changes to storage structure without affecting users or applications while Logical data independence shields users from changes in the database structure, permitting modifications without disrupting user interactions.

Question 21

Q

what is a database schema

Answer

A

Database schema: the description of the structure of the
database
Database schema: the description of the structure of the
database

Question 22

Q

What is a conceptual data model?

Answer

A

Identifies the high level data structure
* Independent of DBMS, application programs, physical considerations.

Question 23

Q

What is a Logical data model

Answer

A

Describes the data in terms of data structures such as
― Records (nodes) and sets (edges)
e.g., Hierarchical model, network model
― Relational tables and columns
E.g., Relational model
― Object-oriented classes
E.g., Object-oriented model, object-relational model
Independent of a particular DBMS product and storage
technology.

Question 24

Q

What is a physical data model

Answer

A

Describe how data is stored in the computer, representing
record structures, record orderings, and access paths.
Highly dependent on the target DBMS

Question 25

Q

Define a relationship and what a degree of relationship means

Answer

A

Relationship (type): meaningful associations among
two or more entities.
* Degree of Relationship: The number of participating
entity types in a relationship e.g Binary if 2, ternary if three and Quaternary if 4

Question 26

Q

What is a recursive relationship?

Answer

A

Degree one: recursive
― A relationship between occurrences of the same entity type
― E.g., AcademicStaff (professor) supervises AcademicStaff (lecturer).

Question 27

Q

Define a candidate Key and a primary key

Answer

A

Candidate Key: The minimal number of attribute(s) whose
value(s) uniquely identify each entity occurrence
― A candidate key can not contain a null

Primary Key: The candidate key that is selected to uniquely identify
each occurrence of an entity type
* Principles of choosing the primary key:
― Attribute length
― Minimal number of attributes required
― The future certainty of uniqueness

Question 28

Q

What is a strong/weak entity type?

Answer

A

Weak Entity Type: existence-dependent on some other
entity
― Each entity occurrence cannot be uniquely identified using only the attributes associated with that entity type

strong is not that ig

Question 29

Q

Define multiplicity

Answer

A

Multiplicity: the number of possible occurrences of an
entity type that may relate to a single occurrence of an
associated entity type through a particular relationship

e.g that 1..* stuff

Representation of Multiplicity
Constraint
Meaning
0..1 Zero or one entity occurrence
1..1 (or just 1) Exactly one entity occurrence
0..* (or just ) Zero or many entity occurrences
1.. One or many entity occurrences
3..7 Minimum of 3 up to a maximum of 7entity
occurrences
1, 5, 10-12 1 or 5 or 10 to 12 entity occurrences

Question 30

Q

What is cardinality and Participation in multiplicity ?

Answer

A

Cardinality: the maximum values for the multiplicity ranges
on either side of the relationship.
― One-to-one (1:1)
― One-to-many (1:)
― Many-to-many (:*)

Participation: the minimum values for the multiplicity
ranges on either side of the relationship
― Optional participation (if 0)
― Mandatory participation (if 1 or more)

Question 31

Q

Which side of the relationship does participation appear on?

Answer

A

The participation of entities appears on the opposite side of the relationship - google if u confused it’s kinda wierd

Question 32

Q

What is a fan trap and when may it occur?

Answer

A

A fan trap is a type of join path between three tables when a “one-to-many” join links a table which is in turn linked by another “one-to-many” join. The fanning out effect of “one-to-many” joins can cause incorrect results to be returned when a query includes objects based on both tables.

Question 33

Q

What is a chasm trap and when may it occur?

Answer

A

A chasm trap is a type of join path between three tables when two “many-to-one” joins converge on a single table, and there is no context in place that separates the converging join paths.

Question 34

Q

What are 9 steps of conceptual database design

Answer

A

[Step 1] Identify entity types
[Step 2] Identify relationship types
[Step 3] Identify and associate attributes with entity and
relationship types
[Step 4] Determine primary keys for entity types
[Step 5] Identify the multiplicity constraints
[Step 6] Consider use of enhanced modeling (optional)
[Step 7] Check model for redundancy, traps
[Step 8] Validate conceptual model against user
transactions
[Step 9] Review conceptual data model with user

Question 35

Q

Explain specialization in terms of Entities

Answer

A

Specialisation: The process of maximizing the differences between members of an entity by identifying their distinguishing characteristics. e.g making more entities from one(Manager, SalesPersonnel and Secretary from Staff)

Question 36

Q

Explain generalization in terms of Entities

Answer

A

Generalisation: The process of minimizing the differences between entities by identifying their common characteristics.

Question 37

Q

What are some properties of relations

Answer

A

A relation has a name that is distinct from all other relation
names in the database
Each cell of the relation contains exactly one atomic (single) value
Each attribute has a distinct name
The values of an attribute are all from the same domain
Each tuple is distinct – there are no duplicate tuples
The order of attributes has no significance
The order of tuples has no significance

Question 38

Q

What is a relation schema and a relation instance

Answer

A

Relation schema: a named relation defined by the relation name
(table name) and attributes.
― 𝑅(𝐴1, 𝐴2, … , 𝐴𝑛)
Student (stdID, stdEmail, stdFstName, stdSecName, gender, programCode)
Primary Key: stdID, Alternate Key: stdEmail
Foreign Key: programCode references Program(programCode)

Relation instance/state: a set of tuples from a relation schema.
― r(R)
^This is like an actual database table

Question 39

Q

Name the three Integrity constraints put on attribute values

Answer

A

― Entity integrity
― Referential integrity
― Domain constrain

Question 40

Q

Define Entity Integrity

Answer

A

― Uniqueness: No two tuples can have identical values for candidate keys.
― Not-Null: No attribute of a candidate key can be a NULL.
* Null: A value of an attribute is currently unknown or not applicable
for this tuple.

Question 41

Q

Define referential integrity

Answer

A

Referential integrity: If a foreign key exists in a relation,
― either the foreign key value matches a primary key value in its home relation
― or the foreign key value is NULL

Question 42

Q

Define Domain constraints

Answer

A

Domain Constraints: enables the RDBMS to carry out checks
on data errors and to determine the range of operations
that can be carried out on the domain
― E.g., if the attribute Age is specified to be in the range16-30, any number outside would give an error
― E.g., If the attribute Name is to hold a set of characters, then any
multiplication or division operations on the attribute values are illegal

Question 43

Q

Define views

Answer

A

View: a subset of base relation(s)
Base relation: A named relation
corresponding to an entity in the conceptual
schema, whose tuples are physically stored
in the database

Question 44

Q

Name some features of views

Answer

A

Virtual/derived relation. It does not necessarily exist in the
database, but can be produced upon request.
* Dynamic. The changes made to the base relation(s) that affect the
view are immediately reflected in the view.
* Generated by applying appropriate relational operations (e.g., 𝜎,
𝜋, x, ⋈, ….)

Question 45

Q

What is the purpose of views in databases

Answer

A

Security mechanism
Customize user’s need
Simplify complex operations on the base relations

Question 46

Q

How are composite attributes represented in relation schemas

Answer

A

For composite attributes, include only the constituent simple attributes.
So for example staffName which is a composite attribute containing staffFName and staffLName would be ignored and the relation schema would look like this:
AcademicStaff(staffID, staffFName,staffLname, academicPosition, gender,
DOB)

Question 47

Q

What is Normalisation and what is the purpose of it?

Answer

A

Normalization: A technique for producing a set of relations with
desired properties.
― remove redundancy
― remove potential for insertion, modification, deletion anomalies

Question 48

Q

If A and B are attributes of relation R, when is B functionally dependent on A? Out of A and B which one is the determinant

Answer

A

if each value of A is associated with exactly one value of B, this is denoted A -> B
In this case A is the determinant

Question 49

Q

If A -> B and B -> C where is the transitive dependency

Answer

A

C is transitively dependent on A via B … A -> C

Question 50

Q

Given A -> B what would make this a partial functional dependency?

Answer

A

if by removing some attribute(s) from A, the dependency still
holds.

Question 51

Q

Given A -> B what would make this a full functional dependency

Answer

A

B is not functionally dependent on any subset of A.

Question 52

Q

What should be done to a database to put it into first normal form?

Answer

A

Repeating groups must be removed, a primary key must exist and duplicate rows must be removed. Additionaly there must be no order dependency

Question 53

Q

What should be done to a database to put it into second normal form?

Answer

A

All partial dependencies must be removed

Question 54

Q

Define a partial dependency

Answer

A

Partial Dependency occurs when a non-prime attribute is functionally dependent on part of a candidate key.

Question 55

Q

What should be done to a database to put it into third normal form

Answer

A

All transitive dependencies must be removed

Question 56

Q

Define a transitive dependency

Answer

A

A transitive dependency refers to some non-prime attribute other than the candidate key that depends on another non-prime attribute that is dependent entirely on the candidate key.
^this is hard to remember but it’s not that hard to actually do

Show_ID Telecast_ID Telecast_Type CD_Cost ($)
F08 S09 Thriller 50
F03 S05 Romantic 30
F05 S09 Comedy 20

In this case Telecast_Type needs to be removed cos it depends on telecast ID which in turn depends on Show ID

Question 57

Q

What needs to occur for a database to be in Boyce-Codd Normal Form(BCNF)

Answer

A

Every determinant must be a candidate key
or
Every non-prime attribute must be functionally dependent on the entire primary key

Question 58

Q

What are the Unary operations that are represented by these symbols 𝜋, 𝜎, 𝜌

Answer

A

― Projection (𝜋 )
― Selection (𝜎)
― Rename (𝜌)

Question 59

Q

What are the set operations that are represented by these symbols
∪, ∩, −, X

Answer

A

― Union (∪)
― Intersection (∩)
― Set difference (−)
― Cartesian Product (X)

Question 60

Q

What are the Binary operations represented by these symbols
⋈, (⋉, ⋊), ÷

Answer

A

― Join (⋈)
― Outer Join (⋉, ⋊)
― Division (÷)

Question 61

Q

How is the projection operator used?𝜋

Answer

A

𝝅<𝒂𝒕𝒕𝒓𝒊𝒃𝒖𝒕𝒆>(𝑹): Produce a new relation that has only some of
𝑅‘s columns</𝒂𝒕𝒕𝒓𝒊𝒃𝒖𝒕𝒆>

e.g
𝝅ID, Name, Salary(Lecturer)

Removes duplicate tuples

Question 62

Q

how is the selection operator used? 𝜎

Answer

A

𝝈<𝒔𝒆𝒍𝒆𝒄𝒕>(𝑹): Produce a new relation that contains only
those tuples of R that satisfy the condition.</𝒔𝒆𝒍𝒆𝒄𝒕>

e.g
𝝈𝒅𝒆𝒑𝒕𝑰𝑫=′𝟎𝟐′∧ 𝒂𝒈𝒆>𝟓𝟎(𝑳𝒆𝒄𝒕𝒖𝒓𝒆𝒓) or 𝝈barName=’Sues’(Sells)

Question 63

Q

How is the rename operator used? 𝜌

Answer

A

𝝆𝑺 𝑬 : Rename the expression E to S
* 𝝆𝑺(𝒂𝟏,𝒂𝟐,…,𝒂𝒏) 𝑬 : Rename the relation E to S,
and also re-names the attributes as 𝑎1, 𝑎2, … , 𝑎𝑛.

e.g
Q: List all the professors, rename the output relation as Professor:
𝜌𝑃𝑟𝑜𝑓𝑒𝑠𝑠𝑜𝑟(𝜎𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛=′𝑝𝑟𝑜𝑓𝑒𝑠𝑠𝑜𝑟′(𝐿𝑒𝑐𝑡𝑢𝑟𝑒𝑟))

Question 64

Q

How is the union operator used? ∪

Answer

A

Will take the union of both tables (R U S ), both tables must have the same relation schema for this to work

Answer 65

A

Will take the intersections of both tables (R ∩ S ), both tables must have the same relation schema for this to work

Answer 66

A

Will take the difference (R - S), records that exist in both R and S are remove from the R table and then that’s outputted. Both tables must have the same relation schema for this to work
𝑅 − 𝑆 ≠ 𝑆 − 𝑅

Answer 67

A

This one is kinda weird
* 𝑹 × 𝑺: Produces a relation that is the
concatenation of every tuple of relation 𝑅 with
every tuple of relation 𝑆.
―If both 𝑅 and 𝑆 have the same attribute 𝐴, then use: 𝑅. 𝐴
and 𝑆. 𝐴

Basically just stick S on the back of R and if they have the same attribute then call it R.attribute and S.attribute. Like for example Bar.barName and Sells.barName

Answer 68

A

― Theta join (Θ-join)
― Equijoin
― Natural join
― Outer join

Answer 69

A

𝑹 ⋈<𝒋𝒐𝒊𝒏> 𝑺: Defines a relation that contains
all combinations of tuples from 𝑅 and 𝑆 that satisfy the
join condition</𝒋𝒐𝒊𝒏>
𝑅 ⋈<𝑗𝑜𝑖𝑛> 𝑆 = 𝜎<𝑗𝑜𝑖𝑛>(𝑅 × 𝑆)</𝑗𝑜𝑖𝑛></𝑗𝑜𝑖𝑛>

Answer 70

A

Equijoin: Produces all the combinations of tuples
from that satisfy a join condition with only
equality comparisons.
―Equijoin is a special case of theta-join.

Answer 71

A

𝑹 ⋈ 𝑺: an equijoin of the two relations R and S
over all common attributes.

Answer 72

A

SQL is a standard language for interacting with a relational
database

Answer 73

A

It retrieves and updates statements:
-Select
-Insert
-Update
-Delete

Answer 74

A

Define DB structure & control access:
― CREATE
― ALTER
― DROP
― GRANT
― REVOKE

Answer 75

A

Comparison: =, <> (!=), <, <=, >=, >, OR, AND, NOT
Range: BETWEEN … AND, NOT BETWEEN … AND
Set membership: IN, NOT IN
Pattern match: LIKE, NOT LIKE
IS NULL, IS NOT NULL

Answer 76

A

COUNT
MIN
MAX
SUM
AVG

Answer 77

A

SELECT *
FROM Products
WHERE Price BETWEEN 10 AND 15;
-Note you can add a not in front of between if needed

Answer 78

A

SELECT *
FROM Customers
WHERE Country IN (‘Germany’, ‘France’, ‘UK’);
-Note you can add a not before IN for the opposite result

Answer 79

A

SELECT *
FROM Customers
WHERE CustomerName LIKE ‘a%’;

Answer 80

A

% – represents zero, one, or multiple characters
_ – represents a single character

Answer 81

A

SELECT CustomerName
FROM Customers
WHERE Address IS NULL ;

-Not it must be IS NULL or IS NOT NULL, the IS is necessary

Answer 82

A

COUNT
MIN
MAX
SUM
AVG

Answer 83

A

SELECT COUNT(DISTINCT Country)
FROM Customers;

Answer 84

A

SELECT MIN(Price), MAX(Price), AVG(Price)q
FROM Products;

Answer 85

A

GROUP BY: forms group of rows with the same column value.
HAVING: filters the groups subject to some condition

This is an example of where they are used in a query

SELECT [DISTINCT] column_name(s)
FROM table_name(s)
[WHERE condition]
[GROUP BY column_name(s)]
[HAVING condition]
[ORDER BY column_name(s)];

Answer 86

A

SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country;

Answer 87

A

― The grouping columns
― Aggregate functions.

An example is this:
SELECT COUNT(CustomerID), Country, City
FROM Customers
GROUP BY Country, City;

Answer 88

A

When aggregate functions are involved;
-HAVING clause always includes at least one aggregate function.
-WHERE clause can not include aggregate function.

Answer 89

A

An alias is a temporary name assigned to a table or a column for the duration of a query. Aliases are used to make the output of a query more readable or to shorten lengthy table or column names

Answer 90

A

Subquery (or nested query) : a complete SELECT statement
embedded in another query.
― In WHERE, or HAVING clause

SELECT *
FROM Customers
WHERE Country = (SELECT Country
FROM Customers
WHERE CustomerName= ‘Ernst Handel’);

Answer 91

A

Used in cases where the subquery will output more than one value
If the inner SELECT statement outputs a single value, use =
If multiple values, use IN

Answer 92

A

The EXISTS operator returns true if the subquery returns one or more rows. It doesn’t matter what the actual values of the rows are; the existence of any row is sufficient for the EXISTS condition to be true.

Answer 93

A

Inner joins in SQL are used to combine rows from two or more tables based on a related column between them. The result includes only the rows where the specified condition is met.

Answer 94

A

Outer joins in SQL are used to retrieve rows from one or more tables, even if there is no match between the joined tables. Unlike inner joins, which return only the matching rows, outer joins return all rows from at least one of the tables, with the non-matching rows filled with NULL values for columns from the table that does not have a corresponding match.

Answer 95

A

― LEFT JOIN: returns all rows from the left table
(table1), with the matching rows in the right table
(table2). The result is NULL in the right side when
there is no match.

― RIGHT JOIN: …

― FULL JOIN: Retain both the left table and right table
rows. The result is NULL for those no matches.

Answer 96

A

SELECT o.OrderID, c.CustomerName, o.OrderDate
FROM Orders o
Natural JOIN Customers c;

Answer 97

A

A self join in SQL is a specific case where a table is joined with itself. This is useful when you want to compare rows within the same table, treating it as if it were two separate tables. To perform a self join, you typically use aliases to differentiate between the columns from the same table

Answer 98

A

Insert adds new rows to the table
Syntax is:
INSERT INTO <table>
(Column1, Column2)
VALUES
(19, ‘ThatValue’),
(23, ‘ThisValue’);

or with a whole record the columns are not needed
INSERT INTO Orders
VALUES
(10447, 90, 6, ‘2017-05-25’,3);

Answer 99

A

Example 1 : Give all the products’ price a 3% increase
UPDATE Products
SET Price=Price*1.03;

Example 2

UPDATE Customers
SET ContactName = ‘Alfred Schmidt’,
City= ‘Frankfurt’
WHERE CustomerID = ‘001’;

Answer 100

A

DELETE FROM <table>
WHERE <condition></condition>

Answer 101

A

CREATE TABLE IF NOT EXISTS Tasks (
task_id INT AUTO_INCREMENT PRIMARY KEY,
title VARCHAR(255) NOT NULL,
start_date DATE,
due_date DATE,
status TINYINT NOT NULL,
description TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

IF NOT EXISTS is optional

Answer 102

A

Numeric types
-INT, TINYINT, DECIMAL, FLOAT, DOUBLE
string types;
-CHAR, VARCHAR, TEXT
Dat/time types
-DATE
-TIME
-DATETIME

Answer 103

A

– NOT NULL
– UNIQUE
– PRIMARY KEY
– FOREIGN KEY
– CHECK

Answer 104

A

NOT NULL: ensures values stored in a column are
not NULL.

Answer 105

A

UNIQUE: ensures values in a column or a group of
columns are different
used like :

CREATE TABLE table_name(
…,
column_name data_type UNIQUE,
…
);

Answer 106

A

CREATE TABLE table_name(
…,
[CONSTRAINT constraint_name] UNIQUE(Column_list)
);

Answer 107

A

CHECK: ensures that values stored in a column or group
of columns satisfy a Boolean expression.

used like :
CREATE TABLE table_name(
…,
column_name data_type CHECK(Boolean_expression),
…
);

Answer 108

A

PRIMARY KEY
― Implicitly means NOT NULL + UNIQUE
― A table can have one and only one primary key.

CREATE TABLE table_name(
…,
column_name data_type PRIMARY KEY,
…
);

Answer 109

A

FOREIGN KEY: refers to the PRIMARY KEY in another table

-The domain of the foreign key must be the same as the domain used in the referenced table

CREATE TABLE Products(
productId INT AUTO_INCREMENT PRIMARY KEY,
productName VARCHAR(100) NOT NULL,
categoryId INT,
CONSTRAINT fk_category FOREIGN KEY (categoryId)
REFERENCES categories(categoryId)
) ENGINE=INNODB;

Answer 110

A

― CASCADE: If a row from the parent table is deleted or updated, the
values of the matching rows in the child table automatically deleted or
updated.
― SET NULL: When referenced data in the parent key is deleted or updated,
all rows in the child table that depend on those values have their foreign
keys set to NULL.
― RESTRICT/NO ACTION: Reject the delete/update operation from the
parent table, if a row from the parent table has a matching row in the
child table.

Answer 111

A

ALTER TABLE Persons
ADD COLUMN dateOfBirth date NOT NULL,
ADD COLUMN joinDate date default CURRENT_DATE();

ALTER TABLE Persons
DROP COLUMN dateOfBirth;

Answer 112

A

ALTER ALTER TABLE Vehicles
MODIFY year SMALLINT NOT NULL,
MODIFY color VARCHAR(20) NULL AFTER make;

Answer 113

A

Add a table constraint:
ALTER TABLE table_name
ADD [CONSTRAINT constraint_name] UNIQUE/CHECK/PRIMARY KEY/FOREIGN KEY (…);

drop a table constraint
ALTER TABLE table_name
DROP CONSTRAINT constraint_name;

drop primary/ foreign key
ALTER TABLE table_name
DROP PRIMARY KEY;

ALTER TABLE table_name
DROP FOREIGN KEY constraint_name;

Answer 114

A

ALTER TABLE Persons
ADD CONSTRAINT UC_person UNIQUE(ID,lastName);

Answer 115

A

DROP TABLE Shippers;

Answer 116

A

Auto-increment allows a unique number to be generated
automatically when a new record is inserted into a table.
― Typically for primary key
― In MySQL, assign AUTO_INCREMENT to a column

example:

CREATE TABLE Persons (
personId int NOT NULL AUTO_INCREMENT PRIMARY KEY,
lastName varchar(255) NOT NULL,
firstName varchar(255),
Age int
);

Answer 117

A

ALTER TABLE Persons AUTO_INCREMENT 100;
^add a number on the end ¯_(ツ)_/¯

Answer 118

A

A data structure such as B-Tree that improves the speed of
data retrieval on a table at the cost of additional writes and
storage to maintain it.

Note: Updating a table with indexes takes more time than updating a table without. So, only create indexes on columns that will be frequently searched.

Answer 119

A

INDEX ( attributeName )
like :
CREATE TABLE Persons (
personid int NOT NULL AUTO_INCREMENT PRIMARY KEY,
lastName varchar(255) NOT NULL,
firstName varchar(255),
age int,
INDEX (lastName)
);

add to existing attribute:

CREATE INDEX idx_persons ON Persons(lastName);

Answer 120

A

CREATE UNIQUE INDEX emp_u1
ON EMP (eName, job);

This will Index and ensure that each combination of eName and job are unique.

Answer 121

A

CREATE [OR REPLACE] VIEW ViewName [column1,…] AS
SELECT …;

example horizontal view
CREATE VIEW Emp_D20 AS
SELECT * FROM Emp
WHERE deptNo = 20;

example vertical view
CREATE VIEW Emp_NoSalary AS
SELECT empNo, eName, job, mgr, hireDate, comm, deptNo
FROM Emp;

Answer 122

A

CREATE VIEW Num_Emp_Job(deptNo,job,cnt) AS
SELECT deptNo,job,COUNT(*)
FROM Emp
GROUP BY deptNo, job
ORDER BY deptNo;

Answer 123

A

All updates to a base table are immediately reflected in all views
that encompass that base table. Similarly, we may expect that if a
view is updated then the base table(s) will reflect that change.

― For a view to be updatable, the DBMS must be able to trace any row or column back to its row or column in the base table. e.g SELECT statement defining the view must be simple, meaning it doesn’t involve complex expressions, aggregates, GROUP BY, DISTINCT, JOINs, or subqueries

Answer 124

A

COMMIT
― Causes all outstanding changes that you have made to become
permanent.
ROLLBACK
― Causes all uncommitted changes to be discarded

Answer 125

A

Physical realization of the database and application designs
― Use DDL to create database schemas
― Use DML to populate the data
― Develop the application programs
Database transactions are implemented using the DML, possibly
embedded in a host programming language.

Answer 126

A

The process of producing a description of the implementation
of the database on secondary storage.
― Describe the base relations, file organizations, and indexes used to
achieve efficient access to the data, and any associate integrity
constraints and security measures.

Answer 127

A

― Logical database design
* Minimally in 3rd normal form
― Transaction analyses
* Most frequent transactions
* Most complex or resource-insensitive transactions
* Distributions of transactions, over time
* Most critical transactions to the applications
― Which DBMS software to use

Answer 128

A

STEP1: Translate the logical data model for target DBMS
― 1.1 Design base relations
― 1.2 Design representation of derived data
― 1.3 Design general constraints
STEP2: Design file organizations and indexes
― 2.1 Choose file organizations
― 2.2 Choose indexes
― 2.3 Estimate disk space requirement
STEP3: Design user views
STEP4: Design security mechanisms
STEP5: Consider controlled redundancy
STEP6: Monitor and tune the operational system

Answer 129

A

Defining Tables and Columns: including naming objects,
choosing data types and lengths, and handling null, default values
Defining Primary Keys: may considering the use of surrogate keys
Handling referential integrity for foreign keys.

^This is basically this thingy , OrderDetails(OrderID, ProductID, Quantity)

Answer 130

A

A surrogate key is a unique identifier assigned to each record or row in a database table. Unlike natural keys, which are based on existing data attributes and have some inherent meaning, surrogate keys are system-generated and lack any meaningful information about the data they represent. Surrogate keys are typically used as primary keys in database tables.

Answer 131

A

Remember derived data is the data that’s calculated based on the database and not physically stored

Use a generated column
Create a view
Use triggers*
Triggers are stored programs executed automatically to
respond to specific events such as inserting, updating or
deleting data.
Calculate it each time

Answer 132

A

A database is stored as a collection of files. A database file is
partitioned into fixed-length storage units called blocks.
What is a block?

Block/Page: A contiguous sequence of sectors from a single
track

Answer 133

A

-Primary storage is volatile ( e.g main memory/cache memory
-Primary storage is where the operations happen
-Data blocks can be moved from primary to secondary storage
-Secondary storage is where the entire database is stored e.g(magnetic disk, flash drives etc)
-Stable storage is where the database is backed up (e.g RAID or a Redundant Array of Independent Disks)
-Stable storage is always non-volatile with independent failure modes

Answer 134

A

The physical arrangement of data in a file

Answer 135

A

― Heap (unsorted) files: Records are placed on disk in no particular
order
― Sequential (ordered) files: records are ordered by the value of a
specific attribute.
― Hash files: Records are placed on disk according to a hash function

Answer 136

A

Clustered Index:
With a clustered index the rows are stored physically on the
disk in the same order as the index.
― Only one clustered index is allowed for each table
― If the indexed attribute chosen is a key of a relation, the index is
called primary index.

Non-Clustered(secondary) index:
An index which maintains a
logical ordering of data rows without altering the physical
ordering of the rows.
― A table could have several secondary indexes
― Foreign keys are good candidates for non clustering indexes.

Answer 137

A

Denormalization is the process of trying to improve the read
performance of a database, at the expense of losing some write
performance, by adding redundant copies of data or by
grouping data.

― If performance is unsatisfactory and a relation has a low update and a very high query rate.

Answer 138

A

They can hold data derived in columns from other tables - stops it having to be derived freshly each time

Decompose very large relations into smaller and more
manageable pieces.

Answer 139

A

Transaction: an executing program that forms a logical unit
of database operations
A transaction includes one or more database operations, e.g.,
insertion, deletion, updating, retrieval.

Answer 140

A

Atomicity: All or nothing
e.g shouldn’t take money from A without giving it to B
Consistency: transforms the database from one
consistent state to another consistent state
e.g money isn’t lost or gained

Isolation: partial effects of incomplete transactions
should not be visible to other transactions
e.g. other transactions shouldn’t see such a change until completion.

Durability: successfully committed transactions are
permanently recorded in the database, not lost, even in
the event of a system failure
e.g. After completion, such a change in A and B is saved in the database.

Answer 141

A

START TRANSACTION;

<operations>
COMMIT;

e.g

START TRANSACTION;

UPDATE Accounts
SET balance = balance - 100
WHERE account = 'A';

UPDATE Accounts
SET balance = balance + 100
WHERE account = 'B’;

COMMIT;
</operations>

Answer 142

A

Schedule: an ordering of operations for concurrent transactions. There are Serial schedules and Non-serial schedules

Serial schedule: a schedule in which the operations for concurrent
transactions are not interleaved.
* But, unacceptable in practice.

Answer 143

A

Serializability and Recoverability

Answer 144

A

Conflicting operations: if two operations belong to different
transactions, access the same database item, and at least one
operation is a writing operation.

Answer 145

A

A schedule is considered recoverable if, in the presence of failures (such as crashes or errors), it is possible to restore the database to a consistent state.

a schedule, where, for each pair of transactions Ti and Tj, if Tj reads a
data item previously written by Ti, then Ti should commit before Tj.

Answer 146

A

Concurrency Control is the process of managing simultaneous
operations on the database without having them interfere with
one another.

Answer 147

A

― Pessimistic: Assumes that conflict is likely and take steps to prevent it.
* Locking, timestamping
― Optimistic: Assumes that conflict is unlikely and only checks for it
when transaction commits

Answer 148

A

Locking: When a transaction is accessing the database, a lock may
deny access to other transactions to prevent incorrect results.
― A read operation must acquire a shared/read lock
― An update operation must acquire an exclusive/write lock

Answer 149

A

Locks can be applied to:
― The whole database
― A file
― A page/a disk block
― A record
― A field value of a record

Answer 150

A

A transaction must issue read_lock(X), or write_lock(X) before
any read(X).

Answer 151

A

― A transaction must issue write_lock(X) before any write(X)

Answer 152

A

― A transaction must issue unlock(X) after read(X) or write(X)

Answer 153

A

Using shared/exclusive locking scheme does not
guarantee serializability. More strict protocols are
required, e.g. two-phase locking (2PL).

Answer 154

A

Two-Phase Locking (2PL): All locking operations precede the first
unlock operation in a transaction.
― Growing/Expanding phase: during which new locks on items can be acquired but none can be released;
― Shrinking phase: during which existing locks can be released but no new locks can be acquired.
* Advantage: it assures serializability

Answer 155

A

It can cause a deadlock which is a circular situation where each of two (or more) transactions are waiting to acquire locks that are held by the other.

Answer 156

A

A transaction will wait for a (database defined) period
to acquire a lock.
* If this time runs out then the whole transaction is
rolled back and restarted.

Answer 157

A

A Wait-for Graph (WFG) is constructed
― node for each transaction
― directed edge from transaction T1 to transaction T2 if T1 is waiting to lock an item currently held by T2
A deadlock exists if the graph contains a cycle.

Answer 158

A

Once deadlock is detected, DBMS needs abort one or more
of the transactions.
-The transaction with the minimum cost should be rolled back
Starvation: the same transaction is always chosen as a victim,
and the transaction can never complete.

Answer 159

A

Wait-die algorithm
― only older transactions can wait for younger ones
Wound-wait algorithm
― only younger transactions can wait for older ones

Answer 160

A

Timestamp: A unique identifier created by the DBMS that
indicates the relative starting time of a transaction.
― Either a logical counter or the system clock.
* Timestamping: a concurrency control protocol that orders
transactions in such a way that older transactions get priority in
the event of conflict

Answer 161

A

Optimistic techniques assume that conflict is rare.
― Before a transaction commits, a check is performed to determine
whether conflict has occurred.
― If yes, roll back and restart.

Answer 162

A

A backup is a copy of the database taken periodically and stored in a safe place which allows the database to be restored with an acceptable loss of data

Full backup
― The whole database.
Differential backup
― All changes made since the last full backup
Incremental
― All changes made since the last incremental backup`

Answer 163

A

Recovery: The process of restoring the database to a correct
state in the event of a failure.

Two main recovery techniques
― Log-based
― Shadow paging

Answer 164

A

Log: a sequence of records, which maintains the records of
actions performed by transactions, recovery can be performed by reversing these operations

Answer 165

A

Checkpoints are scheduled at predetermined intervals and
involve the following operations:
― Write all log records in main memory to secondary storage
― Write the modified blocks in the buffers to secondary storage.
― Write a checkpoint record to the log file. This record contains
transactions which are active at the time of the checkpo

Answer 166

A

When a failure occurs,
― The system check the log file backwards from the end to the last
checkpoint.
― REDO all the transactions that committed since the last checkpoint.
― UNDO all the transactions that were active at the time of the crash.

Answer 167

A

Two types of DB updating schemes:
― Deferred update: Updates are not written to the database until after a transaction has reached its commit point.
― Immediate update: Updates are applied to the database as they occur without waiting to reach the commit point.

Answer 168

A

Maintain two tables during the lifetime of a transaction:
― A current page table
― A shadow page table

During a transaction the current page is copied into a shadow page table and the shadow page table is never modified during transaction execution. If the transaction fails the current page table is discarded and the shadow table is used instead

Answer 169

A

Advantage:
― The overhead of maintaining the log file is eliminated.
― The recovery is significantly faster since there is no need for
UNDO or REDO operations.

Disadvantage:
― Data fragmentation
― Need for periodic garbage collection to reclaim inaccessible
blocks

Answer 170

A

Caused by
– Unauthorized users
– Authorized users
– Programmers
- DBA

Answer 171

A

― Confidentiality: the protection of data from unauthorized disclosure
― Integrity: the requirement that information is protected from improper
modification.
― Availability

Answer 172

A

― Discretionary Access Control (DAC)
* The owner of the object specifies which subjects can access the object.
* Used by most commercial DBMSs
* Supported by SQL: GRANT and REVOKE

Answer 173

A

Mandatory Access Control (MAC)
* The system (not the users) specifies which subjects can access specific data objects.
* Not supported by SQL

Answer 174

A

GRANT INSERT
ON classicmodels.*
TO rfc@localhost;

Note: the opposite of GRANT is REVOKE

Answer 175

A

A subject S is only allowed to read an object O if it has a higher level of clearance than the object

― Class(O): the classification of an object O (relation, tuple, column, view, operation)
― Class(S): the clearance of a subject S (user, account, program)

Answer 176

A

Top Secret (TS) > Secret (S) > Confidential (C) > Unclassified (U)

Answer 177

A

The view mechanism provides a powerful and flexible security
mechanism by hiding parts of the database from certain users.

Answer 178

A

― Disk
― Database
― Network traffic

Answer 179

A

Rather than preventing certain actions, record who did what.
― Keep track of all operations from each user through each login
session.
― The audit trail (or system log) itself must be secured

Answer 180

A

A data storage technique that combines multiple physical disk drive
components into one logical unit
*improves reliability
* increases performance
* improves availability

Answer 181

A

Data striping: the data is segmented into equal-size partitions (the
striping unit) which are transparently distribute across multiple
disks.
― Improves overall I/O performance by allowing multiple I/Os to be serviced in parallel.

Answer 182

A

Each byte may have a parity bit associated with it that records whether the number of 1s is even or odd.

― Even parity: number of 1’s including parity bit is even
* E.g., 0110110 → 01101100, 0101100 → 01011001
― Odd parity: number of 1’s including parity bit is odd
* E.g., 0110110 → 01101101, 0101100 → 01011000
― Improves reliability

Answer 183

A

― RAID 0 : striping
― RAID 1: mirroring
― RAID 10
― RAID 5

Answer 184

A

― Pros: improved I/O performance
― Cons: complete vulnerability to drive failures

Answer 185

A

― Pros: Fast read, availability improvement
― Cons: Write throughput is always slower because every drive
must be updated. Most expensive storage solution.

Answer 186

A

― Pros: improved I/O performance, rebuilding time is fast if one drive fails.
― Cons: Still expensive because of mirroring.

Answer 187

A

― The parity data is also striped across all the disk.
― Pros: Resilience to a single drive failure, fast reading.
― Cons: Writing is slower (due to the parity needed be calculated), complex technology.

Answer 188

A

Distributed database: a collection of multiple interconnected
databases, which are
― spread physically across various locations
― communicate via a computer network.

Answer 189

A

Distribution/location transparency
― Users perceive the database as a single, logical entity.
Local autonomy
― Local data is locally owned and managed.
― Each local database operated independently.
― Local systems are able to operate effectively even if remote databases
are not available.

Answer 190

A

Homogeneous
― All sites use the same type of DBMS.

Heterogeneous
― Sites run different DBMSs.
― Translation using gateways: convert the language and model of each different DBMS into the language and model of the relational system.

Answer 191

A

Fragmentation is to divide a table into a set of smaller tables to be
stored in different sites. The subsets of the table are
called fragments.
― Horizontal
* Subsets of rows are stored at different sites.
― Vertical
* Subsets of attributes are stored at different sites.
― Hybrid (Mixed)

Answer 192

A

Correctness rules:
― Completeness: Each data item can be found in R must appear in at least one fragment.

― Reconstruction
* Horizontal: 𝑅 = 𝑅1 ∪ 𝑅2 … ∪ 𝑅𝑛
* Vertical: 𝑅 = 𝑅1 ⋈ 𝑅2 … ⋈ 𝑅𝑛

― Disjointness
* Horizontal: The fragments are disjoint.
* Vertical: The fragments are disjoint except for the primary key

Answer 193

A

Data allocation: The process of assigning each fragment to a
particular site in the distributed system.

Answer 194

A

Fully replication
― A complete copy of the DB at each site
Non-redundant allocation
― Each fragment is stored at exactly one site
Partial replication
― Certain transactions that access particular parts of the DB are mostly submitted at a particular site, that fragment can be allocated at that site only.
― Data that is accessed at multiple sites can be replicated at those sites. If many updates are performed, it may limit replication.

Answer 195

A

Reflects organizational structure
Easier expansion
Increased reliability and availability
Better response/Improved performance

Answer 196

A

Complex to design and implementation
Harder to control security and integrity
Lack of standard and experience

Answer 197

A

Data warehouse is a database, which
– is kept separate from the organization’s operational database,
– no frequent updating
– possesses consolidated historical data
– helps the organization to analyse its business

Answer 198

A

― Subject Oriented − e.g., a sales data warehouse, or a credit card
transaction data warehouse, etc.
― Integrated − data combined from multiple sources.
― Time Variant − looking for changes over time
― Non-volatile − the previous data is not erased when new data is added to it. Frequent changes in operational database is not reflected in the data warehouse.

Answer 199

A

Online Analytical Processing (OLAP):
― Uses multi-dimensional view of aggregate data to provide quick access to strategic information