Database Theory and Design Flashcards

1
Q

Define a Database (DB)

A

An organized collection of related data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define a Database Management System (DBMS)

A

Software that manages and controls access to the database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define Database application

A

Program that interacts with the database at some point in its execution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define a Database System

A

A collection of programs that interact with the database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are two ways of storing and managing data?

A
  1. File-based System
  2. Database System
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define a File-based System

A

A group of software tools that help users do different tasks. Each program stores and manages its own data separately

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define a Database System

A

Software that helps store, manage, and organize data in an easy-to-access way. It allows users to save, update, and retrieve data quickly and securely

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the advantages of a File-based approach?

A

User Control
- End user has total control of the stored data
- Can modify application or data at will

Applications and data file can be optimised for particular tasks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the disadvantages of a File-based approach?

A
  • Data duplication
  • Data dependence
  • Incompatible file formats
  • Proliferation of application programs (APPS)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Data Duplication

A

If a record is duplicated in each of the 3 files, a change to the record requires a change to all 3 files.

Integrity of student records may be compromised if the files are not all updated - which file holds the correct information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Data Dependence

A

The way data is stored affects how it can be used, making the data and the program tightly connected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are imcompatible file formats

A

If application programs are separatly implemented to data files, then the data files may be incompatible with the application program

Meaning the data files would have to be translated to fit the application program

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the Proliferation of application programs?

A

If you wanted more information out of data files than initially anticipated

You would have to add more application programs to handle new queries

Leading to the proliferation of files and application programs that each official has to handle

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does a database approach address the limitation of an application program?

A

The data is stored in a DB, which is then accessed by DBMS.

When officials need info from the DB they write queries in SQL to communicate with the DBMS which in turn gives them what they need

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a DBMS

A

Software that interacts with users’ application programs and the DB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define a Database Management System (DBMS)

A

Software system that enables the user to define, create, maintain and control access to the database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does an End-user do?

A

Run applications to perform specific database operations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does an Application Developer do?

A

Use a programming language to provide the required functionality for the end users

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What does a Database Designer do?

A

Ensures the database is well-organized, secure, and works efficiently

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does a Database Administrator (DBA) do?

A

Responsible for implementation and monitoring of the database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the 13 advantages of a DBMS?

A
  • Control of data redundancy
  • Data Consistency
  • Sharing of data
  • Improved data integrity
  • Improved Security
  • Enforcement of Standards
  • Economy of Scale
  • Balance of conflicting requirements
  • Improved data accessibility and responsiveness
  • Increased Productivity
  • Improved maintenance through data independence
  • Increased concurrency
  • Backup and Recovery services
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is Control of Data Redundancy?

A

Since all data is now stored in a single database, there are no unnecessary multiple copies of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is Data Consistency?

A

Only one copy of each data item

(changes only have to be made to 1 item, therefore if there was multiple and they werent all changed data wouldnt be consistent)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How would an organization share data to people inside it?

A

Since data is common to the organization, it’s easy for all authorized users to have access to the same data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How would you improve data integrity in a database?

A

Organizational data is stored in one place, it’s easy to enforce consistency rules that shouldn’t be violated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

How would you improve security in a database?

A

Database administrator can easily put in place security and access measures using the DBMS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What data standards could be enforced in a database?

A

Since all data is in the same database, its easy to enforce data standards such as:
- Data formats
- Naming conventions
- Documentation standards
- Update procedures
- Access rules

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

How is Economy of Scale used in Databases?

A

Combining an entire organization’s data into one database and creating a set of applications that work on this one source results in saving costs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

In a database how would you balance conflicting requirements?

A

Since the DBA has an organizational-wide view of all operations, they can optimize the design and operational use of the database to provide the best use of resources as a whole as opposed to satisfying one department at the expense of another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is improved data accessibility and responsiveness in a database?

A

Common database provides easy data access to all authorized people within orgranization

Also, the DBMS provides SQL to enable end-users to easily construct ad hoc queries without detailed programming knowledge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

How could productivity be improved in a database?

A

Can be improved by using a DBMS, which handles low-level file operations and provides SQL for user-friendly query development, reducing the need for extensive programming.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is improved maintenance through data independence in databases?

A

A DBMS separates data management from application programs
- so if changes are made to underyling data structure, the DBMS doesnt have to change application programs.

The DBMS will provide the necessary mapping between application programa and data stored on the database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is concurrency in a database?

A

Many DBMSs allow users to access the database simultanoeusly without any conflict

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What are Backup and recovery services in databases?

A

DBMS have facilities to enable data recovery following a failure

They can also automatically backup the data stored in the database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What are the disadvantages of a DBMS?

A
  • Complex: require those using them to fully understand them
  • Size: Large piece of software that requires alot of memory to run
  • Cost of DBMS: Cost varies depending on if its single user or a big organization. Maintenance and operations costs also need to be factored.
  • Cost of conversion: Data conversion costs as well as training and
    recruiting costs
  • Performance: Written for general use, so some applications aren’t
    as fast
  • Greater impact of failure: Centralization increases system vulnerability. Failure may result in everyone being affected in an organization
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What are the 3 levels in Three-level Architecture?

A

External Level
- The way users percieve data

Conceptual Level
- Provide both the mapping and desired independence

Internal Level
- The way DBMS and the OS perceive the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What are the two objectives of Three-level Architecture?

A
  • Data Abstraction
  • Data Independence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What is Data Abstraction?

A

Hide storage details and present the users with a conceptual view of the database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What is Data Independence?

A

ability to change the database’s internal structure (the lower levels) without impacting the higher levels, such as the applications or user interfaces that interact with the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What is Logical Data Independence?

A

The ability to change the structure of a database’s logical schema (the way data is organized and represented to users) without affecting the applications or queries that use that data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What is Physical Data Independence?

A

Changing the way data is stored in a database without affecting how the data is stored or viewed by users or applications.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What is a Database schema?

A

The description of the structure of the database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What is a Database state?

A

The content of a DB at a moment in time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

What is a Data model?

A

How data is stored, related, and accessed

A set of concepts to describe
- Data and relationships among data
- Data constraints

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

What are the 3 categories of Data model?

A

Conceptual data model
Logical data model
Physical data model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

What is a Conceptual data model?

A

Identifies the high-level data structure

(Independent of DBMS, application programs, physical considerations)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

What is a Logical data model?

A

Describes the data in terms of data structures

Independent of a particular DBMS product and storage technology

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

What are examples of Logical data models?

A

Hierarchical

Network

Relational

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

What is a Hierarchical data model?

A

Data is organized in a tree-like structure

Each node has 1 parent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

What is a Network data model?

A

Data is organized as a graph

One node can have more than one parent node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

What is a Relational data model

A

Data is stored in tables. Each table, called a relation, consists of rows and columns, much like a spreadsheet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

What is a Physical data model?

A

Describe how data is stored in the computer, representing record structures, record openings, and access paths

Highly dependent on target DBMS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

What is Motivation in Databases?

A

One of the most difficult aspect of database design is that designers, programmers and end-users tend to view data in different ways

Need a model for communication that is non-technical and free of ambiguities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

What are the 3 main notations used for an ER Model?

A
  1. Chen Notation
  2. Crow’s feet Notation
  3. UML Notation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

What does an ER Model consist of?

A
  • Entity
  • Relationship
  • Attribute
  • Constraint
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

Define an Entity.

A

A group of objects with the same properties

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

Define a Relationship.

A

Meaningful associations among two or more entities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

Define a Degree of Relationship.

A

The number of participating entity types in a relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

What are 4 Degree of Relationship types?

A
  • Degree one: recursive
  • Degree two: binary
  • Degree three: ternary
  • Degree four: quaternary
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

What is a Multiple Relationship?

A

Two entities are associated through more than one relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

Define an Attribute.

A

A property of an entity or a relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

Define a Candidate Key

A

The minimal number of attribute(s) whose value(s) uniquely identify each entity occurrence
- A candidate key cannot contain a null

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

Define a Primary Key

A

A unique identifier for each record in a database table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

What are the principles of choosing a Primary key?

A
  • Attribute length
  • Minimal number of attributes required
  • Future certainty of uniqueness
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

Do relationships have candidate keys?

A

NO

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

What is a Weak Entity Type?

A

Existence-dependent on some other entity
- Each entity occurrence cannot be uniquely identified using only the attributes assoicated with that entity type

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q

Define Multiplicity

A

The number of possible occurrences of an entity type that may relate to a single occurrence of an associated entity type through a particular relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

What are the 3 relationship types?

A
  1. One-to-one (1:1)
  2. One-to-many(1 : *)
  3. Many-to-many (* : *)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
69
Q

What is a one-to-one relationship?

A

Each record in one table is linked to exactly one record in another table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
70
Q

What is a one-to-many relationship?

A

A record in one table can be linked to multiple records in another table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
71
Q

What is a many-to-many relationship?

A

Multiple records in one table can be linked to multiple records in another table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
72
Q

What is Multiplicity for complex relationships?

A

It’s a way of showing the “how many” in a relationship between entities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
73
Q

What is Cardinality?

A

The maximum values for the multiplicity ranges on either side of the relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
74
Q

What is Participation?

A

The minimum values for the multiplicity ranges on either side of the relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
75
Q

What is the difference between Optional and Mandatory Participation?

A

Optional (if 0)
Mandatory (if 1 or more)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
76
Q

What is a fan trap?

A

When two one-to-many relationships connect in a way that causes incorrect data interpretation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
77
Q

Define Specialisation

A

The process of maximizing the differences between members of an entity by identifying their distinguishing characteristics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
78
Q

Define Generalisation

A

The process of minimizing the differences between entities by identifying their common characteristics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
79
Q

What are the participation constraints?

A
  • Optional
  • Mandatory
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
80
Q

What are the disjoint constraints?

A
  • And (nondisjoint)
  • Or (disjoint)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
81
Q

Define Aggregation

A

An abstraction through which relationships are treated as higher-level entities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
82
Q

Define a Relation

A

A two-dimensional table that has specific characteristics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
83
Q

What are 4 terminologies for describing a table?

A

Attribute - A column that defines a specific piece of information about each record (row). Each attribute represents a data field

Tuple - A row that represents a single record or entry. Each tuple contains specific data for all the attributes (columns) in the table.

Degree - number of attributes (columns) it has

Cardinality - number of tuples (rows) it contains

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
84
Q

What is a Candidate Key?

A

The minimal set of attributes whose values uniquely identify each tuple

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
85
Q

What is a Foreign Key?

A

An attribute, or set of attributes, within one relation that matches the primary key of another relation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
86
Q

List the key properties of Relations

A

A relation has a name that is distinct from all other relation names in the database
Each cell of relation contains exactly one atomic (single) value
Each attribute has a distinct name
The values of an attribute are all from the same domain
Each tuple is distinct - there are no duplicate tuples
Order of attributes and tuples has no significance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
87
Q

Define Relation Schema

A

A named relation defined by the relation name (table name) and attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
88
Q

Define Relation Instance/State

A

A set of tuples from a relation schema

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
89
Q

Define Entity Intregrity

A

Uniqueness : No two tuples can have identical values for candidate keys

Not-Null: No attribute of a candidate key can be a NULL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
90
Q

What is Referential Intregrity

A

If a foreign key exists in a relation
- Either the foreign key value matches a primary key value in its home relation
- Or the foreign key is NULL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
91
Q

Define a Domain

A

Set of allowable values for one or more attributes

92
Q

What are Domain Constraints?

A

Enables RDBMS to carry out checks on data errors and to determine the range of operations that can be carried out on the domain

93
Q

Define a View

A

A subset of base relations

94
Q

Define a Base Relation

A

A base relation is an actual table in the database, not a view or derived table

Represents a real-world entity or concept

95
Q

What are Features of Views?

A

Virtual/derived relation. It does not necessarily exist in the database, but can be produced upon request.

Dynamic. The changes made to the base relation(s) that affect the view are immediately reflected in the view.

Generated by applying appropriate relational operations (e.g., 𝜎, 𝜋, x, ⋈, ….)

96
Q

What are Strong Entity Relation Schemas?

A

A schema that represents a strong entity in a database. A strong entity is an entity that can be uniquely identified by its own attributes, without needing any reference to another entity.
- For composite attributes, include only the constituent simple attributes

97
Q

Explain a One-to-Many Binary Relationship

A

One record in a table is related to many records in another table
- The many side (child entity) receive the foreign key

98
Q

Explain a One-to-One Binary Relationship with mandatory participation on both sides

A

Combine the two entities to form one relation
The primary key of any one of the two relations is chosen to be the priamry key, whilst the other becomes the alternate key

99
Q

Explain a One-to-One Binary Relationship with mandatory participation on one side

A

The entity with mandatory participation receives the foreign key

100
Q

Explain a One-to-One Binary Relationship with optional participation on both sides

A

Foreign key in either side is fine

101
Q

Explain a Many-to-Many Binary Relationship

A

Create a new relation as an associative entity
Primary key: both of these primary keys of its associate entities, may add other attributes

101
Q

Explain a Recursive Relationship

A

Same rules as binary relationships
- (1:1), (1:) recursive relationships are implemented by foreign keys
- (
:*) recursive relationships are implemented by creating a new relation

102
Q

Explain a Complex Relationship

A

involves more than two entities participating in a relationship

103
Q

In a Superclass/Subclass Relationship what can you do?

A
  1. Only keep the subclasses
  2. Only keep the superclass
  3. Keep both
104
Q

Explain a Weak Entity 1:1 relationship

A

In a weak entity relationship, a 1:1 relationship means that each instance of the weak entity is associated with exactly one instance of the strong entity and vice versa.

Key Points:
Weak Entity: Cannot exist without a strong entity; it relies on it for its identification.
Strong Entity: Has its own primary key.

105
Q

Explain a Weak Entity 1:* relationship

A

In a weak entity relationship, a 1:* relationship means that each instance of the weak entity is associated with one instance of the strong entity, but one instance of the strong entity can be associated with multiple instances of the weak entity.

Key Points:
Weak Entity: Cannot exist independently; relies on the strong entity for identification.
Strong Entity: Has its own primary key.

106
Q

Define a Multi-valued attribute

A

An attribute that can have more than one value. For example, a person can have multiple phone numbers.

107
Q

What is Normalisation?

A

A technique for producing a set of relations with desired properties
- Remove redundancy
- Remove potential for insertion, modification, deletion anomalies

108
Q

Define an Insertion Anomaly

A

Insertion of data without necessary conditions having been met.

Example:
Storing a persons credit card in a transaction log without the transaction having ever occured

109
Q

Define a Modification Anomaly

A

Occurs in databases when data is not properly organized, leading to issues with updating, inserting, or deleting records

110
Q

Define a Deletion Anomaly

A

Deleting data that should be kept

111
Q

Define a Functional dependency

A

If A and B are attributes of relation R, if each value of A is assoicated with exactly one value of B, B is said to be functionally dependent (FD) on A.
- Denoted: A → B
- A is the determinant

(Note: A or B is not limited to a single attribute, could be a set of attributes)

112
Q

What is a Transitive dependency?

A

If A → B and B → C is transitive dependent on A via B: A → C

113
Q

What is a Partial functional dependency?

A

Given A → B, if removing some attribute(s) from A, the dependency still holds

114
Q

What is a Full functional dependency?

A

If A → B and B is not functionally dependent on any subset of A

115
Q

Define 1st normal form

A

Remove repeating groups

116
Q

Define 2nd normal form

A

Remove partial dependencies

117
Q

Define 3rd normal form

A

Remove transitive dependencies

118
Q

Define Boyce-Codd normal form (BCNF)

A

Every determinant is a candidate key

(Boyce-Codd Normal Form (BNCF) is also known has 3.5 Normal Form)

119
Q

Define Algebra

A

In its most general form, algebra is the study of mathematical symbols and the rules of manipulating these symbols

120
Q

What is Relational Algebra and how does it work?

A

An algebra whose operands are relations

Relational algebra provides a mean to query the data in a database and to modify the data
- One of the query language
- SQL incorporates relational algebra

121
Q

What are all the Relational Operations?

A

Unary Operations
- Projection (𝜋)
- Selection (𝜎)
- Rename (𝜌)
Set Operations
- Union (∪)
- Intersection (∩)
- Set difference (−)
- Cartesian Product (X)
Binary Operations
- Join (⋈)
- Outer Join (⋉, ⋊)
- Division (÷)
Aggregate and Grouping operations (ℑ)

122
Q

Define Projection (𝜋)

A
  • 𝝅<𝒂𝒕𝒕𝒓𝒊𝒃𝒖𝒕𝒆>(𝑹): Produce a new relation that has only some of 𝑅‘s columns</𝒂𝒕𝒕𝒓𝒊𝒃𝒖𝒕𝒆>
  • Eliminate duplicate tuples, if any.
123
Q

Define Selection (𝜎)

A
  • 𝝈<𝒔𝒆𝒍𝒆𝒄𝒕>(𝑹): Produce a new relation that contains only those tuples of R
    that satisfy the condition.</𝒔𝒆𝒍𝒆𝒄𝒕>
  • The relation has the same schema as the original schema.
124
Q

Define Rename (𝜌)

A
  • 𝝆𝑺 𝑬 : Rename the expression E to S
  • 𝝆𝑺(𝒂𝟏,𝒂𝟐,…,𝒂𝒏) 𝑬 : Rename the relation E to S, and rename the attributes as 𝑎1, 𝑎2, …
    , 𝑎𝑛.
125
Q

Define Union (∪)

A

A relation that contains tuples from both sets with no duplicates

126
Q

Define Intersection (∩)

A

A relation that contains tuples that are only from both sets.

127
Q

Define Set Difference (−)

A

A relation that contains tuples from one set but not the other

128
Q

Define Cartesian Product (X)

A

A relation that is the concatenation of every tuple of Relation X with every tuple of relation Y

129
Q

Define Join Operations (⋈)

A

Performs selection over the Cartesian product of two relations

130
Q

What are the 4 types of Join Operations?

A
  1. Theta join (Θ-join)
  2. Equijion
  3. Natural join
  4. Outer join
131
Q

Define Theta join (Θ-join)?

A
  • 𝑹 ⋈<𝒋𝒐𝒊𝒏> 𝑺: Defines a relation that contains all combinations of tuples
    from 𝑅 and 𝑆 that satisfy the join condition</𝒋𝒐𝒊𝒏>
  • 𝑅 ⋈<𝑗𝑜𝑖𝑛> 𝑆 = 𝜎<𝑗𝑜𝑖𝑛>(𝑅 × 𝑆)</𝑗𝑜𝑖𝑛></𝑗𝑜𝑖𝑛>
132
Q

Define Equijoin

A

Produces all the combinations of tuples from that satisfy a join condition with only equality comparisons.

133
Q

Define Natural Join

A

A way to combine two tables in a database based on common columns

𝑹 ⋈ 𝑺: an equijoin of the two relations R and S over all common attributes.

134
Q

What type of language is SQL and what does that mean?

A

A Declarative Language: Focuses on the what, not how

Also a free-format, case insensitive language

135
Q

What is a DCL (Data Control Language)?

A

Permissions and access control

136
Q

What is a TCL (Transaction Control Language)?

A

Managing Transactions (commit, rollback)

137
Q

What are 5 basic selection conditions in SQL?

A
  1. Comparison: =, <> (!=), <, <=, >=, >, OR, AND, NOT
  2. Range: BETWEEN … AND, NOT BETWEEN … AND
  3. Set membership: IN, NOT IN
  4. Pattern match: LIKE, NOT LIKE
  5. IS NULL, IS NOT NULL
138
Q

What do aggregate functions do and what are the 5 aggregate functions?

A

Perfom calculations on multiple rows

  1. COUNT - counts rows
  2. MIN - finds min
  3. MAX - finds max
  4. SUM - adds up values
  5. AVG - finds average
139
Q

What are 5 key clauses for the SELECT statement in SQL?

A
  1. DISTINCT: Eliminates duplicate rows.
  2. WHERE: Filters rows based on conditions.
  3. GROUP BY: Groups rows sharing a common value.
  4. HAVING: Filters grouped rows (like WHERE but for aggregated data).
  5. ORDER BY: Specifies sorting of results.
140
Q

What does GROUP BY do in SQL?

A

Groups customers by a condition

For example: Groups customers by country and counts the number in each
SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country;

141
Q

What does HAVING do in SQL?

A

Used to filter data

Filters groups where count > 5:
SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country
HAVING COUNT(CustomerID) > 5;

142
Q

Define a subquery

A

A query inside another query

For example:
SELECT CustomerName
FROM Customers
WHERE Country IN (
SELECT Country
FROM Orders
WHERE TotalSales > 5000
);

143
Q

Define every type of JOIN in SQL

A
  • INNER JOIN: Retrieves matching rows between two tables.
  • LEFT JOIN: Retrieves all rows from the left table and matching rows from the right table.
  • RIGHT JOIN: Retrieves all rows from the right table and matching rows from the left table.
  • SELF JOIN: Joins a table to itself.
144
Q

What does INSERT TO do in SQL?

A

Adds new rows to a table

Example:
INSERT INTO Customers (CustomerName, Country)
VALUES (‘John Doe’, ‘USA’);

145
Q

What does UPDATE do in SQL?

A

Modifies existing data in a table

Example:
UPDATE Customers
SET City = ‘Frankfurt’
WHERE CustomerID = 1;

146
Q

What does DELETE do in SQL?

A

Removes rows from a table

Example:
DELETE FROM Orders
WHERE OrderDate < CURRENT_DATE - 30;

147
Q

What do Aliases do in SQL?

A

Temporary names for columns or tables

Example:
SELECT o.OrderID AS “Order ID”, c.CustomerName AS “Customer”
FROM Orders o
JOIN Customers c ON o.CustomerID = c.CustomerID;

148
Q

What are all the Data Types in SQL?

A

Numeric Types:
- INT(n), TINYINT(n), DECIMAL(p,s), FLOAT, DOUBLE, etc.

String Types:
- CHAR(n), VARCHAR(n), TEXT, etc.

Date/Time Types:
- DATE: YYYY-MM-DD format.
- TIME: hh:mm:ss format.
- DATETIME, TIMESTAMP: YYYY-MM-DD hh:mm:ss format.

149
Q

What is the syntax for CREATE TABLE?

A

CREATE TABLE Students (

);

150
Q

What is the Primary key in a table?

A

A column that uniquely identifies rows in a table

151
Q

What is the Foreign key in a table?

A

A key that links to the PRIMARY KEY of another table

152
Q

What is ALTER TABLE used for?

A

To change the contents of a table such as adding, modifying, dropping etc.

Example:
ALTER TABLE table_name
ADD COLUMN new_column data_type [constraint];

153
Q

What is DROP TABLE used for?

A

Deletes a table and all its data permanently

Example: DROP TABLE table_name;

154
Q

How does the UNIQUE constraint work?

A

Used to check data is not a duplicate

Example:
CREATE TABLE Suppliers (
supplier_id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255) NOT NULL,
UNIQUE(name)
);

155
Q

How does the CHECK constraint work?

A

Used to check if a condition is true

Example:
CREATE TABLE Parts (
part_no INT PRIMARY KEY,
description VARCHAR(40),
price DECIMAL(12,2) NOT NULL CHECK(price > 0),
cost DECIMAL(12,2) NOT NULL CHECK(price >= cost)
);

156
Q

What does AUTO_INCREMENT do?

A

Automatically generates a unique number for each new row. Commonly used for primary keys

Example:
CREATE TABLE Persons (
PersonID INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
FirstName VARCHAR(255) NOT NULL,
LastName VARCHAR(255)
);

157
Q

What is an Index?

A

A data structure that speeds up data retrieval but can slightly slow down updates.

158
Q

Why use an Index?

A
  1. Speeds up SELECT queries.
  2. Improves performance for frequently searched columns.
159
Q

What is a View?

A

A virtual table created from the result of an SQL query.

160
Q

What are the benefits of a View?

A
  1. Simplifies complex queries.
  2. Makes data updates reflect in all linked views (updatability).
161
Q

What does COMMIT do ?

A

Saves all changes permanently

162
Q

What does ROLLBACK do?

A

Cancels all uncommitted changes

163
Q

What are all the commands for DML (Data Manipulation Language) ?

A
  1. SELECT: Retrieves data.
  2. INSERT INTO: Adds new rows.
  3. UPDATE: Modifies existing rows.
  4. DELETE: Removes rows.
164
Q

What are all the commands for DDL (Data Defintion Language) ?

A
  1. CREATE TABLE: Defines a new table.
  2. ALTER TABLE: Modifies table structure.
  3. DROP TABLE/VIEW/INDEX: Deletes tables, views, or indexes.
  4. Constraints: NOT NULL, UNIQUE, PRIMARY KEY, FOREIGN KEY.
165
Q

What is Physical Database Design (PDD)?

A
  • Focuses on implementing a database on storage (e.g., hard drive).
  • Defines how data is stored, indexed, and organized for performance.
166
Q

What are Key Steps in Implementation in PDD?

A
  • Create database schemas (using SQL DDL commands).
  • Populate data with SQL DML (e.g., INSERT).
  • Use indexes, constraints, and file organization for optimization.
167
Q

What is the first step in Physical Data Design?

A
  1. Translate Logical Data Model for Target DBMS
    • Design Base Relations:
      • Define tables, columns, primary keys, and surrogate keys
    • Derived Data:
      • Decide whether to store derived data
    • General Constraints:
      • Use CHECK, stored procedures, and triggers for validation.
168
Q

What is the second step in Physical Data Design?

A
  1. Design File Organizations and Indexes
    • File Organizations:
      • Choose how data is stored: heap (unordered) or sequential (ordered).
    • Indexes:
      • Use clustered indexes (physical order) or non-clustered indexes (logical order).
    • Disk Space Estimation:
      • Calculate storage needs based on table size and structure.
169
Q

What is the third (final) step in Physical Data Design?

A
  1. Design User Views and Security
    • User Views:
      • Define views for specific user roles and queries.
    • Security Mechanisms:
      • Use access control, encryption, and permissions.
170
Q

What is a Surrogate key?

A

An artificial key added to the table

With Surrogate Key:
- OrderDetails(OrderDetailID, OrderID, ProductID, Quantity)
Without Surrogate Key - Uses natural data to uniquely identify rows:
- OrderDetails(OrderID, ProductID, Quantity)

171
Q

When should you use surrogate keys?

A
  • When natural keys are too complex or unstable.
  • To avoid duplication across tables.
172
Q

What is Derived Data?

A

Data that is calculated based on other data in the database.

173
Q

What is a constraint?

A

A rule that enforces data integrity in a table

174
Q

What can be used for File Organization?

A

Heap Files:
- Data is stored unordered (faster for insertion).
Sequential Files:
- Data is stored in order (faster for specific queries).
MySQL Storage Engines:
- InnoDB: Default engine for transactions, foreign keys, and performance.

175
Q

What are the two types of indexes and what do they do?

A

Clustered Index: Physical data stored in index order, one per table
Non-Clustered Index: Logical ordering, multiple per table

176
Q

How would you control redundancy?

A

Denormalization:
- Improves read performance at the cost of write performance.
- Used when updates are rare, but reads are frequent.

177
Q

What is a Transaction?

A

A transaction is a group of database operations (insert, update, delete, etc.) that are treated as a single logical unit

178
Q

What does ACID stand for and what does each word mean ?

A

Atomicity
- All or nothing: The transaction is fully completed, or no changes are made.
Consistency
- Transactions transform the database from one consistent state to another.
Isolation
- The operations of one transaction are invisible to others until it is completed.
Durability
- Once a transaction is committed, changes are permanently saved, even in the
event of a system failure.

179
Q

How does SQL support transactions?

A

If any operation in a transaction fails, the entire transaction is rolled back to ensure data integrity.

180
Q

Define a Schedule

A

A schedule is the order of operations for concurrent transactions

181
Q

What are the types of Schedules and how do they work?

A
  1. Serial Schedule:
    • Transactions are executed one after the other.
    • Ensures data consistency but is less efficient.
  2. Non-Serial Schedule:
    • Operations of multiple transactions are interleaved.
    • Requires careful testing for conflicts.
182
Q

What are two desired properties of a Schedule?

A
  • Serializability: The schedule should be equivalent to a serial execution.
  • Recoverability: The schedule should ensure no data is lost due to failed transactions.
183
Q

When do Conflicting Operations Occur?

A

Occurs if two transactions:
- Access the same data
- At least one is a WRITE operation

184
Q

How to test for Serializability?

A

Build a precedence graph:
- Each node represents a transaction.
- Draw edges for conflicting operations.
Check for cycles:
- If cycles exist, the schedule is not serializable.

185
Q

Define Recoverability

A

A schedule is recoverable if, for every transaction T2 that reads data written by T1, T1 commits before T2

186
Q

Define Concurrency Control

A

Managing simultaneous operations on a database to prevent conflicts and maintain consistency

187
Q

What are the goals of Concurrency Control?

A
  1. Schedule Transactions: Avoid interference between them
  2. Guarantee Serializability: Ensure the results are as if transactions were executed
    one after the other
188
Q

What are the techniques of Concurrency Control?

A
  • Pessimistic: Assume conflicts are likely; prevent them proactively (e.g., locking,
    timestamping)
  • Optimistic: Assume conflicts are rare; resolve only if conflicts occur at commit time
189
Q

What is Locking?

A

A mechanism to prevent multiple transactions from accessing the same data simultaneously in a conflicting way

190
Q

Where can Locks be Applied?

A

Entire database, file, page, record, or even a single field value

191
Q

What are two lock types?

A
  • Read Lock (Shared): Multiple transactions can read, but no writes allowed.
  • Write Lock (Exclusive): Only one transaction can write.
192
Q

What are the key rules for locking?

A
  1. A transaction must issue read_lock or write_lock before reading.
  2. A transaction must issue write_lock before writing.
  3. Locks must be released after operations (unlock).
193
Q

What is Two-Phase Locking (2PL)?

A

A locking protocol ensuring serializability by dividing a transaction into two phases:

  1. Growing Phase: Acquire locks, no releases.
  2. Shrinking Phase: Release locks, no new acquisitions.

Benefits: Guarantees serializability

194
Q

What is a Deadlock?

A

A situation where two or more transactions are stuck waiting for each other to release locks.

Example:
Transaction 1 locks X, waits for Y. Transaction 2 locks Y, waits for X.

195
Q

How to handle Deadlocks?

A
  1. Timeouts: Abort a transaction if it waits too long.
  2. Detection and Recovery:
    • Build a Wait-For Graph (WFG):
      • Nodes: Transactions.
      • Edges: Dependencies (T1 → T2 means T1 waits for T2).
      • A cycle indicates a deadlock.
    • Abort transactions involved in the deadlock.
  3. Prevention:
    • Lock all resources before execution (Conservative 2PL).
    • Use timestamps to prioritize transactions (e.g., older transactions wait for younger
      ones).
196
Q

What are 2 Deadlock Prevention Techniques?

A
  1. Wait-Die Algorithm:
    • Older transactions can wait for younger ones.
    • Younger transactions roll back if they must wait for older ones.
  2. Wound-Wait Algorithm:
    • Older transactions force younger ones to roll back.
    • Younger transactions can wait for older ones.
197
Q

What is Timestamping?

A
  • Assigns a unique timestamp to each transaction to order operations.
  • Older transactions are given priority during conflicts.

Key Concept:
- Ensures serializability by using timestamps to determine the execution order of
transactions.

198
Q

What is Optimistic Concurrency Control?

A
  • Assumes conflicts are rare.
  • Performs checks at the end of the transaction to detect conflicts.
199
Q

What is the process of Optimistic Concurrency Control?

A
  1. Execute transaction without locks.
  2. Before committing, check if conflicts occurred.
  3. If yes, roll back and restart the transaction.

When to Use: Suitable for systems with low contention for resources.

200
Q

What are the 3 types of Storage Structure (Not secondary storage types)?

A

Primary Storage:
- Volatile (e.g., main memory, cache memory).
- Data operations occur here but is lost on power failure.
Secondary Storage:
- Non-volatile (e.g., magnetic disk, optical disk, flash drives).
- Used for storing the database permanently.
Stable Storage:
- Data is replicated across multiple non-volatile media (e.g., RAID).
- Ensures reliability and protects against failures.

201
Q

Define Backup

A

A periodic copy of the database stored securely to restore data in case of failure.

202
Q

What are the types of Backup?

A
  1. Full Backup: Entire database.
  2. Differential Backup: Changes since the last full backup.
  3. Incremental Backup: Changes since the last incremental backup.
203
Q

Define a Recovery Technique

A

Restoring the database to a consistent state after a failure

204
Q

What are the 3 types of Recovery Techniques and how do they work?

A
  1. Log-Based Recovery:
    • Uses transaction logs to record database changes:
      • Before-Image (value before update).
      • After-Image (value after update).
    • Components: <Transaction>, <Commit>, <Abort>.</Abort></Commit></Transaction>
  2. Checkpointing:
    • Marks a “safe point” to reduce recovery time.
    • Recovery process:
      • Redo: Transactions committed after the checkpoint.
      • Undo: Transactions active during failure.
  3. Shadow Paging:
    • Maintains two page tables:
      • Shadow table (unchanged).
      • Current table (updated during transactions).
    • Advantages: No undo logs needed.
    • Disadvantages: High overhead for large updates.
205
Q

What are the Key Threats for Database Security?

A
  1. Confidentiality: Prevent unauthorized access.
  2. Integrity: Ensure data is accurate and unaltered.
  3. Availability: Ensure data is accessible when needed.
206
Q

What are the Causes of Security Risks for Database Security?

A
  • Unauthorized users.
  • Malicious/accidental errors by authorized users.
  • Errors by database administrators (DBAs).
207
Q

Define both Database Access Control Methods

A
  1. Discretionary Access Control (DAC):
    • Users specify access rights.
  2. Mandatory Access Control (MAC):
    • Bell-LaPadula Model:
      • “No Read Up”: Cannot read data above clearance level
      • “No Write Down”: Cannot write data below clearance level
208
Q

Define RAID

A
  • Combines multiple disks for:
    • Improved reliability (e.g., mirroring).
    • Increased performance (e.g., striping).
    • Enhanced availability with error correction.
209
Q

What are the 3 types of Database Security Controls?

A
  1. Access Control
  2. Views
  3. RAID
210
Q

Define the Role of the Database Administrator (DBA)

A
  1. Manage account creation and user access.
  2. Assign and revoke privileges.
  3. Implement security policies.
  4. Ensure data integrity and performance optimization.
211
Q

What does DBA stand for

A

Database Administrator

212
Q

Define a Distributed Database

A

A collection of multiple interconnected databases:
- Physically spread across various locations.
- Communicate via a network.

213
Q

What are the Desired characteristics of a Distributed Database?

A

Distribution/Location Transparency:
- Users see the database as one logical entity, no matter its physical location.

Local Autonomy:
- Local databases manage their own data independently, even if remote systems are
down.

214
Q

What are the types of Distributed Databases?

A

Homogeneous:
- All sites use the same DBMS.
Heterogeneous:
- Different DBMSs at each site, with gateways translating between them.

215
Q

Define Fragmentation in a Distributed Database

A

Dividing a table into smaller subsets (fragments) stored across different sites

216
Q

What are the 3 types of Fragmentation in a Distributed Database?

A
  1. Horizontal: Subsets of rows.
  2. Vertical: Subsets of columns (attributes).
  3. Hybrid: A mix of horizontal and vertical fragmentation.
217
Q

Define Data Allocation

A

Assigning fragments to specific sites (optimal allocation is complex)

218
Q

What are the Replication Options in a Distributed Database?

A
  1. Full Replication: Entire database copied at all sites (better reliability).
  2. Non-Redundant Allocation: Each fragment stored at one site (saves storage).
219
Q

What are the Pros and Cons of a Distributed Database?

A

Advantages:
1. Mirrors organizational structure.
2. Simplifies system expansion.
3. Improves reliability and availability.
4. Enhances response time and performance.

Disadvantages:
1. Complex design and implementation.
2. Ensuring security and integrity is harder.
3. Lack of standardization.

220
Q

Define a Data Warehouse

A

A database designed for analysis and decision-making:
- Separate from operational databases.
- Stores historical, consolidated data (not updated frequently).

221
Q

What are the 4 Key Features of a Data Warehouse?

A
  1. Subject-Oriented: Focused on specific areas (e.g., sales, transactions).
  2. Integrated: Combines data from multiple sources.
  3. Time-Variant: Captures data changes over time.
  4. Non-Volatile: Previous data isn’t altered when new data is added.
222
Q

Operational Database (OLTP) vs Data Warehouse (OLAP)

A

Feature |Operational Database (OLTP)| Data Warehouse (OLAP)
________________________________________________________________________________
Data | Current, dynamic | Historical, static
Purpose | Transaction-driven | Analysis-driven
Detail | Primitive, detailed | Summarized, consolidated
User | Operational users | Managerial users
Decision Support | Day-to-day operations | Strategic decisions

223
Q

How does Data Warehouse Architecture work?

A

ETL Process:
1. Extract: Retrieve data from source systems.
2. Transform: Convert it into a usable format.
3. Load: Store it into the warehouse.

224
Q

Define OLAP (Online Analytical Processing)

A

Uses multi-dimensional data views to provide quick access to strategic insights

Key Features: OLAP cubes analyze data across dimensions (time, region, product etc.)

225
Q

What is Data Mining?

A

Process of discovering patterns, trends, and correlations in data using:
1. Statistical methods.
2. Mathematical models.
3. Artificial intelligence techniques.