database design Flashcards

Question 1

Q

What is indexing?

Answer

A

A persistent data structure which can be used to quickly locate rows in a table based on the values of one or several given attributes.

Question 2

Q

What are the gains and losses of indexing?

Answer

A

The gain of indexing:
- Searches in the database goes faster

The cost of an index:
- extra storage space
- index creation takes time - but only done once
- index maintenance - index must be updated – when the data in the table changes (might be a problem).

Question 3

Q

What does a full table scan mean? Is it good or bad?

Answer

A

It means that the database needs to search through all of the existing rows to find the row of your query.

This happens if you do not have indexing and the attribute your looking for is not unique. If the table has many rows it is not time efficient.

Question 4

Q

Why can it be a bad idea to use indexing for an attribute?

Answer

A

If you put an index over an attribute that means that 90% of the rows are selected anyway it could mean that it takes longer than to just do a full table scan.

Question 5

Q

How can you find out how many rows the database has to search through for your query?

Answer

A

put “explain” in front of your query.

Question 6

Q

Should you put indexing over primary and foreign keys?

Answer

A

No, because they are already unique and the database will find that row right away. Same thing for unique attributes.

Question 7

Q

What is the syntax for creating indexes?

Answer

A

CREATE INDEX index_name ON table_name(column1, column2…);

Question 8

Q

Explain what is required for first normal form

Answer

A

Composite attributes or attributes with different values are not allowed in relation.

A relation must have a primary key.
Nested relations are not allowed - the solution is to create new tables.

Multivariate attributes (repeating attributes) are not allowed.

Question 9

Q

Why do we not like redundancy in relational tables?

Answer

A

We do not like redundancy in relational tables because if we want to change one value of one row then we have to make changes in many places.

Question 10

Q

Why do we not like data-replication in relational models?

Answer

A

If we want to delete or change for example the name of something then we have to do it in many places which is inefficient. It is better to create a new relation with a foreign key to the first table where you had the redundancy.

Question 11

Q

Explain what is required for second normal form?

Answer

A

1NF + each non-key attribute must depend on the entire primary key: avoid partial dependencies.

If there is a composite primary key then the attributes needs to have a functional dependency to both.

If the primary key is not in composite form then it automatically fulfills 2NF.

Question 12

Q

Explain what is required for third normal form.

Answer

A

2NF + All non-key attributes will be dependent on the primary key.

If this is not the case it is called transitive dependency and we have to split the table and make a new table. The primary key in the “transitive table” is used as FK in the other table.

So to find the 3NF you have to find the transitive dependencies and split those into new tables.

If your relational database is in 3NF then the database is highly optimized.

Question 13

Q

What is a prime attribute and a non-prime attribute.

Answer

A

Prime attribute = an attribute belongs to at least one candidate key

Non-prime attribute = An attribute that doesn’t belong to any candidate key.

Question 14

Q

What is normalization?

Answer

A

The concept of “fixing your design” is called normalization.

“ The process of making something normal”
The process of breaking bad relations into smaller better relations(tables).

The problems with insert,update and delete will be resolved.

Question 15

Q

How do you choose a good primary key?

Answer

A

From your candidate keys you should choose an attribute that is an integer because an int will outperform a str.

Choose as primary key the smallest of the candidate keys in terms of size in bytes. Ints are smaller than str.

Question 16

Q

How do we map a many-many relationship in the ER diagram?

Answer

A

When you have a many to many relationships in the ER diagram you have to create a new relation that has two foreign keys (one primary key from each entity).

For example:
An employee can work on many projects and a project can have many employees.

Create one table for each entity and one for the relationship WORKS_FOR that has the PK from both entities as foreign keys. If the relationship has an attribute then the relationship table should have a column for that too.

Question 17

Q

What can go wrong with the INSERT command?

Answer

A

INSERT to add information to your tables.

INSERT can violate:
- domain constraint: if the value already exists -
- Referential integrity: FK (that you’re trying to put in) is not present
- Entity integrity: PK value is null.

Question 18

Q

What can go wrong with the DELETE command?

Answer

A

You can not delete a row whose information is used as a key in another table. MySQL would reject the deletion.

A solution could be to put null in the other table that uses the information (SET NULL). Another solution could be to delete all the rows in tables that use that information (CASCADE).

Question 19

Q

What can go wrong with the UPDATE command?

Answer

A

Is basically INSERT and DELETE combined. All the things that can go wrong with insert and delete can go wrong with update.

Question 20

Q

What is a candidate key?

Answer

A

Candidate key = unique attributes that could be used as primary keys.

Question 21

Q

What different data structures for indexing are there? what can they be used for?

Answer

A

Band B
Btrees
These two are good for =, >, < and BETWEEN.

Hash-tables can also be used for indexing but only works for =.

Question 22

Q

Does indexing over last_name work when you query for ‘%g’ or ‘g%’?

Answer

A

No, you need to query for the full last name for the index to work. It will do a full table scan for this query.

Question 23

Q

These queries are used a lot in a database, on which attributes should you put index?
attribute A is primary key and the attributes are A-D

UPDATE R SET D=? WHERE B=?
SELECT D FROM R WHERE C=?

Answer

A

On B and C because they are filtered frequently and these queries will then go faster.

Question 24

Q

What is the use of a primary key and foreign key?

What are the entity integrity restraints for these?

Answer

A

The primary key value is used to uniquely identify each tuple (row) in a relation(table). PK is used to be referenced from another tuple as FK.

The primary key cannot be null. Although attributes of a relation (except PK) can have null values.

Question 25

Q

What can be the problem with doing two transactions in parallel?

How do we fix that problem?

Answer

A

The two transaction happen in parallel and the updated value is not correct because they are not taking each other into account. This is called the lost update problem.

To avoid this a transaction schedule must be serialized. It could be slow because of this.

Question 26

Q

What is a transaction?

Answer

A

a series of operations that need to be executed as a single unit of work that transforms the database from one consistent state to another.

Question 27

Q

What is ACID?

Answer

A

A set of required properties of transactions:

Atomicity = all statements of a transactions are executed or none.

Consistency = all database constraints are satisfied after a transaction is executed.

Isolation = The result of executing concurrent transactions is the same as if the transactions were executed serially.

Durability = ensures that changes made to the database (transactions) that are successfully committed will survive permanently, even in the case of system failures

Question 28

Q

In transactions, what is commit and rollback?

Answer

A

commit = making the effects of the transactions permanent.
Rollback = aborting the transaction (because of failure ect).

Question 29

Q

What is a dirty read problem?

Answer

A

A dirty read problem is what happens when someones reads data as I am updating it in a transaction and the update has not yet been committed.

Question 30

Q

What is a relational database?

Answer

A

Data stored in table format.

Question 31

Q

What is the purpose of an ER model?

Answer

A

To make a blueprint of all entities, attributes ect. of the database we are going to design.

Organize how the data will be stored.

Question 32

Q

What is an entity?

Answer

A

A specific table(thing or object). Holds set/collection of objects.

Question 33

Q

What is an attribute?

Answer

A

A feature of a table or entity. I.g. the columns of the tables.

Each attribute has a data type (int, str, boolean ect.)

Question 34

Q

What is a derived attribute?

Answer

A

You can find the value from another attribute. Ex. you can get age from date of birth.

Question 35

Q

What is the use of a primary key?

Answer

A

A unique attribute that is used to locate a single row in a table.

Question 36

Q

What is a composite key?

Answer

A

A composite key is a unique combination of attributes that are used to locate a single row in a table.

Question 37

Q

What are the key constrains?

Answer

A

If a relation has several candidate keys, one is chosen arbitrarily as primary key.

The primary key value is used to uniquely identify each tuple (row) in a relation(table).

PK is used to be referenced from another tuple as FK.

Question 38

Q

What are the entity integrity constrains?

Answer

A

The primary key cannot be null.

Although attributes of a relation (except PK) can have null values.

Question 39

Q

What are the Referential integrity constraints (FK rules)?

Answer

A

The value in the FK columns can be either a value of an existing PK value from another relation or a null.

Question 40

Q

How can you make sure that an attribute that you want unique cannot be added twice?

Answer

A

Make a constraint by using CREATE TRIGGER or CREATE ASSERTION.

Question 41

Q

What does functional dependency mean?

Answer

A

A functional dependency is a relationship between two sets of attributes in a database, where one set (the determinant) determines the values of the other set (the dependent).

Question 42

Q

How can we find functional dependencies?

Answer

A

If one attribute has only unique values then other attributes are going to be functionally dependent on that attribute.

“A functional dependency is an association between two attributes. We say there is a functional dependency from attribute A to an attribute B if and only if for each value of A there can be at most one value for B “

Question 43

Q

What type of attributes are there?

Answer

A

simple - single/atomic values

derived - derived from another attribute: age from date of
birth.

composite - combination of simple attributes. Ex. name. parenthesis in ER.

multi-valued - Multiple values for a attribute. Ex. phone number. Double ovals in ER.

Question 44

Q

What is a recursive relationship?

Answer

A

A relationship between the same entity.
Ex. an employee can both be and have a boss.

Question 45

Q

What is a weak entity?

Answer

A

An entity without a key, but it has partial-keys.

It depends on another entity.

Question 46

Q

How is a transaction written?

Answer

A

BEGIN;
…
SQLcode
…
COMMIT; ROLLBACK;

Question 47

Q

Why do we need to write rollback in a transaction code?

Answer

A

Because if something happens before the transaction is committed it rolls back.

It is a way to undo a transaction that is not yet permanent.

Either all of the code runs or none of it.

Question 48

Q

Why is the isolation property in ACID often relaxed?

Answer

A

To reduce the overhead and increase the concurrency.

There are weaker isolation levels that allow some serializability violations in order to achieve higher performance.

Isolation level is set per transaction and reflects what read phenomena could happen in the transaction.

Question 49

Q

What is concurrency?

Answer

A

Enabling two or more users to retrieve information from the database at the same time without affecting data integrity.

Question 50

Q

What is the difference between an ER diagram and an EER diagram?

Answer

A

EER is an enhanced ER diagram.

Is has additional concepts such that allow for specialization: subclasses and superclass.

Question 51

Q

What is specialization?

Answer

A

The process of defining subclasses.

A subtype is a relationship and the subclass inherits all of the attributes and relationships of the superclass.

Question 52

Q

What is generalization?

Answer

A

It is the reverse of specialization. Several classes with common features generalize into a superclass.

Question 53

Q

What is the difference between disjoint and overlapping in specialization?

Answer

A

disjoint (d) = an entry is a member of at most one subclass.

overlapping (o) = an entry is a member of one or more subclass.

Question 54

Q

What is the difference between total and partial in specialization?

Answer

A

Total (double line to (d) or (o) from superclass) = every entry is a member of a subclass.

Partial (single line) = not every entry is part of a superclass.

Question 55

Q

What does it mean when the “U” sign is used when there is no specialization?

Answer

A

It means that for example some of the employees are managers if the sign is used between entities employee and manager.

Question 56

Q

What is attribute-defined specialization?

Answer

A

Specialization is based on one of the attributes of the superclass.

The attribute name is written in the specialization.

Question 57

Q

What are the characteristics for relational tables?

Answer

A

Atomic values (basic types like str int).
No multivalued data
Each value in a tuple must come from the domain of the attribute (set of values allowed in an attribute).
Value can be NULL to represent unknown/unavailable ect.

Question 58

Q

What is a foreign key?

Answer

A

A relation uses another relation’s
primary key as an attribute.

Question 59

Q

What are the basic operations for changing a database?

Answer

A

INSERT: a new tuple
DELETE: an existing tuple
MODIFY: an existing tuple

New database state arises after every operation and the database must be in a valid state after every operation.

Question 60

Q

What are the three different constraints used to maintain a valid state database?

Answer

A

key constraints

entity integrity constraints

referential integrity (FK) constraints

Question 61

Q

How can we create tables for one-one and one-many relationships in the ER diagram?

Answer

A

one-one relationship:
Foreign key. Use the entity with total participation (mandatory) and use the primary key from that table as foreign key in the optional table.

one-many relationship:
Foreign key. Use the “one-side” of the relationship. Use the primary key from that table as the foreign key in the “many-side” table.

Question 62

Q

How can you map a multivalued attribute?

Answer

A

For a multivalued attribute, create a new table. This table should have:
- The attribute itself
- PK of original table as FK

You could also use a composite key in the new table of the original PK and the multivalued attribute.

Question 63

Q

How can you map an entity in the relational tables if the entity is weak? What is the primary key of the new relation?

Answer

A

Create a new table for the weak entity.

The primary key in the new relation is the PK of the owner entity composite with the partial key of the weak entity. The PK of the owner entity is the FK.

Question 64

Q

How do you map superclasses and subclasses in the relational database if the specialization is attribute defined?

Answer

A

Create a relation for the superclass and relations for each subclass.

The tables for the subclasses each have the superclass PK as FK and the attributes associated with the subtype.

You could also combine attributes of superclass and subclass to a single relation.

Answer 65

A

If the generalization has total participation and is disjoined then create relation for each subclass with the superclass PK.

Answer 66

A

Create one relation for the superclass and add all attributes of superclass and subclass.

Also add boolean flags to indicate subclasses, one for each subclass.

Answer 67

A

Normalization is the process of breaking bad tables into smaller better tables.

We do this to avoid anomalies with INSERT, DELETE and UPDATE. Essentially to make the DB design better.

Answer 68

A

If a non-key attribute is functionally dependent on another non-key attribute we call this a transitive dependency. This is not allowed in 3NF.

Answer 69

A

Fulfill 2NF. Draw the functional dependencies and find the transitive dependencies.

Decompose the transitive dependencies into new relations.

The primary key in the new relation should also be in the original table to reference to the new relation.

Answer 70

A

Look at the un-normalized attributes and find the repeating ones.

Remove these and create a new relation for them and take the original keys with them.

Create a new unique key for the repeatable attributes aswell.

Answer 71

A

Look at the relations with more than one key. Check that all attributes depend on all keys.

If one attribute does not, remove it together with the key of which it does depend to a new relation.

Answer 72

A

When we want to keep track of an attribute that is tied to both entities. For example:

An employee can be a manager and we want to keep track of the starting date for manager. There could be an attribute to the relation “is” to the entities “employee” and “manager”.

Or if a runner runs a race and we want to keep track of starttime, the relationship between “runner” and “race” has attribute “starttime”.

Answer 73

A

No nested attributes
No replicative values
No multivalued values
No composite attribute
Must have a primary key

Answer 74

A

The original table has ssn, employee name, birthday, department number and department name.

Department number and department name are transitive dependencies.

New table has department number as PK and department name and the rest of the attributes from the original table remains in another table with department number as FK.

Answer 75

A

We have partial dependencies which means that we have to decompose the table to make it into 2NF.

employee name is not dependent on project number so we make a new relation with ssn and employee name. PK ssn.

project name and project manager is only dependent on project number so we move those to a new relation. PK project number.

Answer 76

A

Create a relation for each subclass with PK from superclass and all of the subclass attributes.

Only works for total participation.

Answer 77

A

The lost update is a concurrency problem that can happen if multiple transactions is trying to update the same data at the same time and some of the updates are lost and the final value is wrong.

The solution is to serialize the transactions.

Answer 78

A

Dirty read
Non-repeatable reads
Phantom reads

These problems can occur at different isolation levels that are set at each transaction. Usually we try to relax the isolation to increase concurrency but it increases the risk for the read phenomena.

Answer 79

A

Concurrency issue.

It describes a situation where a transaction reads a piece of data, and when it attempts to read the same data again later within the same transaction, it gets a different value.

The data read during a transaction is subject to modification by other transactions, leading to inconsistency in the values observed by the same transaction at different points in time.

To ensure that this cannot happen you need to set the isolation level to repeatable read or serializable.

Answer 80

A

Concurrency issue.

Refers to a situation where a transaction reads a set of rows that satisfy a certain condition, and when it attempts to read the same set of rows later within the same transaction, it observes additional rows that were inserted by other transactions.

To ensure that this cannot happen you need to set the isolation level for the transaction to serializable.

Answer 81

A

Read uncommitted
- All may occur.

Read committed
- Phantom reads
- non-repatable reads

Repeatable read
- Phantom reads

Serializable

Answer 82

A

No we would need to look at the tables in several states.

Answer 83

A

It will sum the values in T1.C2 and T2.C2 and then multiply them but only where T1.C1 = T2.C1.

It groups by the values by T1.C1.

Answer 84

A

If one transaction reads a value and the updates it and in parallel another transaction reads that same value and updates it again before the first update is done then that update will be lost and the final value will be wrong. .

Answer 85

A

The operations need to be the same and you need to check that the read and write operations come in the right order to ensure that there is no concurrency issue.

You need to make sure that you do not write and read values in the wrong order.

Answer 86

A

When you join two tables sql first finds all possible pairs between the tables based on the join condition and then filters those based on the where clause.