database design Flashcards

1
Q

What is indexing?

A

A persistent data structure which can be used to quickly locate rows in a table based on the values of one or several given attributes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the gains and losses of indexing?

A

The gain of indexing:
- Searches in the database goes faster

The cost of an index:
- extra storage space
- index creation takes time - but only done once
- index maintenance - index must be updated – when the data in the table changes (might be a problem).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does a full table scan mean? Is it good or bad?

A

It means that the database needs to search through all of the existing rows to find the row of your query.

This happens if you do not have indexing and the attribute your looking for is not unique. If the table has many rows it is not time efficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why can it be a bad idea to use indexing for an attribute?

A

If you put an index over an attribute that means that 90% of the rows are selected anyway it could mean that it takes longer than to just do a full table scan.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can you find out how many rows the database has to search through for your query?

A

put “explain” in front of your query.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Should you put indexing over primary and foreign keys?

A

No, because they are already unique and the database will find that row right away. Same thing for unique attributes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the syntax for creating indexes?

A

CREATE INDEX index_name ON table_name(column1, column2…);

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain what is required for first normal form

A

Composite attributes or attributes with different values are not allowed in relation.

A relation must have a primary key.
Nested relations are not allowed - the solution is to create new tables.

Multivariate attributes (repeating attributes) are not allowed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why do we not like redundancy in relational tables?

A

We do not like redundancy in relational tables because if we want to change one value of one row then we have to make changes in many places.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why do we not like data-replication in relational models?

A

If we want to delete or change for example the name of something then we have to do it in many places which is inefficient. It is better to create a new relation with a foreign key to the first table where you had the redundancy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Explain what is required for second normal form?

A

1NF + each non-key attribute must depend on the entire primary key: avoid partial dependencies.

If there is a composite primary key then the attributes needs to have a functional dependency to both.

If the primary key is not in composite form then it automatically fulfills 2NF.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain what is required for third normal form.

A

2NF + All non-key attributes will be dependent on the primary key.

If this is not the case it is called transitive dependency and we have to split the table and make a new table. The primary key in the “transitive table” is used as FK in the other table.

So to find the 3NF you have to find the transitive dependencies and split those into new tables.

If your relational database is in 3NF then the database is highly optimized.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a prime attribute and a non-prime attribute.

A

Prime attribute = an attribute belongs to at least one candidate key

Non-prime attribute = An attribute that doesn’t belong to any candidate key.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is normalization?

A

The concept of “fixing your design” is called normalization.

“ The process of making something normal”
The process of breaking bad relations into smaller better relations(tables).

The problems with insert,update and delete will be resolved.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do you choose a good primary key?

A

From your candidate keys you should choose an attribute that is an integer because an int will outperform a str.

Choose as primary key the smallest of the candidate keys in terms of size in bytes. Ints are smaller than str.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do we map a many-many relationship in the ER diagram?

A

When you have a many to many relationships in the ER diagram you have to create a new relation that has two foreign keys (one primary key from each entity).

For example:
An employee can work on many projects and a project can have many employees.

Create one table for each entity and one for the relationship WORKS_FOR that has the PK from both entities as foreign keys. If the relationship has an attribute then the relationship table should have a column for that too.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What can go wrong with the INSERT command?

A

INSERT to add information to your tables.

INSERT can violate:
- domain constraint: if the value already exists -
- Referential integrity: FK (that you’re trying to put in) is not present
- Entity integrity: PK value is null.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What can go wrong with the DELETE command?

A

You can not delete a row whose information is used as a key in another table. MySQL would reject the deletion.

A solution could be to put null in the other table that uses the information (SET NULL). Another solution could be to delete all the rows in tables that use that information (CASCADE).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What can go wrong with the UPDATE command?

A

Is basically INSERT and DELETE combined. All the things that can go wrong with insert and delete can go wrong with update.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is a candidate key?

A

Candidate key = unique attributes that could be used as primary keys.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What different data structures for indexing are there? what can they be used for?

A

Band B
Btrees
These two are good for =, >, < and BETWEEN.

Hash-tables can also be used for indexing but only works for =.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Does indexing over last_name work when you query for ‘%g’ or ‘g%’?

A

No, you need to query for the full last name for the index to work. It will do a full table scan for this query.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

These queries are used a lot in a database, on which attributes should you put index?
attribute A is primary key and the attributes are A-D

UPDATE R SET D=? WHERE B=?
SELECT D FROM R WHERE C=?

A

On B and C because they are filtered frequently and these queries will then go faster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the use of a primary key and foreign key?

What are the entity integrity restraints for these?

A

The primary key value is used to uniquely identify each tuple (row) in a relation(table). PK is used to be referenced from another tuple as FK.

The primary key cannot be null. Although attributes of a relation (except PK) can have null values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What can be the problem with doing two transactions in parallel?

How do we fix that problem?

A

The two transaction happen in parallel and the updated value is not correct because they are not taking each other into account. This is called the lost update problem.

To avoid this a transaction schedule must be serialized. It could be slow because of this.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is a transaction?

A

a series of operations that need to be executed as a single unit of work that transforms the database from one consistent state to another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is ACID?

A

A set of required properties of transactions:

Atomicity = all statements of a transactions are executed or none.

Consistency = all database constraints are satisfied after a transaction is executed.

Isolation = The result of executing concurrent transactions is the same as if the transactions were executed serially.

Durability = ensures that changes made to the database (transactions) that are successfully committed will survive permanently, even in the case of system failures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

In transactions, what is commit and rollback?

A

commit = making the effects of the transactions permanent.
Rollback = aborting the transaction (because of failure ect).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is a dirty read problem?

A

A dirty read problem is what happens when someones reads data as I am updating it in a transaction and the update has not yet been committed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is a relational database?

A

Data stored in table format.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is the purpose of an ER model?

A

To make a blueprint of all entities, attributes ect. of the database we are going to design.

Organize how the data will be stored.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is an entity?

A

A specific table(thing or object). Holds set/collection of objects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is an attribute?

A

A feature of a table or entity. I.g. the columns of the tables.

Each attribute has a data type (int, str, boolean ect.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is a derived attribute?

A

You can find the value from another attribute. Ex. you can get age from date of birth.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is the use of a primary key?

A

A unique attribute that is used to locate a single row in a table.

36
Q

What is a composite key?

A

A composite key is a unique combination of attributes that are used to locate a single row in a table.

37
Q

What are the key constrains?

A

If a relation has several candidate keys, one is chosen arbitrarily as primary key.

The primary key value is used to uniquely identify each tuple (row) in a relation(table).

PK is used to be referenced from another tuple as FK.

38
Q

What are the entity integrity constrains?

A

The primary key cannot be null.

Although attributes of a relation (except PK) can have null values.

39
Q

What are the Referential integrity constraints (FK rules)?

A

The value in the FK columns can be either a value of an existing PK value from another relation or a null.

40
Q

How can you make sure that an attribute that you want unique cannot be added twice?

A

Make a constraint by using CREATE TRIGGER or CREATE ASSERTION.

41
Q

What does functional dependency mean?

A

A functional dependency is a relationship between two sets of attributes in a database, where one set (the determinant) determines the values of the other set (the dependent).

42
Q

How can we find functional dependencies?

A

If one attribute has only unique values then other attributes are going to be functionally dependent on that attribute.

“A functional dependency is an association between two attributes. We say there is a functional dependency from attribute A to an attribute B if and only if for each value of A there can be at most one value for B “

43
Q

What type of attributes are there?

A

simple - single/atomic values

derived - derived from another attribute: age from date of
birth.

composite - combination of simple attributes. Ex. name. parenthesis in ER.

multi-valued - Multiple values for a attribute. Ex. phone number. Double ovals in ER.

44
Q

What is a recursive relationship?

A

A relationship between the same entity.
Ex. an employee can both be and have a boss.

45
Q

What is a weak entity?

A

An entity without a key, but it has partial-keys.

It depends on another entity.

46
Q

How is a transaction written?

A

BEGIN;

SQLcode

COMMIT; ROLLBACK;

47
Q

Why do we need to write rollback in a transaction code?

A

Because if something happens before the transaction is committed it rolls back.

It is a way to undo a transaction that is not yet permanent.

Either all of the code runs or none of it.

48
Q

Why is the isolation property in ACID often relaxed?

A

To reduce the overhead and increase the concurrency.

There are weaker isolation levels that allow some serializability violations in order to achieve higher performance.

Isolation level is set per transaction and reflects what read phenomena could happen in the transaction.

49
Q

What is concurrency?

A

Enabling two or more users to retrieve information from the database at the same time without affecting data integrity.

50
Q

What is the difference between an ER diagram and an EER diagram?

A

EER is an enhanced ER diagram.

Is has additional concepts such that allow for specialization: subclasses and superclass.

51
Q

What is specialization?

A

The process of defining subclasses.

  • A subtype is a relationship and the subclass inherits all of the attributes and relationships of the superclass.
52
Q

What is generalization?

A

It is the reverse of specialization. Several classes with common features generalize into a superclass.

53
Q

What is the difference between disjoint and overlapping in specialization?

A

disjoint (d) = an entry is a member of at most one subclass.

overlapping (o) = an entry is a member of one or more subclass.

54
Q

What is the difference between total and partial in specialization?

A

Total (double line to (d) or (o) from superclass) = every entry is a member of a subclass.

Partial (single line) = not every entry is part of a superclass.

55
Q

What does it mean when the “U” sign is used when there is no specialization?

A

It means that for example some of the employees are managers if the sign is used between entities employee and manager.

56
Q

What is attribute-defined specialization?

A

Specialization is based on one of the attributes of the superclass.

The attribute name is written in the specialization.

57
Q

What are the characteristics for relational tables?

A
  • Atomic values (basic types like str int).
  • No multivalued data
  • Each value in a tuple must come from the domain of the attribute (set of values allowed in an attribute).
  • Value can be NULL to represent unknown/unavailable ect.
58
Q

What is a foreign key?

A

A relation uses another relation’s
primary key as an attribute.

59
Q

What are the basic operations for changing a database?

A

INSERT: a new tuple
DELETE: an existing tuple
MODIFY: an existing tuple

New database state arises after every operation and the database must be in a valid state after every operation.

60
Q

What are the three different constraints used to maintain a valid state database?

A

key constraints

entity integrity constraints

referential integrity (FK) constraints

61
Q

How can we create tables for one-one and one-many relationships in the ER diagram?

A

one-one relationship:
Foreign key. Use the entity with total participation (mandatory) and use the primary key from that table as foreign key in the optional table.

one-many relationship:
Foreign key. Use the “one-side” of the relationship. Use the primary key from that table as the foreign key in the “many-side” table.

62
Q

How can you map a multivalued attribute?

A

For a multivalued attribute, create a new table. This table should have:
- The attribute itself
- PK of original table as FK

You could also use a composite key in the new table of the original PK and the multivalued attribute.

63
Q

How can you map an entity in the relational tables if the entity is weak? What is the primary key of the new relation?

A

Create a new table for the weak entity.

The primary key in the new relation is the PK of the owner entity composite with the partial key of the weak entity. The PK of the owner entity is the FK.

64
Q

How do you map superclasses and subclasses in the relational database if the specialization is attribute defined?

A

Create a relation for the superclass and relations for each subclass.

The tables for the subclasses each have the superclass PK as FK and the attributes associated with the subtype.

You could also combine attributes of superclass and subclass to a single relation.

65
Q

How can you map generalization in the relational database?

A

If the generalization has total participation and is disjoined then create relation for each subclass with the superclass PK.

66
Q

How do you map an EER to relational database if the EER has overlapping generalization?

A

Create one relation for the superclass and add all attributes of superclass and subclass.

Also add boolean flags to indicate subclasses, one for each subclass.

67
Q

What is normalization and why do we need it?

A

Normalization is the process of breaking bad tables into smaller better tables.

We do this to avoid anomalies with INSERT, DELETE and UPDATE. Essentially to make the DB design better.

68
Q

What is a transitive dependency?

A

If a non-key attribute is functionally dependent on another non-key attribute we call this a transitive dependency. This is not allowed in 3NF.

69
Q

How do we achieve 3NF?

A

Fulfill 2NF. Draw the functional dependencies and find the transitive dependencies.

Decompose the transitive dependencies into new relations.

The primary key in the new relation should also be in the original table to reference to the new relation.

70
Q

How do we go from un-normalized form to 1NF
?

A

Look at the un-normalized attributes and find the repeating ones.

Remove these and create a new relation for them and take the original keys with them.

Create a new unique key for the repeatable attributes aswell.

71
Q

How do we go from 1NF to 2NF?

A

Look at the relations with more than one key. Check that all attributes depend on all keys.

If one attribute does not, remove it together with the key of which it does depend to a new relation.

72
Q

When is it appropriate for a relationship to have an attribute in an ER diagram?

A

When we want to keep track of an attribute that is tied to both entities. For example:

An employee can be a manager and we want to keep track of the starting date for manager. There could be an attribute to the relation “is” to the entities “employee” and “manager”.

Or if a runner runs a race and we want to keep track of starttime, the relationship between “runner” and “race” has attribute “starttime”.

73
Q

Explain what is needed for 1NF

A
  • No nested attributes
  • No replicative values
  • No multivalued values
  • No composite attribute
  • Must have a primary key
74
Q

Give an example of how we decompose a table with transitive dependencies.

A

The original table has ssn, employee name, birthday, department number and department name.

Department number and department name are transitive dependencies.

New table has department number as PK and department name and the rest of the attributes from the original table remains in another table with department number as FK.

75
Q

A table has:
ssn and project number as composite PK, employee name, hours, project name and project manager.

How do we make it into 2NF?

A

We have partial dependencies which means that we have to decompose the table to make it into 2NF.

employee name is not dependent on project number so we make a new relation with ssn and employee name. PK ssn.

project name and project manager is only dependent on project number so we move those to a new relation. PK project number.

76
Q

How do we map an EER that has generalization?

A

Create a relation for each subclass with PK from superclass and all of the subclass attributes.

Only works for total participation.

77
Q

What is a lost update problem? What is the solution?

A

The lost update is a concurrency problem that can happen if multiple transactions is trying to update the same data at the same time and some of the updates are lost and the final value is wrong.

The solution is to serialize the transactions.

78
Q

What are the different read phenomenas that can happen. What are they dependent on?

A

Dirty read
Non-repeatable reads
Phantom reads

These problems can occur at different isolation levels that are set at each transaction. Usually we try to relax the isolation to increase concurrency but it increases the risk for the read phenomena.

79
Q

What is the non-repeatable phenomena?

A

Concurrency issue.

It describes a situation where a transaction reads a piece of data, and when it attempts to read the same data again later within the same transaction, it gets a different value.

The data read during a transaction is subject to modification by other transactions, leading to inconsistency in the values observed by the same transaction at different points in time.

To ensure that this cannot happen you need to set the isolation level to repeatable read or serializable.

80
Q

What is the phantom read phenomena? What can you do to avoid it?

A

Concurrency issue.

Refers to a situation where a transaction reads a set of rows that satisfy a certain condition, and when it attempts to read the same set of rows later within the same transaction, it observes additional rows that were inserted by other transactions.

To ensure that this cannot happen you need to set the isolation level for the transaction to serializable.

81
Q

What are the different isolation levels and which phenomena can happen is you choose them?

A

Read uncommitted
- All may occur.

Read committed
- Phantom reads
- non-repatable reads

Repeatable read
- Phantom reads

Serializable

82
Q

By looking at just one table in one state in a database, can we with certainty say that some functional dependencies hold?

A

No we would need to look at the tables in several states.

83
Q

SELECT T1.C1, SUM(T1.C2*T2.C2) FROM T1, T2
WHERE T1.C1 = T2.C1
GROUP BY T1.C1

What will this query do?

A

It will sum the values in T1.C2 and T2.C2 and then multiply them but only where T1.C1 = T2.C1.

It groups by the values by T1.C1.

84
Q

By looking at the schemas of transactions, how can we know if they will suffer from a lost update problem?

A

If one transaction reads a value and the updates it and in parallel another transaction reads that same value and updates it again before the first update is done then that update will be lost and the final value will be wrong. .

85
Q

By looking at the schemas of two transactions, how do you know if one of them is serializable to the other?

A

The operations need to be the same and you need to check that the read and write operations come in the right order to ensure that there is no concurrency issue.

You need to make sure that you do not write and read values in the wrong order.

86
Q

How does a joined query work?

A

When you join two tables sql first finds all possible pairs between the tables based on the join condition and then filters those based on the where clause.