database design Flashcards
What is indexing?
A persistent data structure which can be used to quickly locate rows in a table based on the values of one or several given attributes.
What are the gains and losses of indexing?
The gain of indexing:
- Searches in the database goes faster
The cost of an index:
- extra storage space
- index creation takes time - but only done once
- index maintenance - index must be updated – when the data in the table changes (might be a problem).
What does a full table scan mean? Is it good or bad?
It means that the database needs to search through all of the existing rows to find the row of your query.
This happens if you do not have indexing and the attribute your looking for is not unique. If the table has many rows it is not time efficient.
Why can it be a bad idea to use indexing for an attribute?
If you put an index over an attribute that means that 90% of the rows are selected anyway it could mean that it takes longer than to just do a full table scan.
How can you find out how many rows the database has to search through for your query?
put “explain” in front of your query.
Should you put indexing over primary and foreign keys?
No, because they are already unique and the database will find that row right away. Same thing for unique attributes.
What is the syntax for creating indexes?
CREATE INDEX index_name ON table_name(column1, column2…);
Explain what is required for first normal form
Composite attributes or attributes with different values are not allowed in relation.
A relation must have a primary key.
Nested relations are not allowed - the solution is to create new tables.
Multivariate attributes (repeating attributes) are not allowed.
Why do we not like redundancy in relational tables?
We do not like redundancy in relational tables because if we want to change one value of one row then we have to make changes in many places.
Why do we not like data-replication in relational models?
If we want to delete or change for example the name of something then we have to do it in many places which is inefficient. It is better to create a new relation with a foreign key to the first table where you had the redundancy.
Explain what is required for second normal form?
1NF + each non-key attribute must depend on the entire primary key: avoid partial dependencies.
If there is a composite primary key then the attributes needs to have a functional dependency to both.
If the primary key is not in composite form then it automatically fulfills 2NF.
Explain what is required for third normal form.
2NF + All non-key attributes will be dependent on the primary key.
If this is not the case it is called transitive dependency and we have to split the table and make a new table. The primary key in the “transitive table” is used as FK in the other table.
So to find the 3NF you have to find the transitive dependencies and split those into new tables.
If your relational database is in 3NF then the database is highly optimized.
What is a prime attribute and a non-prime attribute.
Prime attribute = an attribute belongs to at least one candidate key
Non-prime attribute = An attribute that doesn’t belong to any candidate key.
What is normalization?
The concept of “fixing your design” is called normalization.
“ The process of making something normal”
The process of breaking bad relations into smaller better relations(tables).
The problems with insert,update and delete will be resolved.
How do you choose a good primary key?
From your candidate keys you should choose an attribute that is an integer because an int will outperform a str.
Choose as primary key the smallest of the candidate keys in terms of size in bytes. Ints are smaller than str.
How do we map a many-many relationship in the ER diagram?
When you have a many to many relationships in the ER diagram you have to create a new relation that has two foreign keys (one primary key from each entity).
For example:
An employee can work on many projects and a project can have many employees.
Create one table for each entity and one for the relationship WORKS_FOR that has the PK from both entities as foreign keys. If the relationship has an attribute then the relationship table should have a column for that too.
What can go wrong with the INSERT command?
INSERT to add information to your tables.
INSERT can violate:
- domain constraint: if the value already exists -
- Referential integrity: FK (that you’re trying to put in) is not present
- Entity integrity: PK value is null.
What can go wrong with the DELETE command?
You can not delete a row whose information is used as a key in another table. MySQL would reject the deletion.
A solution could be to put null in the other table that uses the information (SET NULL). Another solution could be to delete all the rows in tables that use that information (CASCADE).
What can go wrong with the UPDATE command?
Is basically INSERT and DELETE combined. All the things that can go wrong with insert and delete can go wrong with update.
What is a candidate key?
Candidate key = unique attributes that could be used as primary keys.
What different data structures for indexing are there? what can they be used for?
Band B
Btrees
These two are good for =, >, < and BETWEEN.
Hash-tables can also be used for indexing but only works for =.
Does indexing over last_name work when you query for ‘%g’ or ‘g%’?
No, you need to query for the full last name for the index to work. It will do a full table scan for this query.
These queries are used a lot in a database, on which attributes should you put index?
attribute A is primary key and the attributes are A-D
UPDATE R SET D=? WHERE B=?
SELECT D FROM R WHERE C=?
On B and C because they are filtered frequently and these queries will then go faster.
What is the use of a primary key and foreign key?
What are the entity integrity restraints for these?
The primary key value is used to uniquely identify each tuple (row) in a relation(table). PK is used to be referenced from another tuple as FK.
The primary key cannot be null. Although attributes of a relation (except PK) can have null values.
What can be the problem with doing two transactions in parallel?
How do we fix that problem?
The two transaction happen in parallel and the updated value is not correct because they are not taking each other into account. This is called the lost update problem.
To avoid this a transaction schedule must be serialized. It could be slow because of this.
What is a transaction?
a series of operations that need to be executed as a single unit of work that transforms the database from one consistent state to another.
What is ACID?
A set of required properties of transactions:
Atomicity = all statements of a transactions are executed or none.
Consistency = all database constraints are satisfied after a transaction is executed.
Isolation = The result of executing concurrent transactions is the same as if the transactions were executed serially.
Durability = ensures that changes made to the database (transactions) that are successfully committed will survive permanently, even in the case of system failures
In transactions, what is commit and rollback?
commit = making the effects of the transactions permanent.
Rollback = aborting the transaction (because of failure ect).
What is a dirty read problem?
A dirty read problem is what happens when someones reads data as I am updating it in a transaction and the update has not yet been committed.
What is a relational database?
Data stored in table format.
What is the purpose of an ER model?
To make a blueprint of all entities, attributes ect. of the database we are going to design.
Organize how the data will be stored.
What is an entity?
A specific table(thing or object). Holds set/collection of objects.
What is an attribute?
A feature of a table or entity. I.g. the columns of the tables.
Each attribute has a data type (int, str, boolean ect.)
What is a derived attribute?
You can find the value from another attribute. Ex. you can get age from date of birth.