Module 12 - Data Storage Design Flashcards
What are the 2 basic types of data storage?
Files
Databases
What are files used for?
Optimizing a specific transaction:
Master files
look-up files
transaction files
audit files
history files
What are databases for?
Databases are for storing a collection of files
What is a relational database?
A database that uses primary keys and foreign keys. It is a collection of tables.
SQL is used for CRUD
The DMBS ensures referential integrity i.e. all foreign key elements cannot be null.
What is a multidimensional database?
Used in data warehousing (collecting data from a wide range of sources)
Allows facts to be aggregated quickly on various dimensions.
What is a NoSQL Database?
Think of firebase, collections -> documents
Wide column stores
File strengths and weaknesses:
Good performance
Good for short term data storage
Redundant data; data must be updated using programs
What is an advantage and disadvantage of Legacy databases?
Very mature products
Outdated, not efficient, limited future
Ouch.
Relational database pros and cons
Leader in database market.
Uses SQL.
Lower level database with flexibility, the opposite of a no SQL database that is more abstracted.
Cannot handle complex data
Object databases pros and cons
Can handle complex data.
Limited market
Multidimensional database pros and cons
Can provide business facts quickly.
Highly specialized use.
NoSQL database pros and cons
Good for huge varied datasets.
New in the market
Specialized use
Entities –> tables
Each row represents a single entity instance
Each column represents an attribute
All values in a column have the same data format.
Each column has a range of values knows as the attribute domain
The order of rows and columns do not matter
Each table must have a primary key
What is a primary key?
An identifier composed of one or more attributes.
A table can have multiple candidates for a primary key.
1:M Relationship using PKs and FKs
Put the PK of the 1: in the table of the :M as a foreign key on the far left
You can do this because every instance of the many will refer to the one and only entity.
What is a foreign key?
The primary key of an outside table to create a COMMON ATTRIBUTE (the FK).
Referential Integrity
Every reference to an entity instance through an FK is technically valid, it can be either null or a valid value.
Null values should be avoided as much as possible
This can happen in relationships where there can be one or no entities.
1:1 Relationship with PKs and FKs
We add the FK in the middle of the table where it makes sense
M:N Relationship with PKs and FKs
Relationships are implemented through common attributes, what common attributes should we define for M:N?
We create an intersection or bridge entity, then we add the FKs to it on the far left.
I think this effectively creates two 1:M relationships
Why should null values in a database be avoided?
Because they threaten data integrity, think of comp 1800 and missing images.
What is the best way to optimize data storage for efficiency?
Normalization
What is denormalization?
Adding back redundant attributes to avoid aggregation (combing two tables together to access values from multiple tables)
What is indexing
Similar to a library catalogue, index keys POINT to table rows.
Indexing requires overhead, they need to be updated whenever data is inserted, deleted, or changed.