Database Design / Case Study Flashcards
What database relationship do you know?
1 to 1: so one entity associating with another. An example would be an employee with a particular work location
1-to-many: when one entity is associated with many others. an example would be if we continue a work location and all the employees that work there
Many to 1: when many entities are associated with one. An exmaple all of the students working on one project
Many to Many: when many entities are associated with many others. customers and products are a good one. Many products can be associated with many customers
Difference between snowflake and star schema and when you might use each?
A star schema is a data model where dimensional and fact tables are connected in a star like shape, for instance:
lets say an e-commerce site. it could have many different dimension tables. One for product. One for customer. One for store and one for date. these would link to a sales fact table. That would have the id’s of all the dimension tables and the purchase amount. The shape of this would resemble a star hence the name.
A snowflake schema splits dimensional tables into further dimension tables. So you will have more tables usually with an id and then name. each dimension is split until there is no repetition of values exceot for identifiers like id’s. So snowflake schemas are normalized whereas star chemas are denormaizled
Advantages:
Star schema has simple query writing.A snowflake schema might need multiple joins so there is a higher processing overhead.
However, star schemas take less space because they have data redundancy
Snowflake has better data integrity because it is normalized where as star schemas on update or delete can only update or delete one of the redundant rows
Generally, star schemas are easier to setup and query. Snowflake is easier to maintain.
What is normalization? Denormalization?
Data normalization is organizing and formatting data to appear similar across all records and fields. Data normalization helps provide analysts with more efficient and precise navigation, removing duplicate data and maintaining referential integrity.
On the other hand, Denormalization is a database technique in which redundant data is added to one or more tables. This technique can optimize performance by reducing the need for costly joins.
How would you handle data loss during a migration?
1) Define the specific data required for migration
2) Avoid migrating data that is no longer needed
3) run business analysis to determine data quality rules
4) understand the different rules of the source system you are moving to and test for that
5) perform real time verifications during the migration
6) Define a clear flow for data, error reporting, and rerun procedures
7) define the correct order of migration depending on the source.
8)Batch migration so maybe you split into 3 parts and test each time
- What are the three main types of data model techniques?
relational, dimensional, and entity-relationship
relational data model uses tables with columns and rows
entity relationship - a model with rel world entities and the relaitonships between them. An employee in an employee database
Dimensional modeling is the tehcnique of data model to use fact and dimension tables. This is to optomize for the SELECT operation on SQL.
What is normalization? Denormalization?
Normalization is removing duplicate data and making data appear similair across all records and fields. This makes it easier to update data correctly, maintains a higher data integrity.
Denormalizaiton is a database technique in which redundant data is added to one or more tables to make querying faster because you ned to do less joins.
What are advantages and disadvantages of a NoSQL database?
NoSQL has superior scalability and superior performation and can handle all types of data. Unstructured, semi-structured and structured. It is easier to spin up as you dont need to define schema.
If you have a huge huge amount of data like an Uber then you would probably be using this somewhere. Or if you are rapidly changing schema’s. NoSQL now though is a bit of a misnomer as like MongoDB does have querying and relational references.
relational databases are more consistent during writes because there is a rigid schema behind them. meaning your data is more consistent.
Generally I feel that relational databases get cheaper and better for performance. If you dont have to use NoSQL then dont it will make your life and your data organization easier. If you need to cut costs and have the scale then you can.
The biggest question is can you let go of dara consistency for faster writes? like yfor you guys i would think no way because its financial informaiton.
structured vs unsctructured vs semistructred data?
ACID
Acid is an acronym for a database system that optimizes around atomicity, consistency, isolation and durability.
Automicity - the entire process ush finsih or else the file, document, transaction reverts back to its prior state
Consistency - any transaction you complete in the database follows rules you or others gave
Isolation - Transactions happen seperately from one another. This is tricky because it slows your sysetems down but adds security and data protection.
Durability - the database stores all the information on permanent storage.
What is a lake-house?
a new data management architecture that combines the scale of data lakes with the ACID stable transactions of a data warehouses. Databricks offers a data lakehouse.
Its i also think a marketing strategy for alot of companies.
With what database design patterns do you have the most experience
Lazy loading for sure. That us ow we have made our app alot faster. CAlls to the DB dont happen until the user initiates them.
How would you design a relational database of customer data?
You are analyzing auto insurance data and find that the marriage attribute column is marked TRUE for all customers.
How far back does this bug go? Is this a star or snowflake schema? If I add a new row is this populating?
What database optimizations might you consider for a Tinder-style app?
O okay. So first off querying the whole database for user_id of the swiped or somehtign woudl be awful. So fist partition by zipcode since you are only going to be swiping by those in your zipcode.
Secondly make one transactional table that just dealt with all the information the UI would need. That way we can make that info super fast and likely we dont need that much info.
All the BI would go to a different database for machine learning.
I see the biggest challenge being swiping with a Yes/No value and how best to store that data. I think that would be a good example of where you coudl use NoSQL because its a huge amount of very simple not complicated data. You might have to use NoSQL for that if every swipe is a new row.
How would you design a relational database of customer data?
1) gather staheholder input and determine the purpose of the database. What kind of analysis will be performed? how oftern will that analysis be performed ?
2) I would then write out the schema I needed. Grab test data and start writing queries and
3) after that I would clean the data. Add columns and compress the size of necessary data.
4) then have a stakeholder test
5) then ship