Introduction Flashcards
What is the difference between the Star and Snowflake schema?
Star schemas have only one fact table located at the center of the model and then one level of dimension tables spread around the fact table in a star pattern. Since there is only one level of dimension table,
they’re very easy to query.
Snowflake schema is simply a more complex version of a star schema.
The main difference is that the dimension tables can have multiple levels.
increasing levels of dimension tables can greatly increase the amount of
processing required to run queries.
Are Star Schemas usually normalised?
NO
Star schemas are often not normalized because of only having one dimension table level.
Dimension tables are normalized in snowflake schema because data from the second and third dimension tables can be joined to higher level dimension
tables. This largely removes the need for duplicate data in a database resulting in normalized data.
Define the main differences between SQL and NoSQL
SQL Database
FIXED schema
Vertically scalable
Table based
ACID
NoSQL Databases
Not FIXED Schema (structures/semi/unstructured)
Horizontally scaled
Type: Document, Key-Value, Graph
BASE
Define the key differences between ETL and ETL Processes
ETL
Extract (Data Factory)
Transform (Databricks)
Load (into SQL)
ETL
Extract ( into staging area Data Factory)
Load (Polybase into Data Lake)
Transform (Databricks)
What are the key benefits of ETL v ETL?
ETL (mainly SQL)
Not good for loading massive amounts of data
May transform data that is not used
ETL (Synapse/Cosmos)
Transform when needed
Need good governance
Define Data Democratization and Data Governance
Data Democratization
Data democratization is just how accessible is your data
Data Democratization means we can make the data accessible, not just to the, not just IT or through IT.
Data Governance
Data democratization can absolutely be a bad thing if employed that way. The other side of that coin is data governance. So data governance is organizing who should have access to what data, how we have the data stored and how we’re going to control or release that data, to other interested parties.