Deck 1 Flashcards
Define a Star Schema
A Data Warehouse design comprised of a fact table(s) with a single table for each dimension. The dimensions in fact tables are connected to dimension tables though a primary key and foreign key relationship.
What is a Fact Table?
Fact table is a table of keys and measurements
Define a Snowflake Schema
Similar to the star schema. It uses normalization which splits up the data into additional tables. The splitting results in the reduction of redundancy and prevention from memory wastage. A snowflake schema is easy to manage but more complex to design and understand.
7 Key differences between Start and Snowflake Schema
- Star schema contains just one dimension table for one dimension entry while there may exist dimension and sub-dimension table for one entry.
- Normalization is used in snowflake schema which eliminates the data redundancy. As against, normalization is not performed in star schema which results in data redundancy.
- Star schema is simple, easy to understand and involves less intricate queries. On the contrary, snowflake schema is hard to understand and involves complex queries.
- The data model approach used in a star schema is top-down whereas snowflake schema uses bottom-up.
- Star schema uses a fewer number of joins. On the other hand, snowflake schema uses a large number of joins.
- The space consumed by star schema is more as compared to snowflake schema.
- The time consumed for executing a query in a star schema is less. Conversely, snowflake schema consumes more time due to the excessive use of joins.
What is a deadlock?
A system is in a deadlock state if there exists a set of transactions such that every transaction in the set is waiting for another transaction in the set. None of the transaction can make progress in such a situation. The only remedy to this undesirable condition is for system to invoke some drastic action,such as rolling back some of the transactions involved in the deadlock. There are two methods for dealing with deadlock 1.Deadlock Prevention. 2.Deadlock detection & Recovery.
What is the definition of GDPR?
The General Data Protection Regulation (GDPR) is a legal framework that sets guidelines for the collection and processing of personal information from individuals who live in the European Union (EU)
What is the definition of CCPA?
The California Consumer Privacy Act (CCPA) is a comprehensive new consumer protection law set to take effect on January 1, 2020. In the wake of the CCPA’s passage, approximately 15 other states introduced their own CCPA-like privacy legislation, and similar proposals are being considered at the federal level.
What is a ‘Dirty Read’?
A dirty read (aka uncommitted dependency) occurs when a transaction is allowed to read data from a row that has been modified by another running transaction and not yet committed.
What is a Galaxy Schema?
A Galaxy Schema contains two fact table that shares dimension tables. It is also called Fact Constellation Schema. The schema is viewed as a collection of stars hence the name Galaxy Schema.
What is a Star Cluster Schema?
Star cluster schema contains attributes of Start schema and Slow flake schema.
What is Cardinality?
In the context of ERD, cardinality refers to the count of instances that are allowed or necessary between entity relationships. How many rows are needed from one entity before it can be linked to another entity.
Two types:
Minimum - The minimum number of instances that are required in the relationship
Maximum - The maximum number of relationships required in the relationship.
What is Network Density?
The “Network Density” metric is commonly calculated as the number of actual possible connections divided by the number of possible connections. There are 9 actual connections and 56 possible connections in the example data, resulting in a Network Density value of .1607 which depending on the context could be considered to be low or high.
What is Network Centralization?
The “Network Centralization” metric tells us how “centered” the network is around the member(s) of the network with the highest number of connections. In a network with three members, this metric is of little value – but in a network with thousands or millions of connections, knowing the people or persons the network is centralized around is meaningful to our understanding of the network. In the data driving my implementation, Jane is involved in four of the nine transactions which would be commonly calculated as (4 / 9) = .444. This would be considered a high value in most cases, so you could say that the total network is highly centralized (around Jane).
What is Network Homophily?
The “Network Homophily” metric describes the degree that connected nodes share similar characteristics - i.e. are connected nodes largely alike? The richer the source data is, the more important and interesting this metric can be as the row count increases. This metric is of particular interest to marketers.
What is “In Degree”?
Switching to Node specific metrics; the “In Degree” metric is the count of in-coming connections to a Node from other nodes in the network. The “Out Degree” metric is the count of outgoing connections from a single node to other nodes in the network. These two metrics are often used to help analysts and marketers understand how “social” products within particular retail categories are with products in similar or different retail categories.