Database Design Flashcards

Question

Dimensions (Dimensional Modeling)

Answer 1

A Dimension Table contains dimensions of a fact in a dimensional model Examples: An Order_Id in a fact table will lead to an orders dimension table with order_id, order_time, order_price, etc.

Answer 2

Fact table is a primary table in a dimensional model. It holds the primary keys referenced by dimension tables. Examples: 1. Orders Table: order_id, customer_id, store_id,

Answer 3

Star schemas are the simplest form of the dimensional model. They are comprised of fact and dimension tables. Fact tables hold records of metrics which are described further by dimension tables A star schema is denormalized

Answer 4

The snowflake schema is an extension of the star schema. Start schemas are extended over one dimension, whereas snowflake schemas are extended over many dimensions A snowflake schema is more normalized than a star schema

Answer 5

Normalization is a database technique that divides tables into smaller tables and connects them via relationships. The goal is to reduce redundancy and increase data integrity. To normalize a table, you identify repeating groups of data and create new tables for them.

Answer 6

Pros: - saves space because it prevents us from repeating data - enforces data integrity because of referential integrity (CA vs. California) - work well for OLTP because it prioritizes safe, fast insertion of data Cons: - Slower queries because there are many joins within the tables

Answer 7

Update Anomaly: Data inconsistency that arises when updating a database with redundancies Insertion Anomaly - When we are unable to insert a new record because of missing attributes Deletion Anomaly - When deletion of record(s) causes unintentional loss of data

Answer 8

A view is a virtual table that is not part of the physical schema The query (not the resulting data) is stored in memory You can query a view as you would a normal table, without having to type out the same queries over and over Views do not take up any storage aside from storing the query statement View are also useful for access control and for masking the complexity of queries. This is essential for highly normalized schemas

Answer 9

Materialized views store the query results rather than storing the query. They must be refreshed or you risk querying stale data. Materialized views are very useful in data warehouses where data is updated less frequently, so you don't have to worry about the view showing stale data. It can also help with queries that have long execution times.

Answer 10

Breaks a table into segments that can make it easier to manage and query your data This can help to improve query performance and control costs by reducing the number of bytes read by a query Partitioning is part of the physical data model, because we are distributing the same data over several physical entities

Answer 11

Vertical partitioning splits up a table vertically based on the columns This can be done even if the table is totally normalized

Answer 12

Partitions table based on rows

Answer 13

When a horizontal partition is used to spread a table over several machines For example, partition user data based on geographic location of users. PRO: query can be easily directed to correct shard, meaning less scanning of the data CON: can lead to unbalanced load, for example if distribution of users across regions is uneven

Answer 14

Rarely-accessed partitions can be moved to a slower medium Optimizes indices, which increases the chance that heavily-used indices will remain accessible in memory

Answer 15

Relational Data Base Management System, based on the relational model of data Queried with SQL Beneficial when working with structured data that will benefit from a pre-defined schema example: MySQL

Answer 16

non-relational database management system less structured and is document-centered rather than being table-centered Also offers greater flexibility

Answer 17

Key-value stores (Redis) Document Databases (MongoDB) Columnar - best suited for analyzing large datasets (Cassandra) Graph - Used to store data best represented as a graph - InfiniteGraph

Answer 18

methodology used as a set of practices and tools to automate the work of software developers

Answer 19

open-source platform that creates, schedules, and monitors data workflows (ETL / ELT)

Answer 20

an open-source framework for running, testing and documenting SQL queries

Answer 21

dashboards for viewing and visualizing analytics dashboards

Answer 22

scalable object storage

Answer 23

Redshift - Amazon Redshift uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes,

Answer 24

Continuous Integration, Continuous Deployment Method that involves automation to stages of app development and deployment

Database Design Flashcards

(48 cards)