S2-m6 Flashcards
Operational Data Storage (ODS)
A repository of transactional data from multiple sources and is often an interim area between a data source and data warehouses. Ex) captured transactional data an be related to operational activities such as customer order, sales, or vendor payments
-smaller and are frequently overwritten as transactions are modified
Data Warehouse
Very large data repositories that are centralized and used for reporting and analysis rather than for transactional purposes
-pulls data directly from enterprise systems or from a ODS
Data Mart
Like a warehouse but is more focused on a specific purpose such as a marketing or logistics and is often a subset of a warehouse
Data Lake
A repository similar to a warehouse, but it contains both structured and unstructured data, with data being intis natural or raw format
-does not have a predefined structure or schema
Relational Database
The most efficient and effective method to store data
Benefits of a Relational database
storing data in a normalized, relational database helps to reasonably assure:
-Completeness
-No redundancy
-Business rules enforcement
-Communication and integration of Business Processes
Tables
Tables are organized structures that establish columns and rows to store specific types of data records. Each represents and object
The existence of more than one table in a relational database is the first differentiator between flat files and relational databases; relational databases are made up of at least two tables that are related.
Attributes
Columns in a relational database Describe the characteristics or properties desired to be known about each entity. Must be unique to that table. Three types of columns: primary keys, foreign keys, and descriptive attributes
Records
Rows contains info about one entity within the table
Fields
A space created at the intersection of a column and row in a table in which data is entered.
Data types
Each attribute in a table has a designated data type. Specify how the data is stored and indicate how the data in a given attribute can be analyzed
Primary Key
A primary key attribute is required in every table. It typically made up of one column. The purpose of the primary key is to help solidify that each row in the table is unique
-rarely descriptive; instead, a collection of letters or sequential numbers
-like id numbers
Composite Primary Key
When more than one attribute is necessary to function as a unique identifier
Foreign Key
Are attributes in one table that are also primary keys in another table
-same customer ID may appear multiple times in the sales orders table because a single customer can place more than one order
- link between a primary key in one table and a foreign key in another table is what creates a relationship between tables
Data Dictionaries
A type of metadata - data about data
Summaries info about the data in a database to make it easier to work with the data and understand how it can be used
Normalization
a database design technique that reduces data redundancy and eliminated undesirable characteristics. Normalization rules divide larger tables into smaller tables and link them using relationships. Purpose is to eliminate redundancy
First Normal Form 1NF Criteria
- Each cell in a table must contain only once piece of information. One value may be in a column
- Each record in every table must be uniquely identified. Accomplished with a primary key,
Second Normal Form 2NF
Follows once a table is in 1NF. Requires all non-key attributes in a table to depend on the entire primary key. 2nF is particularly meaningful for tables that have composite primary keys.
Third Normal Form 3NF
Once the table is in 1NF and 2NF the next step is to ascertain that each column n a table describes only the PK. 3NF wants to establish that non of the none key attributes depend on other non key attributes
What is a database schema?
a set of instructions to tell the database engine how to organize data to be in compliance with the data models. defines the actual structure of the database, including the tables, columns, and relationships between the data entities. Actual implementation and execution of the design
What is a data model?
Conceptual representations of the data structures in an information system and are not restricted to relational databases only. A high level of design of the data structures in an info system
A conceptual data model
a high level big picture representation of the data structure in an info system
A logical data model
more detailed representation of the data structures in an info system at the level of the data itself, providing more detail than a conceptual model.
A physical data model
most detailed representation. Specify how the data will be stored in the database.
Fact Table
contain measures or metrics, which are referred to as facts. these facts measure the business, such as sales, cost of goods, or profit. Do not contain descriptive elements about the business but do contain foreign keys
Dimension Table
Contain descriptive or contextual data for measures such as dates, product names and customer names
Star Schema
Most common schema for dimensional modeling and the simplest. Fact table in the center and the dimension tables arranged around it
Snowflake Schema
Dimension tables further normalized. The dimension tables are broken down to related tables, rather than a single table. More complex and flexible. A balance between a normalized schema and a star schema