Information Management and BI 1(b) Flashcards
What is “Data Modeling”?
- Process of prescribing data format, must be standardized for efficient processing
- Provides means for defining and analyzing data
- Dev process incorporate business rules into data model
- Data Modeling s/b foundation for reporting, lead to effective design of data to be captured, stored and reported
What is the difference btwn “Unstructured” vs “Structured” Data?
(1) Unstructured data:
- No prescribed format
- Capture, store and process in free form
(2) Structured data:
- Prescribed format
- Data organized and stored into files, records and fields w/ criteria for data entered into that format
- Ex: ERP transaction data
- Data must be Structured for computers to process data properly
Name 3 “Data Types”:
(1) Numbers - used for quantities, amts and
dollar amts
(2) Alphanumeric - Address( letters, numbers, characters used in data value)
(3) Dates - describe specific date format
What is “Data Normalization”?
- Transactional flat file
- Remove certain
factors to eliminate data anomalies - Outcomes of normalized data is to maximize efficiency of data model, increases integrity associated w/ capturing and managing data
- Eliminate data redundancy, leave any data field only once, except for duplication of key fields to establish relationships btwn files
- Integrating data from diff systems more efficient if integration data model is normalized
- Data files susceptible to data anomalies represents significant IT risk in that system
What are 3 types of “Conceptual Data Modeling” tools?
(1) Entity-relationship (ER) model
(2) Relational model
(3) Unified Modeling Language (UML)
What is the “ER Model” and what does it depict?
- A Conceptual Data Modeling tools
- Depicts relationship btwn files and records data files in a top-down fashion
- Results = ER diagrams
What are “ER diagrams”?
- Depicts file, type of interaction w/ another file and relationship btwn records in one file to the other
What is a “Relational Model”?
- A Conceptual Data Modeling tool
- Illustrates data files as tables w/ arrows from primary to foreign keys to illustrate relationship btwn data files
- Advantage of a Relational Model is that files are depicted as a table, similar to electronic spreadsheet and fields are depicted as columns, records as rows
What is Unified Modeling Language (UML)?
- Most recent Conceptual Data Modeling tool
- Associated w/ object-oriented programming (OOP), newer programming languages like C sharp and Java
- Both static and dynamic aspects can be depicted in a UML diagram
- UML is more technical than
ER and Relational Models
Name 2 “Data File” forms:
(1) Flat file
- All transactional data in a single record (row)
(2) Databases
- Transactional data in separate files using a structured process (outcome of data normalization)
What are 2 kinds of structure for data in a DW?
- Dimensional Data (“Star”)
2. Normalized Data (“Snowflake”)
What is the “Snowflake” Approach?
- Normalization of Data
- Tables grouped by subject areas (customers, products, employees, purchases)
- Dozens of tables then linked by foreign/primary keys
- Advantage: Adding data from systems to a Snowflake DW is straightforward
What are the Disadvantages of the “Snowflake” Approach?
- Disadvantages:
(1) Difficult to join data from different sources to obtain more meaningful info
(2) Difficult to access info w/out thorough knowledge of data sources and DW structure
What is the “Star” Approach?
- Dimensional Data
- Transactional data designed and defined as facts (numeric data) or dimensions (ref info that adds meaning to the facts)
- Includes various relevant data that describes the numeric data (dimensions)
- Ex: Sale transaction has quantities sold and dollar amts (“facts”)
- Includes sale date, region sold, sales rep, inv category
- Advantage: Easier for users to understand than Snowflake Approach
What are Disadvantages of the “Star” Approach?
- Disadvantages:
(1) Load data from diff systems and databases complicated to get it into dimensional form
(2) Difficult to modify DW structure if entity changes business processes or captures data - Struggle w/ “one way” dimension definitions
- Must refer to and use city, state and zip code as CSZ, and must be 1 field, not 3