Exam Flashcards
Define Information Systems and its computational part.
Information system - people and tools used by a company to acquire and distribute information resources.
Computer System - the digital part of the information system.
Explain the relation between a DBMS (Database Management System) with the underlying concepts behind it such as a Data Model, Database and Operational Data.
DBMS - software that manages large data volumes. The software system is used to store, retrieve and run queries on data.
Definition of other concepts:
- Data model: set of concepts used to describe data and their associations.
- Database: collection of data that models a portion of the business.
- Operational data: transactional data (data generated by operations carried to within business processes).
Relation with DBMS: the DBMS manages the interactions with Databases containing operational and contextual data.
Draw the line between Enterprise Resource Platform (ERP) and Customer Relationship Manager (CRM) and how they relate.
- Enterprise Resource Platform (ERP): provides solutions to cover a large part of the enterprise.
- Customer Relationship Manager (CRM): helps manage communication with customers.
The relationship between both concepts can be relied on the use of ERP from the CRM in order to improve operations and customer relations.
Define Business Intelligence.
A set of tools and techniques that enable a company to transform its business data into timely and accurate information for the decision process and then into knowledge about the business.
Define the different phases of the BI pyramid and what each group of phases represent in relation to the meaning of the data.
1 - Data phase
- Operational Applications (data sources)
- feeding of data
- mostly dirty data
2 & 3 - Information phases
- Reporting
- cleaned operational data
- operative reporting - OLAP & Dashboards (data warehouse)
- Dashboards: set of reports
- Strategical reporting
- “Everyday tasks”
4, 5 - Knowledge phases
- Data Mining (learning models)
- “Search for hidden knowledge in huge amounts of data”
- Analyze patters (“beer and dypers” example) - “What if analysis” (simulation models)
- Forecasting (normal activity) vs Perturbation (discounts, change of course)
- Prediction for the future
- Possibility of cannibalization - Decision
Draw the difference between OLTP and OLAP
- OLTP (On-Line Transactional Processing) is a type of data processing that consists of executing a number of transactions occurring concurrently. Any kind of digital interaction or engagement with a business.
- OLAP (On-Line Analytical Processing) is a software technology to analyze business data from different points of view.
Describe the process of Data Warehousing. Also mention the requirements of the Warehousing process and the requirements of the Data Warehousing itself.
Data Warehousing is a collection of methods, techniques and tools used to support knowledge workers, to conduct data analyses that help with performing decision-making processes and improving information resources.
- Requirements of the Warehousing Process:
1. accessibility
2. integration
3. query flexibility
4. information conciseness
5. multidimensional representation
6. correctness and completeness - Requirements of Data Warehousing:
1. subject-oriented (different context in every database contained in the DW)
2. integrated and consistent (filtering processes, integration between different DBs, temporal evolution - “comparing data of today with data from 25 years ago is not so useful as comparing this year’s data with last year’s data.”)
3. non volatile (99% of transactions are reads, users just read, writing only though ETL)
Describe the ETL process.
ETL (Extract, Transform, Load) is a data integration process that combines data from multiple data sources into a single, consistent data store that is loaded into a data warehouse or other target system. Operational data and external data go through an ETL tool and are stored in data warehouse.
Define a data mart
A subset or an aggregation of the data stored to a primary data warehouse. It includes a set of information pieces relevant to a specific business area, corporate department, or category of users, which allows those users to quickly access critical insights without wasting time searching through an entire data warehouse.
Draw the line between the two different types of Data Warehouse architecture and define their components.
- Two-tier architecture (Source layer => ETL Tools => DW Layer => ): contains different data marts and storage for metadata (data that provides a context with details such as the source, type, owner, relationships, …)
- Three-tier architecture (all the two-tier architecture components + ODS (Operational Data Store): operational data obtained after integrating and cleaning source data. As result, data is integrated, consistent, appropriate, current and detailed).
Identify the different architectures related to the data marts.
- Independent Data marts:
- does not achieve a business scenario utility => data marts does not share data between them - Data Mart Bus
- enterprise view
- conformed dimensions (concepts of primary relevance for the business domain are shared by most data marts) - Hub-And-Spoke
- industry standard
- ODS architecture
- vs Data Mart Bus = harder to set up, ODS architecture should analyze all data coming from the sources. - Federation
- good for highly dynamic environments (mergers and acquisitions)
Define Hierarchy, Cubes, Multidimensional Cube and its relation to measures, facts and dimensions.
Hierarchy can be seen as each dimension being the root of a hierarcht of attributed used to aggregate measure values. It’s a sequence of attributes rooted in a dimension and connected by a many to one relationship.
Cubes are a metaphor for the storage of facts. These cubes consists of measures, dimensions and hierarchies.
Measures tend to be numerical values to quantify the fact from different points of view.
Facts are every business aspects/events that are desired to be monitored. Facts are stored in cubes.
Dimensions are axis of a cube to analyze measure values.
Describe the OLAP Operators.
- Roll-up (zoom out) => start from a cube and return an aggregation
- Drill-down => reverse of roll-up (disaggregate data)
- Slice => filtering based on a specific dimension value
- Dice => filtering based on specific dimension values for all dimension
- Pivoting (change the format of data) => row to columns or columns to rows
- Drill-across => “a join to connect events from different cubes”
Describe the multiple concepts of the Multidimensional Model.
Multidimensional Model is the key for representing and querying information in a Data Warehouse.
Fact is a concept relevant to decision-making processes. A fact expresses many-to-many relationship between its dimensions in the Dimension Fact Model.
Measure is a numerical property of a fact and describes a quantitative fact aspect that is relevant to analysis.
Dimension is a fact property with finite domain and describe an analysis coordinate of the fact.
Describe the normal forms in few words.
1 Normal form = no table column can have tables as values.
2 Normal form = no functional dependencies
3 Normal Form = no transitive dependencies.