Exam Flashcards

1
Q

Define Information Systems and its computational part.

A

Information system - people and tools used by a company to acquire and distribute information resources.

Computer System - the digital part of the information system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explain the relation between a DBMS (Database Management System) with the underlying concepts behind it such as a Data Model, Database and Operational Data.

A

DBMS - software that manages large data volumes. The software system is used to store, retrieve and run queries on data.

Definition of other concepts:
- Data model: set of concepts used to describe data and their associations.
- Database: collection of data that models a portion of the business.
- Operational data: transactional data (data generated by operations carried to within business processes).

Relation with DBMS: the DBMS manages the interactions with Databases containing operational and contextual data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Draw the line between Enterprise Resource Platform (ERP) and Customer Relationship Manager (CRM) and how they relate.

A
  • Enterprise Resource Platform (ERP): provides solutions to cover a large part of the enterprise.
  • Customer Relationship Manager (CRM): helps manage communication with customers.

The relationship between both concepts can be relied on the use of ERP from the CRM in order to improve operations and customer relations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define Business Intelligence.

A

A set of tools and techniques that enable a company to transform its business data into timely and accurate information for the decision process and then into knowledge about the business.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define the different phases of the BI pyramid and what each group of phases represent in relation to the meaning of the data.

A

1 - Data phase

  1. Operational Applications (data sources)
    - feeding of data
    - mostly dirty data

2 & 3 - Information phases

  1. Reporting
    - cleaned operational data
    - operative reporting
  2. OLAP & Dashboards (data warehouse)
    - Dashboards: set of reports
    - Strategical reporting
    - “Everyday tasks”

4, 5 - Knowledge phases

  1. Data Mining (learning models)
    - “Search for hidden knowledge in huge amounts of data”
    - Analyze patters (“beer and dypers” example)
  2. “What if analysis” (simulation models)
    - Forecasting (normal activity) vs Perturbation (discounts, change of course)
    - Prediction for the future
    - Possibility of cannibalization
  3. Decision
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Draw the difference between OLTP and OLAP

A
  • OLTP (On-Line Transactional Processing) is a type of data processing that consists of executing a number of transactions occurring concurrently. Any kind of digital interaction or engagement with a business.
  • OLAP (On-Line Analytical Processing) is a software technology to analyze business data from different points of view.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe the process of Data Warehousing. Also mention the requirements of the Warehousing process and the requirements of the Data Warehousing itself.

A

Data Warehousing is a collection of methods, techniques and tools used to support knowledge workers, to conduct data analyses that help with performing decision-making processes and improving information resources.

  • Requirements of the Warehousing Process:
    1. accessibility
    2. integration
    3. query flexibility
    4. information conciseness
    5. multidimensional representation
    6. correctness and completeness
  • Requirements of Data Warehousing:
    1. subject-oriented (different context in every database contained in the DW)
    2. integrated and consistent (filtering processes, integration between different DBs, temporal evolution - “comparing data of today with data from 25 years ago is not so useful as comparing this year’s data with last year’s data.”)
    3. non volatile (99% of transactions are reads, users just read, writing only though ETL)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe the ETL process.

A

ETL (Extract, Transform, Load) is a data integration process that combines data from multiple data sources into a single, consistent data store that is loaded into a data warehouse or other target system. Operational data and external data go through an ETL tool and are stored in data warehouse.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define a data mart

A

A subset or an aggregation of the data stored to a primary data warehouse. It includes a set of information pieces relevant to a specific business area, corporate department, or category of users, which allows those users to quickly access critical insights without wasting time searching through an entire data warehouse.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Draw the line between the two different types of Data Warehouse architecture and define their components.

A
  • Two-tier architecture (Source layer => ETL Tools => DW Layer => ): contains different data marts and storage for metadata (data that provides a context with details such as the source, type, owner, relationships, …)
  • Three-tier architecture (all the two-tier architecture components + ODS (Operational Data Store): operational data obtained after integrating and cleaning source data. As result, data is integrated, consistent, appropriate, current and detailed).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Identify the different architectures related to the data marts.

A
  1. Independent Data marts:
    - does not achieve a business scenario utility => data marts does not share data between them
  2. Data Mart Bus
    - enterprise view
    - conformed dimensions (concepts of primary relevance for the business domain are shared by most data marts)
  3. Hub-And-Spoke
    - industry standard
    - ODS architecture
    - vs Data Mart Bus = harder to set up, ODS architecture should analyze all data coming from the sources.
  4. Federation
    - good for highly dynamic environments (mergers and acquisitions)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Define Hierarchy, Cubes, Multidimensional Cube and its relation to measures, facts and dimensions.

A

Hierarchy can be seen as each dimension being the root of a hierarcht of attributed used to aggregate measure values. It’s a sequence of attributes rooted in a dimension and connected by a many to one relationship.

Cubes are a metaphor for the storage of facts. These cubes consists of measures, dimensions and hierarchies.

Measures tend to be numerical values to quantify the fact from different points of view.

Facts are every business aspects/events that are desired to be monitored. Facts are stored in cubes.

Dimensions are axis of a cube to analyze measure values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Describe the OLAP Operators.

A
  1. Roll-up (zoom out) => start from a cube and return an aggregation
  2. Drill-down => reverse of roll-up (disaggregate data)
  3. Slice => filtering based on a specific dimension value
  4. Dice => filtering based on specific dimension values for all dimension
  5. Pivoting (change the format of data) => row to columns or columns to rows
  6. Drill-across => “a join to connect events from different cubes”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe the multiple concepts of the Multidimensional Model.

A

Multidimensional Model is the key for representing and querying information in a Data Warehouse.

Fact is a concept relevant to decision-making processes. A fact expresses many-to-many relationship between its dimensions in the Dimension Fact Model.

Measure is a numerical property of a fact and describes a quantitative fact aspect that is relevant to analysis.

Dimension is a fact property with finite domain and describe an analysis coordinate of the fact.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe the normal forms in few words.

A

1 Normal form = no table column can have tables as values.

2 Normal form = no functional dependencies

3 Normal Form = no transitive dependencies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Draw line between between ROLAP and MOLAP.

A

ROLAP = stores data in columns and rows (relational tables), retrieves information on user demand.

MOLAP = uses a multidimensional cube that accesses stored data through various combinations (pre-computed, pre-summarized and stored)

17
Q

Make the distinction between primary and secondary events.

A

A primary event is a particular occurrence of a fact, identified by one n-ple made up of a value for each dimension. A value for each measure is associated with each primary event.

Secondary event that aggregates all of the corresponding primary events. Each secondary event is associated with a value for each measure that sums up all the values of the same measure in the corresponding primary events

18
Q

Describe the possible distinct scenarios in the DFM (shared hierarchy, …)

A
  • Descriptive attributes = stores additional information about a dimensional attribute.
  • Shared hierarchy = a shorthand to denote that a part of a hierarchy is replicated two or more times in a fact schema.
  • Convergence = two dimensional attributes are connected by two or more distinct directed paths, and all of them represent the same functional dependency.
  • Multiple arcs = models a many-to-many association between two dimensional attributes.
  • Incomplete hierarchies = a hierarchy where, for some instances, one or more aggregation levels are missing (because they are unknown or undefined).
  • Recursive hierarchies = The loop is a directed cycle in your hierarchy. It is a roll-up within an attribute of the hierarchy.
19
Q

Describe the Measure types and classification of measures (temporay vs non tempory hierarchies)

A
  • Temporal hierarchies vs Non-temporal hierarchies => possible aggregations that makes sense depending on the type of measure applied.
  • Temporary scenarios
    • Flow measures = refer to a timeframe, at the end of which they are evaluated cumulatively
      • the number of products sold in a day, monthly receipts, yearly number of births
      • Allow all aggregations in temporal and non-temporal hierarchies
    • Level measures = are evaluated at particular times.
      • the number of products in inventory, the number of inhabitants in a city
      • no SUM aggregation in temporal hierarchies
      • allow all aggregations on non-temporal hierarchies
    • Unit measures = are evaluated at particular times but are expressed in relative terms.
      • product unit price, discount percentage, currency exchange
      • do not allow SUM in both temporal and non-temporal hierarchies
20
Q

Define View-materialization.

A

View - table containing aggregated data.

View-Materialization - caching a frequent query in the hard disk.

21
Q

Define aggregate navigators.

A

Components of multidimensional engines in charge of reformulating OLAP queries on the cheapest view.

22
Q

Define Dynamic Hierarchies / Slowly Changing Dimensions, its Temporal scenarios and types.

A

Dynamic Hierarchies are hierarchies that progressively apply changes to its components.

  • Temporal scenarios:
  • up-to-date
    • querying data, the result is based on the current event
    • “a store when the manager changed in the middle of the year. When querying the total sales of the store by manager, the new manager will be given all the sales”
  • historical truth
    • events based on the time it occurred
    • “if the store manager changes, when querying the total sales by manager, two rows will be returned mentioning the sales of the previous and current manager”.
  • rollback
    • query a specific configuration at a previous time
  • Types
    • Type I
      • up-to-date scenario supported
      • All the events, including past ones, are always interpreted from the viewpoint of the current hierarchy instance, without tracking previous instances
      • only regular star schema required
    • Type II
      • historical truth supported
      • events stored into a fact table have to be
        associated with the hierarchy instance that
        was valid when that event took place
      • regular star schema with addition of new tuples into appropriated dimension table for the tracking
    • Type III
      • supports every temporal scenario (up-to-date, historical truth and rollback)
      • requires timestamps specifying the validity of the interval & initial surrogate key of the initial “master key”