Lecture 1 Flashcards
Introduction to DSS and data warehousing
decision support systems
any computerized system that processes and analyzes data and supports decision-making in an organization
business intelligence (BI)
- an umbrella term that combines architectures, tools, databases, analytical tools, applications, and methodologies
- information and knowledge that enables decision making
- relates to understanding preferences, coping with competition, identifying opportunities, enhancing efficiency
- uses tools such as data warehousing, knowledge management, queries, analysis, data mining, visualization
business analytics (BA)
transforming data into meaningful information or knowledge to support business decision-making
data
- internal or external
- structured or unstructured (more than 80%)
- items that are the most elementary descriptions of things, events and transactions
DSS types
Passive DSS
Supports decision-making processes, but it does not offer
explicit suggestions on decisions or solutions
DSS types
Active DSS
Offers suggestions and solutions
DSS types
Collaborative DSS
Operates interactively and allows decision-makers to
modify, integrate, or refine suggestions given by the system.
Suggestions are sent back to the system for validation
DSS types
Model-driven DSS
Enhances management of statistical, financial, optimization,
and simulation models.
DSS types
Communication-driven DSS
Supports a group of people working on a common task.
DSS types
Data-driven DSS
Enhances the access and management of time series of
corporate and external data.
DSS types
Document-driven DSS
Manages and processes nonstructured data in many formats
DSS types
Knowledge-driven DSS
Provides problem-solving features in the form of facts, rules,
and procedures
information
organized data that has meaning and vakue
knowledge
processed data or information that is applicable to a business decision problem
BA method types
descriptive analytics
- use data to understand past & present
- results in well defined problems and opportunities
- “what happened and what is happening?”
- business reporting, dashboards, scoreboards, data warehousing, OLAP, performance dashboard
BA method types
predictive analytics
- predict future behavior (states and conditions) based on past performance
- “what will happen and why will it happen?”
- data mining, text mining, web/media mining, forecasting
BA method types
prescriptive analytics
- make decisions or recommendations to achieve the best performance
- “what should I do and why should I do it?”
- optimization, simulation, decision modeling, expert systems/knowledge based systems/rule based systems, (AI)
descriptive analytics
performance dashboard
provides a comprehensive visual view of corporate performance measures, trends, and exceptions
descriptive analytics
OLAP
online analytical processing
OLTP
online transactional processing
ETL staging
- extraction (getting the data)
- transformation (cleaning the data)
- loading (storing the data in a relevant environment)
getting the raw data to workable data is the most time consuming part of BI/BA/DSS; other difficulties with ETL are:
* temporary storage of data
* very complex data
* development and maintenance are time consuming
* extract & load frequencies
data warehousing
a collection of methods, techniques, and tools used to support people to conduct data analyses that help with performing decision-making processes and improving information resources
metrics
Perspectives (time, location, product, etc) that are relevant for management, and are used in multi-dimensional models.
We try to prevent information overload (about 6-7 metrics because of cognitive limitations)
data warehouse process
a set of tasks that turn operational data into decision-making support information
* accessibility; to users not very familiar with IT and data structures
* integration of data; on the basis of a standard enterprise model
* query flexibility; to maximize the advantages obtained from the existing information
* information conciseness; allowing for target-oriented and effective analyses
* multidimensional representation; giving users an intuitive and manageable view of information
* correctness and completeness; of integrated data
data warehouse
a collection of data that supports decision-making processes
* subject-oriented (for client’s analytic needs, NOT daily operations/transaction processing, excludes data that is not useful for decision making)
* integrated (uses multiple data sources to provide a unified view of all data -> cleans and integrates data -converts it to data warehouse- to ensure consistency)
* time-variant: displays snapshots of history (evolution over a long ass time)
* is not volatile (read-only database; no operational updates of data - only loading and accessing)
a data warehouse exists of tables and relationships between tables (“key relationships”), with a new database next to existing ones purely for decision making
DBMS
database management system
operational database
software that is designed to allow users to easily define, modify, retrieve, and manage data in real-time
operational database properties
- thousands of users
- workload consists of preset transactions
- can access and write to hundreds of records
- goal depends on applications
- data is detailed, and both numeric and text
- data integration is application based
- quality graded on integrity
- only current data
- continuous updates
- normalized model
- optimized for OLTP access to subset of database
data warehouse properties
- hundreds of users
- workload consists of specific analysis queries
- can access millions of records, mostly read-only
- goal is decision-making support
- RDBMS
- data is summed up, mainly numeric
- data integration is subject based (on the wishes of the decision makers)
- high quality, graded on consistency
- current and historical data
- periodical updates (no up-to-date data)
- denormalized and multidimensional models (+star/snowflake models)
- pre-aggregation/pre-clustering/unstructured
- optimized for OLTP access to most of database
data warehouse needs to be a seperate database because?
- historic data: decision systems requires historical data which operational databases do not typically maintain
- data integration & consolidation: data warehousing requires aggregation (clustering) & summarization of data from different heterogenous sources
- data quality: different sources use inconsistent data representations, codes and formats which have to be reconciled for decision-making
and also queries on ERPs are too taxing
ERPs consist of
everyday operational/transactional processes of the company; one of the data sources in a data warehouse
* different platforms (every side and vendor has their own dialect of SQL)
* different databases
* process/product oriented (not decision oriented)
* internal/external data
* structured/unstructured data
* no integrated data
* inconsistent data/low quality
* limited historic data
* transaction processing performance is key (no space for analytical/BI queries!!)
RDBMS
relational database management system
data mart
dependent/independent
If the data mart feed is dependent on the central data warehouse, the data is synced. If not, the data mart is often connected to some other component (XML) on the back-end.
data warehouse framework
data mart
- focused on a single subject or line of business
- subset of data warehouse (less dimensions, less history, less detail, less sources)
- user group/application oriented
- dependent & independent
variants: ROLAP database & MOLAP multi-dimensional cube
data warehouse framework
BI front-end applications
- query & reporting
- OLAP (ROLAP, MOLAP, HOLAP, etc)
- data mining (statistics, decision trees, neural networks, etc)
- data visualization (graphs, animation, etc)
trends are integrated BI-suits and enterprise information portals
data warehouse framework
meta data/master data
very important
* data about data: location, meaning, applied transformations, update frequency, access rights, data models, view definitions, etc
* different heterogenous sources (terrible for maintenance)
alternatives: shared repository or XML (interchange using standards)
churning
propensity to buy
fraud detection
customer segmentation
single-layer architecture
two-layer architecture