01 Data Science for Business Flashcards
What is Big Data?
- Big Data is high-volumen, high velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making and process automation.
- The three V’s: volume, velocity and variety.
What is the general procedure of Data Science regarding extracting knowledge from data?
- It is assumed that extracting useful knowledge from data to solve business problems can be done systematically.
- IT is used to compile large data sets. Then data is analyzed to identify correlations to predict variables.
- Results from the analysis need to be generalizable (prevent overfitting the dataset).
- Meaningful decisions = identify the contexts in which data are created, analyzed and used.
Describe the relation of decision making and data:
- Traditionally, many decisions are made based on “gut feeling” of executives.
- Nowadays, data is available at sufficient quantity and granularity to let more decisions be based on facts.
- Top-down decision making is supplemented by bottom-up data analytics.
Why are Data Science capabilities regarded as a strategic asset?
- The strategic value of data science (compiling and analyzing data) can:
- Transform existing opperations to be more efficient
- Transform entire business models and generate new ways to earn profit/market share
Give some examples of data science solutions for business problems?
- Classification and class probability estimations.
- Regression (estimate numerical values for an individual)
- Similarity matching (identify similar individuals)
- Clustering (group of individuals, customer segments)
- Profiling (typical behaviours of individuals/groups)
- Link prediction (connections among data items)
- Data reduction (replace large data sets with smaller datasets)
- Causal modeling (what events influence each other).
Name some (5) approaches for data science:
- Statistics
- Database querying
- Data Warehousing
- Regresion analysis
- Machine learning, data mining
Briefly describe the evolution of analytical information systems:
1960: MIS (Management IS) Efficiente data processing, integrated information systems, vision of automated decision making.
1970: DSS (Decision Support System) Statistical algorithms, “what if” analysis, complex and rigid structures, databases.
1980: EIS (Executive IS) Multidimensional modeling, transaction processing and decision support, top management decisions.
1990: DWH (Data Warehouse) Integration of diverse data, interactive and customized reports/OLAP, historical data.
2000: BI (Business Intelligence) KPI Systems, balanced scorecards, analytical applications, data mining.
What are the key differences between model-driven vs data-driven decision making?
- Model-based decision making is based on predictions, based on statistics and operations research techniques.
- Data-driven decision making is based on providing data in one place (in a DWH).