ADM Flashcards

1
Q

Characteristics of Big Data:

- 4V’s and what they mean

A

Volume: there is a large amount of information

Variety: information comes in different forms, formats and types (financial data, marketing data, transactional data, images, videos, text, etc)

Velocity: rate of change. Data is constantly changing + growing. Eg: stock market: princes changes in a matter of seconds.

Veracity: reliability and how accurate the data is.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Characteristics of Big Data:

- alternative for 4V’s

A
  • Multiple sources of data (unstructured + semi-structured usually)
  • Multiple users
  • Multiple and unanticipated applications
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Challenge of Big Data:

What is the premise?

A

By managing the 4V’s of big data, we can make better decisions that could improve the company’s competitiveness, efficiency, profitability, etc.

Alternatively: Value lies in extracting knowledge from data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Business Intelligence

Definition:

A

“Business Intelligence (BI) is an umbrella term that includes the applications,
infrastructure and tools, and best practices that enable access to and
analysis of information to improve and optimize decisions and
performance” (Gartner group)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Business intelligence:

What kind of data does Business Intelligence use?

A

Business intelligence reveals insights from raw data.

ex: target (pregnant woman), Visa (predicting divorces)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Formula for Precision = ?

A

Precision = True positive/ (True positive + False positive) (all positive predictions)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Formula for Recall = ?

A

Recall = True Positive/ (True positive + False Negative)

(all positive cases in reality)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Evaluating BI knowledge

A

BI is typically in form of patterns.

Pattern quality:
- Objective evaluation based on statistical strength of findings

  • Subjective evaluation based on human judgement and expectations: expected vs unexpected. Actionable vs unactionable.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Characteristics of BI data

A

Historic:
- Data describing changes of a phenomenon throughout time

Aggregate:
- Data representing a larger population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Common BI Systems Architecture

A

Data Store:
- a duplicate of the transaction processing system. This is done done so that we can process the data without overloading the transaction processing system.

Data Warehouse:
- central warehouse where all the current and historic data of an organization is collected. Difficult and time consuming to integrate and make sense of the data.

Data Marts:
- subset of data from data warehouse where the views are tailored for specific applications.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Human roles in BI data management

A

Data owner: person who is ultimately accountable.

Data steward: person who is responsible for managing data output

Data user: uses data for applications and negotiates with the data owner for access (eg analyst)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Data Quality

Information quality:
- General consideration

A

Information quality is dependent on application of data.

Financial analysis of Fortune 500 companies vs auditing financial statements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Information Quality is evaluated in terms of the following dimensions: (also describe each one)

A
  • Accuracy: data represents the correct state of the real world
  • Reliability/ consistency: dependability of the output information or correctness of the analyzed data
  • Timeliness (current, now): whether data is up-to-date and available on time
  • Completeness: ability of the information system to represent every relevant state of the real world system.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is visual data analytics and when to use it.

A

“users performing analytical reasoning facilitated by interactive visual interfaces”.

Visual analytics is performed when users need to derive insight from data.

When the specifications of a problem domain are not well defined.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Information Quality is evaluated in terms of what dimensions? Explain each one

A
  • Accuracy: data represents the correct state of the real world
  • Reliability/ consistency: dependability of the output information, or correctness of the analyzed data
  • Timeliness (currency): whether data is up-to-date and available on time
  • Completeness: ability of the information system to represent every relevant state of the real world system.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the Data Abstraction Levels? Briefly describe them.

A

Conceptual level:
- understanding and communicating regarding a specific application domain. It models business concepts and their relationships

Logical level:
- deciding how to structure the data so that it becomes suitable for the application in the information system

Physical level:
- considers how data is stored and transmitted between systems and takes the technology infrastructure into consideration.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Data Abstraction Level:

Logical level

A

Data structure is defined in form of classification (abstraction mechanism)

ex: all items in my office vs Birds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Theoretical background on classification.

A
  • classification is not inherent to real world phenomena, it is an artifact of the human mind.
  • Classes are created in order to comprehend phenomena by grouping them based on similarity (Lakoff 1987)
  • Classes are cognitive shortcuts that act as heuristics.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Predefined classification of data. What is it?

A

Inference/ reason about the data using definition provided by data designer. -> there is a correct way of looking at things

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Non-classified data. What is it?

A

Stored data instances and their properties free of pre-defined classification.

Users define their own classifications based on properties of interest and on demand.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Resource Description Framework (RDF)

A

Subject -“predicate”-> Object

We do not assign any classification to the objects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Non-classified Data Usage users:

A

Content consumers: users familiar with the domain represented in the data source, but do not design the classification scheme

Content generators: users generating digital info (eg. comments, reviews, posts, etc)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Exploitation of information

A

“routine execution of knowledge” (close ended)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Exploration of information

A

search for novel and innovative ways of doing things (open ended question)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Metadata

Definition

A

Metadata is data about data.

Metadata is “structured, encoded data that describe
characteristics of information-bearing entities to aid in the
identification, discovery, assessment, and management of
the described entities” (according to the American Library
Association - Zuiderwijk et al. 2012)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Types of metadata

A

Business metadata: focus on content and condition of data from a business perspective (under abstraction level)

Technical metadata: info about tech details of data, systems that store data, processes that move data (logical and physical data levels)

Operational metadata: details of processing and accessing of data (ex data access hierarchies and clearance levels)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Application of metadata

A

It is the foundation of DM.

Within organization: what the definition of a certain concept is (customer, patient, etc)

Between organizations: how to reconcile data that might be defined differently

For security and data quality purposes: we need to know the lineage of provenance of the data.

28
Q

Reference data

A

is used to relate data in a DB to information beyond the boundaries of the organization.

29
Q

Metadata vs Reference data

A

Metadata: defines and describes the type of info in a db. Data about data

Reference data: composed of static facts that are shared by interacting entities (to harmonize usage) ex: ISBN, country codes, etc

30
Q

Master data

A

Provides the best version of the truth about a concept.

Useful when same concept differs between BUs of the same org.

31
Q

Metadata repository elements:

A
  • Business Glossary (business metadata)
  • Data dictionaries (technical metadata)
  • Reference and master data repositories
32
Q

What is Ontology?

A

Ontology is a branch of philosophy that describes the
order and structure of reality in the broadest sense
possible (Angeles 1981

33
Q

Universal Ontology

A

A universal ontology (like BWW) describes the order and structure of the real world.

34
Q

Domain Ontology

A

Domain ontologies form the foundation of a knowledge base.

Domain ontologies are used in diverse fields such as biomedical information, systems engineering and semantic web to represent domain knowledge (or the business metadata)

35
Q

Application of domain ontology

A

Create common understanding of information:
- facilitate interoperability

Enable repurposing of domain knowledge

Make the rules and assumptions of the domain more explicit.

36
Q

Good data supports the following:

A
  • know which data is used for IM
  • build new products with data
  • personal data to keep clients data safe
  • delete data for data server reduction. don’t know which data can be deleted
37
Q

AI supports the following:

A
  • make sure the analysis in unbiased and ethical
  • repeatable AI
  • software products based on AI
  • know which data is used of AI ( where does it come from and if it changed)
38
Q

Developments in AI and data management

A

Rise of data strategies facilitates value extraction from new data and AI products.

Regulations are catching up.

Platforms share and integrate data with third parties

AI replaces software.

39
Q

Data integration

A

Data integration is fundamental for achieving business objectives.

db -> Reporting tool - > monthly insights in sales.

40
Q

How should companies be in control of data integration?

A
  • Know the real source of data (unaltered quality data and definitions)
  • Know who is accountable for data quality at that source. Know who owns the data (consistent data is key for repetitive AI)
  • Know how to use the data (based on definitions). Avoid comparing apples to pears
  • Know if the data is changed at every handover
  • Know if the data you get fits into your system
  • Know if a bias& ethical procedure is followed for data extractions -> create trusted AI
41
Q

Data integration: high level responsibilities (people and what they do)

A

Data owner: responsible for good quality data

Data engineer: determines controls in accordance with requirements. He builds the controls as well

Report owner: responsible for qualitative report requirements

BI/AI specialist: determines which data, report template & analysis is used. He performs the analysis

42
Q

Consultancy: who is your target audience?

A

CFO: wants reliable reporting (report owner + data owner)

Head of sales: wants new insights to find new customers, keep existing ones (report owner + data owner)

CEO: responsible for sales and new products (often report owner)

CDO: responsible for data control and AI (not a data owner )

Lead data scientist: AI innovation through trusted data & algorithms

43
Q

What does Enterprise Resource Planning (ERP) do?

A

ERP attempts to integrate all departments and functions across a company onto a single computer system that can serve all those different departments’ particular needs.

  • automates and integrates the majority of business processes within an organization
  • enables sharing data and common practices across the enterprise
  • provides real-time access
44
Q

ERP Implementation cost.

What is the initial investment for:

Large companies = ?

Small organizations = ?

A

Large companies: $100M - 200M

Smaller organizations: more than $1M

45
Q

ERP implementation cost breakdown per categories

A
Consulting (30%)
Hardware (25%)
Software (15%)
Deployment team (15%)
Training (15%)
46
Q

ERP implementation duration?

A

Typically 1-4 years

47
Q

Benefits of ERP systems (ordered by importance)

A

1) inventory reduction
2) Personnel reduction
3) productivity improvements
4) order management improvements
5) Financial close cycle reduction
6) IT cost reduction

48
Q

ERP challenges (based on Deloitte):

A

People:

  • change management
  • training

Process:

  • Process reengineering
  • requiring ongoing support

Technology:

  • Software functionality
  • Upgrades
49
Q

Name some important ERP vendors

A

SAP

Oracle

Microsoft -> microsoft dynamics

50
Q

Important aspects of ERP and customization

A

ERP is about integration and standardization.

There will always be situations in which ERP systems and business process will not fit.

In this case we need to find out if we should either change the business processes a bit or customize the ERP software.

51
Q

ERP configuration vs customization. What does each one mean?

A

Configuration: about module selection. Covers anticipated variability.

Customization: source code-based adaptations. Meant to cover unanticipated variability.

52
Q

What does ERP vanilla implementation mean and what are reasons to consider it?

A

Uses modules as intended (no customization)

Reasons to consider:

  • straightforward
  • no skill or experience in building or changing systems
  • standardizing (less complications and cheaper)
53
Q

Customization of ERP. What companies should do it? What are benefits and drawbacks?

A

Companies with highly skilled IT devs can more easily do it.

Benefits: minimize risk of project failure

Drawbacks: when upgrading, each modification needs to be analyzed. Upgrade can turn into re-implementation. Literature suggests NO CUSTOMIZATION

54
Q

About ERP:

Institutional theory and misalignments.

What are Imposed structures?

A

Imposed structures are the result of external demands made on the organization by authoritative sources such as gov, professions and established industry practice.

55
Q

About ERP:

Institutional theory and misalignments.

What are voluntary structure?

A

Voluntary structure are developed as a result of an organization’s history and experience, strategy and management preferences.

56
Q

About ERP:

Ontology and misalignment.

Premise:

Ontological perspective on misalignments:

A

For an information system to be stable, its structure must represent a good mapping to the real world it seeks to model.

From an ontological perspective, misalignments are instances where crucial aspects of the real world are not adequately represented by the model embedded in the package.

57
Q

About ERP:

What is deep-structure?

A

Deep structure conveys the core meaning of the real-world system that the information system is intended to model.

DB structure of children hospital has no way to link mother and child

58
Q

About ERP:

What is surface structure?

A

Surface structure is concerned with how real-world meanings are conveyed through the interface between the information system and its users (dialogues and report format).

Customer service needs to see a client transaction and a summary of past transactions on the same screen when interreacting with a customer.

59
Q

About ERP:

Challenge: adapt the business processes or customize the source code?

A

Answer: customize for imposed deep misalignment, and adapt for voluntary
deep, imposed deep, and voluntary surface misalignments

60
Q

What is the first stage in developing a data management capability?

A

First step is to define a shared vision of the future using business models.

Define a strategic end goal!

The business strategy drives the data strategy - not the other way around!

61
Q

Roadmap data management capability

A

Vision -> Analysis -> Portfolio -> Execution

62
Q

What is the second stage in developing a data management capability?

A

Identify gap between end goal and present capacities.

Use a IT strategic impact grid to analyze the strategic value of the IT solutions. This looks at companies’ IT needs from two dimensions: IT reliability vs IT novelty

63
Q

IT strategic impact grid:

support mode

A

(low novelty - low reliability): don’t waste money

64
Q

IT strategic impact grid:

factory mode

A

(low novelty - high reliability): even small disruptions lead to loss of business . “dont cut corners”

65
Q

IT strategic impact grid:

Turnaround mode

A

(high novelty - low reliability): new tech promises major process improvements “don’t screw up”

66
Q

IT strategic impact grid:

Strategic mode:

A

(high novelty - high reliability): innovation is the name of the game. Spend what it takes and monitor results like crazy (amazon, google, facebook, apple)