ADM Flashcards
Characteristics of Big Data:
- 4V’s and what they mean
Volume: there is a large amount of information
Variety: information comes in different forms, formats and types (financial data, marketing data, transactional data, images, videos, text, etc)
Velocity: rate of change. Data is constantly changing + growing. Eg: stock market: princes changes in a matter of seconds.
Veracity: reliability and how accurate the data is.
Characteristics of Big Data:
- alternative for 4V’s
- Multiple sources of data (unstructured + semi-structured usually)
- Multiple users
- Multiple and unanticipated applications
Challenge of Big Data:
What is the premise?
By managing the 4V’s of big data, we can make better decisions that could improve the company’s competitiveness, efficiency, profitability, etc.
Alternatively: Value lies in extracting knowledge from data
Business Intelligence
Definition:
“Business Intelligence (BI) is an umbrella term that includes the applications,
infrastructure and tools, and best practices that enable access to and
analysis of information to improve and optimize decisions and
performance” (Gartner group)
Business intelligence:
What kind of data does Business Intelligence use?
Business intelligence reveals insights from raw data.
ex: target (pregnant woman), Visa (predicting divorces)
Formula for Precision = ?
Precision = True positive/ (True positive + False positive) (all positive predictions)
Formula for Recall = ?
Recall = True Positive/ (True positive + False Negative)
(all positive cases in reality)
Evaluating BI knowledge
BI is typically in form of patterns.
Pattern quality:
- Objective evaluation based on statistical strength of findings
- Subjective evaluation based on human judgement and expectations: expected vs unexpected. Actionable vs unactionable.
Characteristics of BI data
Historic:
- Data describing changes of a phenomenon throughout time
Aggregate:
- Data representing a larger population
Common BI Systems Architecture
Data Store:
- a duplicate of the transaction processing system. This is done done so that we can process the data without overloading the transaction processing system.
Data Warehouse:
- central warehouse where all the current and historic data of an organization is collected. Difficult and time consuming to integrate and make sense of the data.
Data Marts:
- subset of data from data warehouse where the views are tailored for specific applications.
Human roles in BI data management
Data owner: person who is ultimately accountable.
Data steward: person who is responsible for managing data output
Data user: uses data for applications and negotiates with the data owner for access (eg analyst)
Data Quality
Information quality:
- General consideration
Information quality is dependent on application of data.
Financial analysis of Fortune 500 companies vs auditing financial statements.
Information Quality is evaluated in terms of the following dimensions: (also describe each one)
- Accuracy: data represents the correct state of the real world
- Reliability/ consistency: dependability of the output information or correctness of the analyzed data
- Timeliness (current, now): whether data is up-to-date and available on time
- Completeness: ability of the information system to represent every relevant state of the real world system.
What is visual data analytics and when to use it.
“users performing analytical reasoning facilitated by interactive visual interfaces”.
Visual analytics is performed when users need to derive insight from data.
When the specifications of a problem domain are not well defined.
Information Quality is evaluated in terms of what dimensions? Explain each one
- Accuracy: data represents the correct state of the real world
- Reliability/ consistency: dependability of the output information, or correctness of the analyzed data
- Timeliness (currency): whether data is up-to-date and available on time
- Completeness: ability of the information system to represent every relevant state of the real world system.
What are the Data Abstraction Levels? Briefly describe them.
Conceptual level:
- understanding and communicating regarding a specific application domain. It models business concepts and their relationships
Logical level:
- deciding how to structure the data so that it becomes suitable for the application in the information system
Physical level:
- considers how data is stored and transmitted between systems and takes the technology infrastructure into consideration.
Data Abstraction Level:
Logical level
Data structure is defined in form of classification (abstraction mechanism)
ex: all items in my office vs Birds
Theoretical background on classification.
- classification is not inherent to real world phenomena, it is an artifact of the human mind.
- Classes are created in order to comprehend phenomena by grouping them based on similarity (Lakoff 1987)
- Classes are cognitive shortcuts that act as heuristics.
Predefined classification of data. What is it?
Inference/ reason about the data using definition provided by data designer. -> there is a correct way of looking at things
Non-classified data. What is it?
Stored data instances and their properties free of pre-defined classification.
Users define their own classifications based on properties of interest and on demand.
Resource Description Framework (RDF)
Subject -“predicate”-> Object
We do not assign any classification to the objects.
Non-classified Data Usage users:
Content consumers: users familiar with the domain represented in the data source, but do not design the classification scheme
Content generators: users generating digital info (eg. comments, reviews, posts, etc)
Exploitation of information
“routine execution of knowledge” (close ended)
Exploration of information
search for novel and innovative ways of doing things (open ended question)
Metadata
Definition
Metadata is data about data.
Metadata is “structured, encoded data that describe
characteristics of information-bearing entities to aid in the
identification, discovery, assessment, and management of
the described entities” (according to the American Library
Association - Zuiderwijk et al. 2012)
Types of metadata
Business metadata: focus on content and condition of data from a business perspective (under abstraction level)
Technical metadata: info about tech details of data, systems that store data, processes that move data (logical and physical data levels)
Operational metadata: details of processing and accessing of data (ex data access hierarchies and clearance levels)