Intro to Big Data Final Flashcards
What is business intelligence?
An umbrella term that combines architectures, tools, databases, analytical tools, applications and methodologies.
What is the major objective of business intelligence?
To enable interactive, sometimes real time data to give business managers and analysts the ability to conduct appropriate analyses
What is the process of BI based upon?
Transformation
Who came up with the term Business Intelligence, and when?
Gartner Group in the mid-1990s
What are the four major components of a BI system?
A DW
Business analytics
BPM
User interface / dashboard
What legislation requires business leaders to document their business processes and sign off on their legitimacy?
Sarbanes-Oxley Act
What is BI not?
Transaction processing
What is OLTP?
Online transaction processing, a system that handles a company’s routine ongoing business. Store SCM & CRM data
What is OLAP
Online analytical processing systems, use DW
What is BAM?
Business activity management
What are shells?
Preprogrammed tools where all you have to do is insert your numbers
What is the definition of analytics?
The process of developing actionable decisions or recommendations for actions based on insights generated from historical data
What are the three levels of Business Analytics?
Descriptive, Predictive, Prescriptive
What are descriptive analytics?
Reporting analytics, knowing what is happening in the organization and understanding some underlying trends and causes of such occurrences
What are predictive analytics?
They aim to determine what is likely to happen in the future
What are prescriptive analytics?
goal is to recognize what is going on as well as the likely forecast and make decisions to achieve the best performance possible (aka decision or normative analytics)
What is a data warehouse?
DW is a pool of data produced to support decision making; it is also a repository of current and historical data of potential interest to managers throughout the organization
What is teradata?
symbolize the ability to manage terabytes (trillions of bytes) of data
What are the characteristics of Data Warehousing?
Subject oriented (comprehensive view of org) Integrated Time variant (time series) Nonvolatile
What is a data mart?
a smaller version of a DW that focuses on a particular subject or department
What is a dependent data mart?
DM created directly from the Data Warehouse
What is an independent data mart?
Small warehouse designed for a strategic business unit or a department but its source is not an EDW
What is an Operational Data Store?
provides a fairly recent form of customer information file
updated throughout the course of business operations
used for short-term decisions
What are oper marts?
created when operational data needs to be analyzed multidimensionally
What is EDW?
Enterprise data warehouse
large-scale data warehouse that is used across the enterprise for decision support
What is Metadata?
Data about data
describe the structure of and some meaning about data
usually either technical or business metadata
What are the text mining techniques?
- Term frequency-Inverse document frequency
- Named entity recognition
- Topic modeling
- Event extraction
What is TF-IDF?
Term frequency-Inverse document frequency
looks at how frequently a word appears in a document and relative to the whole set of documents
Used to build classifiers or predictive models
What is NER?
Named entity recognition
Recognizes nouns and could be used to extract persons, organizations, locations, dates, monetary amounts
What is topic modeling?
Identifies dominant themes in a vast array of documents
What is Latent Dirichlet Allocation?
words automatically clustered by mixture of topics in each document
What is probabilistic latent semantic indexing?
models co-occurring probability
What is event extraction?
A step further than NER and harder
It looks at the relationship between nouns
looks at kinds of inferences that can be made from incidents in the text
What is the text mining process?
- Establish the Corpus: Collect & Organize the Domain Specific Unstructured Data
- Create the Term-Document Matrix: Introduce the structure to the Corpus
- Extract Knowledge: Discover Novel Patterns from the T-D Matrix
What is Web Usage Mining?
extraction of information from data generated through web page visits and transactions
What is the goal of sentiment analysis?
What do people feel about a certain topic?
What are the characteristics of Big Data?
Volume Variety Velocity Variability Veracity Value
What is Hadoop?
An open source framework for storing, analyzing massive amounts of distributed, unstructure data
What are the Big Data core technologies?
MapReduce + Hadoop
What is MapReduce?
A programming model that distributes processing of very large multi-structured data files across a large cluster of ordinary machines/processors. Developed and popularized by Google.
What are data mining characteristics?
- Source of data for DM is often a consolidated data warehouse
- DM environment is usually a client-server of a web-based information systems architecture
- Data for DM includes only structured data
- The miner is always an end user
What are the most common standard processes for data mining?
- CRISP-DM
2. SEMMA
What is CRISP-DM?
Cross-Industry Standard Process for Data Mining
What is SEMMA?
Sample, Explore, Modify, Model, and Assess
What are the phases for data mining?
- Define the problem
- Identify required data
- Prepare and pre-process
- Model the data
- Train and test
- Verify and deploy
What is simple split?
splitting the data into 2 mutually exclusive sets training (~70%) and testing (30%)
What are the data mining methods?
- Classification
- Regression
- Cluster
- Association Rule Mining
What are the types of Business Reporting?
- Metric Management Reports
- Visualizations/Dashboard
- Balanced Scorecard
What does a balanced scorecard do?
Translates an organization’s financial, customer, internal process into a set of actionable initiatives.
What is the definition of data?
A collection of facts usually obtained as a result of experiences, observations, or experiments
What are inferential statistics?
Drawing inferences about the population based on sample data
What is a histogram?
A frequency chart
What is Kurtosis?
detects the peak/tall/skinny nature of distribution
What is skewness?
Measure of asymmetry
What is regression?
a part of inferential statistics, used to characterize relationship between explanatory (input) and response (output) variable