Intro to Big Data Final Flashcards

1
Q

What is business intelligence?

A

An umbrella term that combines architectures, tools, databases, analytical tools, applications and methodologies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the major objective of business intelligence?

A

To enable interactive, sometimes real time data to give business managers and analysts the ability to conduct appropriate analyses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the process of BI based upon?

A

Transformation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Who came up with the term Business Intelligence, and when?

A

Gartner Group in the mid-1990s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the four major components of a BI system?

A

A DW
Business analytics
BPM
User interface / dashboard

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What legislation requires business leaders to document their business processes and sign off on their legitimacy?

A

Sarbanes-Oxley Act

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is BI not?

A

Transaction processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is OLTP?

A

Online transaction processing, a system that handles a company’s routine ongoing business. Store SCM & CRM data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is OLAP

A

Online analytical processing systems, use DW

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is BAM?

A

Business activity management

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are shells?

A

Preprogrammed tools where all you have to do is insert your numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the definition of analytics?

A

The process of developing actionable decisions or recommendations for actions based on insights generated from historical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the three levels of Business Analytics?

A

Descriptive, Predictive, Prescriptive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are descriptive analytics?

A

Reporting analytics, knowing what is happening in the organization and understanding some underlying trends and causes of such occurrences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are predictive analytics?

A

They aim to determine what is likely to happen in the future

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are prescriptive analytics?

A

goal is to recognize what is going on as well as the likely forecast and make decisions to achieve the best performance possible (aka decision or normative analytics)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a data warehouse?

A

DW is a pool of data produced to support decision making; it is also a repository of current and historical data of potential interest to managers throughout the organization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is teradata?

A

symbolize the ability to manage terabytes (trillions of bytes) of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the characteristics of Data Warehousing?

A
Subject oriented (comprehensive view of org)
Integrated
Time variant (time series)
Nonvolatile
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is a data mart?

A

a smaller version of a DW that focuses on a particular subject or department

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is a dependent data mart?

A

DM created directly from the Data Warehouse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is an independent data mart?

A

Small warehouse designed for a strategic business unit or a department but its source is not an EDW

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is an Operational Data Store?

A

provides a fairly recent form of customer information file
updated throughout the course of business operations
used for short-term decisions

24
Q

What are oper marts?

A

created when operational data needs to be analyzed multidimensionally

25
Q

What is EDW?

A

Enterprise data warehouse

large-scale data warehouse that is used across the enterprise for decision support

26
Q

What is Metadata?

A

Data about data
describe the structure of and some meaning about data
usually either technical or business metadata

27
Q

What are the text mining techniques?

A
  1. Term frequency-Inverse document frequency
  2. Named entity recognition
  3. Topic modeling
  4. Event extraction
28
Q

What is TF-IDF?

A

Term frequency-Inverse document frequency
looks at how frequently a word appears in a document and relative to the whole set of documents
Used to build classifiers or predictive models

29
Q

What is NER?

A

Named entity recognition

Recognizes nouns and could be used to extract persons, organizations, locations, dates, monetary amounts

30
Q

What is topic modeling?

A

Identifies dominant themes in a vast array of documents

31
Q

What is Latent Dirichlet Allocation?

A

words automatically clustered by mixture of topics in each document

32
Q

What is probabilistic latent semantic indexing?

A

models co-occurring probability

33
Q

What is event extraction?

A

A step further than NER and harder
It looks at the relationship between nouns
looks at kinds of inferences that can be made from incidents in the text

34
Q

What is the text mining process?

A
  1. Establish the Corpus: Collect & Organize the Domain Specific Unstructured Data
  2. Create the Term-Document Matrix: Introduce the structure to the Corpus
  3. Extract Knowledge: Discover Novel Patterns from the T-D Matrix
35
Q

What is Web Usage Mining?

A

extraction of information from data generated through web page visits and transactions

36
Q

What is the goal of sentiment analysis?

A

What do people feel about a certain topic?

37
Q

What are the characteristics of Big Data?

A
Volume
Variety
Velocity
Variability
Veracity
Value
38
Q

What is Hadoop?

A

An open source framework for storing, analyzing massive amounts of distributed, unstructure data

39
Q

What are the Big Data core technologies?

A

MapReduce + Hadoop

40
Q

What is MapReduce?

A

A programming model that distributes processing of very large multi-structured data files across a large cluster of ordinary machines/processors. Developed and popularized by Google.

41
Q

What are data mining characteristics?

A
  1. Source of data for DM is often a consolidated data warehouse
  2. DM environment is usually a client-server of a web-based information systems architecture
  3. Data for DM includes only structured data
  4. The miner is always an end user
42
Q

What are the most common standard processes for data mining?

A
  1. CRISP-DM

2. SEMMA

43
Q

What is CRISP-DM?

A

Cross-Industry Standard Process for Data Mining

44
Q

What is SEMMA?

A

Sample, Explore, Modify, Model, and Assess

45
Q

What are the phases for data mining?

A
  1. Define the problem
  2. Identify required data
  3. Prepare and pre-process
  4. Model the data
  5. Train and test
  6. Verify and deploy
46
Q

What is simple split?

A

splitting the data into 2 mutually exclusive sets training (~70%) and testing (30%)

47
Q

What are the data mining methods?

A
  1. Classification
  2. Regression
  3. Cluster
  4. Association Rule Mining
48
Q

What are the types of Business Reporting?

A
  1. Metric Management Reports
  2. Visualizations/Dashboard
  3. Balanced Scorecard
49
Q

What does a balanced scorecard do?

A

Translates an organization’s financial, customer, internal process into a set of actionable initiatives.

50
Q

What is the definition of data?

A

A collection of facts usually obtained as a result of experiences, observations, or experiments

51
Q

What are inferential statistics?

A

Drawing inferences about the population based on sample data

52
Q

What is a histogram?

A

A frequency chart

53
Q

What is Kurtosis?

A

detects the peak/tall/skinny nature of distribution

54
Q

What is skewness?

A

Measure of asymmetry

55
Q

What is regression?

A

a part of inferential statistics, used to characterize relationship between explanatory (input) and response (output) variable