Quiz 1 Prep Flashcards

1
Q

when we have a lot of information and it exceeds the knowledge that we have. Superfluous information.

A

Information Overload

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Examples of __________
o High information about the 2016 presidential election, yet we have very little knowledge about what lead to this outcome!
o Daily information about the wars, yet very little knowledge about the underlying causes!

A

Information Overload

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Causes of poor data quality (4)

A
  • Platform Availability
  • Formality
  • Cost
  • Competitive Advantage
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Examples of platforms in data (4)

A

patient portals, social media platforms, online forums, e-commerce rating/review platforms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Aspects of platform availability (2)

A

empowered users and automated processes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Information provides firms with a _________.

A

competitive advantage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Acquiring high quality data is _________.

A

expensive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Abundant data quality is generally cheap but _________.

A

low quality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Today’s data characteristics (2)

A

Volume and variety

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Information overload can lead to ___________.

A

poor decision making

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Characteristics of information overload (3)

A
  • Complexity
  • Substitution
  • Attention Deficit
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Substitute high quality data with low quality data.

A

Information overload - substitution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Example of information overload - substitution

A

Tinder

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Characteristics of Information Overload: _______
• The illusion of multi-tasking
• When we are exposed to a lot of data, our cognitive power is reduced?
• Distraction (Low cognitive power)

A

Attention Deficit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Characteristics of Information Overload: ______
• Time needed to consider all offered options (millions)
• Fear to miss important data needed for decision making (FOMO)

A

Complexity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Example of information overload: ______

buying a house

A

complexity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Characteristics of Paradigm Shift (4)

A
  • Get expert advice
  • Get information from a trusted source
  • Get wisdom of the crowds
  • Get information from a connected source
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Three characteristics of analytics

A
  1. Recommender Systems – Classification & Prediction
  2. Pattern Recognition
  3. Anomaly Detection algorithms
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Most companies have plenty of ____, but not enough _______.

A

data; knowledge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Companies collect data about customers, products, sales but still lack knowledge to (3):

A
  • Identify products, customers and sales channels that return the highest profit margins.
  • Forecast variations in buying patterns across different types of customers.
  • Predict customer churn
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Organizations are being compelled to capture, understand, and harness their _____ to support _________ in order to improve __________.

A

data; decision making; business operations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

The extensive use of data, statistical and quantitative analysis, explanatory and predictive models, and fact-based management to drive decisions and actions.

A

Business Analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Two Consequences of the Information Age

A
  1. Every business process generates data.

2. Every business needs analytics to remain competitive.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Idiosyncrasies of Business Analytics (3)

A
  1. The Data
  2. The Users and Sponsors
  3. The Methodology
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Intelligent use of _______ results in the following:
• better understanding of how technological, economic, and marketplace shifts affect business performance
• ability to consistently and reliably distinguish between effective and ineffective interventions
• efficient use of assets, reduced waste in supplies, and better management of time and resources
• risk reduction via measurable outcomes and reproducible findings
• early detection of market trends hidden in massive data
• continuous improvement in decision making over time

A

analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

A lot of people confuse analytics with _______.

A

simple reporting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Examples of proactive analytical investigation (5)

A
  • inferential statistics
  • experimentation
  • empirical validation
  • forecasting
  • optimization
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

____________ answers questions such as:
• What does a change in the market mean for my targets?
• What do other factors tell me about what I can expect from my target?
• What is the best combination of factors to give me the most efficient use of resources and maximum profitability?
• What is the highest price the market
will tolerate?
• What will happen in six months if I
do nothing? What if I implement an
alternative strategy?

A

Proactive analytical investigation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Business Applications for Data Analytics (7)

A
  • Churn analysis
  • Cross-selling
  • Fraud detectionvid
  • Risk management
  • Customer segmentation
  • Targeted ads
  • Sales forecast
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Data Mining Tasks (6)

A
  • Classification
  • Association
  • Regression
  • Forecasting
  • Sequence Analysis
  • Deviation Analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

________ is an iterative process.

A

Knowledge discovery

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

the core of the knowledge discovery process.

A

Data mining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

The Knowledge Discovery Process (KDD) (8 Steps)

A
  1. Data Collection
  2. Data Cleaning and Transformation
  3. Model Building
  4. Model Assessment
  5. Reporting
  6. Prediction (Scoring)
  7. Application Integration
  8. Model Management
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Two types of reports:

A
  • Findings

- Prediction or forecast

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

How can we use data mining models? (3)

A
  • Insight
  • Prediction
  • Description
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Assign items to a discrete class based on training data.

A

Classification

37
Q

Data Mining Approaches (2)

A
  • Supervised Learning (Estimation and Classification)

- Unsupervised Learning (Clustering)

38
Q
o	Step 1: The training data contains target class information for each record.
o	Step 2: New records are classified based on the models developed on the training data.
A

Steps of supervised learning

39
Q

Systems designed to generate personalized recommendations to users for products and services.

A

Recommender Systems

40
Q

CRM

A

Customer relationship management

41
Q

How can large firms with millions of customers know customers individually?

A

Through analytical CRM - scoring and prediction

42
Q

Recommender Systems – Design 1

2

A

● Most Popular Recommendations

● Contextual Recommendations

43
Q

Recommendations based on transactions made by the entire population.

A

Most Popular Recommendations

44
Q

Newspapers recommend top stories

A

Example of Most Popular Recommendations

45
Q

Types of Contextual Recommendations

A
  • Time-Based

- Location-Based

46
Q

Staples.com: Time to re-order ink

A

Example of time-based recommendation

47
Q

At the mall, could see ads/coupons of the store closest to you

A

Example of location-based recommendation

48
Q

Recommender Systems – Design 2

2

A
  • Personalized Recommendations

- Based on transaction history

49
Q

Types of recommendations based on transaction history

A
  • Content-based
  • Collaborative-filtering
  • Social marketing
50
Q

 Recommend Similar items
 Domain specific OR problem specific
 “People who have bought this, also bought that”

A

Examples of content-based (market-basket) analysis

51
Q

 “Customers like you purchased these products”

 Examples: Amazon, Netflix

A

Examples of collaborative-filtering

52
Q

the nontrivial extraction of implicit, previously unknown, and potentially useful information from data.

A

Knowledge discovery

53
Q

_______ usually takes the most effort.

A

Data preparation

54
Q

A lot of people underestimate the efforts needed for ________.

A

pre-processing

55
Q

_________ & _________ are very important and take a lot of time and effort

A

Data collection; pre-processing

56
Q

Type of learning that includes a target variable or label.

A

Supervised learning

57
Q

Type of learning that does not include a target variable or label.

A

Unsupervised learning

58
Q
  • The classification of the training data is unknown.

* The aim is to construct a set of clusters, given the data.

A

Unsupervised learning (clustering)

59
Q

Review all data available for data mining.

Assess and explore data.

A

Data Understanding

60
Q

What are my available data sources? (4)

A
  • Corporate data sources
  • External data
  • Free external sources (census bureau)
  • Paid External Sources (Syndicated Databases)
61
Q

Characteristics of data selection (4)

A
  • Find available data sources
  • Find metadata
  • Clearly identify business objective prior to data selection
  • Avoid data sets with averages
62
Q

Data types (3)

A
  • Numeric or continuous data
  • Symbolic data
  • String data
63
Q

a data type used in programming, such as an integer and floating point unit, but is used to represent text rather than numbers. It is comprised of a set of characters that can also contain spaces and numbers.

A

String data

64
Q

Examples of numeric or continuous data

A
  • Integer: age, income

* Real: claim amount

65
Q

Examples of _______
• Flag / dichotomy / binary variable: only has two categorical values (Yes/No, True/False, Vote/no Vote, Response/No Response)
• Categorical variable: has more than two categorical values
(• Unordered: Region, Plan type, Product code
• Ordered: satisfaction rating (very satisfied to not very satisfied))

A

symbolic data

66
Q

Examples of ___________
• Frequency Distributions, Pie Charts, Bar Charts help identify potential problems (for example we have no Hispanics in our dataset)
• Use statistical techniques for continuous variables, for example scatter diagrams (age=200)

A

Analyzing Data Distribution

67
Q

Gain insight into data.

A

Data exploration

68
Q
  • Report key facts on historical data

* Aid in understanding the data

A

Pros of Summary Statistics

69
Q
  • Find simple relationship

* Statistics value can be deceiving

A

Cons of Summary Statistics

70
Q

To increase the accuracy of the mining, has to perform _____________.

A

data preprocessing

71
Q

Real-world data are (3):

A
  • Incomplete
  • Noisy
  • Inconsistent
72
Q

Data Quality Problems (11)

A
  • Missing data
  • Data out of range
  • Duplicate data
  • Invalid data
  • Bad format data
  • Out of sequence data
  • Mixed data
  • Unformatted data
  • Incomplete data
  • Truncated data
  • Transposition errors
73
Q

multiple observations for the same occurrence.

A

duplicate data

74
Q

data that is just plain wrong, incorrect.

A

invalid data

75
Q

fields that should be formatted in a particular manner are not. For example date fields that should be in ddmmyy format are recorded as ddmmmyyyy.

A

bad format data

76
Q

where data should be in a particular sequence, such as account number order but the data is unsorted or incorrectly sorted.

A

out of sequence data

77
Q

in some cases differing types of data or even multiple data files are intermixed.

A

mixed data

78
Q

data that should have been recorded in a particular format, such as an address where the street name and number goes in one area, the city goes in another, the country in still another and the post or zip code in yet another. If all the information is combined in one field, this makes processing difficult.

A

unformatted data

79
Q

files or records within a file that have missing fields or periods of coverage.

A

incomplete data

80
Q

part of the record or file has been cut off or truncated. For example, data recorded from 08:00-09:00 only containing data recorded up to 08:45 due to lack of space to store all of the information.

A

Truncated data

81
Q

when data is copied from a source, some of it is copied incorrectly.

A

transposition errors

82
Q

range of attributes (features) values differ, thus one feature might overpower the other one.

A

Data Normalization

83
Q

Characteristics of Transforming Data (4)

A
  • Consolidate Data
  • Convert formats to fit algorithm
  • Data normalization
  • Scaling data values
84
Q

Preparing your data (3)

A

o Manage missing values
o Handle extreme or unusual values
o Use nonnumeric inputs

85
Q

Select a subset of records from data set based on selection criteria

A

Sampling and selecting data

86
Q

Sampling methods

A
  • Simple random sample
  • Stratified random sample
  • Over sample
87
Q

Sampling by segment

A

Stratified random sample

88
Q

Characteristics of __________
• Remove irrelevant, weakly relevant, and redundant attributes
• Attribute selection
• Often little degeneration in predictive performance or even better performance

A

Feature selection

89
Q

In many cases the information that is lost by ___________ is made up for by a more accurate ___________ in the lower-dimensional space

A

discarding variables; mapping/sampling