Quiz 1 Flashcards

1
Q

Nature of Big Data

A

Volume, Velocity, & Variety

Volume:Large-capacity data storage is not only the problem of data integration but also a critical challenge for analysis.

  • Velocity: Data’s patency, availability, and liquidity become critical; velocity indicates the speed of data changes as well as the need for timely data access and processing.
  • Variety: Firms have more new data for analysis, such as social media, mobile data, various databases that store hierarchical data, text records, e-mail, metering data, video, images, audio, stock ticker data, and financial transactions.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Digital curation

A

Digital curation generally refers to the process of establishing and developing long-term repositories of digital assets for current and future reference by researchers, scientists, historians, and scholars.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Role of IT in Organizations

A

Support
Enhance Effectiveness & Efficiency
Value Creation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data Pyramid

A

Bottom up, Data -> Information -> Knowledge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Organization and Information Systems

A

A growing interdependence between a firm’s information systems and its business capabilities and operations:

Changes in strategy, rules, and business processes increasingly require changes in hardware, software, databases, and telecommunications; often, what an organization can do depends on what its information systems will permit it to do.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

IT Empowers…

A

Suppliers, Firm, Customers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Identifying Valuable Customers: RFM Analysis

A

Recency, Frequency, & Monetary

Valuable customers” are more important to a firm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Identifying Valuable Customer: Clustering Analysis in N-Dimensional Space

A
Demographic analysis
• Customer profiling
• Customer behavior analysis
->
Target marketing  • Medium choice
• Channel design
->
Facilitating transactions • Enhanced CRM
->
Computational algorithm for clustering analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Data-Driven CRM: An Integrated Approach

A
Customer Life Cycle
Partial Functional Solutions
Complete Integrated Solution
Acquire
Direct marketing
Enhance
Cross-sell and up-sell
Retain
Proactive services
Sales force automation Customer support
Integrated CRM System and Applications
Cross-functional processes breaks down functional silos !!
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Data and Firm Performance

A
Data (Information) Visibility 
->
Data (Information) Accessibility
->
Data (Information) Analytics Capability
->
Information Velocity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Data quality:

A

The totality of features and characteristics of data that bears on their ability to satisfy the given business purposes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Data Quality: Common Dimensions/Measures

A
  • Accuracy
  • Completeness • Consistency
  • Timeliness
  • Accessibility
  • Believability
  • Interpretability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Data Governance

A

• Data governance establishes a formal structure and processes by which an organization manages all important issues surrounding data, including data quality as measured by accuracy, completeness, consistency, security, availability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Data Mining: A General Process

A
Selection
->
Preprocessing
->
Transformation
->
Mining
->
Interpretation/Evaluation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

System Development Projects: Key Deliverables

A
  • Fully tested, operational system that meets business requirements, such as management reports, user queries, and business analyses.
  • Documentations of essential system designs and data dictionary.
  • User manuals; e.g., important management reports and common user queries.
  • Documentations describing how the system relates to other relevant systems and how they can be integrated.
  • Key contacts for system operations support and enhancement.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Business-Driven Analysis of Technology Needs

A

Business Objectives
Business Strategies, Activities, and Operations
Information Requirements Core Systems Functionalities
Information Systems need prioritization and evaluation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Key Challenges in IT Management

A
  • A young technology (discipline).
  • Rapid technological advancements and changes. • Deep penetrations in all aspect of organizations. • Widening business-technology gap.
  • Increasing specialization and sub-specialization. • Shifting focuses, disruptive technology.
18
Q

IT and Modern Business

Organizations: Summary

A

• IT has enormous empowerment to firm and its employees
and managers !!
• Increasingly, IT deeply penetrates all aspects of an organization that now cannot operate and compete without IT.
• Effective investment, utilization and management of IT determine firm performance and competiveness.
• IT management is a management challenege, not a pure technology problem !!
• IT enhance firm’s business values and decision making.
• IT support and enable self-service models.
• Business managers must recognize and address the business-
technology gap in their organization; their commitments and
passion are critical to realizing the full business value of IT.

19
Q

IT and Modern Business

Organizations: Summary

A

• IT has enormous empowerment to firm and its employees
and managers !!
• Increasingly, IT deeply penetrates all aspects of an organization that now cannot operate and compete without IT.
• Effective investment, utilization and management of IT determine firm performance and competiveness.
• IT management is a management challenege, not a pure technology problem !!
• IT enhance firm’s business values and decision making.
• IT support and enable self-service models.
• Business managers must recognize and address the business-
technology gap in their organization; their commitments and
passion are critical to realizing the full business value of IT.

20
Q

Business Intelligence

A

(BI) refers to use of technology and statistical techniques to gather, analyze large amounts of data to support business decision making; i.e., discovering important patterns and phenomena for interpretation and business actions.

21
Q

Data mining:

A

A common technological/computational approach to extract business intelligence from vast amounts of (high-quality) data.

22
Q

Classification

A

Classification recognizes patterns that describe the group to which an item belongs by examining existing items that have been classified and by inferring a set of rules.

23
Q

Clustering

A

Clustering works in a manner similar to classification when no groups have yet been defined

24
Q

Association pattern/rule analysis

A

Association pattern/rule analysis (market basket analysis) discovers interesting co-occurrence of items from a set of transactions, each of which contains a collection of items.
• Analysis of retail transactions (e.g., items purchased in a transaction) helps vendors identify which products customers are likely to purchase together.

25
Q

Support of X→Y

A

Support of X→Y is obtained from dividing the number of transactions that contain X∩Y by the total number of transactions in the data

26
Q

Confidence of X→Y

A

Confidence of X→Y is obtained from dividing the number of transactions that contain X∩Y by the number of transactions that contain X

27
Q

Clustering analysis:

A

A process to partition (segment) a group of objects into multiple distinct clusters (subgroups) in a N-dimensional space, such that all the members in one cluster are similar to each other and distinctively different from the members of any other cluster, according to a particular similarity measure/metric; i.e., the opposite of distance between objects.

28
Q

Clustering Analysis for Market

Segmentation: Basic Steps

A

Formulate the segmentation problem and select the variables that we want to use as the basis for clustering.

  1. Compute the distance customers along the selected variables.
  2. Apply the clustering procedure to the chosen distance measure.
  3. Decide the number of clusters.
  4. Map and interpret clusters and draw conclusions; e.g., illustrative techniques include perceptual maps that are useful for firm’s interpreting the resulting clusters.
29
Q

Interval variables:

A

An interval variable contains continuous measurements that follow a linear scale (e.g., height, weight, temperature, cost, etc.); it is required that intervals keep the same importance throughout the scale.

30
Q

Ordinal variables:

A

An ordinal variable takes on more than two states; for example, you may ask someone to indicate his or her liking of a product according to the following categories: 1=detest, 2=dislike, 3=indifferent, 4=like and 5=admire; in an ordinal variable, the different states are ordered in a meaningful sequence but the interval between any two consecutive states may not be equally distanced.

31
Q

Nominal Variables:

A

A nominal variable can take on more than two states; for example, the eye color of a person can be blue, brown, green or grey eyes; these states may be coded as 1, 2, …, M, but their order and the interval between any two states do not have any semantic meaning

32
Q

Binary Variables:

A

Binary variables have only two possible states; for example, a person’s gender is either female or male; binary variable can be viewed as a special case of nominal variable.

33
Q

k-Means Clustering: Example

A

Centroid of a cluster is the average of all the data points in that cluster.

34
Q

K-Means clustering: General Process

A
  1. Choose the number of clusters, k.
  2. Generate k random points as cluster centroids.
  3. Assign each point to the nearest cluster centroid.
  4. Re-compute the new cluster centroid.
  5. Repeat steps 3-4 until some convergence criterion is satisfied.
    Typical convergence criterion is the assignment of customers to clusters has not changed in multiple iterations.
35
Q

k-Means Clustering Algorithm

A

Suppose that n objects described by the attribute vectors {x1, x2, …, xn} be partitioned into k clusters, where k < n. Let mi be the mean of the vectors in cluster i. That is, object y is in cluster i if the distance between y and mi is the minimum.
Randomly initialize the means m1, m2, …, mk Repeat
Use the means to classify all objects into clusters For i = 1 to k
Replace mi with the mean of all objects in cluster i End-for
Until there is no change in any mean

36
Q

Classification Analysis

A
  • Classification is the process that establishes classes with attributes from a set of instances (called training examples). The class of an instance must be one from a finite set of pre-determined class values, while attributes of the instance are descriptors of the instance that are likely to affect its outcome class value.
  • Classification techniques: ID3 and its descendants (such as C4.5), CN2, AQ family, backpropagation neural network, etc.
37
Q

Neural Network Architecture

A
  • Processing units are grouped into linear arrays (layers)
  • A neural network always has an input layer, an output layer, and may or may not have “hidden” layers.
  • Processing unit processes its inputs and produces a single output value; this processing is known as the unit’s activation function.
  • For an input node, the activation function simply passes its value to the output of the node.
  • For a non-input node, the activation function has two parts: a combination function and a transfer function.
38
Q

Confusion Matrix

A

Confusion matrix shows the number of correct and incorrect predictions made by the classification model compared to the actual outcomes

A confusion matrix is a table that categorizes predictions according to whether they match the actual value.
• The class of interest is known as the positive.
• All others are known as negative • An spam classifier example:
• Positive class is spam, as this is the outcome we hope to detect.
• We can then imagine the confusion matrix as shown in the diagram to the right.

39
Q

True Positive, False Negative and Accuracy

A
• False Positive (FP): Incorrectly classified as the class of interest; a.k.a. error of omission, Type I error, or false alarm.
• False Positive (FP): Incorrectly classified as not the class of interest; a.k.a. error of commission, Type II error.
 • Accuracy: Overall correctness of the model
and is calculated as the sum of correct
classifications divided by the total number
of classifications.
40
Q

Precision and Recall

A

• Precision:Portionofinstancespredictedtobeaclass
actually belong to this class
• Howmanyinstancesarepredictedtobepositive? Predicted Positive = TP + FP
• HowmanyofPredictedPositiveinstancesactually belong to the positive class? True Positive
• Precision (positive) = TP / (TP + FP)
• Precision (negative) = TN / (TN + FN)
• Recall:Theabilitytocorrectlyclassifyinstances belonging to this class
• Howmanyinstancesbelongtotheactualpositive class? Actual Positive = TP+FN
• Howmanyofactualpositiveinstancesareclassified correctly? TP
• Recall(positive)=TP/(TP+FN)
• Recall(negative)=TN/(TN+FP)