Question 1

With either CRISP-DM or SEMMA, it is important to fully understand which of the following aspects before preparing the data and selecting analysis techniques:
- The surrounding socioeconomic climate
- business goals
- underlying issues for the business
- political implications

Accepted Answer

- the surrounding socioeconomic climate - business goals - underlying issues for the business

Question 2

What is a popular systematic approach to managing and conducting data mining projects?

Accepted Answer

Cross-Industry Standard Process for Data Mining (CRISP-DM) Methodology

Question 3

Why is the CRISP-DM methodology often preferred to other methodologies?

Accepted Answer

b/c of its emphasis on business goals and objectives prior to preparing the data and choosing analysis techniques

Question 4

Data mining uses many kinds of computational algorithms to identify hidden patterns and relationships in data. For developing predictive models, one tends to employ _ data mining techniques

Accepted Answer

supervised

Question 5

What is the key distinction for supervised data mining techniques? When are they best used?

Accepted Answer

target variable is identified used for developing predictive models

Question 6

What is the key distinction for unsupervised data mining techniques? When are they best used?

Accepted Answer

no target variable is identified effective for data exploration, dimension reduction, and pattern recognition

Question 7

What are common applications of supervised data mining?

Accepted Answer

- classification model (target variable is categorical; predict class distinction of new cases) - prediction model (target variable is numerical; predict target fora new case)

Question 8

What is the term used to describe computer systems that demonstrate human-like intelligence and cognitive functions, such as deduction, pattern recognition, and the interpretation of complex data?

Accepted Answer

artificial intelligence

Question 9

What are common applications of unsupervised data mining?

Accepted Answer

- dimension reduction (convert high-dimensional data into smaller into data with smaller number of variables) - pattern recognition (reorganizing patterns in data using machine learning techniques)

Question 10

What are the 6 phases of the CRISP-DM methodology?

Accepted Answer

1. business understanding 2. data understanding 3. data preparation 4. modeling 5. evaluation 6. deployment

Question 11

____ measures gauge whether a group of observations are similar or dissimilar to one another

Accepted Answer

similarity

Question 12

What is one of the most widely used measures for evaluating similarity with numerical variables. It is defined as the length of a straight line between two observations.

Accepted Answer

euclidean distance

Question 13

Data ____ is the process of dividing a data set into a training, a validation, and, in some situations, an optional test data set.

Accepted Answer

partitioning

Question 14

What is the formula for the matching coefficient?

Accepted Answer

#matching variables/total number variables

Question 15

What are the 3 partitions created in data partitioning?

Accepted Answer

- training set - validation set - optional test data set

Chapter 11: Intro to Data Mining Flashcards

(26 cards)