Chapter 1 - Introduction Flashcards

1
Q

What is data mining?

A

automatically finding useful information in large data repositories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 3 parts of the KDD process?

A
  1. Data processing
  2. Data Mining
  3. Postprocessing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is closing the loop

A

Integrating data mining results into decision support systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What challenges motivated data mining?

A
Scalability
High dimensionality
Hetrogeneous and complex data
Data ownership and distribution
Non traditional anaylsis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the origins of data mining?

A

Statistics

AI, Machine Learning, Pattern Recognition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the two categories of data mining tasks?

A
  1. Predictive tasks

2. Descriptive tasks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the definition of predictive data mining tasks?

A

Attempting to predict a dependent variable given independent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the definition of descriptive data mining tasks?

A

Attempting to find patterns (trends, correlations, clusters, trajectories, anomalies) in data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the 4 core data mining tasks?

A
  1. Predictive modelling
  2. Association analysis
  3. Cluster analysis
  4. Anomaly detection
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the two types of predictive modelling tasks?

A
  1. Classification - used for discrete target variables

2. Regression - used for continuous target variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Explain if the following is a data mining task:

Dividing the customers of a company according to their gender

A

No. This is a simple database query.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain if the following is a data mining task:

Dividing the customers of a company according to their profitability.

A

No. This is an accounting calculation, followed by the application of a threshold. However, predicting the profitability of a new customer would be data mining.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Explain if the following is a data mining task:

Computing the total sales of a company.

A

No. This is simple accounting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Explain if the following is a data mining task:

Sorting a student database based on student identification numbers

A

No. This is a simple database query.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Explain if the following is a data mining task:

Predicting the outcomes of tossing a (fair) pair of dice.

A

No. Since the die is fair, this is a probability calculation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Explain if the following is a data mining task:

Predicting the future stock price of a company using historical records.

A

Yes. We would attempt to create a model that can predict the continuous value of the stock price. This is an example of the area of data mining known as predictive modelling. We could use regression for this modelling, although researchers in many fields
have developed a wide variety of techniques for predicting time series.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Explain if the following is a data mining task:

Monitoring the heart rate of a patient for abnormalities.

A

Yes. We would build a model of the normal behavior of heart rate and raise an alarm when an unusual heart behavior occurred. This would involve the area of data mining known as anomaly detection. This could also be considered as a classification problem if we had examples of both normal and abnormal heart behavior.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Explain if the following is a data mining task:

Monitoring seismic waves for earthquake activities.

A

Yes. In this case, we would build a model of different types of
seismic wave behavior associated with earthquake activities and
raise an alarm when one of these different types of seismic activity
was observed. This is an example of the area of data mining
known as classification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Explain if the following is a data mining task:

Extracting the frequencies of a sound wave.

A

No. This is signal processing

20
Q

Explain whether or not data privacy is an important issue:

Census data collected from 1900–1950.

A

No

21
Q

Explain whether or not data privacy is an important issue:

IP addresses and visit times of Web users who visit your Website.

A

Yes

22
Q

Explain whether or not data privacy is an important issue:

Images from Earth-orbiting satellites.

A

No

23
Q

Explain whether or not data privacy is an important issue:

Names and addresses of people from the telephone book.

A

No

24
Q

Explain whether or not data privacy is an important issue:

Names and email addresses collected from the Web.

A

No

25
Q

What are the 4 components of data preprocessing?

A
  1. Feature selection
  2. Dimensionality reduction
  3. Normalization
  4. Data subsetting
26
Q

What are the 3 components of postprocessing?

A
  1. Filtering patterns
  2. Visualization
  3. Pattern interpretation
27
Q

Scalability is a motivating challenge for data mining. Explain what it is.

A

Too much data and too slow to process

28
Q

High Dimensionality is a motivating challenge for data mining. Explain what it is.

A

Data sets have too many attributes or dimensions

29
Q

Hetrogeneous and Complex data is a motivating challenge for data mining. Explain what it is.

A

Data sets with attributes of different types

30
Q

Data ownership and distribution is a motivating challenge for data mining. Explain what it is.

A

Data is distributed in different geographical locations

31
Q

Non-traditional Analysis is a motivating challenge for data mining. Explain what it is.

A

The need to automate hypothesis generation & testing

32
Q

Explain what type of data mining task the following is an example of:

identifying customers that will respond to a marketing campaign

A

Predictive modelling

33
Q

Explain what type of data mining task the following is an example of:

forecasting disturbances in the Earth’s ecosystem

A

Predictive modelling

34
Q

Explain what type of data mining task the following is an example of:

Judging whether a patient has a particular disease based on the results of medical tests

A

Predictive modelling

35
Q

Explain what type of data mining task the following is an example of:

finding groups of genes that have related functionality

A

association analysis

36
Q

Explain what type of data mining task the following is an example of:

identifying web pages that are accessed together

A

association analysis

37
Q

Explain what type of data mining task the following is an example of:

understanding the relationships between different elements of Earth’s climate system

A

association analysis

38
Q

Explain what type of data mining task the following is an example of:

determining which products customers will frequently buy together

A

association analysis

39
Q

Explain what type of data mining task the following is an example of:

grouping sets of related customers

A

Cluster analysis

40
Q

Explain what type of data mining task the following is an example of:

find areas of the ocean that have a significant impact on the Earth’s climate

A

Cluster analysis

41
Q

Explain what type of data mining task the following is an example of:

compressing data

A

Cluster analysis

42
Q

Explain what type of data mining task the following is an example of:

detection of fraud

A

anomaly detection

43
Q

Explain what type of data mining task the following is an example of:

network intrusions

A

anomaly detection

44
Q

Explain what type of data mining task the following is an example of:

detection of unusual patterns of disease

A

anomaly detection

45
Q

Explain what type of data mining task the following is an example of:

detecting ecosystem disturbances

A

anomaly detection

46
Q

What is the purpose of preprocessing?

A

transform raw input data into an appropriate format for subsequent analysis