Chapter 1 - Introduction Flashcards

1
Q

What is data mining?

A

automatically finding useful information in large data repositories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 3 parts of the KDD process?

A
  1. Data processing
  2. Data Mining
  3. Postprocessing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is closing the loop

A

Integrating data mining results into decision support systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What challenges motivated data mining?

A
Scalability
High dimensionality
Hetrogeneous and complex data
Data ownership and distribution
Non traditional anaylsis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the origins of data mining?

A

Statistics

AI, Machine Learning, Pattern Recognition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the two categories of data mining tasks?

A
  1. Predictive tasks

2. Descriptive tasks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the definition of predictive data mining tasks?

A

Attempting to predict a dependent variable given independent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the definition of descriptive data mining tasks?

A

Attempting to find patterns (trends, correlations, clusters, trajectories, anomalies) in data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the 4 core data mining tasks?

A
  1. Predictive modelling
  2. Association analysis
  3. Cluster analysis
  4. Anomaly detection
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the two types of predictive modelling tasks?

A
  1. Classification - used for discrete target variables

2. Regression - used for continuous target variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Explain if the following is a data mining task:

Dividing the customers of a company according to their gender

A

No. This is a simple database query.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain if the following is a data mining task:

Dividing the customers of a company according to their profitability.

A

No. This is an accounting calculation, followed by the application of a threshold. However, predicting the profitability of a new customer would be data mining.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Explain if the following is a data mining task:

Computing the total sales of a company.

A

No. This is simple accounting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Explain if the following is a data mining task:

Sorting a student database based on student identification numbers

A

No. This is a simple database query.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Explain if the following is a data mining task:

Predicting the outcomes of tossing a (fair) pair of dice.

A

No. Since the die is fair, this is a probability calculation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Explain if the following is a data mining task:

Predicting the future stock price of a company using historical records.

A

Yes. We would attempt to create a model that can predict the continuous value of the stock price. This is an example of the area of data mining known as predictive modelling. We could use regression for this modelling, although researchers in many fields
have developed a wide variety of techniques for predicting time series.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Explain if the following is a data mining task:

Monitoring the heart rate of a patient for abnormalities.

A

Yes. We would build a model of the normal behavior of heart rate and raise an alarm when an unusual heart behavior occurred. This would involve the area of data mining known as anomaly detection. This could also be considered as a classification problem if we had examples of both normal and abnormal heart behavior.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Explain if the following is a data mining task:

Monitoring seismic waves for earthquake activities.

A

Yes. In this case, we would build a model of different types of
seismic wave behavior associated with earthquake activities and
raise an alarm when one of these different types of seismic activity
was observed. This is an example of the area of data mining
known as classification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Explain if the following is a data mining task:

Extracting the frequencies of a sound wave.

A

No. This is signal processing

20
Q

Explain whether or not data privacy is an important issue:

Census data collected from 1900–1950.

21
Q

Explain whether or not data privacy is an important issue:

IP addresses and visit times of Web users who visit your Website.

22
Q

Explain whether or not data privacy is an important issue:

Images from Earth-orbiting satellites.

23
Q

Explain whether or not data privacy is an important issue:

Names and addresses of people from the telephone book.

24
Q

Explain whether or not data privacy is an important issue:

Names and email addresses collected from the Web.

25
What are the 4 components of data preprocessing?
1. Feature selection 2. Dimensionality reduction 3. Normalization 4. Data subsetting
26
What are the 3 components of postprocessing?
1. Filtering patterns 2. Visualization 3. Pattern interpretation
27
Scalability is a motivating challenge for data mining. Explain what it is.
Too much data and too slow to process
28
High Dimensionality is a motivating challenge for data mining. Explain what it is.
Data sets have too many attributes or dimensions
29
Hetrogeneous and Complex data is a motivating challenge for data mining. Explain what it is.
Data sets with attributes of different types
30
Data ownership and distribution is a motivating challenge for data mining. Explain what it is.
Data is distributed in different geographical locations
31
Non-traditional Analysis is a motivating challenge for data mining. Explain what it is.
The need to automate hypothesis generation & testing
32
Explain what type of data mining task the following is an example of: identifying customers that will respond to a marketing campaign
Predictive modelling
33
Explain what type of data mining task the following is an example of: forecasting disturbances in the Earth's ecosystem
Predictive modelling
34
Explain what type of data mining task the following is an example of: Judging whether a patient has a particular disease based on the results of medical tests
Predictive modelling
35
Explain what type of data mining task the following is an example of: finding groups of genes that have related functionality
association analysis
36
Explain what type of data mining task the following is an example of: identifying web pages that are accessed together
association analysis
37
Explain what type of data mining task the following is an example of: understanding the relationships between different elements of Earth's climate system
association analysis
38
Explain what type of data mining task the following is an example of: determining which products customers will frequently buy together
association analysis
39
Explain what type of data mining task the following is an example of: grouping sets of related customers
Cluster analysis
40
Explain what type of data mining task the following is an example of: find areas of the ocean that have a significant impact on the Earth's climate
Cluster analysis
41
Explain what type of data mining task the following is an example of: compressing data
Cluster analysis
42
Explain what type of data mining task the following is an example of: detection of fraud
anomaly detection
43
Explain what type of data mining task the following is an example of: network intrusions
anomaly detection
44
Explain what type of data mining task the following is an example of: detection of unusual patterns of disease
anomaly detection
45
Explain what type of data mining task the following is an example of: detecting ecosystem disturbances
anomaly detection
46
What is the purpose of preprocessing?
transform raw input data into an appropriate format for subsequent analysis