Chapter 1 - Introduction Flashcards
What is data mining?
automatically finding useful information in large data repositories
What are the 3 parts of the KDD process?
- Data processing
- Data Mining
- Postprocessing
What is closing the loop
Integrating data mining results into decision support systems.
What challenges motivated data mining?
Scalability High dimensionality Hetrogeneous and complex data Data ownership and distribution Non traditional anaylsis
What are the origins of data mining?
Statistics
AI, Machine Learning, Pattern Recognition
What are the two categories of data mining tasks?
- Predictive tasks
2. Descriptive tasks
What is the definition of predictive data mining tasks?
Attempting to predict a dependent variable given independent variables
What is the definition of descriptive data mining tasks?
Attempting to find patterns (trends, correlations, clusters, trajectories, anomalies) in data
What are the 4 core data mining tasks?
- Predictive modelling
- Association analysis
- Cluster analysis
- Anomaly detection
What are the two types of predictive modelling tasks?
- Classification - used for discrete target variables
2. Regression - used for continuous target variables
Explain if the following is a data mining task:
Dividing the customers of a company according to their gender
No. This is a simple database query.
Explain if the following is a data mining task:
Dividing the customers of a company according to their profitability.
No. This is an accounting calculation, followed by the application of a threshold. However, predicting the profitability of a new customer would be data mining.
Explain if the following is a data mining task:
Computing the total sales of a company.
No. This is simple accounting.
Explain if the following is a data mining task:
Sorting a student database based on student identification numbers
No. This is a simple database query.
Explain if the following is a data mining task:
Predicting the outcomes of tossing a (fair) pair of dice.
No. Since the die is fair, this is a probability calculation.
Explain if the following is a data mining task:
Predicting the future stock price of a company using historical records.
Yes. We would attempt to create a model that can predict the continuous value of the stock price. This is an example of the area of data mining known as predictive modelling. We could use regression for this modelling, although researchers in many fields
have developed a wide variety of techniques for predicting time series.
Explain if the following is a data mining task:
Monitoring the heart rate of a patient for abnormalities.
Yes. We would build a model of the normal behavior of heart rate and raise an alarm when an unusual heart behavior occurred. This would involve the area of data mining known as anomaly detection. This could also be considered as a classification problem if we had examples of both normal and abnormal heart behavior.
Explain if the following is a data mining task:
Monitoring seismic waves for earthquake activities.
Yes. In this case, we would build a model of different types of
seismic wave behavior associated with earthquake activities and
raise an alarm when one of these different types of seismic activity
was observed. This is an example of the area of data mining
known as classification.