1: Chapter 1 (Textbook) Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

Define Data Mining.

A

Data mining is the process of discovering interesting patterns and knowledge from large amounts of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Knowledge Discovery in Data (KDD)?

A

Knowledge Discovery in Data (KDD) refers to the overall process that includes data preparation, search for patterns, knowledge evaluation, and refinement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explain Data Cleaning.

A

Data cleaning involves the removal of noise and inconsistent data from the database to prepare high-quality data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a Data Warehouse?

A

A data warehouse is a central repository of information, collected from multiple sources and stored under a unified schema at a single site to support management’s decision-making process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe Data Integration.

A

Data integration involves combining data from multiple sources into a coherent data store to provide a unified view of these data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does Data Selection entail?

A

Data selection is retrieving relevant data from the database based on the analysis task.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define Data Transformation.

A

Data transformation is the process of converting data into appropriate forms for mining.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Pattern Evaluation in data mining?

A

Pattern evaluation involves identifying the truly interesting patterns representing knowledge.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does Knowledge Presentation involve in data mining?

A

Knowledge presentation uses visualization and knowledge representation techniques to present the mined knowledge to users, making it understandable and useful.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Explain the difference between Data Characterization and Data Discrimination.

A

Data characterization aims to provide a general description of a dataset, focusing on main characteristics. Data discrimination compares the features of one class of data against another to highlight differences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the typical applications of Data Mining?

A

Typical applications include business intelligence, web search engines, market analysis, healthcare data analysis, and more, where patterns and insights extracted can significantly influence decisions and strategies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What challenges do Data Mining face?

A

Challenges include handling big data, integrating diverse data types, mining knowledge in multidimensional space, and ensuring privacy and security of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Define “Association Analysis” in data mining.

A

Association analysis is a type of data mining that involves finding interesting associations or correlation relationships among a large set of data items.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is “Classification” in data mining?

A

Classification is the process of finding a model that describes and distinguishes data classes or concepts for the purpose of being able to use the model to predict the class of objects whose class label is unknown.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Define “Regression” in the context of data mining.

A

Regression is used to predict missing or unavailable numerical data values, rather than class labels, by modeling continuous-valued functions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is “Clustering”?

A

Clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar to each other than to those in other groups.

17
Q

Explain “Outliers” in data mining.

A

Outliers are data objects that do not comply with the general behavior or model of the data. They can be seen as exceptions or anomalies.

18
Q

What are the primary steps involved in the data mining process?

A

The primary steps include data cleaning, data integration, data selection, data transformation, data mining, pattern evaluation, and knowledge presentation.

19
Q

What role does “Data Cleaning” play in data mining?

A

Data cleaning helps in improving the quality of data by removing noise and handling missing or inconsistent data.

20
Q

How is “Data Integration” important in data mining?

A

Data integration is crucial as it combines data from different sources, providing a unified view that can be more effectively analyzed.

21
Q

Why is “Data Selection” important?

A

Data selection is critical because it involves choosing the relevant portion of data necessary for the mining process, thus ensuring efficiency and effectiveness.

22
Q

Describe the significance of “Data Transformation” in the mining process.

A

Data transformation converts data into formats suitable for mining, facilitating easier and more effective analysis.

23
Q

What does “Pattern Evaluation” entail?

A

Pattern evaluation involves determining which patterns produced by the data mining process are actually interesting and potentially useful.

24
Q

What is the goal of “Knowledge Presentation”?

A

The goal of knowledge presentation is to visualize and present the results of data mining in an understandable manner to the end-user.

25
Q

What is the purpose of “Association Analysis”?

A

The purpose of association analysis is to find patterns, associations, or relationships among sets of items in large datasets.

26
Q

How is “Classification” used in real-world applications?

A

Classification is widely used in applications such as credit scoring, disease diagnosis, and customer segmentation.

27
Q

In what scenarios is “Regression” used in data mining?

A

Regression is used in scenarios such as predicting housing prices, stock prices, and temperature forecasts.

28
Q

What practical uses does “Clustering” have?

A

Clustering is used in market research, pattern recognition, image analysis, and genetic clustering in bioinformatics.

29
Q

How can identifying “Outliers” be beneficial?

A

Identifying outliers can help in fraud detection, network security, and fault detection in manufacturing processes.

30
Q

What challenges are associated with “Outliers” in data analysis?

A

Challenges include distinguishing between noise and outliers, and deciding how to handle outliers—whether to discard them or analyze them separately.