1: Chapter 1 (Textbook) Flashcards
Define Data Mining.
Data mining is the process of discovering interesting patterns and knowledge from large amounts of data.
What is Knowledge Discovery in Data (KDD)?
Knowledge Discovery in Data (KDD) refers to the overall process that includes data preparation, search for patterns, knowledge evaluation, and refinement.
Explain Data Cleaning.
Data cleaning involves the removal of noise and inconsistent data from the database to prepare high-quality data.
What is a Data Warehouse?
A data warehouse is a central repository of information, collected from multiple sources and stored under a unified schema at a single site to support management’s decision-making process.
Describe Data Integration.
Data integration involves combining data from multiple sources into a coherent data store to provide a unified view of these data.
What does Data Selection entail?
Data selection is retrieving relevant data from the database based on the analysis task.
Define Data Transformation.
Data transformation is the process of converting data into appropriate forms for mining.
What is Pattern Evaluation in data mining?
Pattern evaluation involves identifying the truly interesting patterns representing knowledge.
What does Knowledge Presentation involve in data mining?
Knowledge presentation uses visualization and knowledge representation techniques to present the mined knowledge to users, making it understandable and useful.
Explain the difference between Data Characterization and Data Discrimination.
Data characterization aims to provide a general description of a dataset, focusing on main characteristics. Data discrimination compares the features of one class of data against another to highlight differences.
What are the typical applications of Data Mining?
Typical applications include business intelligence, web search engines, market analysis, healthcare data analysis, and more, where patterns and insights extracted can significantly influence decisions and strategies.
What challenges do Data Mining face?
Challenges include handling big data, integrating diverse data types, mining knowledge in multidimensional space, and ensuring privacy and security of data.
Define “Association Analysis” in data mining.
Association analysis is a type of data mining that involves finding interesting associations or correlation relationships among a large set of data items.
What is “Classification” in data mining?
Classification is the process of finding a model that describes and distinguishes data classes or concepts for the purpose of being able to use the model to predict the class of objects whose class label is unknown.
Define “Regression” in the context of data mining.
Regression is used to predict missing or unavailable numerical data values, rather than class labels, by modeling continuous-valued functions.