Chapter 1 Flashcards
How much human-made information does Google estimate exists in the world today?
Google estimates that there are 300 exabytes (300 followed by 18 zeros) of human-made information in the world today
How does the amount of human-made information today compare to just four years ago?
Only four years ago, there were just 30 exabytes of human-made information, which means we’ve seen a tenfold increase in a relatively short span of time.
What has led to the explosive growth of available data volume in today’s world?
The explosive growth of available data volume is a result of the computerization of society and the rapid development of powerful data collection and storage tools.
Why is data mining important in today’s world?
Data mining is important because we live in a world where vast amounts of data are collected daily, and analyzing this data is crucial for gaining insights and making informed decisions in various fields.
Big data consists of
- Network
- Collection
- Storage
- Research
- Analysis
- Volume
- Visualization
- Cloud technology
What is the relationship between data, information, and knowledge in the context of data mining?
Data is the raw facts, while information involves patterns and relationships within data. Knowledge, on the other hand, is the understanding of a subject gained by synthesizing information to identify historical patterns and future trends
Why is data mining essential in today’s business world, and what types of data do businesses typically deal with?
Data mining is essential in the business world because companies handle vast data sets, including sales transactions, stock trading records, product descriptions, sales promotions, company profiles, and customer feedback. For instance, large retailers like Wal-Mart manage hundreds of millions of transactions per week across their global branches.
Can you explain the concept of the Data Pyramid and its components briefly?
The Data Pyramid represents the hierarchy from raw data at the base to wisdom at the top. It includes data, information (patterns and relationships within data), and knowledge (understanding gained from synthesizing information). Wisdom involves using knowledge to make informed decisions.
What is the significance of understanding historical patterns and future trends in the context of data mining?
Understanding historical patterns and future trends is significant in data mining because it allows businesses to make informed decisions, optimize strategies, and respond effectively to changing market conditions based on the insights gained from data analysis.
How did data mining contribute to President Obama’s victory in the 2012 presidential election?
Data mining helped identify likely voters and predict polling outcomes, enabling efficient allocation of campaign resources.
What is the primary purpose of developing data mining tools.
The primary purpose of developing data mining tools is to uncover valuable information from large datasets and transform it into organized knowledge.
What factors have fueled the remarkable growth of data mining and knowledge discovery?
Factors contributing to the growth of data mining include data warehousing, increased data access from web sources, global economic competition, improved computing power, and the availability of commercial data mining software.
How is data mining characterized, and what makes it promising?
Data mining is characterized as a field that turns data into knowledge and is described as young, dynamic, and promising due to its ability to extract valuable insights from data.
What are the key factors that have driven the remarkable growth in the field of data mining and knowledge discovery
The key factors driving the remarkable growth in data mining and knowledge discovery include data warehousing, increased access to data from web sources, competitive pressures in a globalized economy, the tremendous growth in computing power and storage capacity, and the availability of commercial data mining software suites.
Why might the term “knowledge mining” be less accurate in describing the process of extracting insights from large datasets?
The term “knowledge mining” may be less accurate because it may not fully convey the emphasis on extracting insights from large volumes of data. This is why the term “knowledge discovery from data,” or KDD, is often used instead.
In the knowledge discovery process, what does the term “Data Warehouses” typically refer to?
Data Warehouses” typically refer to a centralized storage system for large amounts of data.
What does data mining involve in the knowledge discovery process?
In the knowledge discovery process, data mining involves discovering patterns and valuable insights from extensive datasets.
After data mining, what is the subsequent step in the process of knowledge discovery?
The subsequent step after data mining in the process of knowledge discovery is identifying patterns within the data.
What is the ultimate outcome of the knowledge discovery process, particularly in relation to the identified patterns?
The ultimate outcome of the knowledge discovery process, especially regarding the identified patterns, is the generation of valuable knowledge.
What is the primary objective of data mining?
The primary objective of data mining is to discover meaningful new correlations, patterns, and trends within large datasets.
Can you name some of the technologies and techniques commonly used in data mining?
Some of the technologies and techniques commonly used in data mining include pattern recognition technologies, statistical methods, and mathematical techniques.
What Kinds of Data Can Be Mined?
As a general technology, data mining can be applied
to any kind of data as long as the data are
meaningful for a target application.
The most basic forms of data for mining applications are database data, data warehouse data, and
transactional data.
What can data mining systems analyze when mining relational databases?
Data mining systems, when mining relational databases, can analyze trends or data patterns.
For example, they can predict credit risk based on customer data or detect deviations in sales.
How can data mining be applied to predict credit risk for new customers?
Data mining can predict credit risk for new customers by analyzing factors such as income, age, and previous credit information.
What is the role of data mining in detecting deviations in sales data?
Data mining plays a role in detecting deviations in sales data by identifying items with sales that significantly differ from what is expected compared to the previous year. These deviations can then be further investigated.
Why are relational databases considered important in the context of data mining?
Because they are one of the most commonly available and richest information repositories. They provide a significant source of data for data mining analysis.
What is the primary function of a data warehouse?
The primary function of a data warehouse is to serve as a repository for information collected from multiple sources, which is stored under a unified schema and typically resides at a single site.
How does the content of a record in a transactional database differ from that in a data warehouse?
In a transactional database, each record captures a transaction, such as a customer’s purchase or a flight booking, and includes a unique transaction identity number (trans ID) and a list of the items involved in the transaction. In contrast, a data warehouse may store data from various sources related to transactions, including additional information like item descriptions, details about salespeople, or branch information.
What is the typical structure of data in a data warehouse?
Data in a data warehouse is typically structured under a unified schema, bringing together data from various sources into a single, organized repository.
What kind of information is usually stored in a transactional database?
Transactional databases typically store information related to individual transactions, including transaction details such as the items purchased, customer information, and transaction identity numbers.
What distinguishes a data warehouse from a transactional database in terms of its purpose?
The primary purpose of a data warehouse is to consolidate and store data from multiple sources for analysis and reporting, whereas a transactional database primarily focuses on capturing and managing individual transactions in real-time.
What types of data fall into the category of spatial data?
What types of data fall into the category of spatial data?
Besides textual data, what other forms of data are included in hypertext and multimedia data?
Hypertext and multimedia data encompass data types such as images, videos, and audio, in addition to textual data.
How can data mining tasks be categorized based on their objectives?
Data mining tasks can be categorized into two main categories: descriptive and predictive.
What is the main goal of descriptive mining tasks?
Descriptive mining tasks aim to characterize properties of the data in a target dataset.
What is the primary objective of predictive mining tasks?
Predictive mining tasks involve performing induction on current data in order to make predictions about future or unknown data.
What is the primary purpose of data mining functionalities?
Data mining functionalities are used to specify the types of patterns to be discovered in data mining tasks.
What kinds of patterns can be mined using data mining functionalities?
Data mining functionalities can be used to mine various kinds of patterns, including outlier detection, association rules, classification, clustering, and regression.
What are outliers in a dataset, and how can they be described?
Outliers in a dataset are data objects that do not conform to the general behavior or model of the data. They are data points that are considerably different from the rest of the data.
Provide examples of real-world applications where outlier mining or anomaly mining is valuable.
Outlier mining or anomaly mining is valuable in applications such as credit card fraud detection and network intrusion detection, where identifying unusual patterns is crucial for security and fraud prevention.
What is the primary goal of classification in data mining?
The primary goal of classification in data mining is to find a model or function that describes and distinguishes data classes.
A credit card transactioncan be normal or fraudulent.
——————————————————
A mail can be normal or spam
How is the derived classification model represented, and what is its practical use?
The derived classification model can be represented in various forms, including classification rules (IF-THEN rules), decision trees, mathematical formulae, or neural networks. It is used to predict the class label of objects for which the class label is unknown.
What is the primary distinction between classification and regression in data mining?
Classification predicts categorical (discrete, unordered) labels, while regression models continuous-valued functions. Regression is used to predict missing or unavailable numerical data values, rather than discrete class labels.
What does the term “prediction” encompass in the context of data mining?
In the context of data mining, the term “prediction” encompasses both numeric prediction (regression) and class label prediction (classification).
What is the primary objective of Association Rule Mining?
The primary objective of Association Rule Mining is to discover hidden relationships or associations between items in a dataset, often represented as “if-then” rules.
What is market basket analysis, and how does Association Rule Mining play a role in it?
Market basket analysis is an application of Association Rule Mining that helps retailers understand customer purchasing patterns and optimize product placement.
Which Technologies Are Used in Data Mining?
- Machine Learning
- Pattern recognition
- Visualization
- Algorithms
- High preformance computing
- Info. Retrieval
- Data warehouse
- Database system
- Statistics
- Applications
Which Kinds of Applications Are Targeted?
- Telecommunication Industry
- Credit Card companies
- Insurance companies
- Retail & Marketing
- Medical companies
- Pharmaceutical
What are some of the key objectives or applications of Business Intelligence in various industries?
Business Intelligence is used to maximize the return on marketing campaigns, detect fraudulent transactions, automate the loan application process, and identify and treat the most valued customers.
What is the first phase in the CRISP-DM process?
The first phase in the CRISP-DM process is the Business Understanding Phase.
In which phase of CRISP-DM is data exploration and initial data collection performed?
Data exploration and initial data collection are performed in the Data Understanding Phase of CRISP-DM.
What is the primary goal of the Data Preparation Phase in CRISP-DM?
The primary goal of the Data Preparation Phase in CRISP-DM is to clean, transform, and preprocess the data for modeling.
During which phase of CRISP-DM are machine learning algorithms applied to the prepared data?
Machine learning algorithms are applied to the prepared data during the Modeling Phase of CRISP-DM.
In CRISP-DM, when is the model’s performance assessed and validated?
The model’s performance is assessed and validated in the Evaluation Phase of CRISP-DM.
What is the final phase of CRISP-DM where the results of the data mining process are put into practical use?
The final phase of CRISP-DM where the results are put into practical use is the Deployment Phase.
What are data sets made up of?
Data sets are made up of data objects and their attributes.
What is the correspondence between data objects and attributes in a database?
The rows of a database correspond to the data objects, and the columns correspond to the attributes.
What is an attribute in data mining?
An attribute is a data field, representing a characteristic or feature of a data object.