Week 4 Flashcards
The nontrivial process of identifying valid,k novel, potentially useful and ultimately understandable patterns in data stored in structural databases.
What is Data Mining?
The nontrivial process of identifying valid, potentially useful, and ultimately understandable patterns in data stored in structured databases.
What is the most critical ingredient for data mining?
Data
What types of data can be used?
Structured and unstructured
What is often the datasource?
A consolidated data warehouse
Who is often the end user of mining?
The end user
What is essential for Data mining tools?
The capabilities and ease of use of the tools.
What are predictions in data mining?
Tell the nature of future occurrences of certain events based on what has happened in the past, such as predicting the winner of the Super Bowl or forecasting the absolute temperature of a particular day.
What are associations in data mining?
Find the commonly co-occurring groupings of things, such as beer and diapers going together in market-basket analysis.
What are clusters in data mining?
Identify natural groupings of things based on their known characteristics, such as assigning customers in different segments based on their demographics and past purchase behaviors.
What type of techniques are part of predictions?
Classification and Regression
What type of techniques are part of Association?
Link analysis and Sequence analysis
What type of techniques are part of Clustering?
Outlier analysis
What does DM and statistics start with?
DM starts with a loosely defined discovery statement and statistics with well defined proposition and hypotheses
What set of data does DM and statistics use?
DM uses all existing data to discover novel patterns/relationships, statistics collect a sample of data to test the hypothesis.
What are measures of dispersion?
Degree of variation in a given variable
What is regression used for?
Regression is used to characterize relationships between explanatory (input) and response (output) variables.
What does R² provide?
Provides information about the fit of the model, the higher the R² the more variability is explained by the model.