Introduction Flashcards

Question 1

Q

What is Data Minig

Answer

A

Large quantities of data
data contains interesting patterns

Data Mining helps to

discover patterns in data
use the patterns for decision making

Question 2

Q

Large data sources set the foundation for data mining

Answer

A

Law enforcement agencies -> terrorist detection
Facebook -> interest and behavior of its users
Sloan Digital Sky Survey -> predict type of sky object

Question 3

Q

What is the problem with the large amount of data available

Answer

A

Many data is collected
Only a small amount can be looked at by humans
We are interested in the patterns, and not the data itself

Question 4

Q

Definitions of Data Mining

Answer

A

Exploration & analysis of large quantities of data in order to discover meaningful patterns

Question 5

Q

Data Mining methods

Answer

A

Detect interesting patterns
Support human decision making with patterns
Predict the outcome of a future observation based on patterns

Question 6

Q

Origins of data mining - relation of data mining to other areas

Answer

A

Combination of those areas:

Statistics
Machine Learning AI
Database Systems

Question 7

Q

Origins of data mining - motivating challanges

Answer

A

Large amount of data
high dimensionality of data
heterogenous and complex data
explorative analysis > hypothesize-and-test paradigm

Question 8

Q

What are the two data mining tasks and what is the ML terminology

Answer

A

Descriptive Tasks (Unsupervised)
Goal: Find patterns in the data

Predictive Tasks (Supervised)
Goal: Predict unknown values of a variable

Question 9

Q

Data Mining Tasks

Answer

A

Cluster Analysis (Descriptive)
Classification (Predictive)
Regression (Predictive)
Association Analysis (Descriptive)

Question 10

Q

The most used methods in practice

Answer

A

Regression
Decision Trees
Clustering
Random Forests

Question 11

Q

The steps of the data mining process

Answer

A

1) Data selection
2) Data preprocessing
3) Data transformation
4) Data mining
5) Interpretation / Evaluation of patterns

Question 12

Q

Questions that come up for data selection

Answer

A

What data is useful for the task?
What data is available?
How is the data quality?

Question 13

Q

What does exploration / profiling mean

Answer

A

Develop initial understanding of the data
Calculate basic summarization statistics
Visualize the data
Identify data problems (outliers, missing values, duplicate records)

Question 14

Q

What does preprocessing and transformation mean

Answer

A

Transform data into a representation that is suitable for the chosen data mining method

Data integration and preperation takes 70 - 80 % of the time for a data mining project

Question 15

Q

Important transformation aspects

Answer

A

scale attributes (nominal, ordinal, numeric)
number of dimensions (represent relevant information with less attributes)
amount of data (determines hardware requirements)

Question 16

Q

Transformation methods

Answer

Study These Flashcards

A

Discretization and binarization
feature subset selection / dimensionality reduction
attribute transformation
aggregation, sampling
integrate data from multiple sources

Question 17

Q

Concept of Data mining

Answer

Study These Flashcards

A

Input: Preprocessed Data
Output: Model / Patterns

Apply data mining method
Evaluate resulting model
Iterate (change the parameter settings / use other methods, improve preprocessing, increase quality of training data)

Question 18

Q

Description of the deployment step

Answer

Study These Flashcards

A

Use the model in the business context

- Keep iterating to improve the model

Question 19

Q

How do data scientists spend their days?

Answer

Study These Flashcards

A

60% cleaning data
19 % collecting data sets
9 % mining data for patterns

Question 20

Q

Common data mining software

Answer

Study These Flashcards

A

Python
RapidMiner
scikit-learn
SQL
Anaconda

Question 21

Q

Advantages of RapidMiner

Answer

Study These Flashcards

A

Visual modeling of data mining pipelines

- Faster learning curve for applying data mining methods

Question 22

Q

Different attribute types

Answer

Study These Flashcards

A

Categorial (qualitative)

Nominal : Values of the attributes can only be distinguished from another ( equal or unequal)
Ordinal : Values of the attribute can be ordered (>,

Introduction Flashcards

(22 cards)