Lecture 1 Flashcards

1
Q

What is data mining?

A
identifying
novel
potentially useful
ultimately understandable patters in data. 
(Paitesky-Shapiro)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What fields is data mining a combination of?

A
Machine learning
Application domain
Databases
Statistics
Visualization
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the difference between data mining and statistics?

A

datasets in data mining typically are:

  • samples
  • larger
  • nosier, incomplete, heterogeneous

statistics

  • often deals with the whole population
  • often is concerned with hypothesis testing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is big data?

A

data that is too large to be analyzed with today’s resources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are some questions that we can answer with data mining techniques?

A

Is this object a star or galaxy?
Are customers likely to buy bread together with milk?
What is the value for a particular stock going to be in …
What book is a customer likely going to buy?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the 7 main data mining tasks?

A
Sequential pattern discovery
Outlier detection
Association rule discovery
Regression
Clustering
Visualization
Classification
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe the data mining task “classification”

A

Assigning a category to each object in a data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe the data mining task “visualization”

A

Vizualizing the data. What does the data look like?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe the data mining task “clustering”

A

Determining groups of objects in a data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe the data mining task “Association Rule Discovery”

A

Determining which objects belong together in a data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe the data mining task “Outlier Detection”

A

Determining which objects do not belong with the rest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe the data mining task “Sequential Pattern Discovery”

A

Determining what happens in the data over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Describe the data mining task “Regression”

A

Assigning a numerical value to each object in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Explain the data mining process

A
Collect data
Prepare data
Build model
Evaluate model
Deploy model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How must data be prepared for neural networks?

A

wants numbers (categorical attributes must be transformed)
likes data to be scaled
does not like noisy data, especially for small datasets
can handle irrelevant or redundant attributes, while they may lead to large decision trees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How must data be prepared for decision trees?

A

work better with discrete attributes that have small numbers of possible values
does not care about scaling

17
Q

Can nearest-neighbour data mining techniques handle noise well?

A

approaches can handle noise if a certain parameter is adjusted

18
Q

How must data be prepared for distance-based approaches

A

do not work well if the attributes are equally weighted and typically work with numerical data only

19
Q

Can expectation expectation maximization data mining techniques deal with missing data?

A

approaches can deal with missing data, but k-means techniques require substitution of missing data

20
Q

Describe why data transformations are used in data mining?

A

data must be transformed into an appropriate type for specific data mining techniques to work properly.

eg. neural networks only like data to be numerical values

21
Q

What are the two types of data mining techniques?

A

Predictive (supervised)

Descriptive (unsupervised)

22
Q

Describe predictive (supervised) data mining techniques

A

predict (discrete or continuous) class attributes based on other attribute values

this is like learning from a teacher

23
Q

Describe descriptive (unsupervised) data mining techniques

A

discover structure of data without prior knowledge of class labels.

24
Q

Is the following data mining technique descriptive or predictive?

Classification

A

Predictive

25
Q

Is the following data mining technique descriptive or predictive?

Visualization

A

Descriptive

26
Q

Is the following data mining technique descriptive or predictive?

Clustering

A

Descriptive

27
Q

Is the following data mining technique descriptive or predictive?

Association Rule Discovery

A

Descriptive

28
Q

Is the following data mining technique descriptive or predictive?

Outlier detection

A

Predictive or Descriptive

29
Q

Which data mining technique is the following:

is this object a star or a galaxy?

A

Classification

30
Q

Which data mining technique is the following:

What book is this customer likely to buy? Are there additional books we should recommend?

A

Clustering & Visualization

31
Q

Which data mining technique is the following:

How many groups of customers are there in the data we collected?

A

Clustering & Visualization

32
Q

Which data mining technique is the following:

Are customers likely to buy bread together with milk?

A

Association rule discovery

33
Q

Does traffic we currently see in our network contain any malicious packets?

A

Outlier detection

34
Q

What is a class label?

A

A label which identifies the class of the observation in a data set.

A class is just a way to identify a specific type of observation.

An example might be a data set containing both normal and abnormal readings from a sensor where each sensor reading is a combination of measurements (e.g. temperature, humidity, etc.).

The class label for a given sensor reading, in this case, would be normal or abnormal and the measurements would have to meet certain criteria to be identified as either class.

This would be a two class situation, you can have data sets which have multiple classes of observations or none.

35
Q

What is an application domain?

A

An application domain is simply the domain under study.
For example, you might be looking at data mining techniques for biological data, industry, or astronomy, to name a few. In these cases biology, industry, and astronomy in turn would be the application domains for each of the previously mentioned data sources.