Lecture 7 - BI and Data Mining 2 Flashcards

1
Q

Overfitting results in?

A

Decision trees that are more complex than necessary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Training error no longer provides?

A

A good estimate of how well the tree will perform on previously unseen records

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What can cross validation technique be used for?

A

Cross validation technique can be used to compare the performance of different machine learning models on the same data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is leave-one-out?

A

Similar to k-fold where k=number of samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is bootstrapping?

A

Random sampling with replacement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe bootstrap method?

A

Refers to random sampling with replacement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Neural Networks?

A

An artificial network consists of a pool of simple processing units which communicate by sending signals to each other over a large number of weighted connections.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a test stage in NN?

A

Each unit performs a relatively simple job:
receive input from neighbors or external sources and use this to compute an output signal which is propagated to other units

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the learning stage in NN?

A

A task of the adjustment of the weights

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is feed-forward networks?

A

Where the data flow from input to output units is strictly feed-forward.
The data processing can extend over multiple layers of units, but no feedback connections or connections between the units of the same layer are present.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Perceptron?

A

A single layer feed-forward network consists of one or more output neurons, each of which is connected with a weighted factor w to all of inputs x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does learning in Perceptrons work?

A

The weights of the neural networks are modified during the learning phase

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Convergence Theory?

A

If there exists a set of connection weights w* which is able to perform the transformation y=d(x) the perceptron learning rule will converge to some solution in a finite number of steps for any initial choice of the weights

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Backpropagation?

A

The multi-layer networks with a linear activation can classify only linear separable inputs or, in case of function approximation, only linear functions can be represented.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is SVM?

A

Support Vector Machine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Properties of SVM?

A

Flexibility in choosing a similarity function
Sparseness of solution when dealing with large data sets
Ability to handle large feature spaces
Overfitting can be controlled by soft margin approach

17
Q

SVM applications?

A

Text categorization
Image classification
Bioinformatics
Hand-written character recognition

18
Q

Machine learning focuses on?

A

Prediction, based on known properties learned from the training data

19
Q

Data mining focuses on?

A

The discovery of previously known properties in the data. This is the analysis step of Knowledge Discovery in Databases

20
Q

Data mining uses many?

A

Machine learning methods but often with a slightly different goal in mind

21
Q

Machine learning also employs?

A

Data mining methods as unsupervised learning or as preprocessing step to improve learner accuracy

22
Q

What is Cluster Analysis used for in Data Mining?

A

Used for automatic identification of natural groupings of things
Employ unsupervised learning
Learns the clusters of things from past data, then assigns new instances
There is not an output/target variable

23
Q

What is k-Means clustering alrgorithm?

A

K: pre-determined number of clusters

Algorithm Step 0 determine the value of K

24
Q

Steps of k-means?

A

Step 1: Randomly generate k random points as initial cluster centers
Step 2: Assign each points to the nearest cluster center
Step 3: Re-compute the new cluster centers

25
Q

What is k-means repetition step?

A

Repeat steps 3 and 4 until some convergence criterion is met

26
Q

What is cluster analysis?

A

Finding groups of objects such that the objects in a group will be similar to one another and different from the objects in the other groups

27
Q

Applications of Cluster Analysis?

A

Understanding
- Group related documents for browsing, group genes, and proteins that have similar functionality

Summarization
- reduce the size of large data sets

28
Q

What is a clustering?

A

A set of clusters

29
Q

What are the types of Clusterings?

A

Partitional Clustering

Hierarchical Clustering

30
Q

What is Partitional Clustering?

A

A division data objects into non-overlapping subsets such that each data object is in exactly one subset

31
Q

What is Hierarchical Clustering?

A

A set of nested clusters organized as hierarchical tree

32
Q

What is Association Rule?

A

Is a rule-based machine learning method for discovering interesting relations between variables in large databases

33
Q

Challenges of Frequent Itemset mining?

A

Multiple scans of transaction database
Huge number of candidates
Tedious workload of support counting for candidates