Lecture 7 - BI and Data Mining 2 Flashcards by Deivis D

Overfitting results in?

Decision trees that are more complex than necessary

How well did you know this?

Not at all

Perfectly

Training error no longer provides?

A good estimate of how well the tree will perform on previously unseen records

How well did you know this?

Not at all

Perfectly

What can cross validation technique be used for?

Cross validation technique can be used to compare the performance of different machine learning models on the same data set

How well did you know this?

Not at all

Perfectly

What is leave-one-out?

Similar to k-fold where k=number of samples

How well did you know this?

Not at all

Perfectly

What is bootstrapping?

Random sampling with replacement

How well did you know this?

Not at all

Perfectly

Describe bootstrap method?

Refers to random sampling with replacement.

How well did you know this?

Not at all

Perfectly

What is Neural Networks?

An artificial network consists of a pool of simple processing units which communicate by sending signals to each other over a large number of weighted connections.

How well did you know this?

Not at all

Perfectly

What is a test stage in NN?

Each unit performs a relatively simple job:
receive input from neighbors or external sources and use this to compute an output signal which is propagated to other units

How well did you know this?

Not at all

Perfectly

What is the learning stage in NN?

A task of the adjustment of the weights

How well did you know this?

Not at all

Perfectly

What is feed-forward networks?

Where the data flow from input to output units is strictly feed-forward.
The data processing can extend over multiple layers of units, but no feedback connections or connections between the units of the same layer are present.

How well did you know this?

Not at all

Perfectly

Perceptron?

A single layer feed-forward network consists of one or more output neurons, each of which is connected with a weighted factor w to all of inputs x

How well did you know this?

Not at all

Perfectly

How does learning in Perceptrons work?

The weights of the neural networks are modified during the learning phase

How well did you know this?

Not at all

Perfectly

Convergence Theory?

If there exists a set of connection weights w* which is able to perform the transformation y=d(x) the perceptron learning rule will converge to some solution in a finite number of steps for any initial choice of the weights

How well did you know this?

Not at all

Perfectly

What is Backpropagation?

The multi-layer networks with a linear activation can classify only linear separable inputs or, in case of function approximation, only linear functions can be represented.

How well did you know this?

Not at all

Perfectly

What is SVM?

Support Vector Machine

How well did you know this?

Not at all

Perfectly

Properties of SVM?

Study These Flashcards

Flexibility in choosing a similarity function
Sparseness of solution when dealing with large data sets
Ability to handle large feature spaces
Overfitting can be controlled by soft margin approach

SVM applications?

Study These Flashcards

Text categorization
Image classification
Bioinformatics
Hand-written character recognition

Machine learning focuses on?

Study These Flashcards

Prediction, based on known properties learned from the training data

Data mining focuses on?

Study These Flashcards

The discovery of previously known properties in the data. This is the analysis step of Knowledge Discovery in Databases

Data mining uses many?

Study These Flashcards

Machine learning methods but often with a slightly different goal in mind

Machine learning also employs?

Study These Flashcards

Data mining methods as unsupervised learning or as preprocessing step to improve learner accuracy

What is Cluster Analysis used for in Data Mining?

Study These Flashcards

Used for automatic identification of natural groupings of things
Employ unsupervised learning
Learns the clusters of things from past data, then assigns new instances
There is not an output/target variable

What is k-Means clustering alrgorithm?

Study These Flashcards

K: pre-determined number of clusters

Algorithm Step 0 determine the value of K

Steps of k-means?

Study These Flashcards

Step 1: Randomly generate k random points as initial cluster centers
Step 2: Assign each points to the nearest cluster center
Step 3: Re-compute the new cluster centers

What is k-means repetition step?

Repeat steps 3 and 4 until some convergence criterion is met

What is cluster analysis?

Finding groups of objects such that the objects in a group will be similar to one another and different from the objects in the other groups

Applications of Cluster Analysis?

Understanding - Group related documents for browsing, group genes, and proteins that have similar functionality Summarization - reduce the size of large data sets

What is a clustering?

A set of clusters

What are the types of Clusterings?

Partitional Clustering | Hierarchical Clustering

What is Partitional Clustering?

A division data objects into non-overlapping subsets such that each data object is in exactly one subset

What is Hierarchical Clustering?

A set of nested clusters organized as hierarchical tree

What is Association Rule?

Is a rule-based machine learning method for discovering interesting relations between variables in large databases

Challenges of Frequent Itemset mining?

Multiple scans of transaction database Huge number of candidates Tedious workload of support counting for candidates

Lecture 7 - BI and Data Mining 2 Flashcards

(33 cards)