Exam 3 Flashcards
What is Unsupervised learning (clustering)?
- the class labels of training data are unknown
- given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data
What do decision trees do?
identify ways to split a data set
What does a decision tree start with?
Root Node
What predicts discrete labels?
classification
What predicts continuous quantity or values?
regression
What does multi-class classification require?
requires that a sample only have one class
What is a small portion of a decision tree called?
sub-tree
Type of classification algorithms in machine learning? (4)
- linear classifiers
- k-nearest-neighbors - decision trees
- support vector machines
- neural networks
The data used to view a classification model is called…
Training Data
In supervised learning, training data includes both ____ and _____
input & desired output
Validation data is used for…
testing the model
For SVM the trick is to do ____ ______ data mapping
high dimensional
The effectiveness of SVM depends on…
- section
- parameters
SVM are useful alternative to which model?
ANN
To divide the data into distinct groups so that points in a group are very similar is the main point of what model?
K means clustering
Example of non-probabilistic binary linear classifiers
SVM specifically using the kernel method
In supervised learning, training data is accompanied by…
class labels indicating the class of observation
The mathematical methods of choosing the best split are… (2)
Entropy & Information Gain
For decision tree, the splitting method is by…
reduction in variance
What is Overfitting?
Model is too specific to training data and may have poor accuracy for unseen samples
Two approaches to avoid overfitting
pre-pruning & post-pruning
The basic algorithm for decision trees is
recursive partitioning (top-down recursive divide-and-conquer manner)
Typically the ______ between each pair of adjacent values is considered as a possible split point
midpoint
Random forest used the ____ ____ to construct decision trees
gini index
Trees represent knowledge in the form of _________ rules
IF-THEN
The motivation for SVM is to categorize new unseen objects into two separate groups based on their ______ and _______
Properties & a Set of Known Examples already categorized
What is one of the key areas in machine learning?
Kernel Methods
What are the two key concepts of SVM?
- maximize the margin
- the kernel trick
What are supervised learning models of associated learning algorithms that analyze data and recognize patterns?
Support Vector Machines (SVM)
How do you choose the best support vector in SVM?
Choose the hyperplane that maximizes the margin between classes
What are the vectors points that the margin lines touch known as?
Support Vectors
Large value of parameter C = _____ margin
Small
Small value of parameter C = _____ margin
Large
How is distance measured for KNN?
Euclidean distance
What do the KNN algorithm assume?
similar things exist in close proximity
What is the K value in KNN?
K is the number of existing data points that will be compared to the new data point
How are data points assigned in KNN?
The closest “K” neighbors are compared to the new point and assigned to the category in the majority among the neighbors
what happens when K is too small?
could be sensitive to noise
What happens if K is too large?
neighborhood might include points from other classes
The value of k must be : even or odd?
odd to eliminate ties
Which model makes NO ASSUMPTIONS about the data?
KNN
Typically choose the value of k which has the lowest ____ _____ in _____ data
error rate; validation
When using KNN for prediction, the model uses the….
average of response values