Exam 3 Flashcards

1
Q

What is Unsupervised learning (clustering)?

A
  • the class labels of training data are unknown
  • given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What do decision trees do?

A

identify ways to split a data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does a decision tree start with?

A

Root Node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What predicts discrete labels?

A

classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What predicts continuous quantity or values?

A

regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does multi-class classification require?

A

requires that a sample only have one class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a small portion of a decision tree called?

A

sub-tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Type of classification algorithms in machine learning? (4)

A
  • linear classifiers
    - k-nearest-neighbors
  • decision trees
  • support vector machines
  • neural networks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The data used to view a classification model is called…

A

Training Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In supervised learning, training data includes both ____ and _____

A

input & desired output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Validation data is used for…

A

testing the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

For SVM the trick is to do ____ ______ data mapping

A

high dimensional

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The effectiveness of SVM depends on…

A
  • section
  • parameters
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

SVM are useful alternative to which model?

A

ANN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

To divide the data into distinct groups so that points in a group are very similar is the main point of what model?

A

K means clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Example of non-probabilistic binary linear classifiers

A

SVM specifically using the kernel method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

In supervised learning, training data is accompanied by…

A

class labels indicating the class of observation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

The mathematical methods of choosing the best split are… (2)

A

Entropy & Information Gain

19
Q

For decision tree, the splitting method is by…

A

reduction in variance

20
Q

What is Overfitting?

A

Model is too specific to training data and may have poor accuracy for unseen samples

21
Q

Two approaches to avoid overfitting

A

pre-pruning & post-pruning

22
Q

The basic algorithm for decision trees is

A

recursive partitioning (top-down recursive divide-and-conquer manner)

23
Q

Typically the ______ between each pair of adjacent values is considered as a possible split point

A

midpoint

24
Q

Random forest used the ____ ____ to construct decision trees

A

gini index

25
Q

Trees represent knowledge in the form of _________ rules

A

IF-THEN

26
Q

The motivation for SVM is to categorize new unseen objects into two separate groups based on their ______ and _______

A

Properties & a Set of Known Examples already categorized

27
Q

What is one of the key areas in machine learning?

A

Kernel Methods

28
Q

What are the two key concepts of SVM?

A
  • maximize the margin
  • the kernel trick
29
Q

What are supervised learning models of associated learning algorithms that analyze data and recognize patterns?

A

Support Vector Machines (SVM)

30
Q

How do you choose the best support vector in SVM?

A

Choose the hyperplane that maximizes the margin between classes

31
Q

What are the vectors points that the margin lines touch known as?

A

Support Vectors

32
Q

Large value of parameter C = _____ margin

A

Small

33
Q

Small value of parameter C = _____ margin

A

Large

34
Q

How is distance measured for KNN?

A

Euclidean distance

35
Q

What do the KNN algorithm assume?

A

similar things exist in close proximity

36
Q

What is the K value in KNN?

A

K is the number of existing data points that will be compared to the new data point

37
Q

How are data points assigned in KNN?

A

The closest “K” neighbors are compared to the new point and assigned to the category in the majority among the neighbors

38
Q

what happens when K is too small?

A

could be sensitive to noise

39
Q

What happens if K is too large?

A

neighborhood might include points from other classes

40
Q

The value of k must be : even or odd?

A

odd to eliminate ties

41
Q

Which model makes NO ASSUMPTIONS about the data?

A

KNN

42
Q

Typically choose the value of k which has the lowest ____ _____ in _____ data

A

error rate; validation

43
Q

When using KNN for prediction, the model uses the….

A

average of response values