Final Review Flashcards

1
Q

Who developed the VC dimension?

A

Dr. Vladimir Vapnik and Dr. Chervonenkis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does VC stand for?

A

Vapnik-Chervonekis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why was the VC dimension developed?

A

To help scientists develop better machine learning models that would be better at classifying data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the VC dimension?

A

The Vapnik-Chervonenkis dimension of a hypothesis space is the maximum number of points that can be “shattered” by the hypothesis of that space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

In terms of VC dimension, what does “shattering” mean?

A

“Shattering” means that for every possible way of labeling these points (with binary labels), there is a hypothesis in the space that correctly classifies the points according to that labeling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Is either of these true? Why or why not?

The VC dimension of the hypothesis class of circles is lower than that of squares.

The VC dimension of the hypothesis class of squares is lower than that of circles.

A

No, because both have a VC dimension of 3.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Is this true? Why or why not?

The VC dimension of the hypothesis class of rings is higher that that of circles.

A

Yes, because rings have a VC dimension of 4 while circles have a VC dimension of 3.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are 2 important notes when discussing the VC dimension of a hypothesis space?

A
  • Knowing if the hypothesis space is centered on the origin
  • Knowing that the VC dimension is NOT always equivalent to 1 + (number of dimensions)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Is either of these true? Why or why not?

The VC dimension of circles is higher than that of squares.

The VC dimension of squares is higher that that of circles.

A

No, because both have a VC dimension of 3.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Who Invented Bayes’ Rule?

A

Thomas Bayes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What was the goal of Bayes’ Rule?

A

To solve the problem of inverse probability - inferring the probability of causes (hypotheses) from observed effects (data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Bayes’ Rule?

A

A fundamental theorem in probability theory for updating the probability of a hypothesis based on new evidence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the formula for Bayes’ Rule?

A

P(h|D) = [P(D|h) * P(h)] / P(D)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In Bayes’ Rule, what is P(h|D) and what does it represent?

A

The posterior probability of the hypothesis given the data.

It represents our updated belief in the hypothesis after seeing the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In Bayes’ Rule, what is P(D|h) and what does it represent?

A

P(D|h) is the likelihood, which is the probability of observing the data assuming the hypothesis is true.

It tells us how likely the observed data is under the assumption of the hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In Bayes’ Rule, what is P(h) and what does it represent?

A

P(h) is the prior probability of the hypothesis before observing any data.

It represents our belief in the hypothesis based on prior knowledge or assumptions.

16
Q

In Bayes’ Rule, what is P(D) and what does it represent?

A

P(D) is the marginal likelihood (evidence), which normalizes the posterior distribution by ensuring the total probability sums to 1.

It represents the total probability of observing the data across all possible hypotheses.

17
Q

For Bayes’ Rule, when would we ignore P(D)?

A

When we interested in finding the hypothesis that maximizes the posterior probability - P(h|D)

18
Q

What is MLE?

A

Maximum Likelihood Estimation is a way to estimate the parameters of a model by finding the values that make the observed data most likely

19
Q

What does MLE mean in the context of binary classification?

A

This means finding the parameters of the hypothesis h(x) that make the observed outcomes y more likely.

20
Q

What is MSE? How is it written?

A

Mean Squared Error
MSE = (1/n) * (Σ(y - h(x))^2

21
Q

Why is MSE unsuitable for binary classification?

A
  • Binary classification treats targets as discrete (0 or 1), while MSE assumes targets are continuous values.
  • MSE doesn’t penalize incorrect predictions made with high confidence as effectively as cross-entropy loss does
  • MSE here can result in poor performance because it focuses on shortening distance between predicted and actual values, ignoring the probability estimates produced by models in classification tasks
22
Q

What function is typically used for a binary classification neural network’s output layer? Why?

A

A sigmoid activation function is typically used because it outputs a value between 0 and 1 which can be evaluated with cross-entropy loss to predict how well it matches the actual binary label.

23
Q

Describe Cross-Entropy Loss (CEL)

A
  • Used in binary classification
  • Measures the difference between the predicted probabilities and the true binary labels
  • Directly handles probabilistic predictions
  • Strongly penalizes wrong predictions made with high confidence
24
Q

Describe Mean Squared Error (MSE)

A
  • Typically used in regression tasks where the target variable is continuous
  • Measures the squared difference between predicted values and true values
  • Treats output as continuous, making it LESS suitable for binary classification
25
Q

Describe Neural Networks with Sum of Squared Error (SSE)

A
  • If trained for a binary classification task, it may treat the output as continuous rather than probabilistic
  • For binary classification, use CEL (Cross-Entropy Loss) with a sigmoid output layer
26
Q

What is a limitation of K-Means?

A
  • K-Means struggles with non-spherical clusters
  • K-Means requires a pre-defined number of clusters
  • K-Means may converge to local minima
27
Q

What are the steps of spectral clustering?

A
  • Construct a similarity graph
  • Compute the graph Laplacian
  • Compute the eigenvalues and eigenvectors
  • Perform clustering in the reduced eigenspace
28
Q

What is the kernel trick?

A

This trick maps data into a higher-dimensional space where linear separation is possible

29
Q

Why does the kernel trick matter for concentric circles?

A

This trick allows spectral clustering to transform into the original space, enabling the separation of non-linearly separable points, such as inner and outer circles

30
Q

What is the primary advantage of Isomap over traditional MDS?

A

Isomaps approximate geodesic distances, capturing global structure

31
Q

Advantages and Limitations of Isomap

A

Advantage:
- Captures global structure effectively, especially for data with a natural manifold shape

Limitations:
- Computationally intensive on large datasets
- Struggles with non-manifold high-dimensional data

32
Q

What is the core concept of Laplacian Eigenmaps?

A

Laplacian eigenmaps use spectral graph theory to focus on local neighborhood relationships, preserving local structure

33
Q

What is the process for Laplacian Eigenmaps?

A
  • Build a weighted similarity graph of the data
  • Compute the graph’s Laplacian matrix
  • Perform eigenvalue decomposition to generate a low-dimensional embedding
34
Q

Laplacian Eigenmaps are best suited for preserving what type of structure?

A

Local neighborhood structure

35
Q

Name applications of Laplacian Eigenmaps

A
  • Image Processing
  • Clustering in High-Dimensional Data
  • Sensor Data Analysis
  • Gene Expression Data
36
Q

Advantages and Limitations of Laplacian Eigenmaps

A

Advantage: Efficient for preserving local structures, which is useful for clustering tasks

Limitation: May lose global relationships as it focuses solely at the local level

37
Q
A