5 - Birds of a Feather Flashcards

1
Q

What was the Cholera Inquiry Committee’s report primarily about?

A

A severe cholera outbreak in a London parish in 1854

The report highlighted the impact of the outbreak, particularly in the Soho area.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Who was a notable member of the Cholera Inquiry Committee?

A

John Snow

Snow was a physician known for his contributions to anesthesiology and epidemiology.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What hypothesis did John Snow propose regarding cholera?

A

Cholera was a waterborne disease

This hypothesis was supported by the clustering of the outbreak around a specific water pump.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What did John Snow’s map of Soho illustrate?

A

The locations of cholera deaths and water pumps

The map included a dotted line indicating the area affected by cholera.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a Voronoi cell?

A

A region defined such that any point inside is closer to a specific seed than to any other seed

In Snow’s context, the seed was a water pump.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What did Snow’s inner dotted line represent?

A

Equidistant points from the Broad Street pump and surrounding pumps

It helped demonstrate the relationship between deaths and proximity to water sources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What modern concept is illustrated by Snow’s analysis of the cholera outbreak?

A

Nearest neighbor search algorithms

This concept is fundamental in various fields, including machine learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Manhattan distance?

A

A measure of distance based on grid-like paths, summing the absolute differences of coordinates

It contrasts with Euclidean distance, which measures straight-line distance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What historical figure is known for significant contributions to optics and vision?

A

Abu Ali al-Hasan Ibn al-Haytham (Alhazen)

Alhazen’s work transformed the understanding of vision during the Islamic Golden Age.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the ‘faculty of discrimination’ according to Alhazen?

A

The cognitive process that compares what is seen to stored memories

This process aids in recognizing objects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What algorithm is associated with the concept of nearest neighbors?

A

Nearest Neighbor (NN) rule

This algorithm was formally analyzed in the 1950s and is crucial for pattern recognition.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

True or False: Alhazen’s theories on vision were widely accepted in his time.

A

False

His ideas were revolutionary compared to the prevailing theories at the time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Fill in the blank: John Snow’s analysis of the cholera outbreak led to the inspection of the _______.

A

Broad Street pump

This inspection revealed the contamination source related to the cholera outbreak.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What did the Cholera Inquiry Committee find regarding death rates in the ‘Cholera area’?

A

Deaths were over 10 percent, about 1,000 to every 10,000 persons living

This statistic highlights the severity of the outbreak.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What was a key innovation in Snow’s mapping technique?

A

Annotated map showing the correlation between cholera deaths and water pump locations

This visualization was groundbreaking for epidemiology.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How did Snow demonstrate that distance affected cholera infection rates?

A

He showed that deaths decreased as proximity to the Broad Street pump increased

This finding was crucial in establishing the waterborne theory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does the nearest neighbor rule help classify?

A

Data as belonging to one category or another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Who is associated with the initial concept of the nearest neighbor rule?

A

Alhazen

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What mathematical concept is used to represent points in a 2D or 3D coordinate system?

A

Vectors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How can a 7×9 image be represented mathematically?

A

As a 63-dimensional vector

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What do the pixels in a 7×9 image represent in terms of values?

A

0 for white and 1 for black

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What happens when you draw a numeral on a touch screen?

A

The pattern is stored as a 63-bit long number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the significance of clustering in the context of the nearest neighbor rule?

A

Vectors representing similar patterns cluster near each other in 63D space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the main task of a machine learning algorithm when given a new unlabeled pattern?

A

To determine whether it belongs to category 2 or 8

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is the nearest neighbor rule based on?

A

Finding the point nearest to a new unlabeled vector in hyperdimensional space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

When was the nearest neighbor rule first mathematically mentioned?

A

In a 1951 technical report by Fix and Hodges

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is a key feature of the nearest neighbor algorithm regarding data distribution?

A

It does not make any assumptions about the underlying data distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is a potential issue with using only one nearest neighbor?

A

Overfitting

29
Q

What is the recommended number of nearest neighbors to avoid ties in classification?

A

An odd number

30
Q

What happens when the nearest neighbor algorithm is applied with three neighbors?

A

It uses majority voting to classify the new data point

31
Q

What is the effect of increasing the number of nearest neighbors?

A

The boundary becomes smoother and more generalized

32
Q

What does overfitting refer to in machine learning?

A

The algorithm fitting too closely to the training data, including noise

33
Q

What is the trade-off when avoiding overfitting in a classifier?

A

Some misclassifications may occur in the training dataset

34
Q

What is the primary goal of the nearest neighbor algorithm?

A

To classify new data points based on proximity to labeled data

35
Q

Fill in the blank: Each point in a 3D coordinate system is represented by a _______.

36
Q

True or False: The nearest neighbor algorithm can only classify linearly separable data.

37
Q

What is overfitting in the context of classifiers?

A

Overfitting occurs when a classifier misclassifies some data points in the training dataset to achieve better performance on unseen data.

38
Q

Why is it desirable for a classifier to not overfit the training data?

A

A classifier that does not overfit is likely to make fewer errors when tested with unseen data.

39
Q

What is the Bayes optimal classifier?

A

The Bayes optimal classifier is the best a machine algorithm can do, assuming access to the underlying probability distributions of the data.

40
Q

How does the nearest neighbor algorithm differ from the Bayes optimal classifier?

A

The nearest neighbor algorithm makes fewer assumptions about the underlying distributions and relies solely on the available data.

41
Q

What is the significance of Jensen’s inequality and the dominated convergence theorem in the context of the nearest neighbor algorithm?

A

These mathematical results were significant in developing the intuition and proofs needed for the nearest neighbor algorithm’s efficacy.

42
Q

What is the 1-nearest neighbor (1-NN) rule?

A

The 1-NN rule classifies a new data point based on the closest point in the training dataset.

43
Q

What happens when the 1-NN algorithm is applied to a new penguin with a given bill depth?

A

The algorithm will classify the new penguin based on the majority class of its nearest neighbors.

44
Q

What is the relationship between the number of samples and the performance of the k-NN algorithm?

A

As the number of samples increases, the k-NN algorithm’s performance approaches that of the Bayes optimal classifier.

45
Q

What is the curse of dimensionality?

A

The curse of dimensionality refers to the challenges and inefficiencies that arise when analyzing data in high-dimensional spaces.

46
Q

How does the dimensionality of data affect the number of samples in a specific region?

A

As dimensionality increases, the probability of finding samples in a defined region decreases significantly.

47
Q

What is a nonparametric model?

A

A nonparametric model has no fixed number of parameters and uses all instances of training data for inference.

48
Q

What are the steps involved in the k-NN algorithm?

A
  1. Store all instances of sample data. 2. Calculate distances to new data points. 3. Sort distances and rearrange labels. 4. Classify based on majority label among nearest neighbors.
49
Q

True or False: The k-NN algorithm requires a fixed number of parameters.

50
Q

What is the mathematical relationship between the k-NN algorithm and the Bayes optimal classifier as the sample size increases?

A

The performance of the k-NN algorithm approaches that of the Bayes optimal classifier as the sample size increases.

51
Q

What is the primary disadvantage of the k-NN algorithm?

A

It requires increasing amounts of computational power and memory as the size of datasets grows.

52
Q

Fill in the blank: The k-NN algorithm classifies a new data point as ______ if the majority of its nearest neighbors are labeled as that class.

A

the same class

53
Q

What happens to the chance of finding a data point as the number of features increases to 1,000 or more?

A

The chance of finding a data point within a unit hypercube rapidly diminishes.

54
Q

What is a unit hypercube?

A

A unit hypercube is a geometric figure where the length of each side is equal to 1.

55
Q

What does Julie Delon mean by ‘In high dimensional spaces, nobody can hear you scream’?

A

It refers to the difficulty of finding data points in high-dimensional spaces.

56
Q

How can the problem of the curse of dimensionality be mitigated?

A

By increasing the number of data samples, but this must grow exponentially with the number of dimensions.

57
Q

What is the k-NN algorithm?

A

A machine learning algorithm that calculates distances between a new data point and each sample in the training dataset.

58
Q

What is the assumption behind the k-NN algorithm regarding data points?

A

Similar points have smaller distances between them than dissimilar points.

59
Q

What happens to distances between data points in high-dimensional space?

A

The behavior of distances becomes counterintuitive, affected by the volumes of hyperspheres and hypercubes.

60
Q

What is the volume of a unit sphere in higher dimensions?

A

The volume tends to zero as the number of dimensions increases.

61
Q

What is the volume of a unit hypercube regardless of dimensionality?

A

The volume is always 1.

62
Q

How does the number of vertices in a hypercube change with dimensions?

A

The number of vertices is 2 raised to the power of the number of dimensions (2^d).

63
Q

In a 3D unit cube, how far are the vertices from the origin?

A

The vertices are farther away than the surfaces of the cube which are 1 unit away from the origin.

64
Q

What happens to the volume of the unit hypersphere as dimensions increase?

A

Most of the volume of the hypercube ends up near the vertices, and the internal volume occupied by the hypersphere vanishes.

65
Q

What is the consequence of data points populating the corners of the hypercube?

A

Most corners are devoid of data points, leading to points being almost equidistant from each other.

66
Q

What is principal component analysis (PCA)?

A

A technique used to reduce high-dimensional data to a lower-dimensional space while preserving variation.

67
Q

What does Bellman suggest about the curse of dimensionality?

A

Significant results can still be obtained despite the curse.

68
Q

Fill in the blank: The k-NN algorithm works best for ______ data.

A

[low-dimensional]

69
Q

True or False: The volume of a unit hypersphere increases as the dimensionality increases.