5 - Birds of a Feather Flashcards

Question 1

Q

What was the Cholera Inquiry Committee’s report primarily about?

Answer

A

A severe cholera outbreak in a London parish in 1854

The report highlighted the impact of the outbreak, particularly in the Soho area.

Question 2

Q

Who was a notable member of the Cholera Inquiry Committee?

Answer

A

John Snow

Snow was a physician known for his contributions to anesthesiology and epidemiology.

Question 3

Q

What hypothesis did John Snow propose regarding cholera?

Answer

A

Cholera was a waterborne disease

This hypothesis was supported by the clustering of the outbreak around a specific water pump.

Question 4

Q

What did John Snow’s map of Soho illustrate?

Answer

A

The locations of cholera deaths and water pumps

The map included a dotted line indicating the area affected by cholera.

Question 5

Q

What is a Voronoi cell?

Answer

A

A region defined such that any point inside is closer to a specific seed than to any other seed

In Snow’s context, the seed was a water pump.

Question 6

Q

What did Snow’s inner dotted line represent?

Answer

A

Equidistant points from the Broad Street pump and surrounding pumps

It helped demonstrate the relationship between deaths and proximity to water sources.

Question 7

Q

What modern concept is illustrated by Snow’s analysis of the cholera outbreak?

Answer

A

Nearest neighbor search algorithms

This concept is fundamental in various fields, including machine learning.

Question 8

Q

What is Manhattan distance?

Answer

A

A measure of distance based on grid-like paths, summing the absolute differences of coordinates

It contrasts with Euclidean distance, which measures straight-line distance.

Question 9

Q

What historical figure is known for significant contributions to optics and vision?

Answer

A

Abu Ali al-Hasan Ibn al-Haytham (Alhazen)

Alhazen’s work transformed the understanding of vision during the Islamic Golden Age.

Question 10

Q

What is the ‘faculty of discrimination’ according to Alhazen?

Answer

A

The cognitive process that compares what is seen to stored memories

This process aids in recognizing objects.

Question 11

Q

What algorithm is associated with the concept of nearest neighbors?

Answer

A

Nearest Neighbor (NN) rule

This algorithm was formally analyzed in the 1950s and is crucial for pattern recognition.

Question 12

Q

True or False: Alhazen’s theories on vision were widely accepted in his time.

Answer

A

False

His ideas were revolutionary compared to the prevailing theories at the time.

Question 13

Q

Fill in the blank: John Snow’s analysis of the cholera outbreak led to the inspection of the _______.

Answer

A

Broad Street pump

This inspection revealed the contamination source related to the cholera outbreak.

Question 14

Q

What did the Cholera Inquiry Committee find regarding death rates in the ‘Cholera area’?

Answer

A

Deaths were over 10 percent, about 1,000 to every 10,000 persons living

This statistic highlights the severity of the outbreak.

Question 15

Q

What was a key innovation in Snow’s mapping technique?

Answer

A

Annotated map showing the correlation between cholera deaths and water pump locations

This visualization was groundbreaking for epidemiology.

Question 16

Q

How did Snow demonstrate that distance affected cholera infection rates?

Answer

A

He showed that deaths decreased as proximity to the Broad Street pump increased

This finding was crucial in establishing the waterborne theory.

Question 17

Q

What does the nearest neighbor rule help classify?

Answer

A

Data as belonging to one category or another

Question 18

Q

Who is associated with the initial concept of the nearest neighbor rule?

Question 19

Q

What mathematical concept is used to represent points in a 2D or 3D coordinate system?

Question 20

Q

How can a 7×9 image be represented mathematically?

Answer

A

As a 63-dimensional vector

Question 21

Q

What do the pixels in a 7×9 image represent in terms of values?

Answer

A

0 for white and 1 for black

Question 22

Q

What happens when you draw a numeral on a touch screen?

Answer

A

The pattern is stored as a 63-bit long number

Question 23

Q

What is the significance of clustering in the context of the nearest neighbor rule?

Answer

A

Vectors representing similar patterns cluster near each other in 63D space

Question 24

Q

What is the main task of a machine learning algorithm when given a new unlabeled pattern?

Answer

A

To determine whether it belongs to category 2 or 8

Question 25

Q

What is the nearest neighbor rule based on?

Answer

A

Finding the point nearest to a new unlabeled vector in hyperdimensional space

Question 26

Q

When was the nearest neighbor rule first mathematically mentioned?

Answer

A

In a 1951 technical report by Fix and Hodges

Question 27

Q

What is a key feature of the nearest neighbor algorithm regarding data distribution?

Answer

A

It does not make any assumptions about the underlying data distribution

Question 28

Q

What is a potential issue with using only one nearest neighbor?

Answer

A

Overfitting

Question 29

Q

What is the recommended number of nearest neighbors to avoid ties in classification?

Answer

A

An odd number

Question 30

Q

What happens when the nearest neighbor algorithm is applied with three neighbors?

Answer

A

It uses majority voting to classify the new data point

Question 31

Q

What is the effect of increasing the number of nearest neighbors?

Answer

A

The boundary becomes smoother and more generalized

Question 32

Q

What does overfitting refer to in machine learning?

Answer

A

The algorithm fitting too closely to the training data, including noise

Question 33

Q

What is the trade-off when avoiding overfitting in a classifier?

Answer

A

Some misclassifications may occur in the training dataset

Question 34

Q

What is the primary goal of the nearest neighbor algorithm?

Answer

A

To classify new data points based on proximity to labeled data

Question 35

Q

Fill in the blank: Each point in a 3D coordinate system is represented by a _______.

Answer

A

[x, y, z]

Question 36

Q

True or False: The nearest neighbor algorithm can only classify linearly separable data.

Question 37

Q

What is overfitting in the context of classifiers?

Answer

A

Overfitting occurs when a classifier misclassifies some data points in the training dataset to achieve better performance on unseen data.

Question 38

Q

Why is it desirable for a classifier to not overfit the training data?

Answer

A

A classifier that does not overfit is likely to make fewer errors when tested with unseen data.

Question 39

Q

What is the Bayes optimal classifier?

Answer

A

The Bayes optimal classifier is the best a machine algorithm can do, assuming access to the underlying probability distributions of the data.

Question 40

Q

How does the nearest neighbor algorithm differ from the Bayes optimal classifier?

Answer

A

The nearest neighbor algorithm makes fewer assumptions about the underlying distributions and relies solely on the available data.

Question 41

Q

What is the significance of Jensen’s inequality and the dominated convergence theorem in the context of the nearest neighbor algorithm?

Answer

A

These mathematical results were significant in developing the intuition and proofs needed for the nearest neighbor algorithm’s efficacy.

Question 42

Q

What is the 1-nearest neighbor (1-NN) rule?

Answer

A

The 1-NN rule classifies a new data point based on the closest point in the training dataset.

Question 43

Q

What happens when the 1-NN algorithm is applied to a new penguin with a given bill depth?

Answer

A

The algorithm will classify the new penguin based on the majority class of its nearest neighbors.

Question 44

Q

What is the relationship between the number of samples and the performance of the k-NN algorithm?

Answer

A

As the number of samples increases, the k-NN algorithm’s performance approaches that of the Bayes optimal classifier.

Question 45

Q

What is the curse of dimensionality?

Answer

A

The curse of dimensionality refers to the challenges and inefficiencies that arise when analyzing data in high-dimensional spaces.

Question 46

Q

How does the dimensionality of data affect the number of samples in a specific region?

Answer

A

As dimensionality increases, the probability of finding samples in a defined region decreases significantly.

Question 47

Q

What is a nonparametric model?

Answer

A

A nonparametric model has no fixed number of parameters and uses all instances of training data for inference.

Question 48

Q

What are the steps involved in the k-NN algorithm?

Answer

A

Store all instances of sample data. 2. Calculate distances to new data points. 3. Sort distances and rearrange labels. 4. Classify based on majority label among nearest neighbors.

Question 49

Q

True or False: The k-NN algorithm requires a fixed number of parameters.

Question 50

Q

What is the mathematical relationship between the k-NN algorithm and the Bayes optimal classifier as the sample size increases?

Answer

A

The performance of the k-NN algorithm approaches that of the Bayes optimal classifier as the sample size increases.

Question 51

Q

What is the primary disadvantage of the k-NN algorithm?

Answer

A

It requires increasing amounts of computational power and memory as the size of datasets grows.

Question 52

Q

Fill in the blank: The k-NN algorithm classifies a new data point as ______ if the majority of its nearest neighbors are labeled as that class.

Answer

A

the same class

Question 53

Q

What happens to the chance of finding a data point as the number of features increases to 1,000 or more?

Answer

A

The chance of finding a data point within a unit hypercube rapidly diminishes.

Question 54

Q

What is a unit hypercube?

Answer

A

A unit hypercube is a geometric figure where the length of each side is equal to 1.

Question 55

Q

What does Julie Delon mean by ‘In high dimensional spaces, nobody can hear you scream’?

Answer

A

It refers to the difficulty of finding data points in high-dimensional spaces.

Question 56

Q

How can the problem of the curse of dimensionality be mitigated?

Answer

A

By increasing the number of data samples, but this must grow exponentially with the number of dimensions.

Question 57

Q

What is the k-NN algorithm?

Answer

A

A machine learning algorithm that calculates distances between a new data point and each sample in the training dataset.

Question 58

Q

What is the assumption behind the k-NN algorithm regarding data points?

Answer

A

Similar points have smaller distances between them than dissimilar points.

Question 59

Q

What happens to distances between data points in high-dimensional space?

Answer

A

The behavior of distances becomes counterintuitive, affected by the volumes of hyperspheres and hypercubes.

Question 60

Q

What is the volume of a unit sphere in higher dimensions?

Answer

A

The volume tends to zero as the number of dimensions increases.

Question 61

Q

What is the volume of a unit hypercube regardless of dimensionality?

Answer

A

The volume is always 1.

Question 62

Q

How does the number of vertices in a hypercube change with dimensions?

Answer

A

The number of vertices is 2 raised to the power of the number of dimensions (2^d).

Question 63

Q

In a 3D unit cube, how far are the vertices from the origin?

Answer

A

The vertices are farther away than the surfaces of the cube which are 1 unit away from the origin.

Question 64

Q

What happens to the volume of the unit hypersphere as dimensions increase?

Answer

A

Most of the volume of the hypercube ends up near the vertices, and the internal volume occupied by the hypersphere vanishes.

Answer 61

A

Most corners are devoid of data points, leading to points being almost equidistant from each other.

Answer 62

A

A technique used to reduce high-dimensional data to a lower-dimensional space while preserving variation.

Answer 63

A

Significant results can still be obtained despite the curse.

Answer 64

A

[low-dimensional]