12 - Terra Incognita Flashcards

Question

What is the effect of extremely simple models on training data?

Answer 1

They do badly because they are underfitting the data.

Answer 2

It goes to zero as models start overfitting.

Answer 3

The risk of error during testing.

Answer 4

The optimal balance between underfitting and overfitting.

Answer 5

Minimizing test error implies minimizing generalization error.

Answer 6

They are over-parameterized and should not generalize well.

Answer 7

Increasing the size of the network did not cause it to overfit the training data.

Answer 8

The model accommodates the noise and may still achieve zero training error.

Answer 9

It fits the training data perfectly, including any noise.

Answer 10

Test error continued decreasing as network size increased past the size required for zero training error.

Answer 11

Implicit regularization by stochastic gradient descent.

Answer 12

The effective capacity of several successful neural network architectures is large enough to shatter the training data.

Answer 13

The phenomenon where models can fit noisy data without significant overfitting.

Answer 14

They are set by engineers before training begins and influence model architecture and training process.

Answer 15

Networks where information flows one way from input to output.

Answer 16

Allowing feedback connections to remember previous inputs.

Answer 17

Training neural networks by minimizing loss.

Answer 18

A function that calculates the error made by the network.

Answer 19

To prevent overfitting by controlling model complexity.

Answer 20

Preventing the values for the weights from getting too large.

Answer 21

Randomly dropping some connections in the network.

Answer 22

Differentiable.

Answer 23

Randomly drop some connections during training ## Footnote This technique reduces the number of effective parameters.

Answer 24

The activation functions must be differentiable ## Footnote Some functions, like ReLU, are not differentiable at specific points but can still be used.

Answer 25

Supervised learning requires labeled training data, while unsupervised learning does not ## Footnote In unsupervised learning, the algorithm identifies clusters without explicit labels.

Answer 26

A method that creates implicit labels from unlabeled data without human involvement ## Footnote This approach has led to significant advancements in AI, such as ChatGPT.

Answer 27

Jitendra Malik and colleagues ## Footnote Their work focused on the PASCAL VOC dataset.

Answer 28

Region-based Convolutional Neural Network ## Footnote R-CNN outperformed existing methods in object detection after being fine-tuned.

Answer 29

Why a network trained on ImageNet could detect object boundaries well after fine-tuning ## Footnote Efros believed the CNN needed general information from ImageNet for effective boundary detection.

Answer 30

Efros lost the bet when R-CNN remained the best for object detection ## Footnote The bet was about achieving object detection without human annotations.

Answer 31

They predict masked words in sentences from a large corpus of text ## Footnote The learning process involves calculating loss and updating parameters.

Answer 32

To generate unmasked images from masked input images ## Footnote The MAE learns latent representations of key features in images.

Answer 33

A phenomenon where increasing model capacity leads to improved performance beyond interpolation ## Footnote It includes a first descent to a minimum test error, followed by an ascent, and a second descent.

Answer 34

The unexplored mathematical underpinnings of observed behaviors in over-parameterized neural networks ## Footnote This contrasts with the well-understood behavior in under-parameterized regimes.

Answer 35

The tension between theoretical and experimental approaches in machine learning ## Footnote This has implications for the development of new models and understanding their behavior.

Answer 36

The loss function is non-convex with many local minima ## Footnote This complicates the process of finding the global minimum.

Answer 37

False ## Footnote There are conflicting theories about the existence of local minima in the loss landscape.

Answer 38

It is extremely complicated and may contain local minima or global minima ## Footnote The landscape's complexity is a significant challenge for theorists.

Answer 39

Neural networks can get stuck in not-so-good local minima where the loss is non-zero ## Footnote This occurs despite the networks being over-parameterized.

Answer 40

A method where gradient descent is performed using small batches of training data ## Footnote It approximates the descent direction rather than using the exact steepest descent.

Answer 41

The phenomenon where a neural network learns to generalize beyond mere memorization after extensive training ## Footnote It involves understanding deeper patterns in the data.

Answer 42

A type of architecture especially suited for processing sequential data ## Footnote Examples include LLMs like ChatGPT.

Answer 43

It was trained on a table of modulo-97 addition examples and learned to represent numbers in a high-dimensional space ## Footnote The learning process involved predicting masked numbers in the addition equations.

Answer 44

It likely interpolates the training data and memorizes it ## Footnote This results in poor performance on unseen test data.

Answer 45

The transition from memorizing a table of answers to understanding underlying knowledge ## Footnote It is likened to physical phase changes, such as water turning to ice.

Answer 46

It was the first LLM to correctly answer about 50% of high school-level math questions in the MATH dataset ## Footnote Minerva used a large language model architecture to predict answers based on token sequences.

Answer 47

It converts the question into a sequence of tokens and predicts the answer token by token ## Footnote This raises questions about whether it is reasoning or merely pattern matching.

Answer 48

Periods of stagnation in AI research due to lack of progress or overhyped expectations ## Footnote Notable AI winters occurred in the late 1960s, 1970s, and late 1980s.

Answer 49

We may still be experiencing an AI winter regarding tasks that involve text comprehension and logical reasoning ## Footnote There's ongoing debate about the effectiveness of neural networks alone in achieving true AI.

Answer 50

They were trained using self-supervised learning to predict masked tokens ## Footnote This method does not involve explicit reasoning or problem-solving.

Answer 51

modulo-97 addition