w6 Flashcards

Question 1

Q

The presentation contrasts ELIZA and modern LLMs. How does the transition from symbolic programming to artificial neural networks impact the interpretability and ethical concerns of these systems?

Answer

A

Neural networks operate as “black boxes,” making their decision-making processes difficult to interpret, unlike symbolic systems with explicit rules. This lack of transparency raises ethical concerns about accountability, bias, and fairness in AI applications.

Question 2

Q

Explain why benchmarking in AI is considered “broken,” and propose a cognitive psychology-inspired approach to better evaluate LLMs’ abilities.

Answer

A

Benchmarks fail because LLMs often exploit dataset shortcuts without true understanding. A psychology-inspired approach would use hypothesis-driven evaluations that mimic human cognitive tasks, such as theory-of-mind tests adapted for token prediction, as suggested in the presentation.

Question 3

Q

How do the challenges of the “last 10%” problem reflect broader limitations in AI generalization and understanding?

Answer

A

The “last 10%” highlights AI’s struggle with variability, context sensitivity, and edge cases, which require a depth of reasoning and adaptability that current systems lack. This reflects their reliance on patterns rather than conceptual understanding, as discussed in Chapter 13.

Question 4

Q

How might LLMs’ lack of grounding in physical or social experiences affect their ability to handle theory-of-mind tasks?

Answer

A

Without grounding, LLMs cannot form causal or intentional models, leading to superficial responses in theory-of-mind tasks. Their outputs might mimic understanding but lack the depth required for accurate social reasoning.

Question 5

Q

How do content-sensitive patterns observed in reasoning tasks challenge traditional views of symbolic reasoning, and what does this mean for AI’s potential?

Answer

A

Content-sensitive patterns suggest that LLMs can mimic human-like reasoning without explicitly following symbolic rules. This challenges traditional views by proposing that statistical models may develop novel forms of reasoning, distinct from human cognition.

Question 6

Q

What is a key reason benchmarking is considered “broken” for evaluating LLMs, as discussed in the presentation?

a) Benchmarks fail to measure computational efficiency.
b) LLMs solve benchmarks using superficial patterns rather than deeper understanding.
c) Benchmarks only evaluate explicit bias and not implicit bias.
d) Benchmarks rely on human testers, which introduces variability.

Answer

A

b)LLMs solve benchmarks using superficial patterns rather than deeper understanding.

Question 7

Q

Why does the presentation argue that targeted evaluation is preferable to standard benchmarking for LLMs?
a) It requires less computational power.
b) It aligns with the principle of avoiding sweeping conclusions about LLMs.
c) It allows for faster model fine-tuning.
d) It eliminates biases in the training data.

Answer

A

: b) It aligns with the principle of avoiding sweeping conclusions about LLMs.

Question 8

Q

According to the Mitchell-Krakauer article, why is “scale is all you need” considered a controversial claim?
a) It dismisses the need for diverse training data.
b) It overlooks the importance of model interpretability.
c) It assumes that increasing model size will lead to genuine understanding.
d) It disregards the role of emergent abilities in smaller models.

Answer

A

c) It assumes that increasing model size will lead to genuine understanding.

Question 9

Q

What principle is emphasized in the presentation to evaluate LLMs’ theory-of-mind abilities?
a) Using explicit rule-based reasoning tasks.
b) Translating cognitive tasks into token prediction tasks.
c) Measuring emotional alignment with human responses.
d) Ensuring the model has not seen the test during training.

Answer

A

b) Translating cognitive tasks into token prediction tasks.

Question 10

Q

Which of the following is NOT a challenge identified in Chapter 13’s “last 10%” problem?
a) Speech recognition systems handling unknown words.
b) Machine translation systems interpreting idiomatic expressions.
c) Object detection systems failing on common objects.
d) AI models understanding nuanced contextual meaning.

Answer

A

c) Object detection systems failing on common objects.

Question 11

Q

True or False: According to the presentation, modern LLMs like GPT-4 are interpretable due to their self-learning mechanisms.

Question 12

Q

True or False: The Mitchell-Krakauer article argues that LLMs rely on statistical patterns and lack grounding in physical and social experiences.

Question 13

Q

True or False: Chapter 13 suggests that the “last 10%” problem for speech recognition is primarily caused by computational inefficiency.

Answer

A

Answer: False

Question 14

Q

True or False: The presentation highlights that implicit bias in LLMs can persist even in models explicitly fine-tuned to eliminate explicit bias.

Answer

A

Answer: True

Question 15

Q

True or False: Benchmarking AI on standard datasets is still considered the best way to measure understanding and reasoning abilities.

Answer

A

Answer: False

Question 16

Q

Three key principles of LLM psychology

Answer

A

Transform cognitive task into word prediction task
Consider (and control for) the training data
Avoid sweeping conclusions (and sweeping questions)

Question 17

Q

explain what does the 1st principle mean (LLMs as next-token prediction machines)

Answer

A

At their core, Large Language Models (LLMs) are prediction machines designed to compute the likelihood of the next token (word or character) given a sequence of prior tokens.
This principle emphasizes that all behaviors exhibited by LLMs, such as reasoning or answering questions, stem from this fundamental task.
Implication: To fairly evaluate LLMs, any cognitive or reasoning task must be reframed as a next-token prediction problem.

Question 18

Q

explain the principle 2 - Consider the training data

Answer

A

Modern LLMs are trained on astronomical amounts of data, often without full transparency regarding the datasets.
This introduces the possibility that models may have encountered test cases during training (data contamination).
Implication: Evaluations of LLMs must account for the training data to avoid overestimating their generalization abilities.

Question 19

Q

explain the principle 3 - Avoid sweeping conclusions (and sweeping questions)

Answer

A

LLMs’ behaviors should not lead to overgeneralized claims about their capabilities or limitations.
Example: A failure in one context does not imply a lack of understanding, just as a success does not equate to humanlike reasoning.
Implication: Researchers should adopt a nuanced approach, avoiding extreme skepticism or overconfidence, and focus on specific abilities in well-defined contexts.

Question 20

Q

Machine psychology

Answer

A

Machine psychology, is the study of artificial systems, such as large language models (LLMs), to understand their behavior and capabilities through psychological principles.
- It involves analyzing their outputs (e.g., reasoning, language use) as emergent properties of their design (e.g., next-token prediction) and training data, while emphasizing that these systems do not think or understand like humans.
- It focuses on evaluating machine “cognition” using tools and frameworks from human psychology but tailored to the limitations and mechanics of AI systems.

Question 21

Q

can you explain nativism vs emergentism/conectionism

Answer

A

nativism suggests that LLMs might succeed at certain tasks because their architecture mimics innate principles, like statistical learning frameworks. .
Emergentism/Connectionism, by contrast, frames LLM abilities as arising from their exposure to vast amounts of training data and the learned patterns within, rather than any “innate” programming of specific cognitive structures.
LMs can produce fluent text — but do they actually know the rules of grammar?
✅ Yes!

Question 22

Q

If model knows grammar, then: P(grammatical) > P(ungrammatical)
whats the problem with this logic

Answer

A

Problem: many factors affect word probability — beyond grammar!
Solution: use minimal pars, pairs of sentences with minimal difference!
Ideally: sentences do not occur in training data (syntactic generalisation)

Question 23

Q

But are they truly reasoning, or are they just parroting and pattern matching?

Answer

A

Traditional (symbolic) view
► LLMs only use “simple heuristics” rather than true abstract reasoning
► Their apparent reasoning is just pattern matching from training data
Emergentist (connectionist)
► Human reasoning is not logical, but
content-sensitive and contextual
► These reasoning patterns emerge
naturally from DNN/LLM training

false dichtomy, they do both

Question 24

Q

t or f

Both humans and LLMs much better when content supports conclusion!

Question 25

Q

Language and Grammar:

Answer

A

LLMs demonstrate remarkable fluency and coherence in language generation, capable of producing grammatically correct sentences.
They excel at capturing statistical patterns of language but lack true semantic understanding, relying on next-token prediction rather than grounded comprehension.

Question 26

Q

REASONING

Answer

A

LLMs can perform logical and analogical reasoning tasks, but their success depends on statistical patterns in the training data rather than genuine cognitive processes.
They often fail on abstract or novel reasoning tasks requiring true generalization.

Question 27

Q

Theory of Mind:

Answer

A

While LLMs can simulate behaviors resembling a theory of mind (e.g., predicting others’ beliefs), these are surface-level approximations derived from language patterns.
They lack the experiential and embodied grounding required for genuine mental state attribution.

Question 28

Q

Bias and Prejudice:

Answer

A

LLMs inherit biases present in their training data, reflecting and amplifying societal stereotypes and prejudices.
Addressing these biases requires careful dataset curation, fine-tuning, and transparency in deployment.

Question 29

Q

eliza vs chat bot vs human mind

Answer

A

ELIZA
Symbolic computer program
Simple (~250 lines of code)
Human designed
Human-interpretable

Large Language Models
Artificial Neural Network
Complex (billions of parameters)
Self-taught (machine learning)
Non-interpretable (“black box”)

The human mind
Biological neural network
Complex (~100B units, 100T connections)
Self-taught (evolution + learning)
Non-interpretable (biological “black box”)

Question 30

Q

what is behaviorist and cognitivist approaches to understanding LLMs and why are we switching to cognitivist

Answer

A

The behaviorist approach to understanding LLMs focuses on evaluating observable outputs, such as whether the model can mimic human-like behavior, without considering the internal cognitive processes. In contrast, the cognitivist approach seeks to understand the internal workings of the model, such as how it represents and manipulates information. The shift toward cognitivism arises from the limitations of behaviorism, as it fails to address the actual cognitive capabilities of LLMs, such as reasoning and generalization, making cognitivism more suitable for understanding and improving these models.

Question 31

Q

why is the dichotomy of true believers vs skeptics a false dichotomy in the context of LLMs undrstanding

Answer

A

becasue understanding is not a single monolithic concept

Question 32

Q

can LLMs expan to many abstract syntactic rules

Answer

A

yes
- GPT-style transformer LLMs learn
human-like, abstract syntactic
knowledge, even from only 40M
words (~training data of a 4 yr old)

Question 33

Q