lecture 11 Flashcards

Question 1

Q

limitations of rule-based systems

Answer

A

high precision but low recall
tedious

Question 2

Q

why end-to-end supervised learning

Answer

A

= perform a task directly from raw input data to the desired output

if a phenomenon is systematic, it should arise from examples

the learning algorithm will infer relevant features and tendencies itself if shown through examples

Question 3

Q

black box

Answer

A

we don’t know what the system learned

Internally, the system applies learned patterns to new data, but we, as the users, can’t always see or understand the specific logic it employed to reach its conclusions. This opacity requires us to conduct selective and detailed error analysis to better comprehend where and why the algorithm makes mistakes.

Question 4

Q

bias

Answer

A

systematic error
evaluation metrics do not reveal bias
systems may perform well on the test set, but this is not indicative of how they will perform in the real world
–> better performance does not necessarily mean better understanding

Question 5

Q

consequences of bias

Answer

A

behavior on new data in the real word might be worse (doesnt generalize)
ethical issues
scientific questions/hypothesis testing: better performance does not necessarily mean better understanding

Question 6

Q

natural language inference (NLI)

Answer

A

premise and hypothesis
does the hypothesis follow from the premise
entailment, contradiction, neutral
general means of assessing how well a system can understand natural language

Question 7

Q

NLI architecture

Answer

A

sentences representing the premise and hypothesis are separately processed by their respective sentence encoders.
–> the output of the premise encoder is labeled ‘u’ and the output of the hypothesis encoder is labeled ‘v’.
These encoded vectors, ‘u’ and ‘v’, are then combined and passed through a fully connected layer which aims to capture the relationship between the two sentences.
a softmax function is applied to produce the final classification, which could be ‘entailment’, ‘contradiction’, or ‘neutral’ based on the combined encoded information

Question 8

Q

possible shortcuts for NLI

Answer

A

standard majority class
superficial similarities between premise and hypothesis
–> e.g., exact word matches, similar phrases
–> can a model perform well using only BOW representations
accidental characteristics of the hypotheses.
–> certain unintended patterns or features in the hypotheses could skew predictions, leading models to latch onto these incidental cues instead of performing a true semantic analysis.
–> solution: use hypothesis-only baselines

Question 9

Q

hypothesis-only baselines

Answer

A

train state-of-the-art model on hypotheses only

sentence is processed by a sentence encoder, which transforms it into a vector representation, labeled as (v) and passed through fully connected layer
analyze characteristics in the hypotheses

Question 10

Q

lessons about shortcuts

Answer

A

performance does not tell the full story
a majority baseline is not a sufficiently high bar
the process of dataset creation can introduce unwanted patterns that models may exploit. For instance, if certain words or phrases are more commonly associated with specific outcomes due to the way data was labeled, models will pick up on these regularities.
it’s often simpler for models to exploit these regularities than to solve the actual task. This shortcut undermines the purpose of training these models to understand and process natural language accurately.

Question 11

Q

model interpretability

Answer

A

structural analysis
challenge datasets

Question 12

Q

structural analysis of a network

Answer

A

what are the main components in a network
–> input layer, hidden layers, output layer
diagnostic classification or probing: test if weights carry information

analyze representations of a neural network

train a classifier to predict specific information from the representation

Question 13

Q

diagnostic classification/probing

Answer

A

select a linguistic property/feature (e.g., POS, vector)
extract the weights from a specific layer of the neural network.
To test whether these weights carry useful information, we train a classifier using these weights as inputs and the labels of the linguistic property (e.g., noun, verb, etc.) as outputs.
evaluate its performance.
–>If the classifier performs well, it suggests that the weights from that layer encapsulate significant information about the selected linguistic property.

Question 14

Q

amnesic probing

Answer

A

use a diagnostic classifier to identify information
remove the information
compare model performance on a task before and after removal

–> e.g., remove syntactic information and check if the model can still perform on the language modeling task (masked word prediction)

–> addresses limitations

Question 15

Q

challenge sets

Answer

A

gaining insights into the internal workings of a neural network

select carefully chosen input
observe output
gain insight about what happens in black box by systematically analyzing how different inputs affect the outputs

Question 16

Q

standard test sets vs challenge sets

Answer

Study These Flashcards

A

standard test sets
- taken from corpora
- same distribution as training set

challenge sets
- target specific linguistic phenomena
- systematic
- can include negative examples

Question 17

Q

historical uses challenge sets

Answer

Study These Flashcards

A

grammars and parsing
machine translation

Question 18

Q

current uses of challenge sets

Answer

Study These Flashcards

A

MT: subject-verb agreement, noun phrase agreement, verb-particle constructions, polarity, transliteration
MT: morphological properties

Question 19

Q

structural analysis is for

Answer

Study These Flashcards

A

examining the internal representations of models, which can be achieved through diagnostic classification

Question 20

Q

challenge datasets are for

Answer

Study These Flashcards

A

the assessment of model behavior by testing specific aspects of their performance

Question 21

Q

model interpretability is

Answer

Study These Flashcards

A

a means to analyze what a model ‘knows’ about language

Question 22

Q

main problem

Answer

Study These Flashcards

A

out-of-distribution data: data that falls outside the range of what the model was trained on and can occur in almost any application

lecture 11 Flashcards

(22 cards)