Hugging Face ecosystem | HF NLP course | 7. Main NLP Tasks | Priority Flashcards

Question 1

Q

[q] How would you inspect the class names of a token classification dataset?

Answer

A

label_names = raw_datasets[“train”].features[“ner_tags”].feature.names

Source

Question 2

Q

[q] How to map each token to its corresponding word?

Answer

A

inputs.word_ids()

Source

Question 3

Q

[q] Basics (3 steps) of how texts need to be converted to token IDs before the model can make sense of them.

Answer

A

“– Apply a function to tokenize and align labels for each split of the dataset with map().
– Write a function to combine tokenization and aligning labels to tokens for the examples from one split.
– Write a function to align labels with tokens for one example.”

Source

Question 4

Q

[q] How to pad the labels the exact same way as the inputs so that they stay the same size.

Answer

A

“from transformers import DataCollatorForTokenClassification
data_collator = DataCollatorForTokenClassification(tokenizer=tokenizer)”

Source

Question 5

Q

[q] How is a metric loaded for token classification?

Answer

A

“!pip install seqeval
import evaluate
metric = evaluate.load(““seqeval””)”

Source

Question 6

Q

[q] What are the basic steps in a compute_metrics() function that takes the arrays of predictions and labels, and returns a dictionary with the metric names and values?

Answer

A

”* Take the argmax of logits to get predictions.
* Convert integer indices to labels, ignoring special tokens.
* Call metric.compute() on the predictions and labels.
import numpy as np
def compute_metrics(eval_preds):
logits, labels = eval_preds
predictions = np.argmax(logits, axis=-1)

#Remove ignored index (special tokens) and convert to labels
true_labels = [[label_names[l] for l in label if l != -100] for label in labels]
true_predictions = [
    [label_names[p] for (p, l) in zip(prediction, label) if l != -100]
    for prediction, label in zip(predictions, labels)
]
all_metrics = metric.compute(predictions=true_predictions, references=true_labels)
return {
    ""precision"": all_metrics[""overall_precision""],
    ""recall"": all_metrics[""overall_recall""],
    ""f1"": all_metrics[""overall_f1""],
    ""accuracy"": all_metrics[""overall_accuracy""],
}"

Source

Question 7

Q

[q] How do you set up an Accelerator with a model to train?

Answer

A

“from accelerate import Accelerator
accelerator = Accelerator()
model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare(
model, optimizer, train_dataloader, eval_dataloader
)”

Source

Question 8

Q

[q] What does a postprocess() function need to do during a token classification model’s training?

Answer

A

takes predictions and labels and converts them to lists of strings, like our metric object expects.

Source

Hugging Face ecosystem | HF NLP course | 7. Main NLP Tasks | Priority Flashcards

(8 cards)