Deep Learning Fundamentals - Hugging Face Flashcards

1
Q

What is Hugging Face

A

Hugging Face is a leading technology company in natural language processing (NLP) and machine learning, known for its open-source library, Transformers, which provides access to state-of-the-art NLP models and tools.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Tokenization? And Tokens?

A

It’s like cutting a sentence into individual pieces, such as words or characters, to make it easier to analyze.

Tokens are the pieces you get after cutting up text during tokenization. Can be words, parts of words, or even single letters.

These tokens are converted to numerical values for models to understand.

Code Example:
~~~
from transformers import BertTokenizer

Initialize the tokenizer
tokenizer = BertTokenizer.from_pretrained(‘bert-base-uncased’)

See how many tokens are in the vocabulary
tokenizer.vocab_size
# 30522

Tokenize the sentence
tokens = tokenizer.tokenize(“I heart Generative AI”)

Print the tokens
print(tokens)
# [‘i’, ‘heart’, ‘genera’, ‘##tive’, ‘ai’]

Show the token ids assigned to each token
print(tokenizer.convert_tokens_to_ids(tokens))
# [1045, 2540, 11416, 6024, 9932]

~~~

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why Hugging Face models are so good? What means no_grad method?

A

Hugging Face models provide a quick way to get started using models trained by the community. With only a few lines of code, you can load a pre-trained model and start using it on tasks such as sentiment analysis.

no_grad means that the models is being used only for rpediction, not for training.

Code Example
from transformers import BertForSequenceClassification, BertTokenizer

Load a pre-trained sentiment analysis model
model_name = “textattack/bert-base-uncased-imdb”
model = BertForSequenceClassification.from_pretrained(model_name, num_labels=2)

Tokenize the input sequence
tokenizer = BertTokenizer.from_pretrained(model_name)
inputs = tokenizer(“I love Generative AI”, return_tensors=”pt”)

Make prediction
with torch.no_grad():
outputs = model(**inputs).logits
probabilities = torch.nn.functional.softmax(outputs, dim=1)
predicted_class = torch.argmax(probabilities)

Display sentiment result
if predicted_class == 1:
print(f”Sentiment: Positive ({probabilities[0][1] * 100:.2f}%)”)
else:
print(f”Sentiment: Negative ({probabilities[0][0] * 100:.2f}%)”)
# Sentiment: Positive (88.68%)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why HuggingFace Datasets is so good?

A

HuggingFace Datasets library is a powerful tool for managing a variety of data types, like text and images, efficiently and easily. This resource is incredibly fast and doesn’t use a lot of computer memory, making it great for handling big projects without any hassle.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Hugging Face trainers?

A

Hugging Face trainers offer a simplified approach to training generative AI models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Truncating and Padding? Why do we use padding in machine learning models?

A

Truncating: This refers to shortening longer pieces of text to fit a certain size limit.

Padding: Adding extra data to shorter texts to reach a uniform length for processing.

To ensure that all input data has the same length

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly