9 Flashcards
- Which statement is not correct?
a. BERT, GPT, and Llama are all large language models.
b. ChatGPT uses an encoder-based architecture.
c. LSTMs are a special kind of RNN
d. Bidirectional LSTMs read a sentence from left to right and from right to left, in that way, it is already capable of understanding words in its context to some extent.
b. ChatGPT uses an encoder-based architecture.
- Which sentence is correct?
a. Stemming will retain the stem of a word and make sure that the remaining word is an existing one.
b. (De)capitalization, emoji removal, and transformers are preprocessing methods.
c. Bag of Words is an advanced approach to featurize documents which is not straightforward and very inexpensive when there is a large vocabulary.
d. TF-IDF stands for Term Frequency Inverse Document Frequency and this technique takes into account the word occurrences as well as the number of times the words occur in different documents and normalizes for the latter.
d. TF-IDF stands for Term Frequency Inverse Document Frequency and this technique takes into account the word occurrences as well as the number of times the words occur in different documents and normalizes for the latter.
- Which statement about BERT is not correct:
a. BERT is pretrained using two training steps: masked language modeling and next sentence prediction.
b. Masked language modeling means that randomly words are masked out and the model has to guess what word should be at the masked place.
c. BERT is not able to understand words in its context.
d. BERT is pretrained in an unsupervised manner on enormous amounts of data.
c. BERT is not able to understand words in its context.
- RNN stands for:
a. Recurrent Natural Network
b. Retrainable Neural Network
c. Recurrent Neural Network
d. Random Natural Network
c. Recurrent Neural Network
- Which statement is correct?
a. A neural network typically only consists of an input and output layer.
b. CamemBERT is a French version of BERT specialized in classifying text about types of cheese.
c. RoBERTa is an encoder-based model.
d. BERT is pretrained on labeled data using 2 pretraining steps.
c. RoBERTa is an encoder-based model.
- Natural Language processing can be defined as follows:
a. Natural language processing is a collection of computational techniques for automatic analysis and representation of human languages.
b. Natural language processing is the processing of sounds produced in human language in order to identify and respond to those sounds.
c. Natural Language Processing is the field of studies focused solely on the creation of of chatbots like ChatGPT.
d. Natural Language Processing is the field where units of textual information are found such as documents by matching a user’s search terms.
a. Natural language processing is a collection of computational techniques for automatic analysis and representation of human languages.
- What sentence is correct?
a. One hot encodings are an example of sparse distributed representations
b. Bag of words only uses 1-grams to featurize documents.
c. Stopword removal is a preprocessing method that removes unexisting words from sentences.
d. Using stemming, the word ‘stopping’ would become ‘stop’.
a. One hot encodings are an example of sparse distributed representations
- Which sentence is not correct?
a. If the vocabulary is (train, car, light, candle), possible one-hot encodings are candle=[1 0 0 0], light=[0 1 0 0], car=[0 0 1 0], train=[0 0 0 1].
b. Dense distributed representations typically have a smaller size of the vocabulary than the dimension of the representation.
c. Continuous Bag of Words and Skip-gram are two possible architectures for creating Word2Vec word embeddings.
d. Pretrained word embeddings are vector representations of words that also include syntactic and semantic word relationships.
b. Dense distributed representations typically have a smaller size of the vocabulary than the dimension of the representation.
- Which sentence is correct?
a. The continuous bag of words architecture predicts the context based on a word.
b. Simple methods like logistic regression with TF-IDF are not useful anymore, now large language models exist.
c. Changing the word ‘changing’ to ‘chang’ is an example of lemmatization.
d. Pretrained word embeddings trained with the Word2Vec architecture can make analogies such as vec(paris)-vec(France)+vec(Belgium)=vec(Brussels)
d. Pretrained word embeddings trained with the Word2Vec architecture can make analogies such as vec(paris)-vec(France)+vec(Belgium)=vec(Brussels)
- Which statement is not correct?
a. GloVe captures global corpus statistics, and therefore addresses Word2Vec’s disadvantage of only taking into account local context.
b. FastText can make word representations for unknown words because the word representations are based on the full word.
c. Word2Vec can only make representations for known words that are in the vocabulary, FastText addresses this disadvantage.
d. Bag of Words, TF-IDF, and pretrained word embeddings are used to encode words in sentences.
b. FastText can make word representations for unknown words because the word representations are based on the full word.
- Which of the following methods captures the most context of the sentence?
a. Logistic regression + TF-IDF
b. Recurrent neural network
c. Bidirectional LSTM
d. RoBERTa
d. RoBERTa
- Which statement about Bidirectional LSTMs is correct?
a. A Bidirectional LSTM is a type of Recurrent Neural Network
b. Bidirectional LSTMs look at words on an individual level, not looking at the sentence.
c. Bidirectional LSTMs only read sentences from left to right
d. A Bidirectional LSTM consists of one LSTM layer.
a. A Bidirectional LSTM is a type of Recurrent Neural Network
- Which statement is not correct?
a. Transformer architectures can be used for chatbots, text classification, and machine translation.
b. Positional encodings of words are considered in a transformer architecture.
c. Attention mechanisms are located in the output layer of a transformer.
d. During training, the ground truth is used as input for the decoder part of the transformer.
c. Attention mechanisms are located in the output layer of a transformer.
- Which sentence is correct?
a. In the pretraining step next sentence prediction, the next sentence is generated based on the previous one.
b. Finetuning BERT happens in an unsupervised manner using task-specific data.
c. RoBERTa differs from BERT as RoBERTa is decoder-based and BERT encoder-based.
d. RobBERT is a dutch variant of the BERT model.
d. RobBERT is a dutch variant of the BERT model.
- Which statement is correct?
a. GPT is based on the ChatGPT architecture.
b. Jailbreaking is circumventing limits that were placed on the model.
c. BERT is an autoregressive model.
d. Prompt engineering is the field of studies that learns to simulate conversations with users.
b. Jailbreaking is circumventing limits that were placed on the model.