Machine Learning Flashcards

Question 1

Q

Which of the following are the advantages of transformers over a recurrent sequence model?
a) better at learning long-range dependencies
b) Slower to train and run-on modern hardware
c) require many fewer parameters to achieve similar results
d) none of the above

Answer

A

a) better at learning long-range dependencies

Question 2

Q

Which of these parts of the self-attention operation are calculated by passing inputs through MLP?
a) values
b) keys
c) queries
d) all the above

Answer

A

d) all the above

Question 3

Q

What is the field of natural language processing (NLP)?
a) computer science
b) artificial intelligence
c) linguistics
d) all of the mentioned

Answer

A

d) all of the mentioned

Question 4

Q

What is the main challenge/s of NLP?
a) handling ambiguity of sentences
b) handling tokenization
c) handling pos-tagging
d) All of the mentioned

Answer

A

a) handling ambiguity of sentences

Question 5

Q

What is machine translation?
a) Converts one human language to another
b) Converts human language to machine language
c) Converts any human language to English
d) Converts machine language to human language

Answer

A

a) Converts one human language to another

Question 6

Q

choose from the following areas where NLP can be useful.
a) automatic text summarization
b) automatic question-answering systems
c) information retrieval
d) all the mentioned

Answer

A

d) all the mentioned

Question 7

Q

Which of the following properties will a good position encoding ideally have?
a) unique for all positions
b) relative distances are independent of absolute sequence position
c) well-defined for arbitrary sequence lengths
d) all the above

Answer

A

d) all the above

Question 8

Q

Which of the following includes the major tasks of NLP?
a) automatic summarization
b) discourse analysis
c) machine translation
d) all the mentioned

Answer

A

d) all the mentioned

Question 9

Q

Neural machine translation was based on encoder-decoder _____
a) RNNs
b) LSTMs
c) both a & b
d) neither a & b

Answer

A

c) both a & b

Question 10

Q

The encoder LSTM is used to process the _____ sentence.
a) input
b) output
c) function
d) All the above

Question 11

Q

What is the type of autoencoder?
a) Supervised neural network
b) unsupervised neural network
c) semi-supervised neural network
d) reinforcement neural network

Answer

A

b) unsupervised neural network

Question 12

Q

What type of data can the autoencoder apply dimensionality reduction on?
a) linear data
b) nonlinear data
c) both a & b
d) none of the above

Answer

A

c) both a & b

Question 13

Q

A module that compresses data into an encoded representation that is typically several orders of magnitude smaller than the input data.
a) The encoder
b) Bottleneck
c) The decoder
d) None of the above

Answer

A

a) The encoder

Question 14

Q

a module that contains the compressed knowledge representation and considers the most important part of the autoencoder network?
a) the encoder
b) bottleneck
c) the decoder
d) None of the above

Answer

A

b) bottleneck

Question 15

Q

A module that helps the network “decompress” the knowledge representations and reconstructs the data back from its encoded form.
a) input layer
b) bottleneck
c) output layer
d) none of the above

Answer

A

c) output layer

Question 16

Q

What type of autoencoders work by penalizing the activation of some
neurons in hidden layers?
a) Sparse autoencoder
b) Variational autoencoder
c) Deep autoencoder
d) Convolution autoencoders

Answer

A

a) Sparse autoencoder

Question 17

Q

Which of the following is done by a deep autoencoder?
a) image reconstruction
b) image colorization
c) image search
d) image denoising

Answer

A

c) image search

Question 18

Q

Which of the following is done by a convolution autoencoder?
a) data compression
b) image search
c) information retrieval
d) image colorization

Answer

A

d) image colorization

Question 19

Q

Which of the following is an autoencoder application?
a) watermark removing
b) dimensionality reduction
c) image generation
d) all the above

Answer

A

d) all the above

Question 20

Q

Which autoencoder doesn’t require reducing the bottleneck nodes?
a) sparse autoencoder
b) deep autoencoder
c) variational autoencoder
d) None of the above

Answer

A

a) sparse autoencoder

Question 21

Q

in NLP, bidirectional context is supported by which of the following embedding
a) WORD2VEC
b) BERT
c) GLOVE
d) All the above

Question 22

Q

For a given token, its input representation is the sum of embedding from the token, segment, and position
a) ELMO
b) GPT
c) BERT
d) none of the above

Question 23

Q

BERT Base Contains _____ encoder layers
a) 12
b) 24
c) 36
d) 48

Question 24

Q

BERT large Contains _____ encoder layers
a) 12
b) 24
c) 36
d) 48

Question 25

Q

BERT aims at tackling various NLP tasks such as _____
a) question answering
b) language inference
c) text summarization
d) all of the mentioned

Answer

A

d) all of the mentioned

Question 26

Q

The BERT model is pre-trained on relatively generic tasks
a) masked language modeling (MLM)
b) next sentence prediction
c) a and b
d) none of the mentioned

Answer

A

c) a and b

Question 27

Q

_______ Is to hide a word in a sentence and then have the program predict what
word has been hidden (masked) based on the hidden word’s context.
a) Masked language modeling (MLM)
b) Next sentence prediction
c) Sequence classification
d) Named entity recognition (NER)

Answer

A

a) Masked language modeling (MLM)

Question 28

Q

_______ is to have the program predict whether two given sentences have a
logical, sequential connection or whether their relationship is simply random
a) Masked language modeling (MLM)
b) Next sentence prediction
c) Sequence classification
d) Named entity recognition (NER)

Answer

A

b) Next sentence prediction

Question 29

Q

BERT Can process text ______
a) left-to-right
b) right-to-left
c) both
d) none of the mention

Question 30

Q

BERT was created and published in 2018 By ______
a) Amazon
b) Microsoft
c) IBM
d) Google

Answer

A

d) Google

Question 31

Q

What is the difference between CNN and ANN?
a) CNN has one or more layers of convolution units, which receive its input from multiple units.
b) CNN uses a simpler algorithm than ann.
c) They complete each other, so to use ANN, you need to start with CNN.
d) CNN is the easiest way to use neural networks.

Answer

A

a) CNN has one or more layers of convolution units, which receive its input from multiple units.

Question 32

Q

The data fed into the model and output from each layer is obtained. this step is called.
a) Feed forward
b) Feed backward
c) Input layer
d) Output layer

Answer

A

a) Feed forward

Question 33

Q

Common types of pooling layers.
a) 5
b) 2
c) 3
d) 4

Question 34

Q

computes the output volume by the computing dot product between all filters and image patches.
a) Input layer
b) Convolution layer
c) Activation function layer
d) Pool layer

Answer

A

b) Convolution layer

Question 35

Q

What is back propagation?
a) it is another name given to the curvy function in the perceptron
b) it is the transmission of error back through the network to adjust the inputs
c) it is the transmission of error back through the network to allow weights to be adjusted so that the network can learn
d) all of the mentioned

Answer

A

c) it is the transmission of error back through the network to allow weights to be adjusted so that the network can learn

Question 36

Q

Which of the following functions can be used as an activation function in the output layer if we wish to predict the probabilities of n classes (p1, p2…pk) such that the sum of p over all n equals 1?
a) RELU
b) Sigmoid
c) Softmax
d) Tanh

Answer

A

c) Softmax

Question 37

Q

Which of the following would have a constant input in each epoch of training a deep learning model?
a) Weight between input and hidden layer
b) Weight between hidden and output layer
c) Biases of all hidden layer neurons
d) Activation function of output layer

Answer

A

a) Weight between input and hidden layer

Question 38

Q

Which of the following neural network training challenges can be solved using
batch normalization?
a) overfitting
b) underfitting
c) training is too slow
d) none of the mentioned

Answer

A

c) training is too slow

Question 39

Q

The number of nodes in the input layer is 10 and the hidden layer is 5. the maximum number of connections from the input layer to the hidden layer are?
a) 50
b) Less than 50
c) More than 50
d) None of the mentioned

Question 40

Q

Is Deep Learning a specialized subset of machine learning?
a) true
b) false

Question 41

Q

_____ are models, used to generate data similar to the data on which they are trained, by destroying training data through the successive addition of gaussian noise, and then learning to recover the data by reversing this noising process.
a) Federal learning.
b) Attention learning.
c) CNN.
d) Diffusion models.

Answer

A

d) Diffusion models.

Question 42

Q

What is the goal of training a diffusion model?
a) Learn the reverse process
b) Learn to understand the image
c) Extract the image features
d) Classify the images

Answer

A

a) Learn the reverse process

Question 43

Q

one of the benefits of the diffusion model is _____
a) scalability
b) not requiring adversarial training.
c) parallelizability
d) all of the above.

Answer

A

d) all of the above.

Question 44

Q

in general diffusion model consist of _____ main process
a) 5
b)4
c) 3
d)2

Question 45

Q

A diffusion model is trained by finding the reverse Markov transitions that the likelihood of the training data.
a) Maximize
b) Minimize.
c) Increase.
d) Decrease.

Answer

A

a) Maximize

Question 46

Q

for the reverse process in the diffusion model, we much choose the _____
a) the Sobel filter
b) Laplacian operator
c)thresholding method
d)the gaussian distribution parameterization

Answer

A

d)the gaussian distribution parameterization

Question 47

Q

the transition distributions in the Markov chain are gaussian, where the forward process requires a ______, and the reverse process parameters are learned.
a) variance schedule
b) Laplacian operator.
c)the gaussian distribution parameterization.
d)none of the mentioned.

Answer

A

d)none of the mentioned.

Question 48

Q

our diffusion model is parameterized as a Markov chain, meaning that our latent variables depend only on the _____ timestep
a) previous or following
b) previous
c) following
d) none of the mentioned

Answer

A

a) previous or following

Question 49

Q

a _____ is used to obtain log-likelihoods across pixel values as the last step in the reverse diffusion process.
a) kl divergences
b) simplified training objective
c) u-net-like.
d) discrete decoder.

Answer

A

d) discrete decoder.

Question 50

Q

diffusion models can be applied to
a) image denoising
b) super-resolution.
c) image generation.
d) all of the above.

Answer

A

d) all of the above.

Question 51

Q

What is the main goal of federated learning?
a) to train a single machine learning model on a centralized dataset
b) to train multiple machine learning models on decentralized datasets
c) to train a single machine learning model on decentralized datasets
d) to train multiple machine learning models on a centralized dataset

Answer

A

c) to train a single machine learning model on decentralized datasets

Question 52

Q

How does federated learning differ from traditional machine learning?
a) federated learning requires less data
b) federated learning requires more computational resources
c) federated learning requires less communication bandwidth
d) federated learning requires more data privacy concerns

Answer

A

d) federated learning requires more data privacy concerns

Question 53

Q

What is an advantage of federated learning compared to traditional centralized training?
a) it is more accurate
b) it is faster
c) it requires less data
d) it allows for decentralized data to be used

Answer

A

d) it allows for decentralized data to be used

Question 54

Q

How is data privacy protected in federated learning?
a) data is encrypted before being sent to the centralized server
b) data is never shared with any other parties
c) data remains on the individual devices and is only used for model training
d) data is aggregated and anonym zed before being used for model training

Answer

A

c) data remains on the individual devices and is only used for model training

Question 55

Q

In federated learning, who is responsible for training the model?
a) a centralized server
b) a third-party organization
c) individual clients
d) the data owner

Answer

A

c) individual clients

Question 56

Q

key benefits of federated learning…….
a) it involves more diverse data.
b) it’s secure.
c) it yields real-time predictions.
d) all of the above

Answer

A

d) all of the above

Question 57

Q

What are the challenges of federated learning?
a) efficient communication across the federated network.
b) managing heterogeneous systems in the same networks.
c) privacy concerns and privacy-preserving methods.
d) all of the above

Answer

A

d) all of the above

Question 58

Q

How does federated learning work?
a) Transfer of weights and biases to cloud server
b) Transfer of data to cloud server
c) Transfer of model to cloud server
d) Transfer of user info to cloud

Answer

A

a) Transfer of weights and biases to cloud server

Question 59

Q

Is federated learning more efficient than standard ml techniques for a large number of devices?
a) True
b) False
c) Depends on use case
d) Cannot say

Question 60

Q

federated learning is ______
a) Supervised
b) Unsupervised
c) Reinforcement learning.
d) None of the above

Answer

A

b) Unsupervised

Question 61

Q

What is the basic concept of recurrent neural network?
a) use a loop between inputs and outputs in order to achieve the better prediction.
b) use recurrent features from dataset to find the best answers.
c) use previous inputs to find the next output according to the training set.
d) use loops between the most important features to predict next output.

Answer

A

c) use previous inputs to find the next output according to the training set.

Question 62

Q

The other RNN´s issue is called ‘vanishing gradients’. what is that?
a) when the values of a gradient are too small and the model joins in a loop because of that.
b) when the values of a gradient are too big and the model stops learning or takes way too long because of that.
c) when the values of a gradient are too small and the model stops learning or takes way too long because of that.
d) when the values of a gradient are too big and the model joins in a loop because of that.

Answer

A

c) when the values of a gradient are too small and the model stops learning or takes way too long because of that.

Question 63

Q

LSTM. What is that?
a) LSTM networks are an extension for recurrent neural networks, which basically extends their memory. therefore, it is well suited to learn from important experiences that have very low time lags in between
b) LSTM networks are an extension for recurrent neural networks, which basically extends their memory. therefore, it is not recommended to use it, unless you are using a small dataset.
c) LSTM networks are an extension for recurrent neural networks, which basically extends their memory. therefore, it is well suited to learn from important experiences that have long-time lags in between
d) LSTM networks are an extension for recurrent neural networks, which basically shorten their memory. therefore, it is well suited to learn from important experiences that have very low time lags in between

Answer

A

c) LSTM networks are an extension for recurrent neural networks, which basically extends their memory. therefore, it is well suited to learn from important experiences that have long-time lags in between

Question 64

Q

The network that involves backward links from output to the input and hidden layers is called _________
a) self-organizing maps
b) perceptron
c) recurrent neural network
d) multi layered perceptron

Answer

A

c) recurrent neural network

Answer 54

A

a) Recurrent neural networks

Answer 55

A

a) Sigmoid

Answer 56

A

c) sequential data

Answer 57

A

b) Input gate

Answer 58

A

d) all of the above

Answer 59

A

d) none of the mentioned

Answer 60

A

d) all of the mentioned

Answer 61

A

b) Simulated annealing

Answer 62

A

d) pattern association

Answer 63

A

a) Directly

Answer 64

A

b) It gets speeded up

Answer 65

A

d) none of the mentioned

Answer 66

A

d) hidden and visible layers

Answer 67

A

b) hidden state

Answer 68

A

d) both a & b

Answer 69

A

b) Vanishing gradients

Answer 70

A

d) all of the mentioned

Answer 71

A

a) Exploding gradients

Answer 72

A

c) self-organization

Answer 73

A

d) all of the mentioned

Answer 74

A

b) related to storage & recall task

Answer 75

A

c) neither feature & nor number of groups is known

Answer 76

A

a) Auto-regressive

Answer 77

A

a) mechanism called____

Answer 78

A

c) positional encoding

Answer 79

A

d) segment recurrence

Answer 80

A

d) all of the above

Answer 81

A

a) Bidirectional