Machine Learning Flashcards

1
Q

Which of the following are the advantages of transformers over a recurrent sequence model?
a) better at learning long-range dependencies
b) Slower to train and run-on modern hardware
c) require many fewer parameters to achieve similar results
d) none of the above

A

a) better at learning long-range dependencies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which of these parts of the self-attention operation are calculated by passing inputs through MLP?
a) values
b) keys
c) queries
d) all the above

A

d) all the above

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the field of natural language processing (NLP)?
a) computer science
b) artificial intelligence
c) linguistics
d) all of the mentioned

A

d) all of the mentioned

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the main challenge/s of NLP?
a) handling ambiguity of sentences
b) handling tokenization
c) handling pos-tagging
d) All of the mentioned

A

a) handling ambiguity of sentences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is machine translation?
a) Converts one human language to another
b) Converts human language to machine language
c) Converts any human language to English
d) Converts machine language to human language

A

a) Converts one human language to another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

choose from the following areas where NLP can be useful.
a) automatic text summarization
b) automatic question-answering systems
c) information retrieval
d) all the mentioned

A

d) all the mentioned

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which of the following properties will a good position encoding ideally have?
a) unique for all positions
b) relative distances are independent of absolute sequence position
c) well-defined for arbitrary sequence lengths
d) all the above

A

d) all the above

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which of the following includes the major tasks of NLP?
a) automatic summarization
b) discourse analysis
c) machine translation
d) all the mentioned

A

d) all the mentioned

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Neural machine translation was based on encoder-decoder _____
a) RNNs
b) LSTMs
c) both a & b
d) neither a & b

A

c) both a & b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The encoder LSTM is used to process the _____ sentence.
a) input
b) output
c) function
d) All the above

A

a) input

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the type of autoencoder?
a) Supervised neural network
b) unsupervised neural network
c) semi-supervised neural network
d) reinforcement neural network

A

b) unsupervised neural network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What type of data can the autoencoder apply dimensionality reduction on?
a) linear data
b) nonlinear data
c) both a & b
d) none of the above

A

c) both a & b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

A module that compresses data into an encoded representation that is typically several orders of magnitude smaller than the input data.
a) The encoder
b) Bottleneck
c) The decoder
d) None of the above

A

a) The encoder

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

a module that contains the compressed knowledge representation and considers the most important part of the autoencoder network?
a) the encoder
b) bottleneck
c) the decoder
d) None of the above

A

b) bottleneck

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

A module that helps the network “decompress” the knowledge representations and reconstructs the data back from its encoded form.
a) input layer
b) bottleneck
c) output layer
d) none of the above

A

c) output layer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What type of autoencoders work by penalizing the activation of some
neurons in hidden layers?
a) Sparse autoencoder
b) Variational autoencoder
c) Deep autoencoder
d) Convolution autoencoders

A

a) Sparse autoencoder

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Which of the following is done by a deep autoencoder?
a) image reconstruction
b) image colorization
c) image search
d) image denoising

A

c) image search

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Which of the following is done by a convolution autoencoder?
a) data compression
b) image search
c) information retrieval
d) image colorization

A

d) image colorization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Which of the following is an autoencoder application?
a) watermark removing
b) dimensionality reduction
c) image generation
d) all the above

A

d) all the above

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Which autoencoder doesn’t require reducing the bottleneck nodes?
a) sparse autoencoder
b) deep autoencoder
c) variational autoencoder
d) None of the above

A

a) sparse autoencoder

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

in NLP, bidirectional context is supported by which of the following embedding
a) WORD2VEC
b) BERT
c) GLOVE
d) All the above

A

b) BERT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

For a given token, its input representation is the sum of embedding from the token, segment, and position
a) ELMO
b) GPT
c) BERT
d) none of the above

A

c) BERT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

BERT Base Contains _____ encoder layers
a) 12
b) 24
c) 36
d) 48

A

a) 12

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

BERT large Contains _____ encoder layers
a) 12
b) 24
c) 36
d) 48

A

b) 24

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

BERT aims at tackling various NLP tasks such as _____
a) question answering
b) language inference
c) text summarization
d) all of the mentioned

A

d) all of the mentioned

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

The BERT model is pre-trained on relatively generic tasks
a) masked language modeling (MLM)
b) next sentence prediction
c) a and b
d) none of the mentioned

A

c) a and b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

_______ Is to hide a word in a sentence and then have the program predict what
word has been hidden (masked) based on the hidden word’s context.
a) Masked language modeling (MLM)
b) Next sentence prediction
c) Sequence classification
d) Named entity recognition (NER)

A

a) Masked language modeling (MLM)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

_______ is to have the program predict whether two given sentences have a
logical, sequential connection or whether their relationship is simply random
a) Masked language modeling (MLM)
b) Next sentence prediction
c) Sequence classification
d) Named entity recognition (NER)

A

b) Next sentence prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

BERT Can process text ______
a) left-to-right
b) right-to-left
c) both
d) none of the mention

A

c) both

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

BERT was created and published in 2018 By ______
a) Amazon
b) Microsoft
c) IBM
d) Google

A

d) Google

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is the difference between CNN and ANN?
a) CNN has one or more layers of convolution units, which receive its input from multiple units.
b) CNN uses a simpler algorithm than ann.
c) They complete each other, so to use ANN, you need to start with CNN.
d) CNN is the easiest way to use neural networks.

A

a) CNN has one or more layers of convolution units, which receive its input from multiple units.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

The data fed into the model and output from each layer is obtained. this step is called.
a) Feed forward
b) Feed backward
c) Input layer
d) Output layer

A

a) Feed forward

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Common types of pooling layers.
a) 5
b) 2
c) 3
d) 4

A

b) 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

computes the output volume by the computing dot product between all filters and image patches.
a) Input layer
b) Convolution layer
c) Activation function layer
d) Pool layer

A

b) Convolution layer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is back propagation?
a) it is another name given to the curvy function in the perceptron
b) it is the transmission of error back through the network to adjust the inputs
c) it is the transmission of error back through the network to allow weights to be adjusted so that the network can learn
d) all of the mentioned

A

c) it is the transmission of error back through the network to allow weights to be adjusted so that the network can learn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Which of the following functions can be used as an activation function in the output layer if we wish to predict the probabilities of n classes (p1, p2…pk) such that the sum of p over all n equals 1?
a) RELU
b) Sigmoid
c) Softmax
d) Tanh

A

c) Softmax

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Which of the following would have a constant input in each epoch of training a deep learning model?
a) Weight between input and hidden layer
b) Weight between hidden and output layer
c) Biases of all hidden layer neurons
d) Activation function of output layer

A

a) Weight between input and hidden layer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Which of the following neural network training challenges can be solved using
batch normalization?
a) overfitting
b) underfitting
c) training is too slow
d) none of the mentioned

A

c) training is too slow

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

The number of nodes in the input layer is 10 and the hidden layer is 5. the maximum number of connections from the input layer to the hidden layer are?
a) 50
b) Less than 50
c) More than 50
d) None of the mentioned

A

a) 50

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Is Deep Learning a specialized subset of machine learning?
a) true
b) false

A

a) true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

_____ are models, used to generate data similar to the data on which they are trained, by destroying training data through the successive addition of gaussian noise, and then learning to recover the data by reversing this noising process.
a) Federal learning.
b) Attention learning.
c) CNN.
d) Diffusion models.

A

d) Diffusion models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What is the goal of training a diffusion model?
a) Learn the reverse process
b) Learn to understand the image
c) Extract the image features
d) Classify the images

A

a) Learn the reverse process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

one of the benefits of the diffusion model is _____
a) scalability
b) not requiring adversarial training.
c) parallelizability
d) all of the above.

A

d) all of the above.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

in general diffusion model consist of _____ main process
a) 5
b)4
c) 3
d)2

A

d)2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

A diffusion model is trained by finding the reverse Markov transitions that the likelihood of the training data.
a) Maximize
b) Minimize.
c) Increase.
d) Decrease.

A

a) Maximize

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

for the reverse process in the diffusion model, we much choose the _____
a) the Sobel filter
b) Laplacian operator
c)thresholding method
d)the gaussian distribution parameterization

A

d)the gaussian distribution parameterization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

the transition distributions in the Markov chain are gaussian, where the forward process requires a ______, and the reverse process parameters are learned.
a) variance schedule
b) Laplacian operator.
c)the gaussian distribution parameterization.
d)none of the mentioned.

A

d)none of the mentioned.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

our diffusion model is parameterized as a Markov chain, meaning that our latent variables depend only on the _____ timestep
a) previous or following
b) previous
c) following
d) none of the mentioned

A

a) previous or following

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

a _____ is used to obtain log-likelihoods across pixel values as the last step in the reverse diffusion process.
a) kl divergences
b) simplified training objective
c) u-net-like.
d) discrete decoder.

A

d) discrete decoder.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

diffusion models can be applied to
a) image denoising
b) super-resolution.
c) image generation.
d) all of the above.

A

d) all of the above.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

What is the main goal of federated learning?
a) to train a single machine learning model on a centralized dataset
b) to train multiple machine learning models on decentralized datasets
c) to train a single machine learning model on decentralized datasets
d) to train multiple machine learning models on a centralized dataset

A

c) to train a single machine learning model on decentralized datasets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

How does federated learning differ from traditional machine learning?
a) federated learning requires less data
b) federated learning requires more computational resources
c) federated learning requires less communication bandwidth
d) federated learning requires more data privacy concerns

A

d) federated learning requires more data privacy concerns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

What is an advantage of federated learning compared to traditional centralized training?
a) it is more accurate
b) it is faster
c) it requires less data
d) it allows for decentralized data to be used

A

d) it allows for decentralized data to be used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

How is data privacy protected in federated learning?
a) data is encrypted before being sent to the centralized server
b) data is never shared with any other parties
c) data remains on the individual devices and is only used for model training
d) data is aggregated and anonym zed before being used for model training

A

c) data remains on the individual devices and is only used for model training

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

In federated learning, who is responsible for training the model?
a) a centralized server
b) a third-party organization
c) individual clients
d) the data owner

A

c) individual clients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

key benefits of federated learning…….
a) it involves more diverse data.
b) it’s secure.
c) it yields real-time predictions.
d) all of the above

A

d) all of the above

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

What are the challenges of federated learning?
a) efficient communication across the federated network.
b) managing heterogeneous systems in the same networks.
c) privacy concerns and privacy-preserving methods.
d) all of the above

A

d) all of the above

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

How does federated learning work?
a) Transfer of weights and biases to cloud server
b) Transfer of data to cloud server
c) Transfer of model to cloud server
d) Transfer of user info to cloud

A

a) Transfer of weights and biases to cloud server

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

Is federated learning more efficient than standard ml techniques for a large number of devices?
a) True
b) False
c) Depends on use case
d) Cannot say

A

a) True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

federated learning is ______
a) Supervised
b) Unsupervised
c) Reinforcement learning.
d) None of the above

A

b) Unsupervised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

What is the basic concept of recurrent neural network?
a) use a loop between inputs and outputs in order to achieve the better prediction.
b) use recurrent features from dataset to find the best answers.
c) use previous inputs to find the next output according to the training set.
d) use loops between the most important features to predict next output.

A

c) use previous inputs to find the next output according to the training set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

The other RNN´s issue is called ‘vanishing gradients’. what is that?
a) when the values of a gradient are too small and the model joins in a loop because of that.
b) when the values of a gradient are too big and the model stops learning or takes way too long because of that.
c) when the values of a gradient are too small and the model stops learning or takes way too long because of that.
d) when the values of a gradient are too big and the model joins in a loop because of that.

A

c) when the values of a gradient are too small and the model stops learning or takes way too long because of that.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

LSTM. What is that?
a) LSTM networks are an extension for recurrent neural networks, which basically extends their memory. therefore, it is well suited to learn from important experiences that have very low time lags in between
b) LSTM networks are an extension for recurrent neural networks, which basically extends their memory. therefore, it is not recommended to use it, unless you are using a small dataset.
c) LSTM networks are an extension for recurrent neural networks, which basically extends their memory. therefore, it is well suited to learn from important experiences that have long-time lags in between
d) LSTM networks are an extension for recurrent neural networks, which basically shorten their memory. therefore, it is well suited to learn from important experiences that have very low time lags in between

A

c) LSTM networks are an extension for recurrent neural networks, which basically extends their memory. therefore, it is well suited to learn from important experiences that have long-time lags in between

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

The network that involves backward links from output to the input and hidden layers is called _________
a) self-organizing maps
b) perceptron
c) recurrent neural network
d) multi layered perceptron

A

c) recurrent neural network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

RNNs Stands for?
a) Recurrent neural networks
b) Report neural networks
c) Receives neural networks
d) Recording neural networks

A

a) Recurrent neural networks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

What is the activation function used in forget gate?
a) Sigmoid
b) Tanh
c) RELU
d) None of the above

A

a) Sigmoid

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q

How Many Gates Are There In LSTM?
a) 3
b) 5
c) 4
d) 2

A

a) 3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

……, When the points in the dataset are dependent on the other points in the dataset.
a) continuous data
b) discrete data
c) sequential data
d) ordinal data

A

c) sequential data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
69
Q

……… helps to identify important elements that need to be added to the cell state.
a) Forget gate
b) Input gate
c) Output gate
d) None of the above

A

b) Input gate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
70
Q

LSTM used in ……
a) speech recognition
b) music composition
c) time series prediction
d) all of the above

A

d) all of the above

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
71
Q

what should be the aim of training procedure in Boltzmann machine of
feedback networks?
a) to capture inputs
b) to feedback the captured outputs
c) to capture the behavior of system
d) none of the mentioned

A

d) none of the mentioned

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
72
Q

What consist of Boltzmann machine?
a) fully connected network with both hidden and visible units
b) asynchronous operation
c) stochastic update
d) all of the mentioned

A

d) all of the mentioned

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
73
Q

by using which method, Boltzmann machine reduces the effect of additional stable states?
a) No such method exists
b) Simulated annealing
c) Hopfield reduction
d) None of the mentioned

A

b) Simulated annealing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
74
Q

for which another task can Boltzmann machine be used?
a) pattern mapping
b) feature mapping
c) classification
d) pattern association

A

d) pattern association

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
75
Q

Presence of false minima will have what effect on probability of error in recall?
a) Directly
b) Inversely
c) No effect
d) Directly or Inversely

A

a) Directly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
76
Q

What happens when we use mean field approximation with Boltzmann learning?
a) It slows down
b) It gets speeded up
c) Nothing happens
d) may speedup or speed down

A

b) It gets speeded up

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
77
Q

in Boltzmann learning which algorithm can be used to arrive at equilibrium?
a) Hopfield
b) mean field
c) Hebb
d) none of the mentioned

A

d) none of the mentioned

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
78
Q

All the visible layers in a restricted Boltzmann machine are not connected to
each other.
a) True
b) False

A

a) True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
79
Q

What are the two layers of a restricted Boltzmann machine called?
a) input and output layers
b) recurrent and convolution layers
c) activation and threshold layers
d) hidden and visible layers

A

d) hidden and visible layers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
80
Q

A deep belief network is a stack of restricted Boltzmann machines.
a) True
b) False

A

a) True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
81
Q

the main and most important feature of RNN is _________.
a) visible state
b) hidden state
c) present state
d) None of these

A

b) hidden state

81
Q

RNN remembers each and every information through________.
a) Work
b) Time
c) Hours
d) Memory

A

b) Time

82
Q

to create a numerical representation of our text-based dataset we generate two lookup table, what are they_____.
a) maps character to numbers
b) maps numbers back to characters
c) identify unique characters present in text
d) both a & b

A

d) both a & b

83
Q

_______occurs when the gradients become very small and tend towards zero.
a) Exploding gradients
b) Vanishing gradients
c) Long short-term memory networks
d) Gated recurrent unit networks.

A

b) Vanishing gradients

84
Q

on what parameters can change in weight vector depend?
a) learning parameters
b) input vector
c) learning signal
d) all of the mentioned

A

d) all of the mentioned

85
Q

________Occurs when the gradients become too large due to back-propagation.
a) Exploding gradients
b) Vanishing gradients
c) Long short-term memory networks
d) Gated recurrent unit networks

A

a) Exploding gradients

86
Q

If a competitive network can perform feature mapping, then what is that network can be called?
a) self-excitatory
b) self-inhibitory
c) self-organization
d) none of the mentioned

A

c) self-organization

87
Q

why do we need biological neural networks?
a) to solve tasks like machine vision & natural language processing
b) to apply heuristic search methods to find solutions of problem
c) to make smart human interactive & user-friendly system
d) all of the mentioned

A

d) all of the mentioned

88
Q

what is auto-association task in neural networks?
a) find relation between 2 consecutive inputs
b) related to storage & recall task
c) predicting the future inputs
d) None of the mentioned

A

b) related to storage & recall task

89
Q

What is unsupervised learning?
a) features of group explicitly stated
b) number of groups may be known
c) neither feature & nor number of groups is known
d) none of the mentioned

A

c) neither feature & nor number of groups is known

90
Q

XLNet Is an ________ language model which outputs the joint probability of a sequence of tokens based on the transformer architecture with recurrence.
a) Auto-regressive
b) Auto-Negressive
c) Objective
d) Bidirectional

A

a) Auto-regressive

91
Q

XLNet Is “Generalized” because it captures bi-directional context by means of
a) mechanism called____
a) PLM
b) BERT
c) TRANSFORMER-XL
d) MLM

A

a) mechanism called____

92
Q

______ Keep track of the position of each token in a sequence (will know why
we have this in the later sections)
a) pretrain-finetune discrepancy
b) transformer-xl
c) positional encoding
d) segment recurrence

A

c) positional encoding

93
Q

______ cache the hidden state of first segment in memory in each layer and update attention accordingly. it allows reuse of memory for each segment.
a) pretrain-finetune discrepancy
b) transformer-xl
c) positional encoding
d) segment recurrence

A

d) segment recurrence

94
Q

the attention weights determined by a simple feed forward neural network are____
a) query
b) keys
c) values
d) all of the above

A

d) all of the above

95
Q

_____ Traditional Methods predict the current token given previous “n” tokens, or predict the current token given all tokens after it.
a) Bidirectional
b) Masked language modeling
c) XLNet
d) BERT

A

a) Bidirectional

96
Q

______Is A Neural Network architecture that can model bidirectional contexts in text data using transformer.
a) BERT
b) XLNet
c) MLM
d) PLM

A

a) BERT

97
Q

A disadvantage of BERT is it corrupts the input with _______ and suffers from pretrain-finetune discrepancy.
a) Mask
b) PLM
c) MLM
d) All of above

A

a) Mask

98
Q

XLNet Is the latest and greatest model to emerge from the booming field of natural language processing (NLP)
a) True
b) False

A

a) True

99
Q

XLNet Is “Generalized”
a) True
b) False

A

a) True

100
Q

The Attention Learning mechanism has changed the way we work with deep
learning algorithm
a) true
b) false

A

a) true

101
Q

the advantage of transformers over recurrent sequence model is slower to train and run on Modern Hardware
a) true
b) false

A

b) false

102
Q

Fields like NLP and Computer Vision have been revolutionized by the attention mechanism
a) true
b) false

A

a) true

103
Q

Attention Learning is an Interface connecting the Encoder and Decoder that provides the Decoder with Information
a) true
b) false

A

a) true

104
Q

the encoder LSTM or RNN units produce the words in a sentence one after another
a) true
b) false

A

b) false

105
Q

The Encoder reads the input sentence and tries to make sense of it
a) true
b) false

A

a) true

106
Q

The LSTM is supposed to capture the Long-Range dependency better than the RNN
a) true
b) false

A

a) true

107
Q

RNNs Can’t remember longer sentences and sequences
a) true
b) false

A

a) true

108
Q

If the Encoder makes a bad summary, the translation will be also bad
a) true
b) false

A

a) true

109
Q

the Decoder is used to process the entire input sentence and decode it into a Context Vector
a) true
b) false

A

b) false

110
Q

autoencoders belong to supervised neural networks
a) true
b) false

A

b) false

111
Q

Bottleneck Is the most important part of the network
a) true
b) false

A

a) true

112
Q

Convolution Autoencoders Can Do Image Reconstruction
a) true
b) false

A

a) true

113
Q

Deep Autoencoder is composed of two, Symmetrical Deep-Belief networks
a) true
b) false

A

a) true

114
Q

Deep Autoencoders can’t do image search
a) true
b) false

A

b) false

115
Q

Sparse Autoencoders Offer us an alternative method for introducing an
information Bottleneck without requiring a reduction in the number of nodes
a) true
b) false

A

a) true

116
Q

Sparse Autoencoders work by penalizing the activation of Neurons in input
layer
a) true
b) false

A

b) false

117
Q

Autoencoders can De-Noise images
a) true
b) false

A

a) true

118
Q

Autoencoders can’t be used to reduce dimensionality
a) true
b) false

A

b) false

119
Q

the Encoder module that helps the network “decompress” the knowledge representations and reconstructs the data back from its encoded form
a) true
b) false

A

b) false

120
Q

BERT (bidirectional encoder representation from transformers) is a recent paper published by researchers at Amazon AI Language?
a) true
b) false

A

b) false

121
Q

BERT doesn’t read the text input sequentially?
a) true
b) false

A

b) false

122
Q

BERT Allows Transform Learning on the existing pretrained models and hence can be custom trained for the specific subject.
a) true
b) false

A

a) true

123
Q

In BERT, The relationship between all words in a sentence is Modeled
Irrespective of their position.
a) true
b) false

A

a) true

124
Q

BERT uses unidirectional language model for producing word embedding.
a) true
b) false

A

b) false

125
Q

BERT is not an open-source machine learning framework for NLP?
a) true
b) false

A

b) false

126
Q

BERT not understand human language as it is spoken naturally.
a) true
b) false

A

b) false

127
Q

BERT is expected to have large impact on voice search as well as text-based search.
a) true
b) false

A

b) false

128
Q

same word can have multiple word embedding possible with BERT
a) True
b) False

A

b) False

129
Q

BERT is a deep bidirectional, supervised language representation
a) true
b) false

A

b) false

130
Q

pooling is an up-sampling operation that reduces the Dimensionality of the Feature Map.
a) true
b) false

A

b) false

131
Q

The RELU operation is applied to each Pixel and replaces all the negative Pixel values in the Feature Map with Zero
a) true
b) false

A

a) true

132
Q

Pooling Or Spatial Pooling Layers: Also Called Sub-Sampling
a) true
b) false

A

a) true

133
Q

Pooling reduces the Dimensionality of each feature map by retaining the most important information
a) true
b) false

A

a) true

134
Q

the aim of the fully connected layer is to use the low-level feature of the input mage produced by Convolutional and Pooling Layers
a) true
b) false

A

b) false

135
Q

The Hyperparameters for a Pooling Layer are Filter Size, Stride and max or average Pooling
a) true
b) false

A

a) true

136
Q

When we apply a filter of 1×1, then there is no reduction in the size of the image and hence there is no loss of information.
a) true
b) false

A

a) true

137
Q

flattening means that every Neuron in the previous layer is connected to each Neuron in the next layer
a) true
b) false

A

b) false

138
Q

RELU introduces linearity to the network, and the generated output is a
Rectified Feature Map
a) true
b) false

A

b) false

139
Q

Convolutional Layer Receives a set of Input Feature Maps (IFM) and
generates a set of Output Feature Maps (OFM).
a) true
b) false

A

a) true

140
Q

diffusion models work by destroying training data through the successive addition of Laplacian noise, and then learning to recover the data by reversing this noising process.
a) true
b) false

A

b) false

141
Q

A Discrete Decoder is used to obtain Log likelihoods across Pixel values as the last step in the Reverse Diffusion process.
a) true
b) false

A

a) true

142
Q

diffusion model is a latent variable model which maps to the latent space Sobel using a Fixed chain.
a) true
b) false

A

b) false

143
Q

The goal of training a Diffusion model is to learn the reverse process
a) true
b) false

A

a) true

144
Q

the transition distributions in the Markov chain are Gaussian, which depends only on the forward process.
a) true
b) false

A

b) false

145
Q

Diffusion model is parameterized as a Markov chain, meaning that our latent variables x1, … xt depend only on the previous (or following) timestep.
a) true
b) false

A

a) true

146
Q

for the reverse process in the Diffusion model, we much choose a variance schedule.
a) true
b) false

A

b) false

147
Q

The transition distributions in the Markov chain are Gaussian, where the forward process requires a variance schedule, and the reverse process parameters are learned.
a) true
b) false

A

a) true

148
Q

cascade Diffusion models (like Stable Diffusion) apply the Diffusion process on a smaller latent space for computational efficiency using a Variational Autoencoder for the up and down sampling.
a) true
b) false

A

b) false

149
Q

Diffusion Models can be applied to image De-Noising, Inpainting, Super Resolution, and Image Generation.
a) true
b) false

A

a) true

150
Q

Federated Learning is not used to improve the privacy and security of machine learning models.
a) true
b) false

A

b) false

151
Q

Federated Learning requires the use of a centralized server.
a) true
b) false

A

b) false

152
Q

Federated Learning can’t be used to train models on data that is distributed across multiple devices, such as Smartphones or IoT devices.
a) true
b) false

A

b) false

153
Q

Federated Learning requires the use of a centralized database.
a) true
b) false

A

b) false

154
Q

Federated Learning can’t be used to improve the privacy of machine learning models by keeping sensitive data on individual devices.
a) true
b) false

A

b) false

155
Q

Federated Learning is a Type of machine learning that allows multiple parties to train a model without sharing their data.
a) true
b) false

A

a) true

156
Q

Federated Learning requires participating devices to have high computational power.
a) true
b) false

A

b) false

157
Q

Federated Learning enables Participants to train local models cooperatively on local data without disclosing sensitive data to a central cloud server
a) true
b) false

A

a) true

158
Q

Federated Learning can’t be used to train deep learning models.
a) true
b) false

A

b) false

159
Q

Federated Learning can be used to train models on data that is distributed across multiple devices in real-time.
a) true
b) false

A

a) true

160
Q

In Sequential Data, the points in the dataset are dependent on the other points in the dataset.
a) true
b) false

A

a) true

161
Q

A Timeseries is a common example of Sequential Data, with each point
reflecting an observation at a certain point in time.
a) true
b) false

A

a) true

162
Q

the crucial element to remember about sequence models is that the data we’re working with are Independently and Identically Distributed (I.I.D.) samples.
a) true
b) false

A

b) false

163
Q

Sequence models are the machine learning models that input or output sequences of data.
a) true
b) false

A

a) true

164
Q

structured data includes text streams, audio clips, video clips and time-series data.
a) true
b) false

A

b) false

165
Q

the conventional feedforward artificial neural networks can deal with sequential data and can be trained to hold knowledge about the past.
a) true
b) false

A

b) false

166
Q

traditional RNNs are very excellent at capturing Long-Range dependencies.
a) true
b) false

A

b) false

167
Q

LSTMs Are explicitly designed to avoid the Long-Term Dependency problem.
a) true
b) false

A

a) true

168
Q

input gate controls what information should be forgotten.
a) true
b) false

A

b) false

169
Q

input gate helps to Identify important elements that need to be added to the cell state.
a) true
b) false

A

a) true

170
Q

RBMs are a supervised learning technique
a) true
b) false

A

b) false

171
Q

RBM isn’t restricted to have only the connections between the visible and the
hidden units
a) true
b) false

A

b) false

172
Q

RBM performs discriminative learning similar to what happens in a
classification problem
a) true
b) false

A

b) false

173
Q

If number of visible nodes = nV, number of hidden nodes = nH, then number of connections in RBM = nV* nH
a) true
b) false

A

a) true

174
Q

Boltzmann machines are non-deterministic generative deep learning models with 3 types of nodes: visible, hidden and output nodes
a) true
b) false

A

b) false

175
Q

Boltzmann machines Fall into the class of unsupervised learning.
a) true
b) false

A

a) true

176
Q

sparse Autoencoders introduces information Bottleneck by reducing the number of nodes at hidden layers.
a) true
b) false

A

b) false

177
Q

The idea is to Encourage network to learn an Encoding and Decoding which only relies on activating a small number of neurons.
a) true
b) false

A

a) true

178
Q

To implement Undercomplete Autoencoder, constrain the number of nodes present in hidden layer(s) of the neural network.
a) true
b) false

A

a) true

179
Q

Autoencoders are not capable of learning nonlinear manifolds (a continuous, non-intersecting surface.)
a) true
b) false

A

b) false

180
Q

A Neural Network with multiple hidden layers and Sigmoid nodes can form non-linear decision boundaries.
a) true
b) false

A

a) true

181
Q

Neural Networks compute non-convex functions of their parameters.
a) true
b) false

A

b) false

182
Q

For Logistic Regression, with parameters optimized using a Stochastic Gradient method, setting parameters to 0 is an acceptable initialization.
a) true
b) false

A

a) true

183
Q

For arbitrary Neural Networks, with weights optimized using a Stochastic Gradient method, setting weights to 0 is an acceptable initialization.
a) true
b) false

A

b) false

184
Q

Given a design matrix x ∈ r^(n×d) where d &laquo_space;n, if we project our data onto a k dimensional subspace using PCA where k equals the rank of x, we recreate a perfect representation of our data with no loss.
a) true
b) false

A

a) true

185
Q

hierarchical clustering methods require a predefined number of clusters, much like k-means.
a) true
b) false

A

b) false

186
Q

Given a predefined number of clusters k, globally minimizing the k-means objective function is NP-hard.
a) true
b) false

A

a) true

187
Q

a Random Forest is an ensemble learning method that attempts to lower the bias error of decision trees.
a) true
b) false

A

b) false

188
Q

bagging algorithms attach weights w1…wn to a set of n weak learners. they
re-weight the learners and convert them into strong ones. boosting algorithms
draw n sample distributions (usually with replacement) from an original data set
for learners to train on.
a) true
b) false

A

b) false

189
Q

using cross validation to select Hyperparameters will guarantee that our model does not overfit.
a) true
b) false

A

b) false

190
Q

Bidirectionality is Achieved by a phenomenon called “Masked Language
Modeling”.
a) true
b) false

A

a) true

191
Q

BERT Overcomes this shortcoming; in that it considers previous and next tokens to predict the current token.
a) true
b) false

A

a) true

192
Q

XLNet is not the latest and greatest model to emerge from the booming field of natural language processing (NLP).
a) true
b) false

A

b) false

193
Q

XLNet is not “generalized” because it captures Bidirectional context by means of a mechanism called “Permutation Language Modeling”.
a) true
b) false

A

b) false

194
Q

XLNet is not a generalized Autoregressive model where next token is
dependent on all previous tokens
a) true
b) false

A

b) false

195
Q

XLNet is the idea of capturing Bidirectional context by training an
Autoregressive model on all possible permutation of words in a sentence
a) true
b) false

A

b) false

196
Q

XLNet Integrates the idea of Auto-Regressive models and bi-directional context modeling, yet overcoming the disadvantages of BERT
a) true
b) false

A

a) true

197
Q

Autoregressive (AR) Language Modeling and Autoencoding (AE) have been the two most successful pretraining objectives.
a) true
b) false

A

a) true

198
Q

There are proposed methods used in XLNet like background, objective: permutation language modeling.
a) true
b) false

A

a) true

199
Q

For both BERT and XLNet, partial prediction plays a role of reducing
optimization difficulty by only predicting tokens with sufficient context.
a) true
b) false

A

a) true