Machine Learning Flashcards
Which of the following are the advantages of transformers over a recurrent sequence model?
a) better at learning long-range dependencies
b) Slower to train and run-on modern hardware
c) require many fewer parameters to achieve similar results
d) none of the above
a) better at learning long-range dependencies
Which of these parts of the self-attention operation are calculated by passing inputs through MLP?
a) values
b) keys
c) queries
d) all the above
d) all the above
What is the field of natural language processing (NLP)?
a) computer science
b) artificial intelligence
c) linguistics
d) all of the mentioned
d) all of the mentioned
What is the main challenge/s of NLP?
a) handling ambiguity of sentences
b) handling tokenization
c) handling pos-tagging
d) All of the mentioned
a) handling ambiguity of sentences
What is machine translation?
a) Converts one human language to another
b) Converts human language to machine language
c) Converts any human language to English
d) Converts machine language to human language
a) Converts one human language to another
choose from the following areas where NLP can be useful.
a) automatic text summarization
b) automatic question-answering systems
c) information retrieval
d) all the mentioned
d) all the mentioned
Which of the following properties will a good position encoding ideally have?
a) unique for all positions
b) relative distances are independent of absolute sequence position
c) well-defined for arbitrary sequence lengths
d) all the above
d) all the above
Which of the following includes the major tasks of NLP?
a) automatic summarization
b) discourse analysis
c) machine translation
d) all the mentioned
d) all the mentioned
Neural machine translation was based on encoder-decoder _____
a) RNNs
b) LSTMs
c) both a & b
d) neither a & b
c) both a & b
The encoder LSTM is used to process the _____ sentence.
a) input
b) output
c) function
d) All the above
a) input
What is the type of autoencoder?
a) Supervised neural network
b) unsupervised neural network
c) semi-supervised neural network
d) reinforcement neural network
b) unsupervised neural network
What type of data can the autoencoder apply dimensionality reduction on?
a) linear data
b) nonlinear data
c) both a & b
d) none of the above
c) both a & b
A module that compresses data into an encoded representation that is typically several orders of magnitude smaller than the input data.
a) The encoder
b) Bottleneck
c) The decoder
d) None of the above
a) The encoder
a module that contains the compressed knowledge representation and considers the most important part of the autoencoder network?
a) the encoder
b) bottleneck
c) the decoder
d) None of the above
b) bottleneck
A module that helps the network “decompress” the knowledge representations and reconstructs the data back from its encoded form.
a) input layer
b) bottleneck
c) output layer
d) none of the above
c) output layer
What type of autoencoders work by penalizing the activation of some
neurons in hidden layers?
a) Sparse autoencoder
b) Variational autoencoder
c) Deep autoencoder
d) Convolution autoencoders
a) Sparse autoencoder
Which of the following is done by a deep autoencoder?
a) image reconstruction
b) image colorization
c) image search
d) image denoising
c) image search
Which of the following is done by a convolution autoencoder?
a) data compression
b) image search
c) information retrieval
d) image colorization
d) image colorization
Which of the following is an autoencoder application?
a) watermark removing
b) dimensionality reduction
c) image generation
d) all the above
d) all the above
Which autoencoder doesn’t require reducing the bottleneck nodes?
a) sparse autoencoder
b) deep autoencoder
c) variational autoencoder
d) None of the above
a) sparse autoencoder
in NLP, bidirectional context is supported by which of the following embedding
a) WORD2VEC
b) BERT
c) GLOVE
d) All the above
b) BERT
For a given token, its input representation is the sum of embedding from the token, segment, and position
a) ELMO
b) GPT
c) BERT
d) none of the above
c) BERT
BERT Base Contains _____ encoder layers
a) 12
b) 24
c) 36
d) 48
a) 12
BERT large Contains _____ encoder layers
a) 12
b) 24
c) 36
d) 48
b) 24
BERT aims at tackling various NLP tasks such as _____
a) question answering
b) language inference
c) text summarization
d) all of the mentioned
d) all of the mentioned
The BERT model is pre-trained on relatively generic tasks
a) masked language modeling (MLM)
b) next sentence prediction
c) a and b
d) none of the mentioned
c) a and b
_______ Is to hide a word in a sentence and then have the program predict what
word has been hidden (masked) based on the hidden word’s context.
a) Masked language modeling (MLM)
b) Next sentence prediction
c) Sequence classification
d) Named entity recognition (NER)
a) Masked language modeling (MLM)
_______ is to have the program predict whether two given sentences have a
logical, sequential connection or whether their relationship is simply random
a) Masked language modeling (MLM)
b) Next sentence prediction
c) Sequence classification
d) Named entity recognition (NER)
b) Next sentence prediction
BERT Can process text ______
a) left-to-right
b) right-to-left
c) both
d) none of the mention
c) both
BERT was created and published in 2018 By ______
a) Amazon
b) Microsoft
c) IBM
d) Google
d) Google
What is the difference between CNN and ANN?
a) CNN has one or more layers of convolution units, which receive its input from multiple units.
b) CNN uses a simpler algorithm than ann.
c) They complete each other, so to use ANN, you need to start with CNN.
d) CNN is the easiest way to use neural networks.
a) CNN has one or more layers of convolution units, which receive its input from multiple units.
The data fed into the model and output from each layer is obtained. this step is called.
a) Feed forward
b) Feed backward
c) Input layer
d) Output layer
a) Feed forward
Common types of pooling layers.
a) 5
b) 2
c) 3
d) 4
b) 2
computes the output volume by the computing dot product between all filters and image patches.
a) Input layer
b) Convolution layer
c) Activation function layer
d) Pool layer
b) Convolution layer
What is back propagation?
a) it is another name given to the curvy function in the perceptron
b) it is the transmission of error back through the network to adjust the inputs
c) it is the transmission of error back through the network to allow weights to be adjusted so that the network can learn
d) all of the mentioned
c) it is the transmission of error back through the network to allow weights to be adjusted so that the network can learn
Which of the following functions can be used as an activation function in the output layer if we wish to predict the probabilities of n classes (p1, p2…pk) such that the sum of p over all n equals 1?
a) RELU
b) Sigmoid
c) Softmax
d) Tanh
c) Softmax
Which of the following would have a constant input in each epoch of training a deep learning model?
a) Weight between input and hidden layer
b) Weight between hidden and output layer
c) Biases of all hidden layer neurons
d) Activation function of output layer
a) Weight between input and hidden layer
Which of the following neural network training challenges can be solved using
batch normalization?
a) overfitting
b) underfitting
c) training is too slow
d) none of the mentioned
c) training is too slow
The number of nodes in the input layer is 10 and the hidden layer is 5. the maximum number of connections from the input layer to the hidden layer are?
a) 50
b) Less than 50
c) More than 50
d) None of the mentioned
a) 50
Is Deep Learning a specialized subset of machine learning?
a) true
b) false
a) true
_____ are models, used to generate data similar to the data on which they are trained, by destroying training data through the successive addition of gaussian noise, and then learning to recover the data by reversing this noising process.
a) Federal learning.
b) Attention learning.
c) CNN.
d) Diffusion models.
d) Diffusion models.
What is the goal of training a diffusion model?
a) Learn the reverse process
b) Learn to understand the image
c) Extract the image features
d) Classify the images
a) Learn the reverse process
one of the benefits of the diffusion model is _____
a) scalability
b) not requiring adversarial training.
c) parallelizability
d) all of the above.
d) all of the above.
in general diffusion model consist of _____ main process
a) 5
b)4
c) 3
d)2
d)2
A diffusion model is trained by finding the reverse Markov transitions that the likelihood of the training data.
a) Maximize
b) Minimize.
c) Increase.
d) Decrease.
a) Maximize
for the reverse process in the diffusion model, we much choose the _____
a) the Sobel filter
b) Laplacian operator
c)thresholding method
d)the gaussian distribution parameterization
d)the gaussian distribution parameterization
the transition distributions in the Markov chain are gaussian, where the forward process requires a ______, and the reverse process parameters are learned.
a) variance schedule
b) Laplacian operator.
c)the gaussian distribution parameterization.
d)none of the mentioned.
d)none of the mentioned.
our diffusion model is parameterized as a Markov chain, meaning that our latent variables depend only on the _____ timestep
a) previous or following
b) previous
c) following
d) none of the mentioned
a) previous or following
a _____ is used to obtain log-likelihoods across pixel values as the last step in the reverse diffusion process.
a) kl divergences
b) simplified training objective
c) u-net-like.
d) discrete decoder.
d) discrete decoder.
diffusion models can be applied to
a) image denoising
b) super-resolution.
c) image generation.
d) all of the above.
d) all of the above.
What is the main goal of federated learning?
a) to train a single machine learning model on a centralized dataset
b) to train multiple machine learning models on decentralized datasets
c) to train a single machine learning model on decentralized datasets
d) to train multiple machine learning models on a centralized dataset
c) to train a single machine learning model on decentralized datasets
How does federated learning differ from traditional machine learning?
a) federated learning requires less data
b) federated learning requires more computational resources
c) federated learning requires less communication bandwidth
d) federated learning requires more data privacy concerns
d) federated learning requires more data privacy concerns
What is an advantage of federated learning compared to traditional centralized training?
a) it is more accurate
b) it is faster
c) it requires less data
d) it allows for decentralized data to be used
d) it allows for decentralized data to be used
How is data privacy protected in federated learning?
a) data is encrypted before being sent to the centralized server
b) data is never shared with any other parties
c) data remains on the individual devices and is only used for model training
d) data is aggregated and anonym zed before being used for model training
c) data remains on the individual devices and is only used for model training
In federated learning, who is responsible for training the model?
a) a centralized server
b) a third-party organization
c) individual clients
d) the data owner
c) individual clients
key benefits of federated learning…….
a) it involves more diverse data.
b) it’s secure.
c) it yields real-time predictions.
d) all of the above
d) all of the above
What are the challenges of federated learning?
a) efficient communication across the federated network.
b) managing heterogeneous systems in the same networks.
c) privacy concerns and privacy-preserving methods.
d) all of the above
d) all of the above
How does federated learning work?
a) Transfer of weights and biases to cloud server
b) Transfer of data to cloud server
c) Transfer of model to cloud server
d) Transfer of user info to cloud
a) Transfer of weights and biases to cloud server
Is federated learning more efficient than standard ml techniques for a large number of devices?
a) True
b) False
c) Depends on use case
d) Cannot say
a) True
federated learning is ______
a) Supervised
b) Unsupervised
c) Reinforcement learning.
d) None of the above
b) Unsupervised
What is the basic concept of recurrent neural network?
a) use a loop between inputs and outputs in order to achieve the better prediction.
b) use recurrent features from dataset to find the best answers.
c) use previous inputs to find the next output according to the training set.
d) use loops between the most important features to predict next output.
c) use previous inputs to find the next output according to the training set.
The other RNN´s issue is called ‘vanishing gradients’. what is that?
a) when the values of a gradient are too small and the model joins in a loop because of that.
b) when the values of a gradient are too big and the model stops learning or takes way too long because of that.
c) when the values of a gradient are too small and the model stops learning or takes way too long because of that.
d) when the values of a gradient are too big and the model joins in a loop because of that.
c) when the values of a gradient are too small and the model stops learning or takes way too long because of that.
LSTM. What is that?
a) LSTM networks are an extension for recurrent neural networks, which basically extends their memory. therefore, it is well suited to learn from important experiences that have very low time lags in between
b) LSTM networks are an extension for recurrent neural networks, which basically extends their memory. therefore, it is not recommended to use it, unless you are using a small dataset.
c) LSTM networks are an extension for recurrent neural networks, which basically extends their memory. therefore, it is well suited to learn from important experiences that have long-time lags in between
d) LSTM networks are an extension for recurrent neural networks, which basically shorten their memory. therefore, it is well suited to learn from important experiences that have very low time lags in between
c) LSTM networks are an extension for recurrent neural networks, which basically extends their memory. therefore, it is well suited to learn from important experiences that have long-time lags in between
The network that involves backward links from output to the input and hidden layers is called _________
a) self-organizing maps
b) perceptron
c) recurrent neural network
d) multi layered perceptron
c) recurrent neural network
RNNs Stands for?
a) Recurrent neural networks
b) Report neural networks
c) Receives neural networks
d) Recording neural networks
a) Recurrent neural networks
What is the activation function used in forget gate?
a) Sigmoid
b) Tanh
c) RELU
d) None of the above
a) Sigmoid
How Many Gates Are There In LSTM?
a) 3
b) 5
c) 4
d) 2
a) 3
……, When the points in the dataset are dependent on the other points in the dataset.
a) continuous data
b) discrete data
c) sequential data
d) ordinal data
c) sequential data
……… helps to identify important elements that need to be added to the cell state.
a) Forget gate
b) Input gate
c) Output gate
d) None of the above
b) Input gate
LSTM used in ……
a) speech recognition
b) music composition
c) time series prediction
d) all of the above
d) all of the above
what should be the aim of training procedure in Boltzmann machine of
feedback networks?
a) to capture inputs
b) to feedback the captured outputs
c) to capture the behavior of system
d) none of the mentioned
d) none of the mentioned
What consist of Boltzmann machine?
a) fully connected network with both hidden and visible units
b) asynchronous operation
c) stochastic update
d) all of the mentioned
d) all of the mentioned
by using which method, Boltzmann machine reduces the effect of additional stable states?
a) No such method exists
b) Simulated annealing
c) Hopfield reduction
d) None of the mentioned
b) Simulated annealing
for which another task can Boltzmann machine be used?
a) pattern mapping
b) feature mapping
c) classification
d) pattern association
d) pattern association
Presence of false minima will have what effect on probability of error in recall?
a) Directly
b) Inversely
c) No effect
d) Directly or Inversely
a) Directly
What happens when we use mean field approximation with Boltzmann learning?
a) It slows down
b) It gets speeded up
c) Nothing happens
d) may speedup or speed down
b) It gets speeded up
in Boltzmann learning which algorithm can be used to arrive at equilibrium?
a) Hopfield
b) mean field
c) Hebb
d) none of the mentioned
d) none of the mentioned
All the visible layers in a restricted Boltzmann machine are not connected to
each other.
a) True
b) False
a) True
What are the two layers of a restricted Boltzmann machine called?
a) input and output layers
b) recurrent and convolution layers
c) activation and threshold layers
d) hidden and visible layers
d) hidden and visible layers
A deep belief network is a stack of restricted Boltzmann machines.
a) True
b) False
a) True