Machine Learning Flashcards
Which of the following are the advantages of transformers over a recurrent sequence model?
a) better at learning long-range dependencies
b) Slower to train and run-on modern hardware
c) require many fewer parameters to achieve similar results
d) none of the above
a) better at learning long-range dependencies
Which of these parts of the self-attention operation are calculated by passing inputs through MLP?
a) values
b) keys
c) queries
d) all the above
d) all the above
What is the field of natural language processing (NLP)?
a) computer science
b) artificial intelligence
c) linguistics
d) all of the mentioned
d) all of the mentioned
What is the main challenge/s of NLP?
a) handling ambiguity of sentences
b) handling tokenization
c) handling pos-tagging
d) All of the mentioned
a) handling ambiguity of sentences
What is machine translation?
a) Converts one human language to another
b) Converts human language to machine language
c) Converts any human language to English
d) Converts machine language to human language
a) Converts one human language to another
choose from the following areas where NLP can be useful.
a) automatic text summarization
b) automatic question-answering systems
c) information retrieval
d) all the mentioned
d) all the mentioned
Which of the following properties will a good position encoding ideally have?
a) unique for all positions
b) relative distances are independent of absolute sequence position
c) well-defined for arbitrary sequence lengths
d) all the above
d) all the above
Which of the following includes the major tasks of NLP?
a) automatic summarization
b) discourse analysis
c) machine translation
d) all the mentioned
d) all the mentioned
Neural machine translation was based on encoder-decoder _____
a) RNNs
b) LSTMs
c) both a & b
d) neither a & b
c) both a & b
The encoder LSTM is used to process the _____ sentence.
a) input
b) output
c) function
d) All the above
a) input
What is the type of autoencoder?
a) Supervised neural network
b) unsupervised neural network
c) semi-supervised neural network
d) reinforcement neural network
b) unsupervised neural network
What type of data can the autoencoder apply dimensionality reduction on?
a) linear data
b) nonlinear data
c) both a & b
d) none of the above
c) both a & b
A module that compresses data into an encoded representation that is typically several orders of magnitude smaller than the input data.
a) The encoder
b) Bottleneck
c) The decoder
d) None of the above
a) The encoder
a module that contains the compressed knowledge representation and considers the most important part of the autoencoder network?
a) the encoder
b) bottleneck
c) the decoder
d) None of the above
b) bottleneck
A module that helps the network “decompress” the knowledge representations and reconstructs the data back from its encoded form.
a) input layer
b) bottleneck
c) output layer
d) none of the above
c) output layer
What type of autoencoders work by penalizing the activation of some
neurons in hidden layers?
a) Sparse autoencoder
b) Variational autoencoder
c) Deep autoencoder
d) Convolution autoencoders
a) Sparse autoencoder
Which of the following is done by a deep autoencoder?
a) image reconstruction
b) image colorization
c) image search
d) image denoising
c) image search
Which of the following is done by a convolution autoencoder?
a) data compression
b) image search
c) information retrieval
d) image colorization
d) image colorization
Which of the following is an autoencoder application?
a) watermark removing
b) dimensionality reduction
c) image generation
d) all the above
d) all the above
Which autoencoder doesn’t require reducing the bottleneck nodes?
a) sparse autoencoder
b) deep autoencoder
c) variational autoencoder
d) None of the above
a) sparse autoencoder
in NLP, bidirectional context is supported by which of the following embedding
a) WORD2VEC
b) BERT
c) GLOVE
d) All the above
b) BERT
For a given token, its input representation is the sum of embedding from the token, segment, and position
a) ELMO
b) GPT
c) BERT
d) none of the above
c) BERT
BERT Base Contains _____ encoder layers
a) 12
b) 24
c) 36
d) 48
a) 12
BERT large Contains _____ encoder layers
a) 12
b) 24
c) 36
d) 48
b) 24
BERT aims at tackling various NLP tasks such as _____
a) question answering
b) language inference
c) text summarization
d) all of the mentioned
d) all of the mentioned
The BERT model is pre-trained on relatively generic tasks
a) masked language modeling (MLM)
b) next sentence prediction
c) a and b
d) none of the mentioned
c) a and b
_______ Is to hide a word in a sentence and then have the program predict what
word has been hidden (masked) based on the hidden word’s context.
a) Masked language modeling (MLM)
b) Next sentence prediction
c) Sequence classification
d) Named entity recognition (NER)
a) Masked language modeling (MLM)
_______ is to have the program predict whether two given sentences have a
logical, sequential connection or whether their relationship is simply random
a) Masked language modeling (MLM)
b) Next sentence prediction
c) Sequence classification
d) Named entity recognition (NER)
b) Next sentence prediction
BERT Can process text ______
a) left-to-right
b) right-to-left
c) both
d) none of the mention
c) both
BERT was created and published in 2018 By ______
a) Amazon
b) Microsoft
c) IBM
d) Google
d) Google
What is the difference between CNN and ANN?
a) CNN has one or more layers of convolution units, which receive its input from multiple units.
b) CNN uses a simpler algorithm than ann.
c) They complete each other, so to use ANN, you need to start with CNN.
d) CNN is the easiest way to use neural networks.
a) CNN has one or more layers of convolution units, which receive its input from multiple units.
The data fed into the model and output from each layer is obtained. this step is called.
a) Feed forward
b) Feed backward
c) Input layer
d) Output layer
a) Feed forward
Common types of pooling layers.
a) 5
b) 2
c) 3
d) 4
b) 2
computes the output volume by the computing dot product between all filters and image patches.
a) Input layer
b) Convolution layer
c) Activation function layer
d) Pool layer
b) Convolution layer
What is back propagation?
a) it is another name given to the curvy function in the perceptron
b) it is the transmission of error back through the network to adjust the inputs
c) it is the transmission of error back through the network to allow weights to be adjusted so that the network can learn
d) all of the mentioned
c) it is the transmission of error back through the network to allow weights to be adjusted so that the network can learn
Which of the following functions can be used as an activation function in the output layer if we wish to predict the probabilities of n classes (p1, p2…pk) such that the sum of p over all n equals 1?
a) RELU
b) Sigmoid
c) Softmax
d) Tanh
c) Softmax
Which of the following would have a constant input in each epoch of training a deep learning model?
a) Weight between input and hidden layer
b) Weight between hidden and output layer
c) Biases of all hidden layer neurons
d) Activation function of output layer
a) Weight between input and hidden layer
Which of the following neural network training challenges can be solved using
batch normalization?
a) overfitting
b) underfitting
c) training is too slow
d) none of the mentioned
c) training is too slow
The number of nodes in the input layer is 10 and the hidden layer is 5. the maximum number of connections from the input layer to the hidden layer are?
a) 50
b) Less than 50
c) More than 50
d) None of the mentioned
a) 50
Is Deep Learning a specialized subset of machine learning?
a) true
b) false
a) true
_____ are models, used to generate data similar to the data on which they are trained, by destroying training data through the successive addition of gaussian noise, and then learning to recover the data by reversing this noising process.
a) Federal learning.
b) Attention learning.
c) CNN.
d) Diffusion models.
d) Diffusion models.
What is the goal of training a diffusion model?
a) Learn the reverse process
b) Learn to understand the image
c) Extract the image features
d) Classify the images
a) Learn the reverse process
one of the benefits of the diffusion model is _____
a) scalability
b) not requiring adversarial training.
c) parallelizability
d) all of the above.
d) all of the above.
in general diffusion model consist of _____ main process
a) 5
b)4
c) 3
d)2
d)2
A diffusion model is trained by finding the reverse Markov transitions that the likelihood of the training data.
a) Maximize
b) Minimize.
c) Increase.
d) Decrease.
a) Maximize
for the reverse process in the diffusion model, we much choose the _____
a) the Sobel filter
b) Laplacian operator
c)thresholding method
d)the gaussian distribution parameterization
d)the gaussian distribution parameterization
the transition distributions in the Markov chain are gaussian, where the forward process requires a ______, and the reverse process parameters are learned.
a) variance schedule
b) Laplacian operator.
c)the gaussian distribution parameterization.
d)none of the mentioned.
d)none of the mentioned.
our diffusion model is parameterized as a Markov chain, meaning that our latent variables depend only on the _____ timestep
a) previous or following
b) previous
c) following
d) none of the mentioned
a) previous or following
a _____ is used to obtain log-likelihoods across pixel values as the last step in the reverse diffusion process.
a) kl divergences
b) simplified training objective
c) u-net-like.
d) discrete decoder.
d) discrete decoder.
diffusion models can be applied to
a) image denoising
b) super-resolution.
c) image generation.
d) all of the above.
d) all of the above.
What is the main goal of federated learning?
a) to train a single machine learning model on a centralized dataset
b) to train multiple machine learning models on decentralized datasets
c) to train a single machine learning model on decentralized datasets
d) to train multiple machine learning models on a centralized dataset
c) to train a single machine learning model on decentralized datasets
How does federated learning differ from traditional machine learning?
a) federated learning requires less data
b) federated learning requires more computational resources
c) federated learning requires less communication bandwidth
d) federated learning requires more data privacy concerns
d) federated learning requires more data privacy concerns
What is an advantage of federated learning compared to traditional centralized training?
a) it is more accurate
b) it is faster
c) it requires less data
d) it allows for decentralized data to be used
d) it allows for decentralized data to be used
How is data privacy protected in federated learning?
a) data is encrypted before being sent to the centralized server
b) data is never shared with any other parties
c) data remains on the individual devices and is only used for model training
d) data is aggregated and anonym zed before being used for model training
c) data remains on the individual devices and is only used for model training
In federated learning, who is responsible for training the model?
a) a centralized server
b) a third-party organization
c) individual clients
d) the data owner
c) individual clients
key benefits of federated learning…….
a) it involves more diverse data.
b) it’s secure.
c) it yields real-time predictions.
d) all of the above
d) all of the above
What are the challenges of federated learning?
a) efficient communication across the federated network.
b) managing heterogeneous systems in the same networks.
c) privacy concerns and privacy-preserving methods.
d) all of the above
d) all of the above
How does federated learning work?
a) Transfer of weights and biases to cloud server
b) Transfer of data to cloud server
c) Transfer of model to cloud server
d) Transfer of user info to cloud
a) Transfer of weights and biases to cloud server
Is federated learning more efficient than standard ml techniques for a large number of devices?
a) True
b) False
c) Depends on use case
d) Cannot say
a) True
federated learning is ______
a) Supervised
b) Unsupervised
c) Reinforcement learning.
d) None of the above
b) Unsupervised
What is the basic concept of recurrent neural network?
a) use a loop between inputs and outputs in order to achieve the better prediction.
b) use recurrent features from dataset to find the best answers.
c) use previous inputs to find the next output according to the training set.
d) use loops between the most important features to predict next output.
c) use previous inputs to find the next output according to the training set.
The other RNN´s issue is called ‘vanishing gradients’. what is that?
a) when the values of a gradient are too small and the model joins in a loop because of that.
b) when the values of a gradient are too big and the model stops learning or takes way too long because of that.
c) when the values of a gradient are too small and the model stops learning or takes way too long because of that.
d) when the values of a gradient are too big and the model joins in a loop because of that.
c) when the values of a gradient are too small and the model stops learning or takes way too long because of that.
LSTM. What is that?
a) LSTM networks are an extension for recurrent neural networks, which basically extends their memory. therefore, it is well suited to learn from important experiences that have very low time lags in between
b) LSTM networks are an extension for recurrent neural networks, which basically extends their memory. therefore, it is not recommended to use it, unless you are using a small dataset.
c) LSTM networks are an extension for recurrent neural networks, which basically extends their memory. therefore, it is well suited to learn from important experiences that have long-time lags in between
d) LSTM networks are an extension for recurrent neural networks, which basically shorten their memory. therefore, it is well suited to learn from important experiences that have very low time lags in between
c) LSTM networks are an extension for recurrent neural networks, which basically extends their memory. therefore, it is well suited to learn from important experiences that have long-time lags in between
The network that involves backward links from output to the input and hidden layers is called _________
a) self-organizing maps
b) perceptron
c) recurrent neural network
d) multi layered perceptron
c) recurrent neural network
RNNs Stands for?
a) Recurrent neural networks
b) Report neural networks
c) Receives neural networks
d) Recording neural networks
a) Recurrent neural networks
What is the activation function used in forget gate?
a) Sigmoid
b) Tanh
c) RELU
d) None of the above
a) Sigmoid
How Many Gates Are There In LSTM?
a) 3
b) 5
c) 4
d) 2
a) 3
……, When the points in the dataset are dependent on the other points in the dataset.
a) continuous data
b) discrete data
c) sequential data
d) ordinal data
c) sequential data
……… helps to identify important elements that need to be added to the cell state.
a) Forget gate
b) Input gate
c) Output gate
d) None of the above
b) Input gate
LSTM used in ……
a) speech recognition
b) music composition
c) time series prediction
d) all of the above
d) all of the above
what should be the aim of training procedure in Boltzmann machine of
feedback networks?
a) to capture inputs
b) to feedback the captured outputs
c) to capture the behavior of system
d) none of the mentioned
d) none of the mentioned
What consist of Boltzmann machine?
a) fully connected network with both hidden and visible units
b) asynchronous operation
c) stochastic update
d) all of the mentioned
d) all of the mentioned
by using which method, Boltzmann machine reduces the effect of additional stable states?
a) No such method exists
b) Simulated annealing
c) Hopfield reduction
d) None of the mentioned
b) Simulated annealing
for which another task can Boltzmann machine be used?
a) pattern mapping
b) feature mapping
c) classification
d) pattern association
d) pattern association
Presence of false minima will have what effect on probability of error in recall?
a) Directly
b) Inversely
c) No effect
d) Directly or Inversely
a) Directly
What happens when we use mean field approximation with Boltzmann learning?
a) It slows down
b) It gets speeded up
c) Nothing happens
d) may speedup or speed down
b) It gets speeded up
in Boltzmann learning which algorithm can be used to arrive at equilibrium?
a) Hopfield
b) mean field
c) Hebb
d) none of the mentioned
d) none of the mentioned
All the visible layers in a restricted Boltzmann machine are not connected to
each other.
a) True
b) False
a) True
What are the two layers of a restricted Boltzmann machine called?
a) input and output layers
b) recurrent and convolution layers
c) activation and threshold layers
d) hidden and visible layers
d) hidden and visible layers
A deep belief network is a stack of restricted Boltzmann machines.
a) True
b) False
a) True
the main and most important feature of RNN is _________.
a) visible state
b) hidden state
c) present state
d) None of these
b) hidden state
RNN remembers each and every information through________.
a) Work
b) Time
c) Hours
d) Memory
b) Time
to create a numerical representation of our text-based dataset we generate two lookup table, what are they_____.
a) maps character to numbers
b) maps numbers back to characters
c) identify unique characters present in text
d) both a & b
d) both a & b
_______occurs when the gradients become very small and tend towards zero.
a) Exploding gradients
b) Vanishing gradients
c) Long short-term memory networks
d) Gated recurrent unit networks.
b) Vanishing gradients
on what parameters can change in weight vector depend?
a) learning parameters
b) input vector
c) learning signal
d) all of the mentioned
d) all of the mentioned
________Occurs when the gradients become too large due to back-propagation.
a) Exploding gradients
b) Vanishing gradients
c) Long short-term memory networks
d) Gated recurrent unit networks
a) Exploding gradients
If a competitive network can perform feature mapping, then what is that network can be called?
a) self-excitatory
b) self-inhibitory
c) self-organization
d) none of the mentioned
c) self-organization
why do we need biological neural networks?
a) to solve tasks like machine vision & natural language processing
b) to apply heuristic search methods to find solutions of problem
c) to make smart human interactive & user-friendly system
d) all of the mentioned
d) all of the mentioned
what is auto-association task in neural networks?
a) find relation between 2 consecutive inputs
b) related to storage & recall task
c) predicting the future inputs
d) None of the mentioned
b) related to storage & recall task
What is unsupervised learning?
a) features of group explicitly stated
b) number of groups may be known
c) neither feature & nor number of groups is known
d) none of the mentioned
c) neither feature & nor number of groups is known
XLNet Is an ________ language model which outputs the joint probability of a sequence of tokens based on the transformer architecture with recurrence.
a) Auto-regressive
b) Auto-Negressive
c) Objective
d) Bidirectional
a) Auto-regressive
XLNet Is “Generalized” because it captures bi-directional context by means of
a) mechanism called____
a) PLM
b) BERT
c) TRANSFORMER-XL
d) MLM
a) mechanism called____
______ Keep track of the position of each token in a sequence (will know why
we have this in the later sections)
a) pretrain-finetune discrepancy
b) transformer-xl
c) positional encoding
d) segment recurrence
c) positional encoding
______ cache the hidden state of first segment in memory in each layer and update attention accordingly. it allows reuse of memory for each segment.
a) pretrain-finetune discrepancy
b) transformer-xl
c) positional encoding
d) segment recurrence
d) segment recurrence
the attention weights determined by a simple feed forward neural network are____
a) query
b) keys
c) values
d) all of the above
d) all of the above
_____ Traditional Methods predict the current token given previous “n” tokens, or predict the current token given all tokens after it.
a) Bidirectional
b) Masked language modeling
c) XLNet
d) BERT
a) Bidirectional
______Is A Neural Network architecture that can model bidirectional contexts in text data using transformer.
a) BERT
b) XLNet
c) MLM
d) PLM
a) BERT
A disadvantage of BERT is it corrupts the input with _______ and suffers from pretrain-finetune discrepancy.
a) Mask
b) PLM
c) MLM
d) All of above
a) Mask
XLNet Is the latest and greatest model to emerge from the booming field of natural language processing (NLP)
a) True
b) False
a) True
XLNet Is “Generalized”
a) True
b) False
a) True
The Attention Learning mechanism has changed the way we work with deep
learning algorithm
a) true
b) false
a) true
the advantage of transformers over recurrent sequence model is slower to train and run on Modern Hardware
a) true
b) false
b) false
Fields like NLP and Computer Vision have been revolutionized by the attention mechanism
a) true
b) false
a) true
Attention Learning is an Interface connecting the Encoder and Decoder that provides the Decoder with Information
a) true
b) false
a) true
the encoder LSTM or RNN units produce the words in a sentence one after another
a) true
b) false
b) false
The Encoder reads the input sentence and tries to make sense of it
a) true
b) false
a) true
The LSTM is supposed to capture the Long-Range dependency better than the RNN
a) true
b) false
a) true
RNNs Can’t remember longer sentences and sequences
a) true
b) false
a) true
If the Encoder makes a bad summary, the translation will be also bad
a) true
b) false
a) true
the Decoder is used to process the entire input sentence and decode it into a Context Vector
a) true
b) false
b) false
autoencoders belong to supervised neural networks
a) true
b) false
b) false
Bottleneck Is the most important part of the network
a) true
b) false
a) true
Convolution Autoencoders Can Do Image Reconstruction
a) true
b) false
a) true
Deep Autoencoder is composed of two, Symmetrical Deep-Belief networks
a) true
b) false
a) true
Deep Autoencoders can’t do image search
a) true
b) false
b) false
Sparse Autoencoders Offer us an alternative method for introducing an
information Bottleneck without requiring a reduction in the number of nodes
a) true
b) false
a) true
Sparse Autoencoders work by penalizing the activation of Neurons in input
layer
a) true
b) false
b) false
Autoencoders can De-Noise images
a) true
b) false
a) true
Autoencoders can’t be used to reduce dimensionality
a) true
b) false
b) false
the Encoder module that helps the network “decompress” the knowledge representations and reconstructs the data back from its encoded form
a) true
b) false
b) false
BERT (bidirectional encoder representation from transformers) is a recent paper published by researchers at Amazon AI Language?
a) true
b) false
b) false
BERT doesn’t read the text input sequentially?
a) true
b) false
b) false
BERT Allows Transform Learning on the existing pretrained models and hence can be custom trained for the specific subject.
a) true
b) false
a) true
In BERT, The relationship between all words in a sentence is Modeled
Irrespective of their position.
a) true
b) false
a) true
BERT uses unidirectional language model for producing word embedding.
a) true
b) false
b) false
BERT is not an open-source machine learning framework for NLP?
a) true
b) false
b) false
BERT not understand human language as it is spoken naturally.
a) true
b) false
b) false
BERT is expected to have large impact on voice search as well as text-based search.
a) true
b) false
b) false
same word can have multiple word embedding possible with BERT
a) True
b) False
b) False
BERT is a deep bidirectional, supervised language representation
a) true
b) false
b) false
pooling is an up-sampling operation that reduces the Dimensionality of the Feature Map.
a) true
b) false
b) false
The RELU operation is applied to each Pixel and replaces all the negative Pixel values in the Feature Map with Zero
a) true
b) false
a) true
Pooling Or Spatial Pooling Layers: Also Called Sub-Sampling
a) true
b) false
a) true
Pooling reduces the Dimensionality of each feature map by retaining the most important information
a) true
b) false
a) true
the aim of the fully connected layer is to use the low-level feature of the input mage produced by Convolutional and Pooling Layers
a) true
b) false
b) false
The Hyperparameters for a Pooling Layer are Filter Size, Stride and max or average Pooling
a) true
b) false
a) true
When we apply a filter of 1×1, then there is no reduction in the size of the image and hence there is no loss of information.
a) true
b) false
a) true
flattening means that every Neuron in the previous layer is connected to each Neuron in the next layer
a) true
b) false
b) false
RELU introduces linearity to the network, and the generated output is a
Rectified Feature Map
a) true
b) false
b) false
Convolutional Layer Receives a set of Input Feature Maps (IFM) and
generates a set of Output Feature Maps (OFM).
a) true
b) false
a) true
diffusion models work by destroying training data through the successive addition of Laplacian noise, and then learning to recover the data by reversing this noising process.
a) true
b) false
b) false
A Discrete Decoder is used to obtain Log likelihoods across Pixel values as the last step in the Reverse Diffusion process.
a) true
b) false
a) true
diffusion model is a latent variable model which maps to the latent space Sobel using a Fixed chain.
a) true
b) false
b) false
The goal of training a Diffusion model is to learn the reverse process
a) true
b) false
a) true
the transition distributions in the Markov chain are Gaussian, which depends only on the forward process.
a) true
b) false
b) false
Diffusion model is parameterized as a Markov chain, meaning that our latent variables x1, … xt depend only on the previous (or following) timestep.
a) true
b) false
a) true
for the reverse process in the Diffusion model, we much choose a variance schedule.
a) true
b) false
b) false
The transition distributions in the Markov chain are Gaussian, where the forward process requires a variance schedule, and the reverse process parameters are learned.
a) true
b) false
a) true
cascade Diffusion models (like Stable Diffusion) apply the Diffusion process on a smaller latent space for computational efficiency using a Variational Autoencoder for the up and down sampling.
a) true
b) false
b) false
Diffusion Models can be applied to image De-Noising, Inpainting, Super Resolution, and Image Generation.
a) true
b) false
a) true
Federated Learning is not used to improve the privacy and security of machine learning models.
a) true
b) false
b) false
Federated Learning requires the use of a centralized server.
a) true
b) false
b) false
Federated Learning can’t be used to train models on data that is distributed across multiple devices, such as Smartphones or IoT devices.
a) true
b) false
b) false
Federated Learning requires the use of a centralized database.
a) true
b) false
b) false
Federated Learning can’t be used to improve the privacy of machine learning models by keeping sensitive data on individual devices.
a) true
b) false
b) false
Federated Learning is a Type of machine learning that allows multiple parties to train a model without sharing their data.
a) true
b) false
a) true
Federated Learning requires participating devices to have high computational power.
a) true
b) false
b) false
Federated Learning enables Participants to train local models cooperatively on local data without disclosing sensitive data to a central cloud server
a) true
b) false
a) true
Federated Learning can’t be used to train deep learning models.
a) true
b) false
b) false
Federated Learning can be used to train models on data that is distributed across multiple devices in real-time.
a) true
b) false
a) true
In Sequential Data, the points in the dataset are dependent on the other points in the dataset.
a) true
b) false
a) true
A Timeseries is a common example of Sequential Data, with each point
reflecting an observation at a certain point in time.
a) true
b) false
a) true
the crucial element to remember about sequence models is that the data we’re working with are Independently and Identically Distributed (I.I.D.) samples.
a) true
b) false
b) false
Sequence models are the machine learning models that input or output sequences of data.
a) true
b) false
a) true
structured data includes text streams, audio clips, video clips and time-series data.
a) true
b) false
b) false
the conventional feedforward artificial neural networks can deal with sequential data and can be trained to hold knowledge about the past.
a) true
b) false
b) false
traditional RNNs are very excellent at capturing Long-Range dependencies.
a) true
b) false
b) false
LSTMs Are explicitly designed to avoid the Long-Term Dependency problem.
a) true
b) false
a) true
input gate controls what information should be forgotten.
a) true
b) false
b) false
input gate helps to Identify important elements that need to be added to the cell state.
a) true
b) false
a) true
RBMs are a supervised learning technique
a) true
b) false
b) false
RBM isn’t restricted to have only the connections between the visible and the
hidden units
a) true
b) false
b) false
RBM performs discriminative learning similar to what happens in a
classification problem
a) true
b) false
b) false
If number of visible nodes = nV, number of hidden nodes = nH, then number of connections in RBM = nV* nH
a) true
b) false
a) true
Boltzmann machines are non-deterministic generative deep learning models with 3 types of nodes: visible, hidden and output nodes
a) true
b) false
b) false
Boltzmann machines Fall into the class of unsupervised learning.
a) true
b) false
a) true
sparse Autoencoders introduces information Bottleneck by reducing the number of nodes at hidden layers.
a) true
b) false
b) false
The idea is to Encourage network to learn an Encoding and Decoding which only relies on activating a small number of neurons.
a) true
b) false
a) true
To implement Undercomplete Autoencoder, constrain the number of nodes present in hidden layer(s) of the neural network.
a) true
b) false
a) true
Autoencoders are not capable of learning nonlinear manifolds (a continuous, non-intersecting surface.)
a) true
b) false
b) false
A Neural Network with multiple hidden layers and Sigmoid nodes can form non-linear decision boundaries.
a) true
b) false
a) true
Neural Networks compute non-convex functions of their parameters.
a) true
b) false
b) false
For Logistic Regression, with parameters optimized using a Stochastic Gradient method, setting parameters to 0 is an acceptable initialization.
a) true
b) false
a) true
For arbitrary Neural Networks, with weights optimized using a Stochastic Gradient method, setting weights to 0 is an acceptable initialization.
a) true
b) false
b) false
Given a design matrix x ∈ r^(n×d) where d «_space;n, if we project our data onto a k dimensional subspace using PCA where k equals the rank of x, we recreate a perfect representation of our data with no loss.
a) true
b) false
a) true
hierarchical clustering methods require a predefined number of clusters, much like k-means.
a) true
b) false
b) false
Given a predefined number of clusters k, globally minimizing the k-means objective function is NP-hard.
a) true
b) false
a) true
a Random Forest is an ensemble learning method that attempts to lower the bias error of decision trees.
a) true
b) false
b) false
bagging algorithms attach weights w1…wn to a set of n weak learners. they
re-weight the learners and convert them into strong ones. boosting algorithms
draw n sample distributions (usually with replacement) from an original data set
for learners to train on.
a) true
b) false
b) false
using cross validation to select Hyperparameters will guarantee that our model does not overfit.
a) true
b) false
b) false
Bidirectionality is Achieved by a phenomenon called “Masked Language
Modeling”.
a) true
b) false
a) true
BERT Overcomes this shortcoming; in that it considers previous and next tokens to predict the current token.
a) true
b) false
a) true
XLNet is not the latest and greatest model to emerge from the booming field of natural language processing (NLP).
a) true
b) false
b) false
XLNet is not “generalized” because it captures Bidirectional context by means of a mechanism called “Permutation Language Modeling”.
a) true
b) false
b) false
XLNet is not a generalized Autoregressive model where next token is
dependent on all previous tokens
a) true
b) false
b) false
XLNet is the idea of capturing Bidirectional context by training an
Autoregressive model on all possible permutation of words in a sentence
a) true
b) false
b) false
XLNet Integrates the idea of Auto-Regressive models and bi-directional context modeling, yet overcoming the disadvantages of BERT
a) true
b) false
a) true
Autoregressive (AR) Language Modeling and Autoencoding (AE) have been the two most successful pretraining objectives.
a) true
b) false
a) true
There are proposed methods used in XLNet like background, objective: permutation language modeling.
a) true
b) false
a) true
For both BERT and XLNet, partial prediction plays a role of reducing
optimization difficulty by only predicting tokens with sufficient context.
a) true
b) false
a) true