deck_15595778 Flashcards

Question 1

Q

What is a perceptron?
What does it do?

Answer

A

an artificial neuron that can be used for binary classification
it receives an input signal through weighted connections, sums up the total of those inputs to compute its activation level, and “fires” by outputting a 1 if the total input exceeds a given threshold. Otherwise, it outputs a 0.

Question 2

Q

What happens during training of a perceptron?

Answer

A

During training, a series of small adjustments is made to the connection weights and threshold using the perceptron’s learning rate.

Question 3

Q

Where do the inputs to the perceptron in a classification task come from?

Answer

A

the inputs are the feature values, and the output is the classification label

Question 4

Q

What happens during the training phase if the output of the perceptron is wrong?

Answer

A

the threshold and the weights are adjusted according to the learning rate

Question 5

Q

Output was 0, target was 1
what happens to the threshold and weights?

Answer

A

raise the threshold
lower the weights
ignoring the inputs that are 0

Question 6

Q

Output was 1, target was 0
what happens to the threshold and weights?

Answer

A

lower the threshold
raise the weights
ignoring the inputs that are 0

Question 7

Q

What type of classifier is a perceptron?

Answer

A

Linear classifier
They create a straight-line decision boundary in the feature space.
They will only succeed (converge) if the data is linearly separable.

Question 8

Q

What does it mean to train a perceptron

Answer

A

finding the coefficients for a linear equation
coefficients are the connection weights

Question 9

Q

What are the differences between the Perceptron learning algorithm and Stochastic Gradient Descent

Answer

A

both can be used for classification
SGD finds an optional solution based on a loss function that aims to ‘center’ the decision boundary between the classes
the Perceptron learning algorithm will find a solution if it exists, but not necessarily the best solution

Question 10

Q

what is the Perceptron Learning Algorithm

Answer

A

initialize the weights (ws), threshold (t), and learning rate (lr)
repeat until done
for each training example (xs, target)
compute perceptron output using sum(ws * xs) > t?
if output < target
for each (w, x) in (ws, xs)
w = w + x * lr
t = t – lr
else if output > target
for each (w, x) in (ws, xs)
w = w - x * lr
t = t + lr

Question 11

Q

Initialize the Weights and Threshold of a perceptron
why is starting with random weights useful in a multi layer perceptron

Answer

A

is useful because depending on where you start, you might converge on a better or worse solution.

Question 12

Q

When do we stop repeating the perceptron Learning Algorithm

Answer

A

after a set number of epochs, or
when the perceptron reaches a high level of accuracy, or
when the perceptron hasn’t improved its accuracy in a while, or
some other method or combination of methods.

Question 13

Q

when adjusting the weights of a perceptron Learning Algorithm, why do we normalize the data?

Answer

A

normalize the data before training so that the weights are all adjusted at about the
same rate.

Question 14

Q

what is Randomized Presentation in a perceptron Learning Algorithm

Answer

A

you can randomize the presentation order of each example during an epoch instead of presenting them in the same order.
This can stop the network getting stuck in a suboptimal solution (especially with more complex Multi-Layer perceptrons)

Question 15

Q

what is Batch Learning in a perceptron Learning Algorithm

Answer

A

instead of updating after every example, you can compute the output of the entire batch of examples in the training set, and then update the weights only once per epoch based on the outputs that were
wrong
- this lets you write very short code with numpy and might prevent the network getting stuck in a suboptimal solution

Question 16

Q

What is the bias of a perceptron

Answer

A

a parameter called bias is a weighted connection to an input that is always set to 1 for every example, then it is updated along with the other weights.
Mathematically, using a bias is the same as using an adjustable threshold
the bias is the same as the negation of an adjustable threshold

Question 17

Q

How can you overcome the limitation that a perceptron can only converge on a solution if the data is linearly separable?

Answer

A

use a multi layer perceptron

Question 18

Q

What is a multi-layer perceptron (MLP)?

Answer

A

A MLP is a network of artificial neurons in layers. An individual neuron is similar to a perceptron

Question 19

Q

What does each neuron do in a MLP?

Answer

A

accumulates input from the weighted connections and produces an output using an activation function

Question 20

Q

Why are MLPs sometimes referred to as feedforward networks

Answer

A

because activation always goes in one direction from input layer to output layer

Question 21

Q

What are MLPs with larger number of hidden layers called?

Answer

A

Deep networks
Training a deep network is referred to as deep learning

Question 22

Q

What makes a network full connected (or dense)

Answer

A

each unit is connected to every neuron is the previous and next layers

Question 23

Q

Can multi layer perceptron classifiers learn a decision boundary of any shape?

Answer

A

Only if you have the right configuration of hidden layers, the right activation function, luck in choosing the random starting weights, and enough time / computational power to complete the training
if the configuration isn’t right, it might never converge

Question 24

Q

What is the limitation of the output neurons of an MLP?

Answer

A

they’re limited to linear combinations of the output from the previous layer.

Question 25

Q

What does it mean if an MLP has to classify data that is not linearly separable?

Answer

A

the hidden layers must be performing computations on the inputs that yield a new, linearly separable, representation of the problem to present at the output layer.
Hidden layers operate a bit like the kernel functions of SVMs

Question 26

Q

Why can’t we use the perceptron rule to train MLPs?

Answer

A

the perceptron rule is based on the difference between the actual and target output but we don’t know in advance what the output should be when it comes to the hidden units, so we can’t use that method

Question 27

Q

What is backpropagation?

Answer

A

a method used in neural networks to update the weights of neurons by propagating the error signal from the output layer back to the input layer through hidden layers

Question 28

Q

What is the most popular way to train weights for backpopagation?

Answer

A

stochastic gradient descent with backpropagation.

Question 29

Q

what is the problem of local minima

Answer

A

because networks are initialized with small random values for the connection weights, each run might result in a better or worse state
\
\ /\ /
\__/ \ /
local min \ /
_/ this is the global minimum

your network can get stuck in a suboptimal state (local minimum) and fail to converge to an optimal state
- always train several networks with the same architecture and choose the network with the best performance

Question 30

Q

Why does the threshold function not have a derivative?

Answer

A

the threshold function does not have a derivative because it’s not sloped
(derivative is 0, except on the break where there is no derivative)

Question 31

Q

What are the activation functions of an MLP

Answer

A

Logistic sigmoid
outputs values between 0 and 1
Hyperbolic tangent (tanh) function
outputs values between -1 and 1
Rectified Linear (RELU)
𝑓(𝑥) = max⁡(0, 𝑥)
if activation value is negative, make it 0.
if activation value is positive, use it as it is.

Question 32

Q

Explain Non-Binary Classification using an MLP

Answer

A

To use an MLP for Non-Binary Classification:
- Add an output node for every class
- Add softmax layer that scales the float numbers to add up to 1, and regards the values as probability, so whichever has max probability, that will be the class
e.g., 0.2 for class A, 0.5 for class B, 0.3 for class C
Label is class B

Question 33

Q

Explain Regression using an MLP

Answer

A

Perceptrons can’t be used for regression because they always output a 0 or 1.
But with MLP you can have a single output unit with the identity activation function: 𝑓(𝑎) = 𝑎.
the output layer sums up all the outputs from the previous layer
the output is compared to the target and training happens much the same way

Question 34

Q

What are the hyperparameters you can use when training a multi-layer perceptron

Answer

A

Feature Scaling:
- MLPs are sensitive to feature scale
- recommendation is to scale all the data between -1 and 1

Hidden Layer Configuration:
Consider
1. look at the output of each hidden
neuron as an interim classification computed along the way and learn from these interim classifications. (Usually won’t do this step)
2. If a layer has more neurons than the one before, it can transform the input by adding new meta features (you’re increasing the dimensionality by adding more neurons to a layer than the previous layer, in higher dimension it’s possible to get a linear classification.)
3. If a layer has fewer neurons than the layer before, then it will send less information forward to the next layer. If some features are redundant, or if it’s useful to combine features, this can be a good thing. Otherwise, it might hurt performance. (trying to extract the most important info from the previous outputs / combine some of the features and reducing the dimension of the previous layers. sometimes not a good idea to reduce dimensions, you might lose info, hurt performance. if you reduce too quickly, you’ll get a bottleneck in the neural network (increase gradually and decrease gradually))
4. more layers means more epochs needed to train the network
5. An approach that often works well is to start with a large hidden layer (bigger than the input layer) and then slowly reduce the number of units in each layer until you get to the output.
6. When considering the size of the next layer, think in multiples of the previous layer. For
example, if you have 1000 inputs in the previous layer, maybe consider increasing by 50% (1500)
or decreasing by 25% (750). But don’t decrease suddenly from 1000 to 10 units (99% reduction)
– this might make it hard to learn.

Activation Function:
- ReLu, tanh, sigmoid, linear (need to experiment with them)

Learning Rate:
- Larger values might lead to faster convergence. Smaller values might yield higher accuracy but are more likely to get stuck in a local minimum
- usually start with a value like 0.001 (don’t use too small or too big of a step) - need to experiment with this
- if your model does not converge, it’s jumping around a lot, so reduce the learning rate
- best choice for learning rate is to use adaptive learning rate which is dependent on the loss function

Batch size and shuffling:
- With a larger batch size, you accumulate error signal over a larger number of examples before adjusting the weights
- With a smaller batch size, the weights jump around a lot more, and this can help you avoid getting stuck in a local minimum.
- larger batch size can lead to faster learning.
- shuffle data between each epoch - helps avoid getting stuck in a local minimum

Stopping Condition:
- max number of epochs (max_iter in SKlearn)
- when error rate is only changing by a small amount (tol in SKlearn)

Regularization:
- Regularization refers to a set of mathematical techniques applied to the backpropagation algorithm to avoid overfitting.
- tries to prevent weights from getting too specific to the training data (alpha in SKLearn)

Type of Backpropagation:
- stochastic gradient descent with backpropagation
- solver parameter in SKlearn

Question 35

Q

What is Deep Learning

Answer

A

Deep learning is machine learning that involves very large datasets and deep neural networks with many hidden layers.
Architectures for deep learning: multi-layer perceptrons (MLPs), recurrent neural networks (RNNs), generative adversarial networks (GANs), convolutional neural networks (CNNs), and transformers.

Question 36

Q

What is transfer learning and why do we use it

Answer

A

Training a deep neural network is expensive so developers will opt to fine-tune a network that has already been partially or fully trained for a standard learning task, hoping that the learning from that original task can be re-purposed, or “transferred”, to the task they’re interested in

Question 37

Q

What technical developments have enabled the boom in deep learning

Answer

A

ACCESS TO BIG DATASETS
- Deep learning networks require massive amounts of data to train effectively - we now have access to freely available text data on the web and social media
ACCESS TO FAST HARDWARE
Graphics Processing Units (GPUs), with their large onboard memory caches and massively
parallel architectures, can be repurposed for training neural networks.
TECHNICAL DEVELOPMENTS
solution to the vanishing gradients
problem (new approach to activation function (like ReLU), new approaches to connection weight initialization, Batch Normalization ( input to each layer is rescaled prior to processing)etc)

Question 38

Q

Supervised Learning vs Unsupervised Learning

Answer

A

Supervised learning - set of training data consisting of sets of features associated
with correct outputs (either class labels or numerical values). The task of the learning algorithm was to find a model that could do a good job of predicting the correct outputs for previously unseen examples
Unsupervised learning - set of data containing lists of feature values, but there is no “correct” output given and no solved examples to give to the learner. You don’t know exactly what you
are looking for, but you do know what sort of thing you are looking for

Question 39

Q

What do all approaches to automatic clustering have in common

Answer

A

there is always a distance calculation
involved at the heart of all automatic clustering approaches

Question 40

Q

Advantage / Disadvantage of k-Means Clustering

Answer

A

Advantage: easy to implement
Disadvantages: slow on large data sets, does not always find the optimal solution on the first run (you must do multiple runs and take the best)

Question 41

Q

How does the k-Means algorithm work?

Answer

A

based on finding the centroids of a set of clusters
chooses k centroids randomly, then assigns every point in the sample data to it’s closest centroid
adjusts the centroids by computing the mean of the feature values in each cluster - those mean values become the new centroids
then reassigns every point to the new closest centroid
this process repeats until there are no further improvements to be had

Question 42

Q

How do you measure cluster quality

Answer

A

inertia or the Sum Squared Error
SSE is measuring the distance between each point and its assigned centroid, square the results, and add them up.
Squaring the results is to emphasize points that are further from the cluster

Question 43

Q

How can you choose centroids

Answer

A

random point within the feature space
random from one of the data points
analyze from dataset and choose

Question 44

Q

How does k-Means clustering make predictions

Answer

A

compute the distance of the new data point to each centroid - the closest centroid represents the cluster to which the new data point is assigned

Question 45

Q

How do you find the right k value for k-Means clustering?

Answer

A

Graph the SSE for different values of k and look for the elbow in the graph - that is the right value for k
if there is no clear elbow you’ll have to think carefully about the particular problem you are
trying to solve and what you want to get out of the clustering algorithm.

Question 46

Q

What are the parameters in SKLearn for k-Means clustering?

Answer

A

Initializing the Centroids
- randomly or default is ‘smart’ where initial centroids are nicely spaced out
Number of Runs
- default will perform 10 runs and remember the best solution from those runs
Stopping Condition
- tolerance factor (tol). If the improvement in inertia is less than tol it stops
- max_iter - stops after a certain number of iterations
Verbosity
- controls the verbosity level of the log output that the algorithm generates during operation

Brainscape's Knowledge GenomeTM

deck_15595778 Flashcards

Brainscape's Knowledge Genome^TM