simple feed-forward neural networks (info traverses graph in only one direction - fully-connected - can learn more complex relations from data than single perceptrons, each layer adds NON-LINEARITIES that increase the model's capacity - modern MLPs utilize additional layers and other non-linear activation functions that support the learning process

08_neural networks Flashcards by Annina Vietze

How do kNN, linear models and tree-based models really learn?

not iteratively

knn: computed distances and compares distribution of unseen data points with distribution of seen data points

linear: fitted to seen data based on the task

tree-based: identify and memorize patterns relevant to the task

How well did you know this?

Not at all

Perfectly

With which three components does the human brain work?

Neurons (nerve cells)

Dendrites (connects neurons)

Axons (long-distance connections)

–> neurons are inter-connected forming a dense network

How well did you know this?

Not at all

Perfectly

How is information passed through neurons in the human brain?

through electrical signals

connected neurons absorb the incoming signals and process them. some of them will fire, but not all.

–> cascade of signals

How well did you know this?

Not at all

Perfectly

What do we need for neural networks to represent the deep cascade of the layers of neurons in a human brain?

input data, which is processed in its hidden layers, and generates output data

How well did you know this?

Not at all

Perfectly

What is a fully connected network?

a neural network where each neuron is connected to all neurons in the previous layers and all the neurons in the following layer

How well did you know this?

Not at all

Perfectly

How can a fully connected neural network be characterized?

number of layers (depth)
number of neurons in each layer
number of input variables (= number of neurons in the first layer)
number of output variables (= number of neurons in the final layer)

How well did you know this?

Not at all

Perfectly

How does a fully connected neural network work?

vectorial input data provided to the network, one value per neuron in the input layer
all inputs are seen by each neuron in the underlying layer
each neuron will process the incoming info, firing (1) under some conditions and (0) otherwise
repeat
output is generated in the final layer

How well did you know this?

Not at all

Perfectly

How does a neural network act in general terms?

acts as a function approximator

any mathematical function can be approcimated

How well did you know this?

Not at all

Perfectly

Can we implement artificial neural networks to learn specific tasks?

through connectionism, everything is connected with everything

How well did you know this?

Not at all

Perfectly

What are two problems we have to solve before we can implement artificial neural networks?

1) how to implement neurons?

2) how to train the network?

How well did you know this?

Not at all

Perfectly

How does a general neuron work?

number of inputs might differ from number of outputs - what is the function?

takes in a vector of values, processes them and returns a binary signal based on its learned behavior, which is then passed on to all neurons in the following layer

How well did you know this?

Not at all

Perfectly

What is part of the function of a perceptron?

input variable x,
weight w
bias value b

–> if the resulting value is greater zero, perceptron neuron fires, otherwise not

step function is called activation function: introduces non-linearity into the output of the perceptron

How well did you know this?

Not at all

Perfectly

What can a single perceptron be considered as?

a linear classifier

How well did you know this?

Not at all

Perfectly

How do we train a perceptron?

perceptron learning rule, weights are adjusted by a step size that is called the LEARNING RATE

by iteratively running this algorithm over training data multiple times, weights can be learned so that the model perform properly

How well did you know this?

Not at all

Perfectly

What is a major limitation of individual perceptrons?

inability to reproduce a logical exclusive-or (XOR) function!

bc are simply linear functions

multi-layer perceptrons concatenate layers of perceptrons, which makes them much more powerful

How well did you know this?

Not at all

Perfectly

What does MLP stand for?

multi-layer perceptron

How well did you know this?

Not at all

Perfectly

What are MLPs?

simple feed-forward neural networks (info traverses graph in only one direction

fully-connected
can learn more complex relations from data than single perceptrons, each layer adds NON-LINEARITIES that increase the model’s capacity
modern MLPs utilize additional layers and other non-linear activation functions that support the learning process

How well did you know this?

Not at all

Perfectly

What is the function behind a neuron?

x * w + b > 0

How well did you know this?

Not at all

Perfectly

What do artificial neurons compute?

Study These Flashcards

dot-product between input vectors and learned weights
and produce an output signal that propagates through all deep layers

What is a perceptron?

Study These Flashcards

simple artificial neuron that produces a binary output

What is a multi-layer perceptron?

Study These Flashcards

an early fully-connected neural network

What does an activation function do?

Study These Flashcards

defines when a neuron “fires”

Non-linearity increases in the model’s capacity

What is a simple step function?

Study These Flashcards

g(x) = geschwungene Klammer entweder 1 if <condition> or 0 else</condition>

to define whether a neuron fires or not

What are advantages and disadvantages of the step function?

Study These Flashcards

+ simple to implement

+ computationally inexpensive

only binary (discrete) output
no gradient

What is the sigmoid function?

o(x) = exp(x) / 1 + exp(x)

What are advantages and disadvantages of the sigmoid function?

+ continuous non-linear function + gradient defined - asymmetric output value range [0, 1] - computationally expensive

What is the tanh function?

tanh (x) = sinh (x) / cosh (x)

What are advantages and disadvantages for the tanh function?

+ continuous non-linear function + gradient defined - symmetric output value range [-1, 1] - computationally expensive

What is the ReLu function?

rectified linear unit function ReLU(x) = geschwungene Klammer x if x>0; = else

What are advantages of the ReLU function?

+ continuous non-linear function + gradient defined, and simple to compute + computationally inexpensive

Why is it important for the activation function to be differentiable?

we need the gradient to be computable. therefore, the step function is not a good choice as it has no gradient

Why is the ReLU used most often?

Sigmoid, Tanh and ReLU roughly lead to similar results, but the ReLU is computationally the most efficient

What should a good activation function be?

continuously differentiable non-linear computationally inexpensive

What enables deep neural networks to learn complex tasks?

the non-linearity of activation functions

What is the Least squares fitting in linear regression?

a convex optimization problem: there is only one solution to the problem and it is per definition the best solution

How do we modify the neural networks weights to reduce the loss?

- random changes (possible but not very goal-oriented) - backpropagation (we check for every single weights how changing it would affect the loss)

How can we modify each individual weight parameter?

based on computed gradients wi = wi - alpha upsidedowntriangle wi

What is a learning rate?

alpha (definition of step size for the modifications to the weights)

What is stochastic gradient descent?

iterative process, depends on the random selection of mini-batches following the gradients in the weight space to the lowest loss value --> allows us to find the minimum of the loss in an iterative process

What happens if we use a small learning rate?

it will take a long time to reach the global minimum, we could also possibly get stuck in a local minimum

What happens if we use a large learning rate?

it is possible that we miss the global minimum, also convergence is unlikely

How do neural networks learn?

learn patterns from data to perform specific tasks early layers extract low-level signals with spatial significance later layers interpret these signals and provide semantic significance --> end-to-end learning

What does Stochastic gradient descent (SGD) do?

it uses the gradients computed with backpropagation to update network weight parameters iteratively to reduce the model's loss

What is key to a meaningful training process in neural networks?

ability to compute the gradient of the loss function with respect to every single network weight parameter this is achieved through a process called backpropagation

What is the neural network training pipeline?

1) sample batch (input data x and target data y) from training dataset 1 epoch: - evaluate model on batch input data (prediction) in forward pass - compute loss on prediction and target y - compute weight gradients with backprop. - modify weights based on gradients and learning rate - repeat for all batches 2) repeat for a number of epochs, monitor training and validation loss + metrics 3) stop before overfitting sets in

What do you see in the curves of the training and the validation loss in well-trained neural network models?

If the validation loss sinks less fast than the training loss but still does not go up after some iterations, the model is well-trained

08_neural networks Flashcards

(46 cards)