ML-Final Flashcards

1
Q

What is threshold logical unit

A

Simple model of a neuron

Each input value is multiplied with the corresponding weight value, and these weighted values are then summed.

If the weighted summed input is larger than a certain threshold value, then the output is set to one, and zero otherwise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is weight parameter

A

representing the ‘strength’ of a connection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a perceptron

A

where the output is calculated from the weighted summed input with a activation function(gain function, transfer function, output function, activation function. )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Give examples of gain function, transfer function, output function, activation function.

A

??? sigmoid, tanh

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why we add bias term to perceptron

A

A bias allows a perceptron to shift the prediction to better fit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Similarities of SVM and perceptron ?

A

Linear SVM is a special case of a perceptron

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the difference between Deep Learning and SVM?

A

SVM solve the optimization problem with specific transformations of the feature space.

Deep learning will aim at learning the appropriate transformations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is delta term(delta rule?)

A

δ = (y(i) − y)y(1 − y)

Delta rule is a gradient descent learning rule for updating the weights of the inputs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why a multilayer feedforward network is a universal function approximator

A

there is guaranteed to be a neural network so that for every possible input, (x) the value f(x) is output from the network

given enough hidden nodes, any functions can be approximated with arbitrary precision by these networks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is error-back-propagation or backpropagation

A

calculation of the gradient proceeds backwards through the network, gradient of the final layer of weights being calculated first and the gradient of the first layer of weights being calculated last

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Batch, mini-batch and online

A

online avoiding local minima, mini-batch large datasets. Batch high memory space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

No free lunch” theorem

A

no one model that works best for every problem.

The assumptions of a great model for one problem may not hold for another problem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is cross entropy? (The negative log probability) what is it used for ?

A

The negative log probability of the given label times the current model(probability distribution)

H(p,q) = − sum[ p(y) log q(y) ]

q: true nature of data
p: The neural network model represents the probability p(y|x; w)

Derive learning rule

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

KL-divergence, what is it equivalent to? What are they related to? What are they used in neural network ?

A

related to cross entropy
H(p, q) = H(p) + KL(p||q)

minimizing the cross entropy is equivalent to minimizing the KL-divergence

both are closely related to the maximum (log) likelihood principle

use to generate learning rule?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is softmax function, why and where does it used in neural network?

A

Softmax function is a generalization of the logistic function that “squeeze” the output in the range (0, 1)

It is used to highlight the largest values and suppress values which are significantly below the maximum value in a neural network.

final layer of a neural network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How neural networks are related to probabilistic regression?

A

cross entropy, KL-divergence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the relationship of maximize the log probability and cross entropy

A

That is, we want to maximize the log probability of the data given the labels. Since the cross entropy is the negative of this, maximizing the log probability of the data given the labels is equivalent of minimizing the cross entropy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is deep learning?

A

Deep learning basically refer to neural networks with many layers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a filter in CNN?

A

It is a vector describing a pattern

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is convolution?

A

Convolution is the operation of multiplying and adding while shifting the filter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is a stride in CNN?

A

It is how many steps you shift the filter in each iteration of the convolution

22
Q

What is a pooling operation in convolutional neural networks and why is this operation important?

A

Pooling is taking the average or the maximum of the previous output in a certain area of the filtered image.

It compress down the image and high-level representation.

This is usually called downsampling, this operation is important is because it reduce the dimensionality of features and computational cost .

Also helps to prevent overfitting.

23
Q

Briefly explain `dropout’ and why it is used in deep networks. 


A

Dropout: Randomly (e.g. p=0.5) ignoring hidden node for a specific input during learning. temporarily turned off

The reason we use it is that it is a regularization technique that helps to prevent overfitting.

24
Q

Sparse representation, comprised representation and fully distributed representation.

A

???

25
Q

What is autoencoder? Why it is useful?

A

An autoencoder is a neural network that tries to reconstruct its input.

it is a feature extraction algorithm it helps us find a representation for our data

26
Q

What is the relation between Ridge regression and a Gaussian prior?

A

Ridge regression use the L2 regularization, and the L2 regularization is equivalent to a Gaussian prior.

27
Q

What is batch normalization ?

A

Batch normalization: normalize the input to each hidden layer over each mini-batch

28
Q

What is skip connections in neural network?

A

the process to skip the convolutional layers in the network

29
Q

What is Recurrent Neural Networks and Where is the term ‘recurrent’ comes from?? What is used for ?

A

Recurrent Neural Networks perform the same task for every element of a sequence.

It used for sequence processing eg, for machine translation

30
Q

Explain what is backpropagation-though-time in RNN?

A

????

31
Q

What is Gated Neural Network?

A

A gated recurrent network has an extra memory state(namely gated) that will be carried from the current step to the next step.

A forgetting gate and a write gate can modify its value.

An example of such neural network would be LSTM (Long Short Term Memory ) or a gated recurrent unit (GRU) s

32
Q

What is Boltzmann machine ? What is the challenge of it?

A

Special form of recurrent network that the connections between nodes are symmetric

The challenge is finding practical training rules

33
Q

What is reinforcement learning (RL)?

A

A learning system with action and reward.

34
Q

In reinforcement learning, what is a policy? 


A

A policy in reinforcement learning is use to determine the action to take in each state.

35
Q

What are the RL challenges ?

A
  1. Credit assignment

2. Exploration versus exploitation trade-off.

36
Q

What is Markov condition ? or the Markov Decision Process? (same as transition function in RL)

A

transition function only depend on the previous state and the intended action from the previous state

37
Q

What is Reward function in RL?

A

rt+1 = ρ(st, at)

returns the value of reward when the agent is entering state st+1 by taking action at from state st

38
Q

What is Policy in RL?

A

A policy in reinforcement learning is use to determine the action to take in each state. Policy: at = π(st)

39
Q

Value function and Optimal Value function

A

Reward and disconnect reward
this functions tells us how good is action a in state s

Value function (state-action): Qπ(s, a) 
Value function (state): V π (s) = Qπ (s, π(s))
Optimal Value function: V ∗(s) = max Q∗(s, a),
40
Q

Optimal policy

A

Optimal policy: π∗(s) = arg max Q∗(s, a).

41
Q

What is Model-based Reinforcement Learning ?

A

we assume that the agent has a model of the environment and its behaviour by knowing the reward function ρ(s, a) and the transfer functions τ (s, a).

42
Q

What is Model-free Reinforcement Learning ?

A

???

43
Q

SARSA

A

??

44
Q

Q-learning

A

??

45
Q

Explain the difference between the SARSA and Q-Learning algorithm. 


A

SARSA is an on-policy approach of RL.

in the part where γ Q (st+1, at+1) we know that its use the previous policy to generate the next policy.

Namely State-Action-Reward-State-Action.

Q-learning is an off-policy approach in RL.

γ m a xaQ (si+1, a′) is the part that is different than ASRSA.
Here, we do not limit the how the next action is selected which means the policy generated in Q-learning is not depends on the previous policy.

46
Q

epsilon-greedy policy ?

A

??

47
Q

What is the difference between on-policy and off policy?

A

????

48
Q

basic Bellman equation?

A

???

49
Q

What can we learn about SARSA and Q-Learning ?

A

SARSA will avoid the mistake due to exploration, and Q- learning still have the ability to learn with different exploring policy.

50
Q

What is reward function in RL? What is transfer function ?

A

reward function ρ(s, a) and the transfer functions τ (s, a).

51
Q

What is non-Markovian condition ?

A

non-Markovian condition would be the case in which the next state depends on a series of previous states and actions

52
Q

Temporal difference ?

A

ggggg