College 1 Flashcards by Marlotte Pannekoek

what are the types of artificial neurons?

perceptron

- sigmoid neuron

How well did you know this?

Not at all

Perfectly

Finish the sentence:

The perceptron takes several ..(1).. inputs and produces a single ..(2).. output.

binary

2. binary

How well did you know this?

Not at all

Perfectly

What determines whether the perceptron neuron’s output is 0 or 1?

The neuron’s output, 0 or 1, is determined by whether the weighted sum is less than or greater than some threshold value.
w * x + b > 0, output = 1

How well did you know this?

Not at all

Perfectly

What’s the difference between a perceptron and a sigmoid neuron?

With perceptrons, a small change in the weights or bias of any single perceptron in the network can sometimes cause the output of that perceptron to completely flip, say from 0 to 1. Sigmoid neurons are modified so that small changes in their weights and bias cause only a small change in their output.
Just like a perceptron, the sigmoid neuron has inputs, x1, x2, … But these inputs can also take on any values between 0 and 1.
The output is not 0 or 1. Instead, it’s σ (w ⋅ x + b), where σ is called the sigmoid function or the logistic function. (If you want a binary output, you can for example decide to interpret <0.5 as 0.)

How well did you know this?

Not at all

Perfectly

Define: multilayer perceptron (MLP)

Network with an input layer, multiple hidden layers and an output later are sometimes called multilayer perceptrons (MLPs), despite being made up of sigmoid neurons, not perceptrons.

How well did you know this?

Not at all

Perfectly

Define: feedforward neural networks

Neural networks where the output from one layer is used as input for the next layer,

How well did you know this?

Not at all

Perfectly

Define: recurrent neural network

Artificial neural networks in which feedback loops are possible.
The idea in these models is to have neurons which fire for some limited duration of time, before becoming inactive. That firing can stimulate other neurons, which may fire a little while later, also for a limited duration.
That causes still more neurons to fire, and so over time we get a cascade of neurons firing.
Loops don’t cause problems in such a model, since a neuron’s output only affects its input at some later time, not instantaneously.

How well did you know this?

Not at all

Perfectly

Define: cost / loss / objective function

A cost / loss / objective function quantifies how well our algorithm finds weights and biases so that the output from the network approximates y(x) for all training inputs x.

How well did you know this?

Not at all

Perfectly

How does gradient descent work?

You want to find a point where the cost function C achieves it’s global minimum.
We try this by randomly choosing a starting point and computing derivatives. In practice we compute the gradients seperately for every training example and average them.
We decide the direction of the step by choosing the direction which will lead to the largest immdediate decrease of C (defined as the vector of partial derivatives)
The size of the step is dependent on the learning rate.
We take the step and start computing derivatives again.

How well did you know this?

Not at all

Perfectly

what is the difference between plain gradient descent and stochastic gradient descent?

Stochastic gradient descent can speed up learning
SGD picks out a randomly chosen mini-batch of training inputs.
The true gradient ∇C is estimates by computing the gradient for each input in the mini-batch and averaging over this small sample.
This is is repeated until all inputs are exhausted, which is said to complete an epoch in training. Then we start another epoch.

How well did you know this?

Not at all

Perfectly

Define: online / incremental learning

SGD with a minibatch of size 1.

How well did you know this?

Not at all

Perfectly

What does the the back propagation algorithm do?

the backpropagation algorithm is a fast way of computing the gradient of the cost function.

How well did you know this?

Not at all

Perfectly

Explain the relation between:

deep learning
representation learning
machine learning
AI

Deep learning is a kind of representation learning, which is in turn a kind of machine learning, which is used for many but not all approaches to AI.

How well did you know this?

Not at all

Perfectly

Define: Knowledge base

approach to AI

Achieve AI by hard-coding knowledge about the world in formal languages. A computer can reason automatically about statements in these formal languages using logical inference rules.

How well did you know this?

Not at all

Perfectly

Define: Machine learning

The ability to acquire their own knowledge, by extracting patterns from raw data. Simple machine algorithms depend heavily on the representation of the data they are given. Each piece of information included in the representation of the patient is known as a feature. Many artificial intelligence tasks can be solved by designing the right set of features to extract for that task, then providing these features to a simple machine learning algorithm.
- E.g. logistic regression, naïve Bayes

How well did you know this?

Not at all

Perfectly

Define: representation learning

An approach to use machine learning to discover not only the mapping from representation to output but also the representation itself.
- E.g. shallow autoencoders

How well did you know this?

Not at all

Perfectly

Define: shallow auto encoders

An autoencoder is the combination of an encoder function, which converts the input data into a different representation, and a decoder function, which converts the new representation back into the original format.

How well did you know this?

Not at all

Perfectly

Define: Deep learning

Deep learning represents the world as a nested hierarchy of concepts, with each concept defined in relation to simpler concepts, and more abstract representations computed in terms of less abstract ones. It is the study of models that involve a greater amount of composition of either learned functions or learned concepts than traditional machine learning does.
Deep learning resolves a problem by breaking the desired complicated mapping into a series of nested simple mappings, each described by a different layer of the model. The input is presented at the visible layer, so named because it contains the variables that we are able to observe. Then a series of hidden layers extracts increasingly abstract features from the image. These layers are called “hidden” because their values are not given in the data; instead the model must determine which concepts are useful for explaining the relationships in the observed data.
- E.g. MLPs

How well did you know this?

Not at all

Perfectly

Define: feedforward deep network, or multilayer perceptron (MLP).

Study These Flashcards

A multilayer perceptron is just a mathematical function mapping some set of input values to output values. The function is formed by composing many simpler functions. We can think of each application of a different mathematical function as providing a new representation of the input.

Define: Depth of a neural network

Study These Flashcards

The depth of the computational graph:
The first view is based on the number of sequential instructions that must be executed to evaluate the architecture. We can think of this as the length of the longest path through a flow chart that describes how to compute each of the model’s outputs given its inputs.
The depth of the probabilistic modeling graph
The second view is based on the graph describing how concepts are related to each other

Both are dependent on the choice of sets of smallest element from which to construct the graphs.

What are the three historical periods in deep learning?

Study These Flashcards

Cybernetics (1940s – 1960s):
Connectionism / parallel distributed processing (1980s – 1990s):
Deep Learning (2006 – now):

What are the characteristics of the Cybernetics period?

Study These Flashcards

Development of theories of biological learning
Simple linear models that take a set of input values, learn a set of weights and compute their outputs.
Models based on the f(x, w) used by the perceptron and ADALINE are called linear models. Linear models have many limitations. Most famously, they cannot learn the XOR function, where f([0,1], w) = 1 and f([1,0], w) = 1 but f([1,1], w) = 0 and f([0,0], w) = 0.

What are the highlights of the Cybernetics period?

Study These Flashcards

o McCulloch-Pitts neuron (1943)
- This linear model could recognize two diﬀerent categories of inputs by testing whether f(x, w) is positive or negative. Weights were set by human operator.
o Perceptron (1958)
- The first model that could learn the weights that defined the categories given examples of inputs from each category.
o The adaptive linear element (ADALINE) (1960)
- Simply returned the value of f(x) itself to predict a real number and could also learn to predict these numbers from data
- The training algorithm used to adapt the weights of the ADALINE was a special case of an algorithm called stochastic gradient descent.

What is the highlight of connectionism / parallel distributed processing?

Study These Flashcards

Central idea = a large number of simple computational units can achieve intelligent behavior when networked together. This insight applies equally to neurons in biological nervous systems as it does to hidden units in computational models.

what are the highlights of connectionism / parallel distributed processing?

o Distributed representation (1986) - This is the idea that each input to a system should be represented by many features, and each feature should be involved in the representation of many possible inputs. o Backpropagation (1986) - To train a neural network with one or two hidden layers. o Long short-term memory (LSTM) network (1997) - Recurrent neural network to resolve mathematical difficulties in modeling long sequences. Now they are used to model relationships between sequences and other sequences rather than just fixed

What are the highlights of Deep Learning?

o Deep belief network (2006) - Neural networks can be trained efficiently using greedy layer pretraining. o Neural Turing machines (2014) - Neural networks that learn to read from memory cells and write arbitrary content to memory cells and can learn simple programs from examples of desired behavior. o Reinforcement learning - An autonomous agent must learn to perform a task by trial and error, without any guidance from the human operator.

Define: unsupervised learning

Unsupervised learning is modeling the underlying or hidden structure or distribution in the data in order to learn more about the data. Unsupervised learning is where you only have input data and no corresponding output variables.

Define: supervised learning

Supervised learning is simply a process of learning algorithm from the training dataset. Supervised learning is where you have input variables and an output variable and you use an algorithm to learn the mapping function from the input to the outpu

Define: accuracy

Accuracy = Number of correct predictions / Total number of predictions

Define: precision

TP / TP + FP

Define: recall

TP / TP + FN

Define: MAE

average of all absolute errors

When was the perceptron invented and by whom?

1958, Frank Rosenblatt

When was backpropagation invented and by whom?

1982, Paul Werbos 1986, David Rumelhart, Geoffrey Hinton and Ronald Williams 1989, Yann Lecun

What were important advances in big data?

'The cat experiment' - 2012, Andrew Ng Train a 9-layered NN with 1 billion connections on a large dataset of 10 million images. The NN is trained using model parallelism on a cluster with 1000 machines (16000 cores) ImageNet - since 2009, Fei-Fei li An image database organised according to the WordNet hierarchy, in which each node of the hierarchy is depicted by hundreds and thousands of images. Training on image and video processing at large scale.

What is AlexNet?

Deep Convolutional Neural Networks trained on ImageNet using GPU's (2012)

Explain the difference between a CPU and a GPU

CPU (multiple cores) - Core 1,2,3,4 - Cache - System memory GPU (Hundreds of Cores) - Cores - Device Memory

Name some DL frameworks

- Theano, - TensorFlow - Keras - Torch + caffe2 = PyTorch

What are the shapes of: - a scalar - a vector - a matrix - a tensor

- scalar: 1 x 1 - vector: 1 x n - matrix: n x m - tensor: n x m x c x......

Multiply matrix: A 1,1 - A 1,2 - A 1,3 A 2,1 - A 2,2 - A 2,3 with vector: X 1 X 2 X 3

``` y1 = A 1,1 X 1 + A 1,2 X 2 + A 1,3 X 3 y2 = A 2,1 X 1 + A 2,2 X 2 + A 2,3 X 3 ```

``` Transpose: v1 v2 v3 v4 ```

v1, v2, v3, v4

Transpose: A 1,1 - A 1,2 - A 1,3, - A 1,4 A 2,1 - A 2,2 - A 2,3 - A 2,4 A 3,1 - A 3,2 - A 3,3 - A 3,4

A 1,1 - A 2,1 - A 3,1 A 1,2 - A 2,2 - A 3,2 A 1,3 - A 2,3 - A 3,3 A 1,3 - A 2,3- A 3,3

Add: A 1,1 - A 1,2 A 2,1 - A 2,2 to: B 1,1 - B 1,2 B 2,1 - B 2,2

A 1,1 + B 1,1 - A 1,2 + B 1,2 | A 2,1 + B 2,1 - A 2,2 + B 2,2

if you multiply a matrix of shape (3, 2) by a matrix of shape (4, 3) what is the resulting shape?

(4,2)

What's the requirement for matrix multiplication?

The number of columns in matrix A must be equal to the number of rows in matrix B.

What are the properties of matrix multiplication?

●A(B+C) = (AB) + (BC) ●A(BC) = (AB)C ● AB is not equal to BA ● (AB) ^T = B^T A^T

Element-wise product: What is the Hadamard product for two matrices: C = A dot B

Cij = AijBij

College 1 Flashcards

(47 cards)