Topic 2 Flashcards
Data Mining & Machine Learning: Introduction
Perceptron neurons
A perceptron takes several binary inputs, , and produces a single binary output
Weights
real numbers expressing the importance of the respective inputs to the output
Threshold value
neuron’s output, or , is determined by whether the weighted sum is less than or greater than some threshold value
Layer
The outputs of a first layer can feed a 2nd layer, and a 3rd layer a so on, creating more nuanced, abstract decisions.
Bias
bias = -threshold, a measure of how easy it is to get the perceptron to output a . Or to put it in more biological terms, the bias is a measure of how easy it is to get the perceptron to fire.
NAND gate
Any computation can be built using NAND gates, and perceptrons implement a NAND gate.
Input layer
input layer perceptrons are really special units which are simply defined to output the desired values,
Learning algorithms
can automatically tune the weights and biases of a network of artificial neurons. This tuning happens in response to external stimuli, without direct intervention by a programmer.
Sigmoid neuron
more tunable than perceptrons, small changes to inputs cause small changes to outputs; also called logistic neurons
Sigmoid function
sigma(z) = 1/ (1+ exp(-z))
Activation function
the general form of neural net functions, of which perceptrons and sigmoid neurons are examples
Input neurons
the neurons making up the input layer
Output neurons
the neuron(s) making up the output layer
hidden layer
layers between input and output layers
Multilayer perceptrons
or MLPs, another name for multiple layer networks
Feedforward neural networks
open loop design, all neurons feed in a single direction, no feedback loops
Recurrent networks
feedback loops are possible; neurons fire for some limited duration of time, before becoming quiescent
Cost function
quantifies how well algorithm is performing, goal is to minimize; also called “loss” or “objective” function
Quadratic cost function
Also called Mean Square Error (MSE) function, smooth, monotonic cost functions make tuning easier than discrete functions
Gradient descent algorithm
a minimization algorithm which seeks the minium by calculating derivatives (slope) and goes “downhill”
Learning rate
asdf
stochastic gradient descent
Much faster learning speed than full computation; select small number of randomly chosen training inputs and calculate cost and change in cost to get gradient of cost.
learning rate
a small, positive parameter used to define step-size, or how quickly the algorithm can move along the gradient
Mini-batch
random sample on inputs used in stochastic gradient descent
Epoch
complete computation of a training set
Validation set
data not used in training, to validate the algorithm hasn’t overfit and will work on unseen data
hyper-parameters
parameters not directly-selected by the algorithm, e.g. learning rate
deep-neural networks
many-layer structure - two or more hidden layers; a series of many layers, with early layers answering very simple and specific questions about the input image, and later layers building up a hierarchy of ever more complex and abstract concepts.