Class 8 Flashcards
deep learning
broad family of techniques for ML in which the hypotheses take the form of a complex algebraic circuit with tunable connection strengths
neural networks
networks trained by deep learning methods
feedforward network
neural network with connections only in 1 direction – forms a DAG with designated input and output nodes
recurrent network
neural network that feeds its intermediate or final outputs back into its own inputs
universal approximation theorem
states that a network with just 2 layers of computation, 1st = nonlinear and 2nd = linear, can approximate any continuous function to an arbitrary degree of accuracy
activation function
first layer in a network, the nonlinear one
relu
rectified linear unit
softplus
smooth version of ReLU
vanishing gradient
error signals are extinguished as they are propagated back through the network
automatic differentiation
applies rules of calculus in a systematic way to calculate gradients for any numeric program
one hot encoding
non-numeric attributes (think strings) given a numeric expression
convolutional neural network
neural network that contains specially local connections
kernel
pattern of weights that is replicated across multiple local regions
convolution
process of applying the kernel to the pixels of the image
stride
size of the step that the kernel takes across an image
receptive field
part of a neuron where the sensory input can affect the neurons activation
pooling layer
layer in a neural network that summarizes a set of adjacent units from the preceding layer with a single value
downsampling
process of making the stride larger – coarsens the resulting image
tensors
multidimensional arrays of any dimension
feature map
output activations for a given filter – created by tensors
channels
dimension of the matrix information that carries information about features
residual networks
neuralnetworks that avoid the problem of vanishing gradients by building a very deep network, has skip connections
batch normalization
improves rate of convergence on SGD by rescaling the values generated by the internal layers of the network from examples within each minibatch
neural architecture search
used to explore the state space of possible network architectures – using a neural network to find the best neural network