Class 8 Flashcards
deep learning
broad family of techniques for ML in which the hypotheses take the form of a complex algebraic circuit with tunable connection strengths
neural networks
networks trained by deep learning methods
feedforward network
neural network with connections only in 1 direction – forms a DAG with designated input and output nodes
recurrent network
neural network that feeds its intermediate or final outputs back into its own inputs
universal approximation theorem
states that a network with just 2 layers of computation, 1st = nonlinear and 2nd = linear, can approximate any continuous function to an arbitrary degree of accuracy
activation function
first layer in a network, the nonlinear one
relu
rectified linear unit
softplus
smooth version of ReLU
vanishing gradient
error signals are extinguished as they are propagated back through the network
automatic differentiation
applies rules of calculus in a systematic way to calculate gradients for any numeric program
one hot encoding
non-numeric attributes (think strings) given a numeric expression
convolutional neural network
neural network that contains specially local connections
kernel
pattern of weights that is replicated across multiple local regions
convolution
process of applying the kernel to the pixels of the image
stride
size of the step that the kernel takes across an image
receptive field
part of a neuron where the sensory input can affect the neurons activation
pooling layer
layer in a neural network that summarizes a set of adjacent units from the preceding layer with a single value
downsampling
process of making the stride larger – coarsens the resulting image
tensors
multidimensional arrays of any dimension
feature map
output activations for a given filter – created by tensors
channels
dimension of the matrix information that carries information about features
residual networks
neuralnetworks that avoid the problem of vanishing gradients by building a very deep network, has skip connections
batch normalization
improves rate of convergence on SGD by rescaling the values generated by the internal layers of the network from examples within each minibatch
neural architecture search
used to explore the state space of possible network architectures – using a neural network to find the best neural network
weight decay
adding a penalty to the loss function (same as regularization for a neural network)
dropout
technique for introducing noise at training time which forces the model to become more robust (similar to boosting), can randomly deactivate units (perceptrons)
recurrent neural networks
neural network that are distinct from feed forward networks in that they allow cycles in the computation graph
markov assumption
RNN’s assume the current state is based on a finite set of previous states
gating units
vectors that control the flow of information in the LSTM via elementwise multiplication of the corresponding vector
unsupervised learning
takes a set of unlabeled examples, may try learning a new representation like specific feature or image, might try to learn a generative model
generator
network that maps values to produce samples from the distribution
discriminator
network that classifies inputs as real (from the training set) or fake (from the genrator)
generative adversarial network
pair of networks that combine to form a generative system
transfer learning
occurs when experience with one learning task helps an agent learn better on another task
multitask learning
form of transfer learning where we simultaneously train a model on multiple objectives
deep reinforcement learning
field of research on multilayer computation graphs