Chapter 2: The Math of Neural Networks Flashcards
core building block of neural network
layer
Neural layers do what?
extract representations out of the data fed to them
Chaining together successive neural layers creates what?
Progressive Data distillation
Dense Neural layer
Fully connected neural layer
Parts of the compilation step
- loss function 2. an optimizer 3. metrics to monitor during data and training
multidimensional NumPy arrays also called what?
Tensors
What are tensors
Tensors are a generalization of matrices to an arbitrary number of dimensions. A container for data almost always numerical
Alternate name for dimension in the context of tensors?
Axis
Term for the number of axes of a tensor?
its rank
Tensor that contains only 1 number
Scalar
An array of numbers
Vector
1D Tensor
Vector
0D Tensor
Scalar
How many Axes do vectors have?
1
What is a vector’s dimension?
the number of entries along its axis. NOT the same as tensor dimension
An array of vectors
Matrix
2D Tensor?
Matrix
3 Key attributes of a Tensor
- Number of axes
- Shape
- Data type
Tensor Shape
A tuple of numbers that describes how many dimensions it has along each axis
Tensor Axis convention
Axis = 0 Sample axis Axis = 1 Feature axis Axis = 2 Time axis
Element-wise operations
applied independently to each entry the tensors
vectorized implementations
amenable to massively parallel implementation
Geometric Interpretation of Deep Learning
Neural networks consist entirely of chains of tensor operations and all of these tensor operations are just geometric transformations of the input data
Differentiable
Can be derived
Gradient
The derivative of a tensor operation. the generalization of the concepts of derivatives to functions of multidimensional inputs
Gradient and Loss reduction
by shifting weights slightly in the opposite direction of the gradient
mini-batchStochastic gradient descent
the process of using random batches of a data set to train the weights of a neural network.
batch stochastic gradient descent
using all of your data to train your neural network via stochastic g. d.
What role does momentum play in SGD?
It addresses two issues 1. Convergence Speed 2. local minima
What gives rise to the backpropagation algorithm?
applying the chain rule to the computation of the gradient values in a neural network allowing us to find the derivative
backpropagation
starts with the final loss value and works backward from the top layers to the bottom layers applying the chain rule to compute the contribution that each parameter had in the loss value
Where does the knowledge of network lie?
in the weight tensors (attributes of the layers)
Dense neural network layers
Fully connected
layer compatability
The idea that every layer will only accept input tensors of a certain shape and will return output tensors of a certain shape
network space
the network space you choose constrains your space of possibilities (hypothesis space)
Optimizer
determines how the network will be updated based of the loss function. It implements a specific variant of stochastic gradient descent
Keras backend engines
currently just Tensorflow, Theano, and Microsoft Cognitive toolkit (CNTK)
Keras and hardware
Can run on either CPU or GPU
Keras on CPU
Tensorflow uses a library for tensor operations Eigen
Keras on GPU
Tensorflow uses a library for deep-learning op. called NVIDIA CUDA Deep Neural Network Library
Ways to define neural model
Sequential or Functional API