Final Exam Review Flashcards
back propagation -
Training Process:
1 .Forward Pass:
The input data is passed through the neural network layer by layer to produce an output. The output is compared to the actual target values, and the error is calculated.
- Backward Pass (Backpropagation):
The algorithm then works backward through the network. It calculates the gradient of the error with respect to the weights of the network. This is done using the chain rule of calculus. The gradients indicate how much the error would increase or decrease if the weights were adjusted. - Weight Update:
The weights of the network are then updated in the opposite direction of the calculated gradients. This process is repeated iteratively, adjusting the weights to minimize the error.
how do we train a neural network to be intelligent to do task like prediction and classification
back propagation.
back propagation function -
it is through back propagation that we update the ways in the matrix so that it will get a good representation of the information - that’s with the neural network approach
with back propagation how are you going to do that weight update?
The weights are updated in the opposite direction of the computed gradients. The learning rate determines the step size of this update.
This process is typically repeated for multiple iterations (epochs) until the model converges to a set of weights that minimizes the loss.
what conceptual things happen during back propagation
Training Process:
1 .Forward Pass:
- Backward Pass (Backpropagation):
- Weight Update:
the difference between the predicted value and the actual value
- Backpropagation takes the difference between the predicted value and the actual value and uses that error term to adjust each node’s weights.
- The process works backwards from the final layers to earlier layers, one layer at a time, and computes the contribution that each weight in the given layer had in the loss value.
- The algorithm that computes the loss value is called a “gradient descent:” this iteratively moves in the direction of greatest improvement in prediction
back propagation speed
larger steps will allow the learning to happen faster (big vs little) to find the optimal point
word embedding - -
Word embedding is a technique in natural language processing (NLP) and machine learning that represents words as vectors of real numbers. These vectors capture semantic relationships between words, allowing words with similar meanings to have similar vector representations. In other words, word embedding is a way to map words to dense vectors of real numbers, often in a continuous vector space.
epoch
one Epoch is when an ENTIRE dataset is passed forward and backward through the neural network only ONCE.
An epoch refers to one complete pass through the entire training dataset during the training of a machine learning model.
batch
one batch contains the training examples present in one weight update (recommended: no more than 32)
A batch is a subset of the training dataset that is processed together in one iteration.
Iteration
number of iterations/batches = total training data/batch size
An iteration, in the context of training, refers to one update of the model’s weights.
batch size calculation based on code
Number of Iterations=
Total Dataset Size / Batch Size
name 3 word embedding techniques.
One-hot Vector
TF-IDF
Word2Vec
GloVe
fastText
ELMo
Attention Mechanism –BERT
XLNet
why do we need to do word embedding in a neural network approach?
word embedding is crucial because it represents words as vectors in a continuous vector space. This helps capture semantic relationships between words, enabling the network to understand context, similarities and differences.
GloVe (Global Vectors for Word Representation):
GloVe is an unsupervised learning algorithm that learns word representations by examining global word co-occurrence statistics. It creates embeddings by factorizing the logarithm of the word co-occurrence matrix.
Word2Vec (Word to Vector):
Developed by Google, Word2Vec represents words as dense vectors in a continuous vector space. It uses neural networks to learn word embeddings based on the context in which words appear.
FastText:
Developed by Facebook, FastText extends Word2Vec by representing each word as a bag of character n-grams. This allows it to generate embeddings for out-of-vocabulary words and capture morphological information.
TF-IDF (Term Frequency-Inverse Document Frequency):
While not a neural embedding technique, TF-IDF is a traditional method for representing words based on their importance in a document or a corpus. It is commonly used in information retrieval.
ELMo (Embeddings from Language Models):
ELMo generates word embeddings by considering the context in which words appear in a sentence. It uses a deep, context-dependent bidirectional LSTM (Long Short-Term Memory) model.
BERT (Bidirectional Encoder Representations from Transformers):
BERT is a transformer-based model that considers bidirectional context information for word embeddings. It has been highly successful in various NLP tasks and captures complex linguistic patterns.
XLNet:
An extension of BERT, XLNet uses a permutation language modeling objective to capture bidirectional context information. It overcomes some limitations of BERT, particularly in handling dependencies between words.
CNN - -
CNN stands for Convolutional Neural Network. It is a type of artificial neural network designed for processing structured grid data, such as images. CNNs are particularly effective in computer vision tasks, including image recognition, object detection, and image classification.