Architecture, Machine Learning & Neural Networks Flashcards
Problems to be addressed
Latency
Linguistic nuances
Architecture
Dataset
Processing power
Ethical challenges
Latency is being experienced because
the chatbot’s response time is slow, damaging the customer experience
Linguistic nuances are being experienced because
The chatbot’s language model is struggling to respond appropriately to ambiguous statements.
Architecture issues are being experienced because
the chatbot’s architecture is too simplistic and unable to handle complex language
The dataset of the chatbot is an issue because
The chatbot’s training data is not diverse enough, leading to poor accuracy in understanding and responding to customer queries
Processing power is an issue because
The system’s computational capability is limited
There are ethical challenges involved since
the chatbot does not always give appropriate advice and is prone to revealing personal info from its training dataset
Machine learning and 3 examples
This is when we combine data and algorithms to make predictions about future behavior. 3 Examples include:
- Image recognition
- Speech Recognition
- Recommendation systems
What is the process behind setting up machine learning?
- Select a machine learning model (Linear regressions, k-Nearest Neighbor, Decision Trees etc.)
- Train the model with input data and the result of what each input should be so it can understand certain characteristics of input data that predicts a specific output
- Use the model by putting input data into it
- Receive the predicted output
- Improve the model by analyzing outputs
Basic concept of Neural Networks
Neural networks are like a simplified version of the human brain, designed to recognize patterns and make sense of different types of information
What do neural networks help with?
They help in things like identifying categories, predicting outcomes, or finding patterns in data, just like how we learn to categorize or recognize things in everyday life
How neural networks learn?
These networks improve over time by learning from examples, just like how we learn from experience. The more examples they get, the better they become at their tasks.
Input layers NN
The input layer accepts data either for training purposes or to make a prediction. It only takes in real numbers.
Hidden Layers NN
The hidden layers are responsible for actually deciding what the output is for a given input, this can also be a place where training occurs
Output layers NN
Outputs the final prediction
Weights NN
The weights, along with the biases, are what determine the output of a neural network. Weights determine the strength and direction (positive or negative) of the connection between neurons, transforming the input data as it passes through the NN.
Loss in NN
This is a quantifiable measure of the difference in what the NN produced and what the expected output should be. It is calculated using a loss function like mean-squared error.
Gradient in NN
The gradient is a measure of how sensitive the amount of loss is to changed in weight and bias. It is calculated by the derivative of the loss function.
Why do we calculate loss?
To understand how much to adjust the weights and biases in the network.
What is a derivative
Function that represents the rate of change of change in a function.
Partial derivatives
Used in functions with multiple variables, like f(x, y, z). The partial derivate measures how these types of function change with respect to one variable while keeping all other variables constant.
Gradient Descent Function NN
Once we calculate the gradient of our NN, we can put this into a gradient descent function, which is an algorithm that minimizes the loss function, ultimately allowing us to find the weights and biases for every connection that will give us the result we want.
The algorithm then automatically updates the weight and biases in the NN.
Backpropagation
Backpropagation is part of the training process in a neural network. It is the algorithm used to compute the gradients (sensitivities of the loss function to changes in weights and biases) for each layer, starting from the output layer and propagating backward through the network. These gradients are then used by the gradient descent algorithm to update the weights and biases, minimizing the loss function step by step.
Backpropagation requires _________ of the different gradients from each layer meaning it needs to use the derivate rule of ________ rule
multiplication, chain
Each gradient’s calculation is based in a NN on the gradient of the ________ layer, starting from the output layer
previous
Vanishing gradient problem and why?
This is when the gradient gets smaller and smaller the deeper you go into the hidden layers, until the gradient becomes so small, the NN doesn’t have much to optimize with. It occurs due to the nature of how gradients are computed using the chain rule and activation functions (sigmoid) (which often produce values less than 0). When values less than 0 get multiplied and multiplied, it leads to a smaller and smaller number hence, a vanishing gradient.
It can also just be caused by very small initial weights
Activation function and 3 examples
This is a function that is applied to the output of a neuron in a NN to introduce non-linearity
- Sigmoid
- ReLU
- Tanh
Why are non-linear neuron outputs important?
Allow complex representations and ability to capture complicated patterns and relationships. If the neuron outputs were linear, no matter how many layers the NN has, it would just be a linear model which cannot capture as much complex relationships.
i.e. allows it to understand and predict more complex shit
epoch
this is one complete iteration of passing all batches of the training data through the NN. The more epochs, the more accurate the model is and its understanding of the training data to make predictions.
How would a NN generally work in a categorization process?
It would take the input, and as it goes through each hidden layer, recognize characteristics of the input. At the end, it would take the characteristics and put them together to categorize the data or make a prediction about what it is.
Big problem with layers and accuracy in NN
The more layers, the more precision of prediction, however it also means higher likelihood of the vanishing gradient problem
Bias NN
Parameter added to the weighted input that shifts the activation function, making the neuron output non-linear and thus, allowing it to process more complex patterns.
Essentially, this is what allows for flexibility in the neuron’s output.