Feedforward Neural Networks 1 Flashcards
1
Q
Neural Network
A
- Uses only one hidden layer
- Splits complex (non-linearly separable) feature spaces (can’t be done with linear functions)
- Used in simple problems where relationship between input/output is straightforward
2
Q
Why can’t we use Linear Model for everything?
A
Linear models can’t learn FEATURE COMBINATIONS.
3
Q
Linear Features
A
- treating each feature independently (not connected to any other features)
- in text classification, prob/weight of each word is calculated separately and later summed
- uses logistic regression
4
Q
Feature Combinations
A
- Process of combining two or more features to create NEW, MORE COMPLEX features
- More complex, involving non-linear interactions, polynomial features, or higher-order relationships
5
Q
XOR
A
- Simple non-linear function
- Demonstrates that linear models cannot solve non-linearly separable problems.
6
Q
Deep Neural Networks
A
- Uses two or more hidden layers
- Can split even more complex feature spaces
- Used for tasks requiring learning from large amounts of unstructured data
7
Q
Immediate Conjunctive Features
A
- Specific combination of two or more features to create a NEW feature
- Simple and direct combinations of existing features
8
Q
softmax()
A
- Converts raw scores from output layer into probabilities that sum to one
- Works by first exponentiating each element in the input vector and then normalizes them
- “Different weights, same feature”
9
Q
Feedforward Neural Network
A
- Type of neural network where connections between nodes do not form cycles (can’t go backwards)
- Neurons in each layer are fully connected to neurons in next layer
- Each neuron applies an activation function
- After producing the output, the network computes a loss function and backpropagates
10
Q
Loss Function
A
Quantifies the difference between the predicted output and the actual target
11
Q
Backpropagation
A
Gradients of the loss function are calculated and used to adjust the weights in the opposite direction
12
Q
Log Likelihood
A
- Simply the natural logarithm of the likelihood function.
- Probability distributions involve products of probabilities.
- Log transforms these into sums, simplifying the optimization process.
13
Q
Likelihood
A
- Quantifies how well model explains observed data given certain parameters
- Measures the probability of observing the data under the model
14
Q
Why maximize log likelihood?
A
- Helps to find neural network parameters that best explain the observed data
- Higher values indicate that model predicts observed data more accurately, making it useful for model evaluation
- Incorporates regularization to prevent overfitting
15
Q
Gradient of the Loss Function
A
- Optimization algorithm
- Provides info on how to adjust parameters to minimize the loss
- If gradient is positive, indicates that increasing parameters will increase loss
- If gradient is negative, increasing parameters will decrease loss