Lets go Flashcards
What learning approach does linear regression use?
a. supervised
b. unsupervised
c. reinforcement
d. deep
e. meta-heuristics
f. convolutional
Linear regression is a supervised learning approach.
What is the aim of linear regression?
Linear regression tries to fit a line to historical data.
The equation for a line is Y = MX + C. What variables does linear regression change in order to minize cost?
Y = MX + C
P0 = C, P1 = M
Y = P0 + P1X
What is cost defined as in linear regression? Short
Cost = amount of data that does not fit the hypothesis (line).
What is the hypothesis defined as in linear regression? Short
The hypothesis in linear regression just means the predicted function (line).
What are two algorithms that are often used in linear regression?
The two algorithms that are often used in linear regression are gradient descent and using the normal equation. These algorithms aim to find the local optima of cost when iteratively changing P0 and P1.
What learning approach does K-NN (K-NearestNeighbours) use?
a. supervised
b. unsupervised
c. reinforcement
d. deep
e. meta-heuristics
f. convolutional
Supervised learning
Does K-NN (K-NearestNeighbours) classify or regress? How does it do that? Short
Classification algorithm which determines the class of any given point by counting the N nearest neighbors. Whichever class occurs the most is the class that will be given to the ‘predicted’ data point.
How does a regression differ from a classification? Short
Regression is trying to find the value of the other axis on the graph.
Classification is trying to find the group/ class a point on a graph belongs to.
What learning approach does K-means use?
a. supervised
b. unsupervised
c. reinforcement
d. deep
e. meta-heuristics
f. convolutional
Unsupervised learning
Is K-means good for regression or classification?
Good for classification and regression
Not sure how but you can use it in regression if the data forms a line by splitting the line into K groups and determining the value of the other axis based on the classification given?
Does K-means need labeled data?
No. Unsupervised learning…
How does K-means work? Explain the algo in short.
Find K number of groups within the data (centroids). You can extrapolate the groups found on new data
The algo:
Start with K centroids (random)
Each datapoint is grouped with its nearest centroid
Set the centroids pos to the mean of all values
Repeat steps B and C until stopping criteria is met:
(i.e., no data points change clusters, the sum of the distances is minimized, or some maximum number of iterations is reached).
Randomize the starting points and rerun algo to find another result
Choose the best results
Does K-means always give a result?
K-means guarantees to converge at a result, does not shift around forever. That result MAY be a local optima.
Does K-NN (K-NearestNeighbours) need labeled data?
Yes. Supervised learning…
Does linear regression need labeled data?
Yes. Supervised learning…
What is the ‘search space’ in regards to nueral networks? Short
The search space is the total amount of ‘answers’ that are possible for a problem.
What are the most basic activation functions?
Hard limit (step function): the simplest!, Converts a fraction to either 0 or 1, If the fraction is above some threshold, it becomes a 1, otherwise a 0.
Linear: Returns a fraction, which is either the same fraction, or a scaled one.
Sigmoidal: More advanced, but essential to solving useful problems with ANNs, Based on an S-shaped curve.
ReLu: its just relu
What is feed forward in regards to neural networks? Short
Feed forward refers to the fact that a node takes an input and multiplies it by its weight when outputting it to the connected output nodes.
What are ‘learning rules’ in regards to neural networks? Name & explain them. Detailed
TLDR: Change weights based on the correctness of the prediction.
Perceptron Learning Rule: states that the algorithm would automatically learn the optimal weight coefficients. Single layer Perceptrons can learn only linearly separable patterns. Basically the same as error propogation, but it can’t propagate so it has a different definition.
Error back-propagation rule: The training samples are fed through one-by-one, and for each. The actual output is compared to the correct output. This error is then used to alter the weights between all neurons throughout the network.
Should datasets be split into multiple groups? Which ones and what ratio? Short
Datasets should be split into three groups with the ratio 70/20/10. The 70 is for training data, 20 for evaluation, and 10 for testing. Evaluation happens after every epoch to evaluate the output of the model currently to check whether we are over fitting. Instead the test data is used after the model is trained to see how it performs on completely new data.
What are the effects of a NN (neural network) having a high ‘bias’? Short
A model with high bias tends to be smaller and/or simpler. If too much so, it fails to capture the relationships within the data, leading to lower accuracy. In other words, it’s biased toward seeing different values in the training data as too similar, lacking the ability to capture the variations.
What are the effects of a NN (neural network) having high ‘variance’? Short
A model with high variance tends to be larger and/or more complex. If too much so, it sees even slightly different samples as vastly different, giving inaccurate predictions, as it learns from the noise in the data. In other words, it sees a lot of variance in training data, even if some samples are very similar to each other.
Regularization techniques are used to balance bias and variance, and prevent overfitting. What are the ones covered in the slides? Detail
Lasso
Least Absolute Shrinkage and Selection Operator
Prevents overfitting by adding a penalty term to the loss function.
Remember, the loss function is the way the error is calculated.
It detects the synaptic weights associated with less important features, and drives them to zero value.
It therefore carries out feature selection internally.
Leads to smaller models
Ridge
Similar to L1, it adds an extra term to the loss function, which is related to the squares of the weights.
Unlike L1, which drives weights to zero, L2 drives those associated with less important features down to smaller values.
More computationally efficient than L1.
Drop out
Doesn’t change the loss function, but the network architecture itself.
Neurons in each layer are randomly “shut down” (or ignored), and the selection is changed every epoch.
It forces the network to learn more robust features and has been shown to increase generalization.
What are genetic algorithms? short
Genetic algorithms are algorithms that mimics evolution to ‘train’ a solution. It’s easy to get a good result using your domain specific knowledge as you can directly implement it into ‘starting population’.
How does a genetic algorithm work? Simple step by step
GA procedure at a high level:
1. Initializes a population of solutions (chromosomes).
2. Evaluate each chromosome in the population using an objective function.
3. Create a new solutions from the previous population by applying the reproduction & modification operations.
4. Replace the old population with new ones, just created.
5. Repeat steps 2 to 4 for a number of iterations (generations).
What is reproduction in a genetic algorithm? Short
Reproduction is the process in which individual solutions are arranged according to their objective function values -i.e. performance, and accordingly selected for the next generation.
What is crossover in a genetic algorithm? Short
The idea of crossover is that genetic material (i.e. binary patterns) of parents are combined to produce new solutions (children) that would ‘hopefully’ benefit from strengths of the parents. Use the biased roulette wheel to select the parents. Crossover is achieved by exchanging coding bits between 2 ‘mated’ strings.
What is elitism in a genetic algorithm? Short
Take x percent top performers and include them in the next population without any reproduction/ mutation.
What is mutation in a genetic algorithm? Short
The is the occasional random alteration of the value of a bit string.
The mutation operation plays the role of occasionally providing new material to add to the diversity of the search. This gives us the ability to potentially find a new/ better local optima.