Midterm 2 Flashcards

Question 1

Q

What Is artificial intelligence?

Answer

A

A computer program that mimics the intelligence of humans

Question 2

Q

What is machine learning?

Answer

A

technique which a computer can use to learn from data using complex rules

Question 3

Q

What is deep learning?

Answer

A

A technique for machine learning based on the neurons in the brain

Question 4

Q

List the three types of Machine learning and explain them

Answer

A

Unsupervised learning - no feedback given to algorithm
supervised learning - every example has a label
Reinforcement - reward or punishment per action

Question 5

Q

How does supervised learning training work?

Answer

A

Training is a collection of labelled examples {xi, yi} where xi is a feature vector with D dimensions and y is a label

Question 6

Q

What is k-nearest neighbors?

Answer

A

Looks at neighbors closest to the data (similar feature values) and what they are classified as

Question 7

Q

What is linear regression?

Answer

A

Supervised machine learning used on continuous numerical data. It enables us to identify a linear trend and outliers

Question 8

Q

What is binary classification

Answer

A

Supervised learning where the objective is to organize everything into one of 2 classes (logistic regression)

Question 9

Q

what is multi-class classification

Answer

A

supervised learning into 3 or more discrete classes. can be transformed into binary
- one vs all (OvA)

Question 10

Q

Explain one vs all

Answer

A

separate binary classifier for each class. each classifier labels one class as positive and all others as negative. final assignment is based on the classifier with the highest confidence score

Question 11

Q

What is a decision boundary?

Answer

A

A boundary which partitions the underlying feature space into regions corresponding to different class labels

Question 12

Q

What is linearly separable data?

Answer

A

data is linearly separable when 2 classes can be perfectly separated by a single linear boundary (line for 2d, plane for 3d, hyperplane for >3d)

Question 13

Q

what is the difference between a simple decision boundary and a complex one?

Answer

A

simple is when the boundary comes from a polynomial function.
complex is an irregular decision boundary generated by decision trees

Question 14

Q

What is logistic regression?

Answer

A

it is a binary (0,1) classification algorithm which determines the probability that a given instance xi belongs to the positive class

Question 15

Q

Explain the logistic function

Answer

A

maps a real valued input to the open interval 0-1. It is called a squashing function because it maps a wide input domain to a constrained output

Question 16

Q

What is underfitting?

Answer

A

Machine learning concept where the model is too simple to accurately classify the data. It is underfitting if it has poor performance on both training and test data and adding more data doesn’t correct the issue

Question 17

Q

What is overfitting?

Answer

A

When the model is too complex for a given classification problem (tall decision tree, deep and wide neural networks). Too many features creates excellent performance on the training set but poor performance on the testing set

Question 18

Q

Explain learning curves?

Answer

A

Displays the performance of our model by using Root mean square error (RMSE) on both the training and test sets

Question 19

Q

What is the Bias/Variance trade off?

Answer

A

Bias -> error created by overly simplistic models, high bias = underfitting
Variance -> error from overly complex models that is sensitive to fluctuations in the training data. High variance = overfitting
Tradeoff -> aim for a model that generalizes new data well

Question 20

Q

Explain the confusion matrix

Answer

A

A matrix which displays the true positives, false negatives, false positives and true negatives for all labels

Question 21

Q

What is accuracy?

Answer

A

The ratio of correctly predicted instances and the total number of predictions

Question 22

Q

What is precision

Answer

A

ratio of true positives (TP) / total number of positives

Question 23

Q

Explain the holdout method

Answer

A

Allocate roughly 80% of your dataset for training and reserve the remaining 20% for testing
- Training error generally low otherwise there is something wrong
- Generalization error - error rate observed when the model is evaluated on new unseen data

Question 24

Q

What is cross validation?

Answer

A

method to evaluate models and improve performance. Involves partitioning the dataset into multiple subsets

Question 25

Q

Explain k-fold cross validation

Answer

A

Divide the dataset into k equally sized folds
Training and validation - for each iteration, one fold is used as the validation remaining as training
Evaluation - models performance is evaluated in each iteration, resulting in k performance measures
Aggregation - stats are calculated based on k performance measures

Question 26

Q

What are the benefits of k fold compared to normal test train split?

Answer

A

Much more reliable estimate of model performance.
Results in better generalization and reduced variability
Works very well for hyper parameter tuning

Question 27

Q

Challenges of multi fold

Answer

A

Computationally costly - takes forever to train and doing it a bunch of times increases that
Class imbalance - folds may not represent minority classes (if one fold contains a ton of one class it could skew training or validation)
Error prone

Question 28

Q

What is a hyperparameter

Answer

A

A hyperparameter is a configuration external to the model that is set prior to the training process and dictates the learning process

Question 29

Q

Grid search

Answer

A

Enumerates through all possible hyperparameter combinations
train on training set, evaluate on validation set

Question 30

Q

Data augmentation

Answer

A

a technique used to increase the diversity of a dataset by applying various transformations to the existing data

Question 31

Q

What is one-hot encoding?

Answer

A

A technique that converts categorical variables into a binary vector representation where each category is represented with a single 1 and all others as 0 (e.g. instead of something just being labelled 5 it is 0, 0, 0, 0, 1)

Question 32

Q

Explain why one-hot encoding is beneficial

Answer

A

Increases the dimensionality of feature vectors. it helps it avoid bias

Question 33

Q

What is Binning (feature engineering)

Answer

A

placing things into bin categories. e.g. ages into: child, teen, adult and senior

Question 34

Q

What is normalization

Answer

A

A scaling technique which accelerates optimization -> algorithms perform optimally when feature values are within similar ranges and this helps with it

Question 35

Q

What is standardization?

Answer

A

Transforms each feature to have a normal distribution with a mean of 0 and a standard deviation of 1

Question 36

Q

Standardization or Normalization?

Answer

A

-> standardization for unsupervised learning or if features resemble a normal distribution
-> standardization handles outliers better otherwise use normalization

Question 37

Q

What is data imputation

Answer

A

Data imputation -> the process of replacing missing values in a dataset using statistics or machine learning

Question 38

Q

Data imputation strategies

Answer

A

mean, median or mode replacement
special value method -> value outside normal range as a notifier of a missing value

Question 39

Q

What is a class imbalance

Answer

A

A scenario where the number of instances in one class significantly outnumbers the instances of another one -> the model becomes biased towards the dominant majority class

Question 40

Q

Explain the solutions to class imbalance

Answer

A

Oversampling the minority class -> can lead to overfitting and poor performance on the test data

Undersampling the majority class -> loss of info about majority class can lead to underfitting

Synthetic data -> generate fake minority data

Question 41

Q

Deep learning - how it works

Answer

A

Machine learning technique that can be applied to supervised learning, unsupervised learning and reinforcement learning
It is inspired by neurons and uses layers of them connected to classify things

Question 42

Q

Explain the layers of a neural network

Answer

A

Input: where the data is input it corresponds to the number of attributes in the data
Hidden layer: the process for which the computer sorts the data
Output layer: where the data is classified completely

Question 43

Q

What is an activation function in relation to neural networks

Answer

A

Activation function is applied to the entire neural network and introduces non-linearity into the neural network -> a neuron is fired or activated when the requirement passed to the node exceeds the value stored within the node

Question 44

Q

Explain the three common activation functions?

Answer

A

Sigmoid -> (sin based function) produces outputs inbetween (0, 1)
Tanh -> (tan based function)outputs in between (-1, 1)
ReLU ->(rectified linear unit) outputs values in the interval [0, infinity)

Question 45

Q

Explain the universal approximation theorem

Answer

A

A neural network with a single hidden layer can approximate any continuous function

Question 46

Q

What is back-propagation

Answer

A

A learning procedure which repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector.

Question 47

Q

Explain the backpropagation steps

Answer

A

Initialization
Forward pass
compute loss
backpropagation
repeat 2 to 5

The algorithm stops either after a number of epochs or when the convergence criteria is satisfied

Question 48

Q

Explain the initialization step

Answer

A

Initializing the weights and biases of the neural network
1. Zero initialization - all weights are 0
- doesn’t work well for symmetry as all neurons produce identical outputs
2. Random initialization - weights are initialized randomly using uniform or normal distributions

Question 49

Q

Explain the forward pass step

Answer

A

Data is passed to the first layer -> for each of the hidden layers, compute the activations by applying the weighted sum of inputs plus bias -> followed by an activation function

Question 50

Q

Explain the compute loss step

Answer

A

Calculate the error (loss) using a suitable function by comparing predicted values with actual target values -> A smaller loss indicates that predicted values are closer to the actual target values

Question 51

Q

Explain the Backwards pass step

Answer

A

Output layer: Compute the gradient of the loss with respect to the output layers weights and biases using the chain rule of calculus

Hidden layers: propagate the error backwards through the network layer by layer. For each layer compute the gradient loss with respect to the weights and biases

Update the weights and biases: adjust the weights and biases using the calculated gradients and a learning rate

Question 52

Q

What is gradient descent

Answer

A

An optimization algorithm used to minimize the loss function by iteratively moving towards the steepest descent as defined by the negative gradient

Question 53

Q

Explain weight initialization and how vanishing gradients were solved

Answer

A

Random initialization breaks the symmetry and allows for effective learning

Glorot initialization fixes it for sigmoid and tanh’

He initialization is optimal for ReLU and its variants

Question 54

Q

Explain how learning rate is important to optimization

Answer

A

Learning rate determines the step size during optimization

Question 55

Q

Explain the hierarchy of concepts

Answer

A

Each layer detects patterns from the output of the layer preceding it -> in other words the network uncovers patterns of patterns

Question 56

Q

What is a Convolutional Neural Network

Answer

A

Crucial pattern info is often local (e.g. top left edge)
convolutional layers, reduce parameters significantly because neurons are not fully connected to the preceding layer but rather their receptive fields

Question 57

Q

What is a kernel in the context of machine learning

Answer

A

A kernel is a small matrix that slides over input data such as an image to perform convolution. The kernel is moved through the entire image one n pixels at a time (kernel is nxn) the values in the kernel are multiplied by the value in the input matrix region the overlap and then all the values are summed to make a single scalar value -> output matrix is the feature map

Question 58

Q

What is a receptive field

Answer

A

Each unit is connected to neurons in its receptive fields -> unit i, j in layer l is connected to the units (i to i + fh -1) and (j to j+fw-1) of the layer l-1

Question 59

Q

What is padding?

Answer

A

Zero padding -> to have layers of the same size the grid can be padded with zeroes -> allows it to recognize edges

Question 60

Q

Explain the stride

Answer

A

Stride -> it is possible to connect a larger layer (l-1) to a smaller one (l) by skipping units. The number of units skipped is called the stride

Question 61

Q

what are filters

Answer

A

A window of size fh x fw is moved over the output layers l-1 referred to as the input feature map
- For each location, the product is calculated between the extracted patch and a matrix of the same size known as the convolution kernel or filter

Question 62

Q

Explain the kernel parameters and where they originate

Answer

A

The parameters of the kernel are learned through backpropagation allowing the network to optimize its feature extraction capabilities based on the training data

Question 63

Q

what is a feature map

Answer

A

in CNN the output of a convolution operation is the feature map

Question 64

Q

What is the bias term

Answer

A

a single bias term is added uniformly to all entries of the feature map -> this bias helps adjust the activation level

Answer 65

A

basically a convolutional layer except there is no weights instead there is aggregating function normally max or mean-> each neuron in a pooling layer is connected to neurons in the receptive field

Answer 66

A

Dimensionality reduction -> reduces spatial dimensions of input feature maps decreasing the # of parameters and computational load

Feature extraction -> essentially summarizes the region discarding less important details

Translation invariance -> network becomes less sensitive to small changes

Noise reduction -> smooths noise through aggregation

Answer 67

A

Observability: partially or fully
agent composition: single or multiple
Predictability: deterministic or nah
State dependency: stateless or stateful
temporal dynamics: static or dynamic
state representation: discrete or continuous

Answer 68

A

A collection of states (state space) -> an initial state where the agent begins -> one or more goal states -> a set of actions available in the state -> a transition model that determines the next state based on the current state and action

Answer 69

A

searching with heuristic functions involved to estimate costs

Answer 70

A

Breadth first improvement -> uses heuristics to prioritize nodes that seem closer to the goal -> it uses a priority queue sorted by estimated cost

Answer 71

A

specifically for 8-tile problem -> calculates the sum of the distance of tiles from their goal position

Brainscape's Knowledge GenomeTM

Midterm 2 Flashcards

All content necessary for midterm 2

Brainscape's Knowledge Genome^TM