Module 7 Flashcards by mustafa mohsin

Overview of logistic regression

Used to estimate the probability that an event will occur as a function of other variables
Can be considered a classifier as well

How well did you know this?

Not at all

Perfectly

Describe Inputs and outputs of logistic regression

Input - variables can be continuous or discrete
Output - Set of coefficients that indicate the relative impact of each driver + A linear expression for predicting the log-odds ratio of outcome as function of drivers

How well did you know this?

Not at all

Perfectly

List logistic regression use cases

probability of an event
Binary classification
Multi-class classification

How well did you know this?

Not at all

Perfectly

What is the goal of logistic regression?

Predict the true portion of success, pi, at any value of the predictor
pi = # of success / # of trials

How well did you know this?

Not at all

Perfectly

Describe Y X and PI in Binary logistic regression model

Y = Binary Response
X = Quantitative predictor
pi = success

How well did you know this?

Not at all

Perfectly

Logistic regression Pros

Explanatory value
Robust
Concise
Easy to score data
returns good probability estimates
preserves summary stats of training data

How well did you know this?

Not at all

Perfectly

Logistic Regression Cons

Does not handle missing values well
Doesnot work well with discrete drivers with distinct values
Cannot handle variables that affect outcome in discontnues way ( step functions)
Assumes each var affects log-odds

How well did you know this?

Not at all

Perfectly

Describe Neural Network Concept

constructed and implemented to model the human brain

- performs pattern matching, classification, etc tasks that are difficult for traditional computers

How well did you know this?

Not at all

Perfectly

Describe an artificial neural network

posses a large number of processing elements called nodes/neurons operating in parallel
neurons connected by link
each link has weight regarding input signal
each neuron has internal state called activation level

How well did you know this?

Not at all

Perfectly

What are the components of a single-layer neural network

Input layer, Hidden layer, output layer, parameters are weights and intercepts are biases

How well did you know this?

Not at all

Perfectly

What are Ak and g(z) in a neural network

Ak is activations in the hidden layer
g(z) is called the activation function - popular functions are sigmoid and rectified linear
g(z) are typically non-linear derived features

How well did you know this?

Not at all

Perfectly

Describe details of the output layer in ann and fitting model

Output activation function encodes softmax function

- Fit model by minimizing cross entropy/ negative multinomial log-likelihood

How well did you know this?

Not at all

Perfectly

Describe how CNN works

builds up an image in a hierarchical fashion
hierarchy is constructed through convolution and pooling layers
Edges and shapes are recognized and pieced together to form shapes/target image

How well did you know this?

Not at all

Perfectly

Describe the convolution filter ( learned, score)

filters are learned during training
Input image and filter are combined using the dot product to get a score
score is high if sub-image of the input image is similar to filter

How well did you know this?

Not at all

Perfectly

What is the idea of convolution, its result, and the weight in the filters?

the idea is to find common patterns that occur in different parts of the image
Result is a new feature map
weights are learned by the network

How well did you know this?

Not at all

Perfectly

What are Pooling and its adv

each nonoverlapping 2 x 2 block is replaced by maximum
sharpens feature identification
allows for locational invariance
reduces dimensions by a factor of 4

How well did you know this?

Not at all

Perfectly

Describe the architecture of CNN

Study These Flashcards

many convolve + pool layers
filters are typically small (3x3)
Each filter creates a new channel in the convolution layer
As pooling reduces size, the number of filters/channels increases

How to create features X to characterize the document?

Study These Flashcards

Use Bag of words

What is a bag of words

Study These Flashcards

Bag of words are unigrams
Identify 10K most frequently occurring words
create a binary vector of length 10k for each document and score 1 in every position that the corresponding word occurred

What is a recurrent neural network?

Study These Flashcards

builds a model that takes into account the sequential nature of the data and build memory of past

What is each observation in RNN and target y

Study These Flashcards

The feature for each observation is a sequence of vectors
Target Y is a single variable such as sentiment or one-hot vector for multiclass
Y can also be a sequence

Describe architecture of RNN and what does it represent?

Study These Flashcards

Hidden layer is a sequence of vectors A that receive input X and A -1 that produce output O
same weights, W,U,B are used at each step
represents an evolving model updated as each element is processed

How to increase accuracy for RNN

Study These Flashcards

add LTSM - long and short-term memory

What is autocorrelation

Study These Flashcards

is the correlation of all pairs

What is the RNN forecaster similar to

Autoregression procedure

When to use deep learning

- image classification, modeling, medical - Speech modeling, language, forecasting - when the signal to noise ratio is high - use simpler models like AR(5) or glmnet if you can

When does fitting neural network become difficult

When the objective is the nonconvex - the solution nonconvex functions and gradient descent.

implementing non convex functions and gradient descent

- Start with a guess for all parameters and set t = 0 | - Iterate until the objective fails to decrease

How to find a direction that points downhill on the gradient descent?

- Use gradient vector/ vector of partial derivatives where p is the learning rate

What does Backpropagation use

- R is a sum so the gradient is the sum of gradients | - Backpropagation uses chain rule for differentiation

What is slow learning

- Gradient descent is slow and has a low learning rate | - Use early stopping for regularization

What is stochastic gradient descent

- rather than using all data, use minibatches drawn at random

What is an epoch

count of iterations and amounts to a number of minibatch updates

What is Regularization

shrinks weights at each layer, two forms are dropout and augmentation

What is dropout learning

- at each update remove units with no probability and scale weights of those retained - other units stand in for those removed

What is ridge and data augmentation, what is effective in

- make copies of each (x,y) and add small gaussian noise, not modifying the copies - this makes the fit robust - equivalent to ridge regularization in OLS - effective with SGD

What is double descent

- with neural networks better to have too many hidden units than too few - running stochastic gradient descent till zero training error gives a good out-of-sample error - Increasing layers and training to zero error gives better out-of-sample error

In a wide linear model ( p > n) what does SGD with small step size lead to ?

minimum norm

What is minimum norm

zero residual solution

what is similar to the ridge path

Stochastic gradient flow

which ratio is less prone to overfitting

high signal-to-noise ratio

Module 7 Flashcards

(41 cards)