Module 7 Flashcards

1
Q

Overview of logistic regression

A
  • Used to estimate the probability that an event will occur as a function of other variables
  • Can be considered a classifier as well
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe Inputs and outputs of logistic regression

A

Input - variables can be continuous or discrete
Output - Set of coefficients that indicate the relative impact of each driver + A linear expression for predicting the log-odds ratio of outcome as function of drivers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

List logistic regression use cases

A
  1. probability of an event
  2. Binary classification
  3. Multi-class classification
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the goal of logistic regression?

A
  • Predict the true portion of success, pi, at any value of the predictor
  • pi = # of success / # of trials
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe Y X and PI in Binary logistic regression model

A
Y = Binary Response
X = Quantitative predictor
pi = success
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Logistic regression Pros

A
  • Explanatory value
  • Robust
  • Concise
  • Easy to score data
  • returns good probability estimates
  • preserves summary stats of training data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Logistic Regression Cons

A
  • Does not handle missing values well
  • Doesnot work well with discrete drivers with distinct values
  • Cannot handle variables that affect outcome in discontnues way ( step functions)
  • Assumes each var affects log-odds
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe Neural Network Concept

A
  • constructed and implemented to model the human brain

- performs pattern matching, classification, etc tasks that are difficult for traditional computers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe an artificial neural network

A
  • posses a large number of processing elements called nodes/neurons operating in parallel
  • neurons connected by link
  • each link has weight regarding input signal
  • each neuron has internal state called activation level
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the components of a single-layer neural network

A

Input layer, Hidden layer, output layer, parameters are weights and intercepts are biases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are Ak and g(z) in a neural network

A

Ak is activations in the hidden layer
g(z) is called the activation function - popular functions are sigmoid and rectified linear
g(z) are typically non-linear derived features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe details of the output layer in ann and fitting model

A
  • Output activation function encodes softmax function

- Fit model by minimizing cross entropy/ negative multinomial log-likelihood

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Describe how CNN works

A
  • builds up an image in a hierarchical fashion
  • hierarchy is constructed through convolution and pooling layers
  • Edges and shapes are recognized and pieced together to form shapes/target image
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe the convolution filter ( learned, score)

A
  • filters are learned during training
  • Input image and filter are combined using the dot product to get a score
  • score is high if sub-image of the input image is similar to filter
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the idea of convolution, its result, and the weight in the filters?

A
  • the idea is to find common patterns that occur in different parts of the image
  • Result is a new feature map
  • weights are learned by the network
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are Pooling and its adv

A
  • each nonoverlapping 2 x 2 block is replaced by maximum
  • sharpens feature identification
  • allows for locational invariance
  • reduces dimensions by a factor of 4
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Describe the architecture of CNN

A
  • many convolve + pool layers
  • filters are typically small (3x3)
  • Each filter creates a new channel in the convolution layer
  • As pooling reduces size, the number of filters/channels increases
18
Q

How to create features X to characterize the document?

A

Use Bag of words

19
Q

What is a bag of words

A
  • Bag of words are unigrams
  • Identify 10K most frequently occurring words
  • create a binary vector of length 10k for each document and score 1 in every position that the corresponding word occurred
20
Q

What is a recurrent neural network?

A
  • builds a model that takes into account the sequential nature of the data and build memory of past
21
Q

What is each observation in RNN and target y

A
  • The feature for each observation is a sequence of vectors
  • Target Y is a single variable such as sentiment or one-hot vector for multiclass
  • Y can also be a sequence
22
Q

Describe architecture of RNN and what does it represent?

A
  • Hidden layer is a sequence of vectors A that receive input X and A -1 that produce output O
  • same weights, W,U,B are used at each step
  • represents an evolving model updated as each element is processed
23
Q

How to increase accuracy for RNN

A

add LTSM - long and short-term memory

24
Q

What is autocorrelation

A

is the correlation of all pairs

25
Q

What is the RNN forecaster similar to

A

Autoregression procedure

26
Q

When to use deep learning

A
  • image classification, modeling, medical
  • Speech modeling, language, forecasting
  • when the signal to noise ratio is high
  • use simpler models like AR(5) or glmnet if you can
27
Q

When does fitting neural network become difficult

A

When the objective is the nonconvex - the solution nonconvex functions and gradient descent.

28
Q

implementing non convex functions and gradient descent

A
  • Start with a guess for all parameters and set t = 0

- Iterate until the objective fails to decrease

29
Q

How to find a direction that points downhill on the gradient descent?

A
  • Use gradient vector/ vector of partial derivatives where p is the learning rate
30
Q

What does Backpropagation use

A
  • R is a sum so the gradient is the sum of gradients

- Backpropagation uses chain rule for differentiation

31
Q

What is slow learning

A
  • Gradient descent is slow and has a low learning rate

- Use early stopping for regularization

32
Q

What is stochastic gradient descent

A
  • rather than using all data, use minibatches drawn at random
33
Q

What is an epoch

A

count of iterations and amounts to a number of minibatch updates

34
Q

What is Regularization

A

shrinks weights at each layer, two forms are dropout and augmentation

35
Q

What is dropout learning

A
  • at each update remove units with no probability and scale weights of those retained
  • other units stand in for those removed
36
Q

What is ridge and data augmentation, what is effective in

A
  • make copies of each (x,y) and add small gaussian noise, not modifying the copies
  • this makes the fit robust - equivalent to ridge regularization in OLS
  • effective with SGD
37
Q

What is double descent

A
  • with neural networks better to have too many hidden units than too few
  • running stochastic gradient descent till zero training error gives a good out-of-sample error
  • Increasing layers and training to zero error gives better out-of-sample error
38
Q

In a wide linear model ( p > n) what does SGD with small step size lead to ?

A

minimum norm

39
Q

What is minimum norm

A

zero residual solution

40
Q

what is similar to the ridge path

A

Stochastic gradient flow

41
Q

which ratio is less prone to overfitting

A

high signal-to-noise ratio