W2-Machine Learning Modeling Pipelines in Production Flashcards

Question 1

Q

Neural network models, as part of the training process, will learn to ignore features that don’t provide predictive information by reducing their weights to zero or close to zero. Is this an efficient model in terms of resource usage and performance?

Dimensionality Effect on Performance 02:08

Answer

A

While this is true, the result is not an efficient model.

Resource: Much of the model can end up being shut off when running inference to generate predictions but those unused parts of the model are still there. They take up space and they consume compute resources as the model server traverses the computation graph.

Performance: Those unwanted features could also introduce unwanted noise into the data, which can degrade model performance.

In general, you shouldn’t just throw everything at your model and rely on your training process to determine which features are actually useful.

Question 2

Q

Some feature representations such as one hot encoding are problematic for working with text in high dimensional spaces, why?

Dimensionality Effect on Performance 04:32

Answer

A

They tend to produce very sparse representations that do not scale well.

One way to overcome this is to use an embedding layer that tokenizes the sentences and assigns a float value to each word.

Question 3

Q

Why data being high dimensional can be an issue (for distance based models)?

Curse of Dimensionality 01:18

Answer

A

In extreme cases where we have more features and observations, we run the risk of massively overfitting our model.
\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_

But in the more general case when we have too many features, observations become harder to cluster. Too many dimensions can cause every observation in your data set to appear equidistant from all the others. And because clustering using a distance measures such as Euclidean distance to quantify the similarity between observations, this is a big problem.

Ultimately, having more dimensions often means our model is less efficient.
\_\_\_\_\_\_\_
The higher the dimension, the bigger the feature space, the more sparse the data points (based on Euclidean formula, but it’s the same for other distance formulas), the harder the generalization (and more chance of learning noise)

Question 4

Q

Regardless of which modeling approach you’re using, increasing dimensionality has another problem especially for classification which is known as the Hughes effect. what is it?

Answer

A

This is a phenomenon that demonstrates the improvement in classification performance as the number of features increases until we reach an optimum where we have enough features. Adding more features while keeping the training set the same size will degrade the classifiers performance.

Question 5

Q

TruncatedSVD needs a number of components strictly lower to the number of features. True/False

C3-W2-Lab2

Question 6

Q

There are several ways to choose a k-dimensional subspaces.

For example, in a classification tests, you typically want to have the maximum separation among classes. ____, generally works well for that.

For regression, you want to maximize the correlation between the projected data and the output, where ____, works well. Finally and unsupervised tasks, we typically want to retain as much of the variance as possible. ____, is the most widely used technique for doing that.

Algorithmic Dimensionality Reduction 01:53

Answer

A

Linear discriminant analysis, or LDA
Partial least squares, or PLS
Principal component analysis, or PCA

Question 7

Q

What does ICA (Independent Component Analysis) do?

Chat-GPT

Answer

A

ICA (Independent Component Analysis) is a linear transformation method that seeks to find a set of independent non-Gaussian signals from a set of mixed signals.
ICA is particularly useful for separating signals that are mixed in a way that is not readily apparent

Question 8

Q

Independent component analysis separates a multivariant signal into additive components that are maximally independent. Often, ICA is not used for reducing dimensionality but for separating superimposed signals. True/False

Other Techniques 03:00

Answer

A

True

ICA further assumes that there exists independent signals, S, and a linear combination of signals, Y.

Question 9

Q

NMF(Non-negative Matrix Factorization), like PCA, is a dimensionality reduction technique. In contrast to PCA, however, NMF models are ____.

C3-W2-Lab2

Answer

A

interpretable

NMF expresses samples as combinations of interpretable parts. For example, it represents documents as combinations of topics, and images in terms of commonly occurring visual patterns.

Question 10

Q

Can NMF method of dimensionality reduction be used on every dataset?

C3-W2-Lab2

Answer

A

No, NMF can’t be applied to every dataset, however. It requires the sample features be non-negative, so greater than or equal to 0.

Question 11

Q

What’s Quantization?

Benefits and Process of Quantization 00:26

Answer

A

Quantization involves transforming a model into an equivalent representation that uses parameters and computations at a lower precision.

Question 12

Q

Quantization improves the model’s execution performance and efficiency, and it can often result in higher model accuracy. True/False

Benefits and Process of Quantization 00:26

Answer

A

False, Quantization improves the model’s execution performance and efficiency, but it can often result in lower model accuracy.

Question 13

Q

You can do quantization during training NOT after the model has been trained. True/False

Answer

A

False, You can do quantization during training or after the model has been trained.

Post Training Quantization 00:00

Question 14

Q

What’s Post-training quantization?

Post Training Quantization 00:00

Answer

A

Post-training quantization is a conversion technique that can reduce model size while also improving CPU and hardware accelerator latency with little degradation in model accuracy.

Post-training quantization can result in a loss of accuracy, particularly for smaller networks, but it is often fairly negligible.

Question 15

Q

What does post-training quantization do?

Post Training Quantization 00:37

Answer

A

What post-training quantization basically does is to convert, or more precisely, quantize the weights from floating point numbers to integers in an efficient way.

By doing this, you can gain up to three times lower latency without taking a major hit on accuracy.

Question 16

Q

What can we do If the loss of accuracy is too great in post-training quantization and we want to use quantizers?

Post Training Quantization 04:37

Answer

Study These Flashcards

A

If the loss of accuracy is too great, consider using quantization aware training. However, doing so requires modifications during model training to add fake quantization nodes, while post-training quantization techniques are fairly simple.

Question 17

Q

What’s pruning?

Pruning 00:00

Answer

Study These Flashcards

A

A method to increase the efficiency of models by removing parts of the model that did not contribute substantially to producing accurate results. This is referred to as pruning.

Question 18

Q

Name 3 methods that enable reduced model size and latency which makes it ideal for edge and IOT devices.

C3-W2-Lab3

Answer

Study These Flashcards

A

post-training quantization
quantization aware training
weight pruning

W2-Machine Learning Modeling Pipelines in Production Flashcards

(18 cards)