W2-Machine Learning Modeling Pipelines in Production Flashcards

1
Q

Neural network models, as part of the training process, will learn to ignore features that don’t provide predictive information by reducing their weights to zero or close to zero. Is this an efficient model in terms of resource usage and performance?

A

While this is true, the result is not an efficient model.

Resource: Much of the model can end up being shut off when running inference to generate predictions but those unused parts of the model are still there. They take up space and they consume compute resources as the model server traverses the computation graph.

Performance: Those unwanted features could also introduce unwanted noise into the data, which can degrade model performance.

In general, you shouldn’t just throw everything at your model and rely on your training process to determine which features are actually useful.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Some feature representations such as one hot encoding are problematic for working with text in high dimensional spaces, why?

A

They tend to produce very sparse representations that do not scale well.

One way to overcome this is to use an embedding layer that tokenizes the sentences and assigns a float value to each word.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why data being high dimensional can be an issue (for distance based models)?

A

In extreme cases where we have more features and observations, we run the risk of massively overfitting our model.
\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_

But in the more general case when we have too many features, observations become harder to cluster. Too many dimensions can cause every observation in your data set to appear equidistant from all the others. And because clustering using a distance measures such as Euclidean distance to quantify the similarity between observations, this is a big problem.

Ultimately, having more dimensions often means our model is less efficient.
\_\_\_\_\_\_\_
The higher the dimension, the bigger the feature space, the more sparse the data points (based on Euclidean formula, but it’s the same for other distance formulas), the harder the generalization (and more chance of learning noise)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Regardless of which modeling approach you’re using, increasing dimensionality has another problem especially for classification which is known as the Hughes effect. what is it?

A

This is a phenomenon that demonstrates the improvement in classification performance as the number of features increases until we reach an optimum where we have enough features. Adding more features while keeping the training set the same size will degrade the classifiers performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

TruncatedSVD needs a number of components strictly lower to the number of features. True/False

C3-W2-Lab2

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

There are several ways to choose a k-dimensional subspaces.

For example, in a classification tests, you typically want to have the maximum separation among classes. ____, generally works well for that.

For regression, you want to maximize the correlation between the projected data and the output, where ____, works well. Finally and unsupervised tasks, we typically want to retain as much of the variance as possible. ____, is the most widely used technique for doing that.

A

Linear discriminant analysis, or LDA
Partial least squares, or PLS
Principal component analysis, or PCA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does ICA (Independent Component Analysis) do?

Chat-GPT

A

ICA (Independent Component Analysis) is a linear transformation method that seeks to find a set of independent non-Gaussian signals from a set of mixed signals.
ICA is particularly useful for separating signals that are mixed in a way that is not readily apparent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Independent component analysis separates a multivariant signal into additive components that are maximally independent. Often, ICA is not used for reducing dimensionality but for separating superimposed signals. True/False

A

True

ICA further assumes that there exists independent signals, S, and a linear combination of signals, Y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

NMF(Non-negative Matrix Factorization), like PCA, is a dimensionality reduction technique. In contrast to PCA, however, NMF models are ____.

C3-W2-Lab2

A

interpretable

NMF expresses samples as combinations of interpretable parts. For example, it represents documents as combinations of topics, and images in terms of commonly occurring visual patterns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Can NMF method of dimensionality reduction be used on every dataset?

C3-W2-Lab2

A

No, NMF can’t be applied to every dataset, however. It requires the sample features be non-negative, so greater than or equal to 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
A

Quantization involves transforming a model into an equivalent representation that uses parameters and computations at a lower precision.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Quantization improves the model’s execution performance and efficiency, and it can often result in higher model accuracy. True/False

A

False, Quantization improves the model’s execution performance and efficiency, but it can often result in lower model accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

You can do quantization during training NOT after the model has been trained. True/False

A

False, You can do quantization during training or after the model has been trained.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What’s Post-training quantization?

A

Post-training quantization is a conversion technique that can reduce model size while also improving CPU and hardware accelerator latency with little degradation in model accuracy.

Post-training quantization can result in a loss of accuracy, particularly for smaller networks, but it is often fairly negligible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does post-training quantization do?

A

What post-training quantization basically does is to convert, or more precisely, quantize the weights from floating point numbers to integers in an efficient way.

By doing this, you can gain up to three times lower latency without taking a major hit on accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What can we do If the loss of accuracy is too great in post-training quantization and we want to use quantizers?

A

If the loss of accuracy is too great, consider using quantization aware training. However, doing so requires modifications during model training to add fake quantization nodes, while post-training quantization techniques are fairly simple.

17
Q

What’s pruning?

A

A method to increase the efficiency of models by removing parts of the model that did not contribute substantially to producing accurate results. This is referred to as pruning.

18
Q

Name 3 methods that enable reduced model size and latency which makes it ideal for edge and IOT devices.

C3-W2-Lab3

A
  1. post-training quantization
  2. quantization aware training
  3. weight pruning