Machine Learning - Basics Flashcards

Question 1

Q

What is Machine Learning?

Answer

A

ML is the process where computers learn to make decisions from data to meet a goal without being explicitly programmed.

Question 2

Q

What are some common applications of machine learning?

Answer

A

Machine learning algorithms are often used to:
* Learn and automate human processes (e.g., support request routing)
* Optimize outcomes (e.g., give info to optimize costs)
* Predict outcomes (e.g., house prices)
* Model complex relationships (e.g. non-linear, multivariate relationships)
* Learn patterns in data (e.g., fraud detection)

Question 3

Q

What is labeled data and what is it used for?

Answer

A

Labeled data is data that has the information about target variable for each instance.

Labeled data allows us to train supervised machine learning algorithms.

Question 4

Q

What are the most common types of algorithms that use supervised learning?

Answer

A

The most common uses of supervised learning are regression and classification.

Question 5

Q

What are the most common types of algorithms that use unsupervised learning?

Answer

A

The most common uses of unsupervised machine learning are:
* Clustering
* Dimensionality reduction
* Association-rule mining (These don’t predict outcomes - they discover patterns. Useful for rec systems, market analysis, and pattern mining)

Association-rule mining algos: are Apriori, ECLAT, FP-Growth

Question 6

Q

What is the difference between online and offline learning?

Answer

A

Online learning refers to updating models incrementally as each new data point arrives. (Great for real-time systems and streaming data. Often used when the full dataset is too large to fit in memory or when data is constantly evolving.)
Offline learning refers to learning by batch processing data. If new data comes in, an entire new batch (including all the old and new data) must be fed into the algorithm to learn from the new data. (More stable, but not ideal for rapidly changing environments.)

Question 7

Q

What is reinforcement learning?

Answer

A

Reinforcement learning involves algorithms that make decisions by interacting with an environment, getting rewards or penalties for its actions.

Over time, it learns to choose actions that maximize cumulative reward.

In RL, the algorithm doesn’t get told the correct action (i.e., label). Instead, it tries stuff and gets rewards or penalties.

The algorithm optimizes expected cumulative reward over time.

For example, a robot could use reinforcement learning to learn that walking forward into a wall is bad, but turning away from a wall and walking is good.

Question 8

Q

What is the difference between a model parameter and a learning hyperparameter?

Answer

A

A model parameter describes the final model itself, e.g. slope in a linear model.

A learning hyperparameter describes the way in which a model parameter is learned, e.g. learning rate, penalty terms, number of features to include in a weak predictor.

Question 9

Q

What is overfitting?

Answer

A

Overfitting occurs when a model makes much better predictions on known data (data included in the training set) than unknown data (data not included in the training set).

Question 10

Q

Explain the following plot

Answer

A

The plot shows the loss (i.e., error) on the training and validation sets through each epoch of model training.

The red vertical line shows the point where the model begins overfitting (the validation loss starts to increase while the training loss continues to decrease).

Question 11

Q

How can you combat overfitting?

Answer

A

A few ways of combating overfitting are:

Simplify the model (often done by changing the hyperparameters)
Select a different model
Use more training data
Gather better quality data.

Question 12

Q

What is a validation set and why use one?

Answer

A

A validation set is a set of data that is used to evaluate a model’s performance during training/model selection. After models are trained, they are evaluated on the validation set to select the best possible model.

It must never be used for directly for training the model.

It must also not be used as the test data set because we’ve biased our model selection toward working well on this data, even though the model was not directly trained on it.

Question 13

Q

What is a test set and why use one?

Answer

A

A test set is a set of data not used during training or validation.

The model’s performance is evaluated on the test set to predict how well it will generalize to new data.

Question 14

Q

What is cross validation and why is it useful?

Answer

A

Cross validation is a technique for more accurately training and validating models. It rotates what data is held out from model training to be used as the validation data.

Several models are trained and evaluated, with every piece of data being held out from one model. The average performance of all the models is then calculated.

It is a more reliable way to validate models but is more computationally costly, e.g. 5-fold cross validation requires training and validating 5 models instead of 1.

Machine Learning - Basics Flashcards

(14 cards)