Machine Learning Flashcards

1
Q

What is Machine Learning? (ML)

A

The study of computer algorithms that improve automatically through experience and the use of data. It’s part of AI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How does ML work?

A

Machine learning algorithms build a model based on sample data, known as “training data”, in order to make
predictions or decisions without being explicitly programmed to do so.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you formalize ML?

A

ML can be described as a function Y=H(X) where the goal is to find the most simple H which predicts Y using X as input for a given prediction accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What do you call the performance of H in matching Y using X?

A

The Objective function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you find the objective function?

A

Obj(H) = L(H) + omega(H)

where L is the matching error
and Omega is the regularization term/complexity of H

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does ML consist of in terms of the objective function?

A

Minimizing the Obj(H) as the best potential compromise between prediction accuracy and complexity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the main categories of Machine Learning?

A

Supervised: classification & regression
Unsupervised: clustering, association & dimension reduction (generalization)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the difference between supervised and unsupervised ML?

A

Supervised: data is pre-categorized

Unsupervised: data is not labeled

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the main ML application/tasks?

A

Forecasting and classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the main categories of ML engines

A

-Linear/non-linear regressions
-Random forests and boosted trees
-Deep learning and neural networks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a linear regression?

A

You model the relationship between two variables Y and X where X explains Y such that:

Y= aX+b

where a=Cov(Y,X)/Var(X)
and B=E(Y)-aE(X)

(remember Y is what you want to predict and X is the explanatory variable)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What do you need for the regression to be complete?

A

The mean of the residue should be normally distributed with a mean of 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the steps in training AI predictive models?

A

Building the model
Training the model on sample data
Testing the model on different sample data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is one of the main challenges in training ML algorithms?

A

Avoiding overfitting so that it only works on the training data sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do you avoid overfitting?

A

You keep the model as simple as possible (few parameters)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the trade-off in training a predictive engine?

A

Between testing error and model complexity

17
Q

What is a decision tree?

A

Tool that uses that uses a tree-like model of decisions and their
possible consequences, including chance event outcomes, resource costs, and utility. It is one
way to display an algorithm that only contains conditional control statements.

18
Q

What are the risks inherent to AI?

A

For Data: Biased samples, correlation is not causality, lacking features, changes in patterns

For Algorithm: lack of explainability, overfitting, design flaws, lack of contextual sensitivity and common sense