Working With Data Flashcards

1
Q

Why can’t be used raw data somethings in ML modelling

A

Because ML model require data to be number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is data preparation?

A

It can be defined as the transformation of raw data into a form of data more suitable for the model (number for ex.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Which tasks include data preparation?

A

Data cleaning, Feature selection, data trasforms, feature engineering ( new variables form existing data) and dimensionality reduction .

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is data cleaning ?

A

It is the process of fixing or removing incorrect data from the dataset that can be affect the analysis ; if data is trash also the prediction will be unreliable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is an outlier ?

A

It is a data which is significantly distant from the means of the other data ; it could be either significant or meaningless for the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is overfitting ?

A

It is a common problem in ML where a model learns the training data too well that can’t performs well on data which doesn’t come from the dataset (external data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the features in a ML mod?

A

They are the main variables of the model, in a dataset they are the number of columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is supervised learning ?
What is it common use?

A

Is an approach where a computer algorithm is trained in input data that has been labeled for that scope .
It is used both for classification and regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is unsupervised learning ?

A

It is an approach where the computer algorithms is feed by unlabeled data and the output is to detect patterns or similarities in the dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is linear regression and what is the formula ?

A

Linear regression is a ML technique used for modelling the relationship between the depend variables and indipendent variables ; the aim is to do a prediction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the steps for finding a model ? Citing and explain it

A

Prediction, training (parameter estimation), Hyperparameter .
1) We start randomly applied a model on previous unseen data ( the same is for NN, the result will be bad).
2) we adjust the parameter and used Bayesian inference
3) we put the hyperparameters to adjust the structure of the model before the training phase

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a loss function?
What is the formula ?

A

The loss function , also known as cost function quantifies how well a model prediction match ( yn) the actual output (yn), the goal is to minimare this distance .
FORMULA
1/N • summation from i=1 to N (yn -y
n)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly