Working With Data Flashcards

Question 1

Q

Why can’t be used raw data somethings in ML modelling

Answer

A

Because ML model require data to be number

Question 2

Q

What is data preparation?

Answer

A

It can be defined as the transformation of raw data into a form of data more suitable for the model (number for ex.)

Question 3

Q

Which tasks include data preparation?

Answer

A

Data cleaning, Feature selection, data trasforms, feature engineering ( new variables form existing data) and dimensionality reduction .

Question 4

Q

What is data cleaning ?

Answer

A

It is the process of fixing or removing incorrect data from the dataset that can be affect the analysis ; if data is trash also the prediction will be unreliable

Question 5

Q

What is an outlier ?

Answer

A

It is a data which is significantly distant from the means of the other data ; it could be either significant or meaningless for the model

Question 6

Q

What is overfitting ?

Answer

A

It is a common problem in ML where a model learns the training data too well that can’t performs well on data which doesn’t come from the dataset (external data)

Question 7

Q

What are the features in a ML mod?

Answer

A

They are the main variables of the model, in a dataset they are the number of columns

Question 8

Q

What is supervised learning ?
What is it common use?

Answer

A

Is an approach where a computer algorithm is trained in input data that has been labeled for that scope .
It is used both for classification and regression

Question 9

Q

What is unsupervised learning ?

Answer

A

It is an approach where the computer algorithms is feed by unlabeled data and the output is to detect patterns or similarities in the dataset

Question 10

Q

What is linear regression and what is the formula ?

Answer

A

Linear regression is a ML technique used for modelling the relationship between the depend variables and indipendent variables ; the aim is to do a prediction.

Question 11

Q

What are the steps for finding a model ? Citing and explain it

Answer

A

Prediction, training (parameter estimation), Hyperparameter .
1) We start randomly applied a model on previous unseen data ( the same is for NN, the result will be bad).
2) we adjust the parameter and used Bayesian inference
3) we put the hyperparameters to adjust the structure of the model before the training phase

Question 12

Q

What is a loss function?
What is the formula ?

Answer

A

The loss function , also known as cost function quantifies how well a model prediction match ( yn) the actual output (yn), the goal is to minimare this distance .
FORMULA
1/N • summation from i=1 to N (yn -yn)^2

Working With Data Flashcards

(12 cards)