Working With Data Flashcards
Why can’t be used raw data somethings in ML modelling
Because ML model require data to be number
What is data preparation?
It can be defined as the transformation of raw data into a form of data more suitable for the model (number for ex.)
Which tasks include data preparation?
Data cleaning, Feature selection, data trasforms, feature engineering ( new variables form existing data) and dimensionality reduction .
What is data cleaning ?
It is the process of fixing or removing incorrect data from the dataset that can be affect the analysis ; if data is trash also the prediction will be unreliable
What is an outlier ?
It is a data which is significantly distant from the means of the other data ; it could be either significant or meaningless for the model
What is overfitting ?
It is a common problem in ML where a model learns the training data too well that can’t performs well on data which doesn’t come from the dataset (external data)
What are the features in a ML mod?
They are the main variables of the model, in a dataset they are the number of columns
What is supervised learning ?
What is it common use?
Is an approach where a computer algorithm is trained in input data that has been labeled for that scope .
It is used both for classification and regression
What is unsupervised learning ?
It is an approach where the computer algorithms is feed by unlabeled data and the output is to detect patterns or similarities in the dataset
What is linear regression and what is the formula ?
Linear regression is a ML technique used for modelling the relationship between the depend variables and indipendent variables ; the aim is to do a prediction.
What are the steps for finding a model ? Citing and explain it
Prediction, training (parameter estimation), Hyperparameter .
1) We start randomly applied a model on previous unseen data ( the same is for NN, the result will be bad).
2) we adjust the parameter and used Bayesian inference
3) we put the hyperparameters to adjust the structure of the model before the training phase
What is a loss function?
What is the formula ?
The loss function , also known as cost function quantifies how well a model prediction match ( yn) the actual output (yn), the goal is to minimare this distance .
FORMULA
1/N • summation from i=1 to N (yn -yn)^2