Machine Learning Flashcards
Learn basics of ML
What does ML allows us to do
By using data from past observations we can predict future outcomes or values
ex. icecream sales based on historical sales and weather records
What is function in ML terms
ML encapsulates a function to calculate output value based on one or more input values
process of defining function is known as training
predict new values based on function is known as inferencing
ML Steps for Training and Inferencing
- Obtain training data of past observations. This data has the attributes or features(x) of thing being observed and known output (label(y)). [x1,x2,x3], y … lots of this data
ex. for icecream sales case features could be temp, rainfall, windspeed and # sold would be the label
- Algorithm is applied to data to determine relation between features and label. Specific algorithm applied depends on the problem, but main goal is to fit a function to the data
- result of algorithm is a function that captures the model. y= f(x). We can use this to make predictions now
- Training phase is complete. Now we can perform inferencing. Use our function to get predicted value represented by y hat (y^)
What are the different types of Supervised ML?
Regression and Classification (Binary and Multiclass)
What is an example of Unsupervised ML?
Clustering
Supervised ML
Supervised ML is when the training data includes both feature values and labels
Regression
label predicted by model is a numeric value
i.e ice cream sales, fuel efficiency, property price
Binary Classification
label determines whether observed item is or isn’t something.
i.e. is patient diabetic based on weight, age, blood
i.e.2. will customer default on loan based on income, age, credit history
Multiclass Classification
Extension of binary classification. to determine multiple outcomes.
i.e out of 3 species which penguin can this be based on physical measurements
genre of movie based on cast, director, and budget
Unsupervised ML
involves training models that consist only of feature values without any known labels. The ML algorithms will determine relation between features to group them
Clustering
most common form of unsupervised ML.
clustering algorithm identifies similarities between observations based on their features and groups them into discrete clusters
i.e. group similar flowers based on size, number of leaves and number of petals
It can be thought of to be similar to multiclass except we do not have the defined labels with clustering
Regression Model process
- split training data to data you will train with and subset to validate trained model
- use algorithm to fit training data to model - use regression algorithm like linear regression
- use validation data held back to test the model by predicted labels for features
- Compares actual labels for validation data against predicted. Then aggregate differences between predicted and actual label values to see how the model performs
This process can be repeated with different algorithms and parameters.
What are some evaluation metrics for regression?
Mean Absolute Error (MAE)
Mean Squared Error (MSE)
Root Mean Squared Error (RMSE)
Coefficient of Determination (R^2)
What is calculated in classification model
probability values — instead of numeric values like in regression
for binary classification the value is 1 or 0 (yes or no, true/false)
What kind of shape does the function for binary classification take?
it takes a sigmoid shape kind of like an S and one algorithm that can be used to get this is logistic regression (top and bottom part do look like a logarithmic function)