predictive maintenance for automobiles sales revenue prediction

Integrative Programming 5 Flashcards by Mary Forro

Linear Regression

a simple algorithm that models linear relationship between inputs and continous numerical output variable

How well did you know this?

Not at all

Perfectly

Linear Regression Use Cases

stock price prediction
house price prediction
customer lifeline value prediction

How well did you know this?

Not at all

Perfectly

Linear Regression Advantages

explainable method
interpretable results by its coefficients
faster to train other than machine learning models

How well did you know this?

Not at all

Perfectly

Linear Regression Disadvantages

assumes linear relationship between input and output data
sensitive to outliers
can underfit small, high-dimensional data

How well did you know this?

Not at all

Perfectly

Logistic Regression

an algorith that models linear relationship between input and categorical outputs ( 1 or 0)

How well did you know this?

Not at all

Perfectly

Logistic Regresssion Use Cases

credit risk score prediction
customer churn prediction

How well did you know this?

Not at all

Perfectly

Logistic Regresssion Advantages

interpretable and explainable
less prone to overfitting when using regularization
applicaple for multi-class prediction

How well did you know this?

Not at all

Perfectly

Logistic Regresssion Disadvantages

assumes linear relationship between input and output
Can overfit small, high-dimensional data

How well did you know this?

Not at all

Perfectly

Lasso Regression

Part of the regression family — it penalizes
features that have low predictive outcomes by
shrinking their coeOcients closer to zero. Can
be used for classification or regression

How well did you know this?

Not at all

Perfectly

Lasso Regression Use Cases

predicting house prices
predicting clinical outcomes based on health data

How well did you know this?

Not at all

Perfectly

Lasso Regression Advantages

less prone to overfitting
can handle high dimensional data
no need for feature selection

How well did you know this?

Not at all

Perfectly

Lasso Regression Disadvantages

can lead to poor interopreability as it can keep highly correlated variables

How well did you know this?

Not at all

Perfectly

Ridge Regression

Part of the regression family — it penalizes
features that have low predictive outcomes by
shrinking their coeOcients closer to zero. Can
be used for classification or regression

How well did you know this?

Not at all

Perfectly

Ridge regression

predictive maintenance for automobiles
sales revenue prediction

How well did you know this?

Not at all

Perfectly

Ridge Regression Advantages

less prone to overfitting
best suited where data suffer from multicollinearity
explainable and interpretable

How well did you know this?

Not at all

Perfectly

Ridge Regression Disadvantages

all predictors are kept in the final model
doesnt peform feature selection

How well did you know this?

Not at all

Perfectly

Ridge Regression Case Uses

prediction for automobile maintenace
sales revenue prediction

How well did you know this?

Not at all

Perfectly

Decision Tree

make decision rules on the features to produce predictions. It can be used for classification or regressio

How well did you know this?

Not at all

Perfectly

Decision Tree Use Cases

customer churn prediction
credit score modeling
disease prediction

How well did you know this?

Not at all

Perfectly

Decision Tree Advantages

explainable and interpretable
can handle missing values

How well did you know this?

Not at all

Perfectly

Decision Tree Disadvantages

prone to ovefitting
sensitive to outliers

How well did you know this?

Not at all

Perfectly

Random Forest

Study These Flashcards

learning method that combines the output of multiple decision trees

Random Forest Use Cases

Study These Flashcards

credit score modeling
house prices prediction

Random Forest Advantages

Study These Flashcards

reduces overfitting
higher accuracy than other models

Random Forest Disadvantages

* not very interpretable * higher complexity training

Gradient Boosting Regression

employs boosting to make predictive models from an ensemble of weak predictive learners

Gradient Boosting Regression Used Cases

* prediction of car emission * prediction of ride hailing fair amount

Gradient Boosting Regression Advantages

* better accuracy than other models * handle multicollinearity * can handle non linear relationships

Gradient Boosting Regression Disadvantages

* sensitive to outliers therefore resulting to overfitting * computationally expensive and high complexity

XGBoost

* effiecient and flexible * can be used for classification or regression

XGBoost Used Cases

* claims in processing insurance * churn prediction

XGBoost Advantage

* claims in processing insurance * churn prediction

XGBoost Disadvantage

* hyperparameter tuning complexity * cannot perform well on spare datasets

LightGBM Regressor

boosting framework designed to be more efficient other than implementation

LightGBM Regressor Used CAses

* predicting flight time for airlines * predicting cholesterol level based on health data

Tree-based model

1. decision tree 2. random forest 3. gradient boosting regression 4. XGBoost 5. LightGBM Regressor

LightGBM Regressor Advantages

* can handle large amounts of data * computationally efficient * low memory usage

LightGBM Regressor Disadvantages

* can overfit data due to high complexity * hyperparameter tuning can be complex

Clustering Models

1. k-means 2. hierarical clustering 3. gaussian mixture model

K-means

K-Means is the most widely used clustering approach—it determines K clusters based on euclidean distances

K-Mean Used Cases

* customer segmentation * recommendation system

K-Mean Advantages

* can handle scale large datasets * simple to interpret and implement * results to tight clusters

K-Mean Disadvantages

* equires the expected number of clusters from the beginning * Has troubles with varying cluster sizes and densities

hierarical clustering

* bottom up approcah where datasets are considered as their own clusters * merge of two clusters iteratively

hierarical clustering used cases

* fraud detection * document clustering based on similarities

hierarical clustering advantages

* no need to spcify number of clusters * resulting dendrogram is informative

hierarical clustering disadvantages

* not always the best in clustering * cannot handle large datasets

gaussian mixture modeling

probalistic modeling normally distributed clusters without dataset

gaussian mixture modeling advantages

* compares probability for an observation belonging to a cluster * can identify overlapping clusters * provide better accurate results than k-means

gaussian mixture modeling disadvantages

* requires complex tuning * requires setting the number of mixture components or clusters

gaussian mixture modeling used cases

* customer segmentation * recommendataion system

Apriori algorithm

Rule based approach that identifies the most frequent itemset in a given dataset where prior knowledge of frequent itemset properties is used

Apriori algorithm used cases

* prodcut placement * recommendation engine * promotional optimization

Apriori algorithm advantages

* results are intrepretable and intuitive * exhaustive approach as it finds all rules

Integrative Programming 5 Flashcards

(54 cards)