Integrative Programming 5 Flashcards

1
Q

Linear Regression

A

a simple algorithm that models linear relationship between inputs and continous numerical output variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Linear Regression Use Cases

A
  • stock price prediction
  • house price prediction
  • customer lifeline value prediction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Linear Regression Advantages

A
  • explainable method
  • interpretable results by its coefficients
  • faster to train other than machine learning models
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Linear Regression Disadvantages

A
  • assumes linear relationship between input and output data
  • sensitive to outliers
  • can underfit small, high-dimensional data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Logistic Regression

A

an algorith that models linear relationship between input and categorical outputs ( 1 or 0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Logistic Regresssion Use Cases

A
  • credit risk score prediction
  • customer churn prediction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Logistic Regresssion Advantages

A
  • interpretable and explainable
  • less prone to overfitting when using regularization
  • applicaple for multi-class prediction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Logistic Regresssion Disadvantages

A
  • assumes linear relationship between input and output
  • Can overfit small, high-dimensional data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Lasso Regression

A

Part of the regression family — it penalizes
features that have low predictive outcomes by
shrinking their coeOcients closer to zero. Can
be used for classification or regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Lasso Regression Use Cases

A
  • predicting house prices
  • predicting clinical outcomes based on health data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Lasso Regression Advantages

A
  • less prone to overfitting
  • can handle high dimensional data
  • no need for feature selection
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Lasso Regression Disadvantages

A

can lead to poor interopreability as it can keep highly correlated variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Ridge Regression

A

Part of the regression family — it penalizes
features that have low predictive outcomes by
shrinking their coeOcients closer to zero. Can
be used for classification or regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Ridge regression

A
  • predictive maintenance for automobiles
  • sales revenue prediction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Ridge Regression Advantages

A
  • less prone to overfitting
  • best suited where data suffer from multicollinearity
  • explainable and interpretable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Ridge Regression Disadvantages

A
  • all predictors are kept in the final model
  • doesnt peform feature selection
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Ridge Regression Case Uses

A
  • prediction for automobile maintenace
  • sales revenue prediction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Decision Tree

A

make decision rules on the features to produce predictions. It can be used for classification or regressio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Decision Tree Use Cases

A
  • customer churn prediction
  • credit score modeling
  • disease prediction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Decision Tree Advantages

A
  • explainable and interpretable
  • can handle missing values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Decision Tree Disadvantages

A
  • prone to ovefitting
  • sensitive to outliers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Random Forest

A

learning method that combines the output of multiple decision trees

23
Q

Random Forest Use Cases

A
  • credit score modeling
  • house prices prediction
24
Q

Random Forest Advantages

A
  • reduces overfitting
  • higher accuracy than other models
25
Random Forest Disadvantages
* not very interpretable * higher complexity training
26
Gradient Boosting Regression
employs boosting to make predictive models from an ensemble of weak predictive learners
27
Gradient Boosting Regression Used Cases
* prediction of car emission * prediction of ride hailing fair amount
28
Gradient Boosting Regression Advantages
* better accuracy than other models * handle multicollinearity * can handle non linear relationships
29
Gradient Boosting Regression Disadvantages
* sensitive to outliers therefore resulting to overfitting * computationally expensive and high complexity
30
XGBoost
* effiecient and flexible * can be used for classification or regression
31
XGBoost Used Cases
* claims in processing insurance * churn prediction
32
XGBoost Advantage
* claims in processing insurance * churn prediction
33
XGBoost Disadvantage
* hyperparameter tuning complexity * cannot perform well on spare datasets
34
LightGBM Regressor
boosting framework designed to be more efficient other than implementation
35
LightGBM Regressor Used CAses
* predicting flight time for airlines * predicting cholesterol level based on health data
36
Tree-based model
1. decision tree 2. random forest 3. gradient boosting regression 4. XGBoost 5. LightGBM Regressor
37
LightGBM Regressor Advantages
* can handle large amounts of data * computationally efficient * low memory usage
38
LightGBM Regressor Disadvantages
* can overfit data due to high complexity * hyperparameter tuning can be complex
39
Clustering Models
1. k-means 2. hierarical clustering 3. gaussian mixture model
40
K-means
K-Means is the most widely used clustering approach—it determines K clusters based on euclidean distances
41
K-Mean Used Cases
* customer segmentation * recommendation system
42
K-Mean Advantages
* can handle scale large datasets * simple to interpret and implement * results to tight clusters
43
K-Mean Disadvantages
* equires the expected number of clusters from the beginning * Has troubles with varying cluster sizes and densities
44
hierarical clustering
* bottom up approcah where datasets are considered as their own clusters * merge of two clusters iteratively
45
hierarical clustering used cases
* fraud detection * document clustering based on similarities
46
hierarical clustering advantages
* no need to spcify number of clusters * resulting dendrogram is informative
47
hierarical clustering disadvantages
* not always the best in clustering * cannot handle large datasets
48
gaussian mixture modeling
probalistic modeling normally distributed clusters without dataset
49
gaussian mixture modeling advantages
* compares probability for an observation belonging to a cluster * can identify overlapping clusters * provide better accurate results than k-means
50
gaussian mixture modeling disadvantages
* requires complex tuning * requires setting the number of mixture components or clusters
51
gaussian mixture modeling used cases
* customer segmentation * recommendataion system
52
Apriori algorithm
Rule based approach that identifies the most frequent itemset in a given dataset where prior knowledge of frequent itemset properties is used
53
Apriori algorithm used cases
* prodcut placement * recommendation engine * promotional optimization
54
Apriori algorithm advantages
* results are intrepretable and intuitive * exhaustive approach as it finds all rules