Integrative Programming 5 Flashcards

1
Q

Linear Regression

A

a simple algorithm that models linear relationship between inputs and continous numerical output variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Linear Regression Use Cases

A
  • stock price prediction
  • house price prediction
  • customer lifeline value prediction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Linear Regression Advantages

A
  • explainable method
  • interpretable results by its coefficients
  • faster to train other than machine learning models
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Linear Regression Disadvantages

A
  • assumes linear relationship between input and output data
  • sensitive to outliers
  • can underfit small, high-dimensional data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Logistic Regression

A

an algorith that models linear relationship between input and categorical outputs ( 1 or 0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Logistic Regresssion Use Cases

A
  • credit risk score prediction
  • customer churn prediction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Logistic Regresssion Advantages

A
  • interpretable and explainable
  • less prone to overfitting when using regularization
  • applicaple for multi-class prediction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Logistic Regresssion Disadvantages

A
  • assumes linear relationship between input and output
  • Can overfit small, high-dimensional data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Lasso Regression

A

Part of the regression family — it penalizes
features that have low predictive outcomes by
shrinking their coeOcients closer to zero. Can
be used for classification or regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Lasso Regression Use Cases

A
  • predicting house prices
  • predicting clinical outcomes based on health data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Lasso Regression Advantages

A
  • less prone to overfitting
  • can handle high dimensional data
  • no need for feature selection
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Lasso Regression Disadvantages

A

can lead to poor interopreability as it can keep highly correlated variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Ridge Regression

A

Part of the regression family — it penalizes
features that have low predictive outcomes by
shrinking their coeOcients closer to zero. Can
be used for classification or regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Ridge regression

A
  • predictive maintenance for automobiles
  • sales revenue prediction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Ridge Regression Advantages

A
  • less prone to overfitting
  • best suited where data suffer from multicollinearity
  • explainable and interpretable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Ridge Regression Disadvantages

A
  • all predictors are kept in the final model
  • doesnt peform feature selection
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Ridge Regression Case Uses

A
  • prediction for automobile maintenace
  • sales revenue prediction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Decision Tree

A

make decision rules on the features to produce predictions. It can be used for classification or regressio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Decision Tree Use Cases

A
  • customer churn prediction
  • credit score modeling
  • disease prediction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Decision Tree Advantages

A
  • explainable and interpretable
  • can handle missing values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Decision Tree Disadvantages

A
  • prone to ovefitting
  • sensitive to outliers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Random Forest

A

learning method that combines the output of multiple decision trees

23
Q

Random Forest Use Cases

A
  • credit score modeling
  • house prices prediction
24
Q

Random Forest Advantages

A
  • reduces overfitting
  • higher accuracy than other models
25
Q

Random Forest Disadvantages

A
  • not very interpretable
  • higher complexity training
26
Q

Gradient Boosting Regression

A

employs boosting to make predictive models from an ensemble of weak predictive learners

27
Q

Gradient Boosting Regression Used Cases

A
  • prediction of car emission
  • prediction of ride hailing fair amount
28
Q

Gradient Boosting Regression Advantages

A
  • better accuracy than other models
  • handle multicollinearity
  • can handle non linear relationships
29
Q

Gradient Boosting Regression Disadvantages

A
  • sensitive to outliers therefore resulting to overfitting
  • computationally expensive and high complexity
30
Q

XGBoost

A
  • effiecient and flexible
  • can be used for classification or regression
31
Q

XGBoost Used Cases

A
  • claims in processing insurance
  • churn prediction
32
Q

XGBoost Advantage

A
  • claims in processing insurance
  • churn prediction
33
Q

XGBoost Disadvantage

A
  • hyperparameter tuning complexity
  • cannot perform well on spare datasets
34
Q

LightGBM Regressor

A

boosting framework designed to be more efficient other than implementation

35
Q

LightGBM Regressor Used CAses

A
  • predicting flight time for airlines
  • predicting cholesterol level based on health data
36
Q

Tree-based model

A
  1. decision tree
  2. random forest
  3. gradient boosting regression
  4. XGBoost
  5. LightGBM Regressor
37
Q

LightGBM Regressor Advantages

A
  • can handle large amounts of data
  • computationally efficient
  • low memory usage
38
Q

LightGBM Regressor Disadvantages

A
  • can overfit data due to high complexity
  • hyperparameter tuning can be complex
39
Q

Clustering Models

A
  1. k-means
  2. hierarical clustering
  3. gaussian mixture model
40
Q

K-means

A

K-Means is the most widely used clustering
approach—it determines K clusters based on
euclidean distances

41
Q

K-Mean Used Cases

A
  • customer segmentation
  • recommendation system
42
Q

K-Mean Advantages

A
  • can handle scale large datasets
  • simple to interpret and implement
  • results to tight clusters
43
Q

K-Mean Disadvantages

A
  • equires the expected number of clusters
    from the beginning
  • Has troubles with varying cluster sizes and densities
44
Q

hierarical clustering

A
  • bottom up approcah where datasets are considered as their own clusters
  • merge of two clusters iteratively
45
Q

hierarical clustering used cases

A
  • fraud detection
  • document clustering based on similarities
46
Q

hierarical clustering advantages

A
  • no need to spcify number of clusters
  • resulting dendrogram is informative
47
Q

hierarical clustering disadvantages

A
  • not always the best in clustering
  • cannot handle large datasets
48
Q

gaussian mixture modeling

A

probalistic modeling normally distributed clusters without dataset

49
Q

gaussian mixture modeling advantages

A
  • compares probability for an observation belonging to a cluster
  • can identify overlapping clusters
  • provide better accurate results than k-means
50
Q

gaussian mixture modeling disadvantages

A
  • requires complex tuning
  • requires setting the number of mixture components or clusters
51
Q

gaussian mixture modeling used cases

A
  • customer segmentation
  • recommendataion system
52
Q

Apriori algorithm

A

Rule based approach that identifies the most frequent itemset in a given dataset where prior knowledge of frequent itemset properties is used

53
Q

Apriori algorithm used cases

A
  • prodcut placement
  • recommendation engine
  • promotional optimization
54
Q

Apriori algorithm advantages

A
  • results are intrepretable and intuitive
  • exhaustive approach as it finds all rules