Unit 5: Random Forests and Ensemble Learning Flashcards

1
Q

What is a Random Forest, and how does it function?

A

Random Forest: An ensemble learning method that uses multiple decision trees to improve accuracy.
Functionality:

Bagging: Samples data with replacement to create diverse decision trees.
Majority Voting: Final prediction is made based on the majority vote from all trees.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the advantages of using Random Forest over a single decision tree?

A

Advantages:

Reduced Overfitting: Random Forest mitigates overfitting compared to a single decision tree.
Improved Accuracy: Combines multiple trees for a more robust model.
Feature Importance: Provides insights into feature significance in predictions.
    Importance Calculation: Based on the decrease in accuracy when a feature is permuted.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explain different ensemble learning techniques and their applications.

A

Ensemble Learning Techniques:

Bagging (Bootstrap Aggregating): Reduces variance by averaging predictions from multiple models (e.g., Random Forest).
Boosting: Sequentially trains models, each correcting errors of the previous one (e.g., AdaBoost, Gradient Boosting).
Stacking: Combines predictions from multiple models using another model to improve performance.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are common evaluation metrics for assessing model performance?

A

Evaluation Metrics:

Accuracy: The ratio of correctly predicted instances to total instances.
    Equation: Accuracy=TP+TNTP+TN+FP+FNAccuracy=TP+TN+FP+FNTP+TN​ (where TP, TN, FP, and FN are True Positives, True Negatives, False Positives, and False Negatives, respectively).
Precision: The ratio of correctly predicted positive observations to the total predicted positives.
    Equation: Precision=TPTP+FPPrecision=TP+FPTP​
Recall (Sensitivity): The ratio of correctly predicted positive observations to all actual positives.
    Equation: Recall=TPTP+FNRecall=TP+FNTP​
F1 Score: The harmonic mean of precision and recall.
    Equation: F1 Score=2⋅Precision⋅RecallPrecision+RecallF1 Score=2⋅Precision+RecallPrecision⋅Recall​
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Mean

A

Mean (Average):
Mean=∑i=1nxin
Mean=n∑i=1n​xi​​

Where xixi​ are the data points and nn is the number of data points.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Median:

For an ordered dataset, the median is the middle value. If there is an even number of observations:
A

Median=2x2n​​+x2+1n​​​

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Mode:

The most frequently occurring value in a dataset.
A

Linear Regression Equation:
y=mx+b
y=mx+b
Where yy is the predicted value, mm is the slope, xx is the feature, and bb is the y-intercept.

Confusion Matrix:
    Useful for evaluating classification models:
Accuracy=TP+TNTP+TN+FP+FN
Accuracy=TP+TN+FP+FNTP+TN​
    Where:
        TPTP = True Positives
        TNTN = True Negatives
        FPFP = False Positives
        FNFN = False Negatives

Precision and Recall:
    Precision:
Precision=TPTP+FP
Precision=TP+FPTP​
    Recall (Sensitivity):
Recall=TPTP+FN
Recall=TP+FNTP​

Unit 3: Neural Networks

Activation Function (Sigmoid):
σ(x)=11+e−x
σ(x)=1+e−x1​

Mean Squared Error (MSE) Loss Function:
MSE=1n∑i=1n(yi−y^i)2
MSE=n1​i=1∑n​(yi​−y^​i​)2
    Where yiyi​ is the true value and y^iy^​i​ is the predicted value.

Unit 4: Support Vector Machines and Flexible Discriminants

SVM Decision Function:
f(x)=∑i=1nαiyiK(xi,x)+b
f(x)=i=1∑n​αi​yi​K(xi​,x)+b
    Where:
        αiαi​ are the weights,
        yiyi​ are the class labels,
        KK is the kernel function.

Unit 5: Random Forests and Ensemble Learning

Entropy for Information Gain:
H(S)=−∑i=1cpilog⁡2(pi)
H(S)=−i=1∑c​pi​log2​(pi​)
    Where pipi​ is the proportion of class ii in set SS.

Gini Index:
Gini(S)=1−∑i=1cpi2
Gini(S)=1−i=1∑c​pi2​

General Statistical Concepts

Central Limit Theorem:
    States that the distribution of the sample means approaches a normal distribution as the sample size increases, regardless of the population's distribution.

Hypothesis Testing:
    Test statistic:
z=xˉ−μsn
z=n

​s​xˉ−μ​
    Where:
        xˉxˉ = sample mean,
        μμ = population mean,
        ss = sample standard deviation,
        nn = sample size.

These equations and concepts should provide a solid foundation for the mathematical aspects of your course. If there’s a specific topic or equation you’re trying to recall, let me know, and I can provide more targeted information!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Variance:

A

Variance(s2)=n−1∑i=1n​(xi​−xˉ)2​ Where xˉxˉ is the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly