Unit 5: Random Forests and Ensemble Learning Flashcards

Question 1

Q

What is a Random Forest, and how does it function?

Answer

A

Random Forest: An ensemble learning method that uses multiple decision trees to improve accuracy.
Functionality:

Bagging: Samples data with replacement to create diverse decision trees.
Majority Voting: Final prediction is made based on the majority vote from all trees.

Question 2

Q

What are the advantages of using Random Forest over a single decision tree?

Answer

A

Advantages:

Reduced Overfitting: Random Forest mitigates overfitting compared to a single decision tree.
Improved Accuracy: Combines multiple trees for a more robust model.
Feature Importance: Provides insights into feature significance in predictions.
    Importance Calculation: Based on the decrease in accuracy when a feature is permuted.

Question 3

Q

Explain different ensemble learning techniques and their applications.

Answer

A

Ensemble Learning Techniques:

Bagging (Bootstrap Aggregating): Reduces variance by averaging predictions from multiple models (e.g., Random Forest).
Boosting: Sequentially trains models, each correcting errors of the previous one (e.g., AdaBoost, Gradient Boosting).
Stacking: Combines predictions from multiple models using another model to improve performance.

Question 4

Q

What are common evaluation metrics for assessing model performance?

Answer

A

Evaluation Metrics:

Accuracy: The ratio of correctly predicted instances to total instances.
    Equation: Accuracy=TP+TNTP+TN+FP+FNAccuracy=TP+TN+FP+FNTP+TN (where TP, TN, FP, and FN are True Positives, True Negatives, False Positives, and False Negatives, respectively).
Precision: The ratio of correctly predicted positive observations to the total predicted positives.
    Equation: Precision=TPTP+FPPrecision=TP+FPTP
Recall (Sensitivity): The ratio of correctly predicted positive observations to all actual positives.
    Equation: Recall=TPTP+FNRecall=TP+FNTP
F1 Score: The harmonic mean of precision and recall.
    Equation: F1 Score=2⋅Precision⋅RecallPrecision+RecallF1 Score=2⋅Precision+RecallPrecision⋅Recall

Question 5

Q

Mean

Answer

A

Mean (Average):
Mean=∑i=1nxin
Mean=n∑i=1nxi

Where xixi are the data points and nn is the number of data points.

Question 6

Q

Median:

For an ordered dataset, the median is the middle value. If there is an even number of observations:

Answer

A

Median=2x2n+x2+1n

Question 7

Q

Mode:

The most frequently occurring value in a dataset.

Answer

A

Linear Regression Equation:
y=mx+b
y=mx+b
Where yy is the predicted value, mm is the slope, xx is the feature, and bb is the y-intercept.

Confusion Matrix:
    Useful for evaluating classification models:
Accuracy=TP+TNTP+TN+FP+FN
Accuracy=TP+TN+FP+FNTP+TN
    Where:
        TPTP = True Positives
        TNTN = True Negatives
        FPFP = False Positives
        FNFN = False Negatives

Precision and Recall:
    Precision:
Precision=TPTP+FP
Precision=TP+FPTP
    Recall (Sensitivity):
Recall=TPTP+FN
Recall=TP+FNTP

Unit 3: Neural Networks

Activation Function (Sigmoid):
σ(x)=11+e−x
σ(x)=1+e−x1

Mean Squared Error (MSE) Loss Function:
MSE=1n∑i=1n(yi−y^i)2
MSE=n1i=1∑n(yi−y^i)2
    Where yiyi is the true value and y^iy^i is the predicted value.

Unit 4: Support Vector Machines and Flexible Discriminants

SVM Decision Function:
f(x)=∑i=1nαiyiK(xi,x)+b
f(x)=i=1∑nαiyiK(xi,x)+b
    Where:
        αiαi are the weights,
        yiyi are the class labels,
        KK is the kernel function.

Unit 5: Random Forests and Ensemble Learning

Entropy for Information Gain:
H(S)=−∑i=1cpilog⁡2(pi)
H(S)=−i=1∑cpilog2(pi)
    Where pipi is the proportion of class ii in set SS.

Gini Index:
Gini(S)=1−∑i=1cpi2
Gini(S)=1−i=1∑cpi2

General Statistical Concepts

Central Limit Theorem:
    States that the distribution of the sample means approaches a normal distribution as the sample size increases, regardless of the population's distribution.

Hypothesis Testing:
    Test statistic:
z=xˉ−μsn
z=n

sxˉ−μ
    Where:
        xˉxˉ = sample mean,
        μμ = population mean,
        ss = sample standard deviation,
        nn = sample size.

These equations and concepts should provide a solid foundation for the mathematical aspects of your course. If there’s a specific topic or equation you’re trying to recall, let me know, and I can provide more targeted information!

Question 8

Q

Variance:

Answer

A

Variance(s2)=n−1∑i=1n(xi−xˉ)2 Where xˉxˉ is the mean.

Question 9

Q

Unit 5: Random Forests and Ensemble Learning Flashcards

(9 cards)