Class Five Flashcards
What is Boosting in machine learning?
Boosting is a machine learning technique that combines multiple weak learners (models) to create a strong learner. It sequentially trains models, giving more weight to misclassified instances to improve overall prediction accuracy.
What is Gradient Boosting?
Gradient Boosting is a boosting algorithm that builds an ensemble of weak prediction models in a stage-wise manner, where each new model corrects the errors made by the previous models by minimizing a loss function using gradient descent.
What are the advantages of Gradient Boosting?
Advantages of Gradient Boosting include high prediction accuracy, handling of complex data relationships, and ability to capture interactions between features.
What are the limitations of Gradient Boosting?
Limitations of Gradient Boosting include potential overfitting if the model is too complex, sensitivity to noisy data, and longer training time compared to other algorithms.
What is Support Vector Machine (SVM)?
Support Vector Machine (SVM) is a supervised learning algorithm used for classification and regression tasks. It finds an optimal hyperplane that separates data points of different classes with the maximum margin.
What are the advantages of Support Vector Machines (SVM)?
Advantages of SVM include effective in high-dimensional spaces, robust against overfitting, and versatility with different kernel functions.
- Easy training. No local optimal.
- Scales well
- Trade-off between classifier complexity and error can be controlled
explicitly.
What are the limitations of Support Vector Machines (SVM)?
Limitations of SVM include sensitivity to the choice of kernel function and hyperparameters, computational complexity for large datasets, and difficulty in handling noisy or overlapping data.
* Weakness: Efficiency depends on choosing kernel function
What is the C parameter in SVM?
- C is trade-off between training error and flatness of solution.
- Tells how much outliers are considered in calculation.
- Aim: Keep training error small but need to generalize as well.
- Larger C means less training error but risks losing generalization.
- Smaller C means classifier is flat.
- Grid search can be used to estimate C.
- RBF-SVM: two parameters (C and gamma (the radius of RBF))
What is sampling error?
Sampling error is the difference between the characteristics observed in a sample and the true characteristics of the population it represents. It arises due to random sampling variation.
What is sampling bias?
Sampling bias occurs when the sample used in a study or analysis is not representative of the entire population, leading to systematic errors and inaccurate generalizations.
What are Type 1 and Type 2 errors?
Type 1 error, also known as a false positive, occurs when a true null hypothesis is incorrectly rejected. Type 2 error, or false negative, happens when a false null hypothesis is incorrectly retained.
What is a p-value?
A p-value is a statistical measure that helps determine the significance of results in hypothesis testing. It represents the probability of observing the data or more extreme results if the null hypothesis is true.
What are the limitations of p-values?
Limitations of p-values include reliance on arbitrary thresholds for significance, susceptibility to sample size effects, and potential misinterpretation leading to erroneous conclusions.
How can sampling errors be reduced?
Sampling errors can be reduced by increasing the sample size, ensuring random sampling, and minimizing non-response rates to obtain a more representative sample of the population.
How can sampling bias be addressed?
Sampling bias can be addressed by using appropriate sampling techniques (e.g., stratified sampling), ensuring diverse and unbiased participant selection, and accounting for potential biases in data analysis.