Class Five Flashcards

Question 1

Q

What is Boosting in machine learning?

Answer

A

Boosting is a machine learning technique that combines multiple weak learners (models) to create a strong learner. It sequentially trains models, giving more weight to misclassified instances to improve overall prediction accuracy.

Question 2

Q

What is Gradient Boosting?

Answer

A

Gradient Boosting is a boosting algorithm that builds an ensemble of weak prediction models in a stage-wise manner, where each new model corrects the errors made by the previous models by minimizing a loss function using gradient descent.

Question 3

Q

What are the advantages of Gradient Boosting?

Answer

A

Advantages of Gradient Boosting include high prediction accuracy, handling of complex data relationships, and ability to capture interactions between features.

Question 4

Q

What are the limitations of Gradient Boosting?

Answer

A

Limitations of Gradient Boosting include potential overfitting if the model is too complex, sensitivity to noisy data, and longer training time compared to other algorithms.

Question 5

Q

What is Support Vector Machine (SVM)?

Answer

A

Support Vector Machine (SVM) is a supervised learning algorithm used for classification and regression tasks. It finds an optimal hyperplane that separates data points of different classes with the maximum margin.

Question 6

Q

What are the advantages of Support Vector Machines (SVM)?

Answer

A

Advantages of SVM include effective in high-dimensional spaces, robust against overfitting, and versatility with different kernel functions.

Easy training. No local optimal.
Scales well
Trade-off between classifier complexity and error can be controlled
explicitly.

Question 7

Q

What are the limitations of Support Vector Machines (SVM)?

Answer

A

Limitations of SVM include sensitivity to the choice of kernel function and hyperparameters, computational complexity for large datasets, and difficulty in handling noisy or overlapping data.
* Weakness: Efficiency depends on choosing kernel function

Question 8

Q

What is the C parameter in SVM?

Answer

A

C is trade-off between training error and flatness of solution.
Tells how much outliers are considered in calculation.
Aim: Keep training error small but need to generalize as well.
Larger C means less training error but risks losing generalization.
Smaller C means classifier is flat.
Grid search can be used to estimate C.
RBF-SVM: two parameters (C and gamma (the radius of RBF))

Question 9

Q

What is sampling error?

Answer

A

Sampling error is the difference between the characteristics observed in a sample and the true characteristics of the population it represents. It arises due to random sampling variation.

Question 10

Q

What is sampling bias?

Answer

A

Sampling bias occurs when the sample used in a study or analysis is not representative of the entire population, leading to systematic errors and inaccurate generalizations.

Question 11

Q

What are Type 1 and Type 2 errors?

Answer

A

Type 1 error, also known as a false positive, occurs when a true null hypothesis is incorrectly rejected. Type 2 error, or false negative, happens when a false null hypothesis is incorrectly retained.

Question 12

Q

What is a p-value?

Answer

A

A p-value is a statistical measure that helps determine the significance of results in hypothesis testing. It represents the probability of observing the data or more extreme results if the null hypothesis is true.

Question 13

Q

What are the limitations of p-values?

Answer

A

Limitations of p-values include reliance on arbitrary thresholds for significance, susceptibility to sample size effects, and potential misinterpretation leading to erroneous conclusions.

Question 14

Q

How can sampling errors be reduced?

Answer

A

Sampling errors can be reduced by increasing the sample size, ensuring random sampling, and minimizing non-response rates to obtain a more representative sample of the population.

Question 15

Q

How can sampling bias be addressed?

Answer

A

Sampling bias can be addressed by using appropriate sampling techniques (e.g., stratified sampling), ensuring diverse and unbiased participant selection, and accounting for potential biases in data analysis.

Question 16

Q

How can Type 1 and Type 2 errors be controlled

Answer

Study These Flashcards

A

Type 1 and Type 2 errors can be controlled by adjusting the significance level (alpha) for hypothesis testing, increasing the sample size to improve statistical power, and conducting thorough statistical analyses to minimize errors.

Class Five Flashcards

(16 cards)