Class Seven Flashcards
What is Bayesian statistics?
Bayesian statistics is an approach to statistical inference that uses Bayes’ theorem to update prior beliefs or knowledge based on observed data, resulting in posterior probability distributions.
P(A|B)= (P(A)*P(B|A))/P(B)
What are the advantages of Bayesian statistics?
Advantages of Bayesian statistics include the ability to incorporate prior knowledge, flexibility in handling complex models, and the interpretation of results as probabilities.
What are the limitations of Bayesian statistics?
Limitations of Bayesian statistics include the need for informative prior distributions, computational complexity for complex models, and potential subjectivity in choosing priors.
What are probabilistic classifiers?
Probabilistic classifiers are machine learning models that assign class labels to instances and provide a probability or likelihood estimate of the assigned label based on observed features or attributes.
What is Naive Bayes classifier?
Naive Bayes classifier is a probabilistic classifier based on Bayes’ theorem and the assumption of feature independence given the class. It calculates the posterior probability of each class and assigns the instance to the class with the highest probability.
What are the advantages of Naive Bayes classifier?
Advantages of Naive Bayes classifier include simplicity, fast training and prediction times, and the ability to handle high-dimensional data.
What are the limitations of Naive Bayes classifier?
Limitations of Naive Bayes classifier include the assumption of feature independence (which may not hold in all cases), the sensitivity to irrelevant features, and the potential for poor performance with imbalanced datasets.
What is the difference between Naïve Bayes and Decision Tree?
Naïve Bayes:
* Simultaneously combine all features.
* Training: 1 pass over data to count.
* Conditional independence assumption.
* Testing: look at all features.
* New data: just update counts.
* Accuracy: good if features almost
independent given label (text).
Decision trees:
* Sequence of rules based on 1 feature.
* Training: 1 pass over data per depth.
* Greedy splitting as approximation.
* Testing: just look at features in rules.
* New data: might need to change tree.
* Accuracy: good if simple rules based on
individual features work (“symptoms”).
What is class imbalance in machine learning?
Class imbalance refers to a situation where the distribution of class labels in a dataset is uneven, with one class having significantly fewer instances than the others.
What is SMOTE (Synthetic Minority Over-sampling Technique)?
SMOTE is a technique used to address class imbalance by generating synthetic minority class instances by interpolating between existing minority class instances.
What are the advantages of SMOTE?
Advantages of SMOTE include the ability to balance class distribution, the generation of diverse synthetic samples, and the potential improvement of minority class prediction performance.
What are the limitations of SMOTE?
Limitations of SMOTE include the potential creation of noisy or unrealistic synthetic instances, sensitivity to the choice of neighbors, and difficulties in handling overlapping or borderline cases.
What is ADASYN (Adaptive Synthetic Sampling)?
ADASYN is an extension of SMOTE that adaptively adjusts the generation of synthetic samples based on the difficulty of learning from the minority class instances.
What are the advantages of ADASYN?
Advantages of ADASYN include its ability to focus on more challenging minority class instances, better handling of overlapping classes, and potential performance improvement over SMOTE in imbalanced datasets.
What are the limitations of ADASYN?
Limitations of ADASYN include the potential generation of noisy synthetic instances, sensitivity to the choice of neighbors, and the need for careful parameter tuning.