classification Flashcards

1
Q

what is classification

A

Classification is to determine P(H|X), (i.e., posteriori probability): the probability that the

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

determine the different subsets of probabilities

A

● P(H) (prior probability): the initial probability
○ E.g., X will buy a computer, regardless of age, income
● P(X): the probability that sample data is observed
● P(X|H) (likelihood): the probability of observing the sample X, given that the hypothesis
holds
○ E.g., Given that X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

define briefly the naive bayes

A

it assumes dependence between attributes which ease up the calculation,
if the attributes are continuous then a gaussian distribution is used
if the attributes are categorical then they calculated used defined probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how to deal with 0 values in naive bayes

A

laplacian correction, adding 1 to each case

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

how to measure accuracy

A

We can rely on
precision exactness – what % of tuples that the classifier labelled as positive are
actually positive completeness – what % of positive tuples did the classifier label as positive
recall harmonic mean of precision and recall,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is ensemble method

A
An ensemble for classification is a composite model, made up of a combination of
classifiers.
● The individual classifiers vote, and a class label prediction is returned by the
ensemble based on the collection of votes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is the difference between bagging and boosting

A

bagging is an ensemble method that assigns equal weight predictions,
while boosting for each training tuple weight are assigned, when classifier Mi is learned, weights are updated to allow other classifiers to learn from the tuple that were misclassified by the Mi
usually is more accurate but leads to over fitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

why using random forests

A

bagging does not work very well on decision trees as the trees that are generated are pretty correlated, the idea is we choose L out of D attributes
when L is too small and we perform a linear combination of

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is the disadvantage of naive classifier

A

iclass conditional independence, therefore loss of accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is the holdout method

A

the idea consists of splitting the data into 2 parts training set and testing the issue here is there would be an imbalance which reduce the accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is cross validation

A

when we can not split our data into 2 parts, we only t split the dataset into K chunks, take one chunk use it for testing and the other for training keep iterating till finishing with the all the chunks, this include leave on out, k-fold when the unbalcan happen we may want to stratify where every fold has the same probabilities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is the idea of bootstrapping

A

perform n sampling with replacement, in small dataset use bootstrapped data for training and the original one for testing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly