Supervised Classifiers Flashcards
What is knowledge representation?
Viewing what the machine algorithm has learned. It may be in the form of a set of rules; probability distribution
Steps in developing a machine learning application
1) Collect Data
2) Prepare the input data
3) Analyze the input data
4) Train the algorithm
5) Test the algorithm
What are k-nearest neighbor pros and cons?
Pros: High accuracy, insensitive to outliners, no assumptions about data
Cons: Computationally expensive, requires lots of memory
Describe k-nearest-neighbors.
We have an existing set of example data, our training set. We have labels for all of this data – we know what class each piece of the data should fall into. When we’re given a new piece of data without a label, we compare that new piece of data to the existing data, every piece of existing data. We then take the most similar pieces of data (the nearest neighbors) and look at their labels. We look at the top k most similar pieces of data from our known dataset. Majority vote wins.
What’s the formula to normalize and scale everything to the range 0 to 1
newValue = (oldValue - min)/(max-min)
What is the Euclidian distance between two vectors?
sqrt( (Xa0 - Xb0)^2 + (Xa1 - Xb1) ^2 + … )
What are the pros and cons of decision trees?
Pros: Computationally cheap to use, easy for humans to understand learned results, missing values OK, can deal with irrelevant features.
Cons: Prone to overfitting.
Describe decision tree creation.
1) Make a decision on which feature to best split the data first.
2) Split the dataset into subsets. The subsets will then traverse down the branches of the first decision node.
a) If the data on the branches is the same class, then you’ve properly classified it and don’t need to continue splitting it.
b) If not the same, then you need to repeat the splitting process on this subset.
What is Shannon entropy or just entropy for short?
The expected value of the information.
Information is defined as follows. If you’re classifying something that can take on multiple values, the information for xi is defined as L(Xi) = log2(p(Xi)), where p(Xi) is the probability of choosing this class.
To calculate entropy, you need the expected value of all the information of all possible values of our class. This gives us:
H = - SUM(i = 1 to n) ( p(Xi)*log2p(Xi) ) where n is the number of classes.
Give an example of shannon entropy.
A coin toss. 2 shannons of entropy: Information entropy is the log-base-2 of the number of possible outcomes; with two coins there are four outcomes, and the entropy is two bits.
Naive Bayes Pros and Cons
Pros: Works with a small amount of data, handles multiple classes.
Cons: Sensitive to how input data is prepared.
What is underflow?
Doing too many multiplications of small numbers.
How do we get around underflow?
To take the natural logarithm of the products. If you recall from algebra, ln(a*b) = ln(a) + ln(b)
What’s the difference between a set-of-words model vs bag-of-words model?
Set-of-words is when you treat the presence or absence of a word as a feature. A bag of words can have multiple occurrences of each word.
What is hold-out cross validation?
When you randomly select a portion of our data for the training set and a portion for the test set.