lecture 6 Flashcards
ayesian networks (repeated)
reat Anthrax as the output Ythat we would like to predict, based on input variables X1through X4representing Cough, Fever, Difficulty Breathing, Wide Mediastinum• Suppose that we’re given a datasetwith lots of subjects, their corresponding inputs and whether or not they had Anthrax.• How do we turn this into a model that can predictfor a new subject whether he or she has Antrax?
Naïve Bayes classifier
enerative model of the distribution of the input features X1through Xp(“effect”) given the class Y (“cause”):•Use Bayes’ rule to compute the probability of each class given inputs:is proportional to•NaïveBayes: inputs are assumed to beconditionally independentgiven the class:5)|,…,(1YXXPpYPYXXPXXYPpp)|,…,(),…,|(11pkkpYXPYXXP11
Laplace estimator
ust counting fails dramaticallywhen nior nijhappens to be zero• Add pseudocounts: act as if you start with one example for all possible options you’d like to estimate the probability for, yieldingand• Goes back to Laplacesunrise problem
Learning decision trees
epresentation is a decision tree•Biasis towards simple decision trees• Searchthrough the space of decision trees, from simple decision trees to more complex ones
Decision trees
A decision tree (for a particular output feature) is a tree where:• Each nonleaf nodeis labeled with an input feature• The arcs out of a node labeled with feature Aare labeled with each possible value of the feature A• The leavesof the tree are labeled with point prediction of the output feature
Issues in learning decision trees
iven some training examples, which decision tree should begenerated?• A decision tree can represent anydiscrete function of the input features• You need a bias. Example, prefer the smallest tree. Least depth? Fewest nodes? Which trees are the best predictors of unseendata?• How should you go about building a decision tree? The space ofdecision trees is too big for a systematic search for the smallestdecision tree
Searching for a good decision tree
he input is a set of input features, a target feature and a set of training examples• Either:- Stop and return a value for the target feature (or a distribution over target feature values)- Choose an input feature to split on.For each value of this feature,build a subtreefor those exampleswith this value for the input feature
Gini coefficient
lternative for entropy• Simplerto compute, often betterperformance• One minus sum of squared probabilities:• Lower is better
Handling overfitting
ecision trees can easily overfitthe data when noise and correlations in the training set are not reflected in the data as a whole• To prevent overfitting:- restrict the splitting, and split only when the split makes (statistically) sense- allow unrestricted splitting and prune the resultingtree where it makes unwarranted distinctions- learn multiple trees and average them (e.g.,random forest
Linear models
Many popular models for supervised learning are essentially linear in the input features: linear regression, perceptron, logistic regression, linear SVM, ..
Linear regression
it a linear function through the data.
Goal: find parameters wthat minimize Error(w
Minimizing the error
Find the minimum analytically- Effective when it can be done (e.g., for linear regression)- Quite exceptional2. Find the minimum iteratively- Works for larger classes of problems- Many different generic and specialized algorithms
Linear classifier
ssume we are doing binary classification, with classes { 0; 1 }• There is no point in making a prediction ofless than 0 or greater than 1
Sigmoid
moothedstep function• Used for logistic regression andin multi-layered perceptrons• Provides more information on theseriousness of an error (“just wrong”vs “complete nonsense”)• Gradient defined everywhere
Logistic regression model
lassify whether you’re interested to read a book, depending on whether it’s new or old, long or short, and whether you’re at home or not