lecture 6 Flashcards

1
Q

ayesian networks (repeated)

A

reat Anthrax as the output Ythat we would like to predict, based on input variables X1through X4representing Cough, Fever, Difficulty Breathing, Wide Mediastinum• Suppose that we’re given a datasetwith lots of subjects, their corresponding inputs and whether or not they had Anthrax.• How do we turn this into a model that can predictfor a new subject whether he or she has Antrax?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Naïve Bayes classifier

A

enerative model of the distribution of the input features X1through Xp(“effect”) given the class Y (“cause”):•Use Bayes’ rule to compute the probability of each class given inputs:is proportional to•NaïveBayes: inputs are assumed to beconditionally independentgiven the class:5)|,…,(1YXXPpYPYXXPXXYPpp)|,…,(),…,|(11pkkpYXPYXXP11

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Laplace estimator

A

ust counting fails dramaticallywhen nior nijhappens to be zero• Add pseudocounts: act as if you start with one example for all possible options you’d like to estimate the probability for, yieldingand• Goes back to Laplacesunrise problem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Learning decision trees

A

epresentation is a decision tree•Biasis towards simple decision trees• Searchthrough the space of decision trees, from simple decision trees to more complex ones

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Decision trees

A

A decision tree (for a particular output feature) is a tree where:• Each nonleaf nodeis labeled with an input feature• The arcs out of a node labeled with feature Aare labeled with each possible value of the feature A• The leavesof the tree are labeled with point prediction of the output feature

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Issues in learning decision trees

A

iven some training examples, which decision tree should begenerated?• A decision tree can represent anydiscrete function of the input features• You need a bias. Example, prefer the smallest tree. Least depth? Fewest nodes? Which trees are the best predictors of unseendata?• How should you go about building a decision tree? The space ofdecision trees is too big for a systematic search for the smallestdecision tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Searching for a good decision tree

A

he input is a set of input features, a target feature and a set of training examples• Either:- Stop and return a value for the target feature (or a distribution over target feature values)- Choose an input feature to split on.For each value of this feature,build a subtreefor those exampleswith this value for the input feature

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Gini coefficient

A

lternative for entropy• Simplerto compute, often betterperformance• One minus sum of squared probabilities:• Lower is better

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Handling overfitting

A

ecision trees can easily overfitthe data when noise and correlations in the training set are not reflected in the data as a whole• To prevent overfitting:- restrict the splitting, and split only when the split makes (statistically) sense- allow unrestricted splitting and prune the resultingtree where it makes unwarranted distinctions- learn multiple trees and average them (e.g.,random forest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Linear models

A

Many popular models for supervised learning are essentially linear in the input features: linear regression, perceptron, logistic regression, linear SVM, ..

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Linear regression

A

it a linear function through the data.

Goal: find parameters wthat minimize Error(w

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Minimizing the error

A

Find the minimum analytically- Effective when it can be done (e.g., for linear regression)- Quite exceptional2. Find the minimum iteratively- Works for larger classes of problems- Many different generic and specialized algorithms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Linear classifier

A

ssume we are doing binary classification, with classes { 0; 1 }• There is no point in making a prediction ofless than 0 or greater than 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Sigmoid

A

moothedstep function• Used for logistic regression andin multi-layered perceptrons• Provides more information on theseriousness of an error (“just wrong”vs “complete nonsense”)• Gradient defined everywhere

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Logistic regression model

A

lassify whether you’re interested to read a book, depending on whether it’s new or old, long or short, and whether you’re at home or not

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Summary: basic models

A

aïve Bayes- generative model for classification; Bayes’ rule to compute class probabilities- parameters learned using (pseudo)counting• Decision tree- greedily build a tree by splitting upon single features- very flexible, but then high risk of overfitting when not treated with care• Linear regression- restrictive, linear model for function fitting- weights can be learned iteratively using gradient descent• Logistic regression- used for classification (silly name!); basically linear regression with sigmoid- learned iteratively using gradient descent

17
Q

Learning goals: supervised learning

A

Show how to learn a naïve Bayes classifier from a data set− Explain why it’s called “Bayes” and “naïve”− Read off how a decision tree classifies new data points− Explain criteria for impurity such as the entropy and the Ginicoefficient and what they are used for− Show how to learn a decision tree from a data set− Discuss ways to prevent overfitting in decision trees− Explain the difference between linear and logistic regression− Calculate and interpret the output of a linear/logistic regression model when given the weights and the inputs− Explain the principles behind gradient descent