lecture 6 Flashcards

Question 1

Q

ayesian networks (repeated)

Answer

A

reat Anthrax as the output Ythat we would like to predict, based on input variables X1through X4representing Cough, Fever, Difficulty Breathing, Wide Mediastinum• Suppose that we’re given a datasetwith lots of subjects, their corresponding inputs and whether or not they had Anthrax.• How do we turn this into a model that can predictfor a new subject whether he or she has Antrax?

Question 2

Q

Naïve Bayes classifier

Answer

A

enerative model of the distribution of the input features X1through Xp(“effect”) given the class Y (“cause”):•Use Bayes’ rule to compute the probability of each class given inputs:is proportional to•NaïveBayes: inputs are assumed to beconditionally independentgiven the class:5)|,…,(1YXXPpYPYXXPXXYPpp)|,…,(),…,|(11pkkpYXPYXXP11

Question 3

Q

Laplace estimator

Answer

A

ust counting fails dramaticallywhen nior nijhappens to be zero• Add pseudocounts: act as if you start with one example for all possible options you’d like to estimate the probability for, yieldingand• Goes back to Laplacesunrise problem

Question 4

Q

Learning decision trees

Answer

A

epresentation is a decision tree•Biasis towards simple decision trees• Searchthrough the space of decision trees, from simple decision trees to more complex ones

Question 5

Q

Decision trees

Answer

A

A decision tree (for a particular output feature) is a tree where:• Each nonleaf nodeis labeled with an input feature• The arcs out of a node labeled with feature Aare labeled with each possible value of the feature A• The leavesof the tree are labeled with point prediction of the output feature

Question 6

Q

Issues in learning decision trees

Answer

A

iven some training examples, which decision tree should begenerated?• A decision tree can represent anydiscrete function of the input features• You need a bias. Example, prefer the smallest tree. Least depth? Fewest nodes? Which trees are the best predictors of unseendata?• How should you go about building a decision tree? The space ofdecision trees is too big for a systematic search for the smallestdecision tree

Question 7

Q

Searching for a good decision tree

Answer

A

he input is a set of input features, a target feature and a set of training examples• Either:- Stop and return a value for the target feature (or a distribution over target feature values)- Choose an input feature to split on.For each value of this feature,build a subtreefor those exampleswith this value for the input feature

Question 8

Q

Gini coefficient

Answer

A

lternative for entropy• Simplerto compute, often betterperformance• One minus sum of squared probabilities:• Lower is better

Question 9

Q

Handling overfitting

Answer

A

ecision trees can easily overfitthe data when noise and correlations in the training set are not reflected in the data as a whole• To prevent overfitting:- restrict the splitting, and split only when the split makes (statistically) sense- allow unrestricted splitting and prune the resultingtree where it makes unwarranted distinctions- learn multiple trees and average them (e.g.,random forest

Question 10

Q

Linear models

Answer

A

Many popular models for supervised learning are essentially linear in the input features: linear regression, perceptron, logistic regression, linear SVM, ..

Question 11

Q

Linear regression

Answer

A

it a linear function through the data.

Goal: find parameters wthat minimize Error(w

Question 12

Q

Minimizing the error

Answer

A

Find the minimum analytically- Effective when it can be done (e.g., for linear regression)- Quite exceptional2. Find the minimum iteratively- Works for larger classes of problems- Many different generic and specialized algorithms

Question 13

Q

Linear classifier

Answer

A

ssume we are doing binary classification, with classes { 0; 1 }• There is no point in making a prediction ofless than 0 or greater than 1

Question 14

Q

Sigmoid

Answer

A

moothedstep function• Used for logistic regression andin multi-layered perceptrons• Provides more information on theseriousness of an error (“just wrong”vs “complete nonsense”)• Gradient defined everywhere

Question 15

Q

Logistic regression model

Answer

A

lassify whether you’re interested to read a book, depending on whether it’s new or old, long or short, and whether you’re at home or not

Question 16

Q

Summary: basic models

Answer

Study These Flashcards

A

aïve Bayes- generative model for classification; Bayes’ rule to compute class probabilities- parameters learned using (pseudo)counting• Decision tree- greedily build a tree by splitting upon single features- very flexible, but then high risk of overfitting when not treated with care• Linear regression- restrictive, linear model for function fitting- weights can be learned iteratively using gradient descent• Logistic regression- used for classification (silly name!); basically linear regression with sigmoid- learned iteratively using gradient descent

Question 17

Q

Learning goals: supervised learning

Answer

Study These Flashcards

A

Show how to learn a naïve Bayes classifier from a data set− Explain why it’s called “Bayes” and “naïve”− Read off how a decision tree classifies new data points− Explain criteria for impurity such as the entropy and the Ginicoefficient and what they are used for− Show how to learn a decision tree from a data set− Discuss ways to prevent overfitting in decision trees− Explain the difference between linear and logistic regression− Calculate and interpret the output of a linear/logistic regression model when given the weights and the inputs− Explain the principles behind gradient descent

lecture 6 Flashcards

(17 cards)