Module 6 Flashcards by mustafa mohsin

Describe the Supervised learning problem

Outcome measurement Y ( dependent var, response/target)

- Vector of P predictor measurements X ( input, regressor, covariates, independent var)

How well did you know this?

Not at all

Perfectly

What are X and Y in regression/classification problems

Regression problem
- Y is quantitative ( price, blood pressure)

Classification problem
- Y takes value in a finite ordered set ( classes, true/false)

has training data - instances of the data

How well did you know this?

Not at all

Perfectly

List objectives of supervised learning (AUA)

Accurately predict unseen test cases
Understand which inputs affect the outcome and how
Assess the quality of our predictions and inferences

How well did you know this?

Not at all

Perfectly

Describe unsupervised learning

No outcome variables, just a set of predictors/features measured on a set of samples
objective is fuzzy - find group of samples
difficult to tell how well you’re doing
useful for pre-processing in supervised learning

How well did you know this?

Not at all

Perfectly

Describe Statistical Learning vs ML

ML is a subset of AI
SL is a subfield of stats

ML has a greater emphasis on large-scale applications and prediction accuracy

SL emphasizes models and their interpretability, precision, and uncertainty

How well did you know this?

Not at all

Perfectly

Describe the regression function

Is also defined for vector X. f (x) = f (x1, x2, x3) = E(Y |X1 = x1, X2 = x2, X3 = x3
Is the ideal/optimal predictor of Y with regard to mean squared prediction error - minimizes error
E is the irreducible error - error in prediction due to distribution of y values
mean squared prediction error = reducable error + irreducible error
E[(Y − ˆf (X))2|X = x] = [f (x) − ˆf (x)]^2 Reducible
+ Var(e) Irreducible

How well did you know this?

Not at all

Perfectly

Describe the nearest neighbor

N(x)
good for sample / p <= 4
can be lousy when p is large due to curse of dimenionality - nearest neighbours far in high dimensions

How well did you know this?

Not at all

Perfectly

Describe the linear model

f(x) = B0 + B1X1 + B2X2 + … BPXP

Parametric Model
specified in terms of p + 1 parameters
almost never correct - good and interpretable appx to unknown true function

How well did you know this?

Not at all

Perfectly

trade-offs of linear model (PGP)

Prediciton accuracy vs interpretability
Linear models easy to interpret
Good fit vs over/under-fit
Parismony vs Blakcbox
- prefer simple model with fewer variables

How well did you know this?

Not at all

Perfectly

Describe assessing model accuracy

Compute average squared prediction error over TE (fresh test data) rather than TR (training data) to avoid bias towards overfit models.
- MSETe = Avei∈Te[yi − ˆf (xi)]2

How well did you know this?

Not at all

Perfectly

Describe Bias Variance Trade-off

As flexibility of f increases, so does variance and bias decreases
choosing flexibility based on average test error amounts to bias-variance trade-off

How well did you know this?

Not at all

Perfectly

Describe Classification Problem (BAU)

Response variable Y is qualitative
Goals are to:
1) Build a classifier that assigns a class label from C to a future unlabeled observation X
2) Assess uncertainty in each classification
3) Understand the roles of different predictors among X

How well did you know this?

Not at all

Perfectly

Is there an ideal C(X)?

- Let pk(x) = Pr(Y = k|X = x), k = 1, 2, . . . , K.
These are conditional class probabiliteies

The Bayes optimal classifier at x is
C(x) = j if pj (x) = max{p1(x), p2(x), . . . , pK (x)}

How well did you know this?

Not at all

Perfectly

Classification details (MBS)

Measure Performance through misclassification rate
ErrTe = Avei∈TeI[yi 6 = ˆC(xi)]
Bayes classifier has the smallest error
SVM builds structured models for C(x)

How well did you know this?

Not at all

Perfectly

Describe Tree based models

for regression and classification
involve stratifying or segmenting predictor space into a number of simple regions
splitting decision methods are also known as decision tree methods

How well did you know this?

Not at all

Perfectly

Describe Pros and Cons of tree-based methods

Study These Flashcards

Simple / useful for interpretation
not competitive with best-supervised learning approaches in terms of prediction accuracy
combining trees can result in dramatic improvements in prediction accuracy while losing some interpretation

Details of tree building process

Study These Flashcards

Divide predictor space into J distinct nonoverlapping regions
For every observation in region R, we make the same prediction = mean of response values for training observations in R
Goal is to find boxes R1,…RJ that minimizes RSS = ∑∑(yi − ˆyRj )2
j=1 i∈Rj
Takes a top-down greedy approach - recursive binary splitting

Describe classification tree

Study These Flashcards

Used to predict qualitative response

- Predict that each observation belongs to the most commonly occurring class of training observations in its region

Details of classification tree

Study These Flashcards

uses recursive binary splitting
Uses classification error rate rather than RSS , E= 1 - max(pmk)
pmk = proportion of training observations in the mth region from kth classes
Two other measures are preferable - Gini index and deviance

Describe Gini index

Study These Flashcards

G =K∑^pmk(1 − ˆpmk)
k=1
- takes on a small value if all of pmk are close to 0 or 1

measure of node purity, small = single class observations
similar to cross-entropy

Tree 10 fold / N fold cross validation

Study These Flashcards

Divide dataset into 10/N parts, use 9 parts for training set and 1 part for test set
repeat process 10/N times using every part for testing
stratified sampling is used to divide dataset

Evaluation Measures

Study These Flashcards

Accuracy = TP + TN / ( TP+TN+FP+FN)
True Positive Rate = TP/ (TP+FN)
False positive Rate = FP / (FP + TN)

Issues with decision trees

Study These Flashcards

- Missing values
assign most common attribute value or common class value

Overfitting
When accuracy high on training data and low on test data
Reduced Error Pruning
Remove sub-tree and make it leaf node

Describe unsupervised Learning

Study These Flashcards

Only observe the features such as X1, X2, etc.

- Not interested in prediction since no response variable Y

Goals of unsupervised learning

- discover things about measurements, patterns etc | - two methods, clustering and principal components analysis

Challenges of unsupervised learning

- More subjective than supervised, no simple goal

Advantage of unsupervised learning

- Growing importance | - easier to obtain unlabeled data rather than labeled data

Describe clustering

- techniques for finding subgroups or clusters in a dataset - find similarity patterns - must define what is similar vs different

clustering advantages

- Clustering data - Discover communitites - Crash report grouping

Details of k means clustering

- each observation belongs to at least one cluster - no observation belongs to > 1 cluster - good clustering is when within-cluster variation is small as possible - Thus, minimize WCV(Ck)

How to define within cluster variation

- Euclidean distance K∑ 1/ |Ck| ∑ p∑ (xij − xi′j )2 k=1 i,i′∈Ck j=1

K-Means clustering algorithm

1. Randomly assign an initial cluster for observations 2. Iterate until cluster assignments stop changing - compute cluster centroid - assign observation to cluster where the euclidean distance to centroid smallest not guaranteed to have a global minimum

Module 6 Flashcards

(32 cards)