Classification Flashcards by Jacob Fritz

What is the process of modeling (3 things)

Real life situation expressed as math
analyze the math
turn math back into real life solution

How well did you know this?

Not at all

Perfectly

Model meaning in analytics

-regression
-regression based on size weight and distance
-regression extimate = 37 +81xsize + 76xweight + 4xDistance

How well did you know this?

Not at all

Perfectly

What is classification?

putting things into categories

How well did you know this?

Not at all

Perfectly

Data table vocab

row - data point
column - attributes, features, covariate, predictor, factor, variable
response /outcome - “answer” or outcome

How well did you know this?

Not at all

Perfectly

what is structured data

data that can be described and scored in a structured way
ex: quantitative credit score, age, sales
categorical - m/f, hair color

How well did you know this?

Not at all

Perfectly

what is unstructured data

data that is not easily described and stored
ex- written text

How well did you know this?

Not at all

Perfectly

Data Types

quantative - #s w/ meaning
ex sales age temp income
categorical - # w/o meaning
ex zip codes - higher /lower not meaningful
Binary data - (subset of categorical)
-only 2 values
ex: m/f on off t/f
sometimes quantitative measure

How well did you know this?

Not at all

Perfectly

Data Relations

Unrelated
-no relationship between data points
ex: different customers, loan applications

Time series
same data recorded over time
-often recorded @ equal intervals
ex: daily sales, stock prices, child’s height on each birthday

How well did you know this?

Not at all

Perfectly

Support Vector Machine line information

m = # data points(rows)
n = # attributes(columns)
xij = jth column of ith data point
yi = response for row i

line
a1x1 + a2x2 + anxn + a0 = 0 OR Sum from j = 1 to n of ajxk + a0 = 0

for classification the line a1x1 + a2x2 + anxn + a0 = +- 1 ( any number, 1 in this case)

you could alos say (a1x1 + a2x2 + anxn + a0 = 0)yi >= 1

How well did you know this?

Not at all

Perfectly

Distance between support vectors

2/ sqrt(sumEj(aj)^2)

aij I think would be the coefficients from both lines

Note:
m = # data points(rows)
n = # attributes(columns)
xij = jth column of ith data point
yi = response for row i

line
a1x1 + a2x2 + anxn + a0 = 0 OR Sum from j = 1 to n of ajxij + a0 = 0

How well did you know this?

Not at all

Perfectly

How do you maximize the margin in SVM?

minimize a0…an sum from j = 1 to n (aj)^2 subject to (a1x1 + a2x2 + anxn + a0 = 0)yi >= 1 for each data point i

Note:
m = # data points(rows)
n = # attributes(columns)
xij = jth column of ith data point
yi = response for row i

line
a1x1 + a2x2 + anxn + a0 = 0 OR Sum from j = 1 to n of ajxij + a0 = 0

How well did you know this?

Not at all

Perfectly

How to calculate error in svm?

Correct side of line - Sum from j = 1 to n of (ajxij + a0 )yi -1 >= 0
Wrong side of line - Sum from j = 1 to n of (ajxij + a0 )yi -1 < 0
-amount it’s less than 0 is there error

Note:
m = # data points(rows)
n = # attributes(columns)
xij = jth column of ith data point
yi = response for row i

line
a1x1 + a2x2 + anxn + a0 = 0 OR Sum from j = 1 to n of ajxk + a0 = 0

How well did you know this?

Not at all

Perfectly

Svm error for data point

max( 0, 1 -(Sum from j = 1 to n of (ajxij + a0 )yi)

How well did you know this?

Not at all

Perfectly

SVM total error

sum from i = 1 to m (max( 0, 1 -(Sum from j = 1 to n of (ajxij + a0 )yi))

How well did you know this?

Not at all

Perfectly

SVM margin denominator

sum from j = 1 to n of (aj)^2

How well did you know this?

Not at all

Perfectly

SVM Equation

minimize a0… an sum from i = 1 to m max( 0, 1 -(Sum from j = 1 to n of (ajxij + a0 )yi) + lambda sum j = 1 to n (aj)^2

How well did you know this?

Not at all

Perfectly

What happens when lambda (SVM) increases and decreases

Study These Flashcards

Lambda control the tradeoff between error and margin. as lambda increases the importance of a larger margin outweigh mistakes in data points.

if lambda decreases the margin term drops to 0 and the importance of correctly classifying outweighs having a large margin

what is a support vector?

Study These Flashcards

point that holds up the shape
-can support from sides or top
-can have more than one line

Support vector machine model

Study These Flashcards

-determines support vectors automatically from the data (hence machine)

Where is the classifier in relation to the support vectors

Study These Flashcards

the classifier is between the two support vectors

How can you weight an svm to be more conservative in a direction?

Study These Flashcards

-hard classification- if giving a bad loan is 2x as bad as not giving a good loan we could adjust the intercept (2/3(a0-1) + 1/3 (a0+1) or (a0-1/3)
- soft classification - add a multiplier to your error term >1 for more costly errors and <1 for less costly errors
Note: the intercept can be between a0 -1 and a0+1 without making any mistakes on the data (line is still within margin)

When you maximize the margin what are you doing?

Study These Flashcards

minimizing the sum of squares of the coefficients

Do you need to scale data for SVM? ( needs some work)

Study These Flashcards

Yes! we are minimizing the sum of squares of the coefficients in order to maximize the margins and that relies on the orders of magnitude being the same

In svm what does it mean when a coefficients value is close to 0?

Study These Flashcards

it’s not relevant for classification, similar to if your classifier is a vertical line meaning one attribute does not matter

Scaling equation

xminj = mini xij xmaxj = maxi xij for each data point i: xijscaled = xij- xmin j / xmaxj-xminj

Standardization equation

factor j has mean j = (sum i = 1 to n xij) / n j sd = sd j for each datapoint i xij standardized = xij - mean j /sd j

General scaling

xij scaled [a,b] = xij scaled [0,1] (a-b) + b

what does scaling do?

gets all values between 0 and 1 (or any other bumpers)

what does standardization do?

scaling to a normal distribution -commonly mean = 0 and sd = 1

When to use scaling?

-data required within bounded range -ex: neural networks -optimization models that need bounded data -batting avgs 0 -1 rgb color intensities 0 -255 sat scores 200-800 you can always try both and see what works best

When to use standardization?

ex - principal component analysis -clustering you can always try both and see what works best

How does KNN work?

KNN counts the # of classes for k closest points. The max class is this data points class

KNN considerations

-which type of distance to use?-straight line distance weighted distances -unimportant attributes can be removed (when weight is close to 0) -what is a good k value (validation)

Which of these is a datapoint? A survey of 25 people recorded each person's family size and type of car -The 14th person's family size and car type -14th person's family size -the car type of each person

-The 14th person's family size and car type A data point is all the information about one observation

Which of these is structured data? -a persons twitter feed -the amount of money in a persons bank account

-the amount of money in a persons bank account every entry will be a number of dollars and cents

Which of these is time series data? -avg cost of a house in us every year since 1820 -the height of each professional basketball player in the nba at the start of the season

-avg cost of a house in us every year since 1820 the same thing measured at yearly time intervals

Which term measures error in classifying all of the data points

sum from j=1 to n max(0,1 - (sum i = 1 to m aixij+a0)yi)

When you are multiplying your error term, would a higher number favor or disfavor classification errors?

favor

Which dataset is scaled between 0 and 1? - 5,12,27,29 -0.0,0.2,0.6,1.0 -0.3,0.4,0.7,0.75

0.0,0.2,0.6,1.0

What is the purpose of classification models?

putting things into categories -differentiate

Classification Flashcards

(41 cards)