Classification Flashcards
What is the process of modeling (3 things)
- Real life situation expressed as math
- analyze the math
- turn math back into real life solution
Model meaning in analytics
-regression
-regression based on size weight and distance
-regression extimate = 37 +81xsize + 76xweight + 4xDistance
What is classification?
putting things into categories
Data table vocab
row - data point
column - attributes, features, covariate, predictor, factor, variable
response /outcome - “answer” or outcome
what is structured data
data that can be described and scored in a structured way
ex: quantitative credit score, age, sales
categorical - m/f, hair color
what is unstructured data
data that is not easily described and stored
ex- written text
Data Types
quantative - #s w/ meaning
ex sales age temp income
categorical - # w/o meaning
ex zip codes - higher /lower not meaningful
Binary data - (subset of categorical)
-only 2 values
ex: m/f on off t/f
sometimes quantitative measure
Data Relations
Unrelated
-no relationship between data points
ex: different customers, loan applications
Time series
same data recorded over time
-often recorded @ equal intervals
ex: daily sales, stock prices, child’s height on each birthday
Support Vector Machine line information
m = # data points(rows)
n = # attributes(columns)
xij = jth column of ith data point
yi = response for row i
line
a1x1 + a2x2 + anxn + a0 = 0 OR Sum from j = 1 to n of ajxk + a0 = 0
for classification the line a1x1 + a2x2 + anxn + a0 = +- 1 ( any number, 1 in this case)
you could alos say (a1x1 + a2x2 + anxn + a0 = 0)yi >= 1
Distance between support vectors
2/ sqrt(sumEj(aj)^2)
aij I think would be the coefficients from both lines
Note:
m = # data points(rows)
n = # attributes(columns)
xij = jth column of ith data point
yi = response for row i
line
a1x1 + a2x2 + anxn + a0 = 0 OR Sum from j = 1 to n of ajxij + a0 = 0
How do you maximize the margin in SVM?
minimize a0…an sum from j = 1 to n (aj)^2 subject to (a1x1 + a2x2 + anxn + a0 = 0)yi >= 1 for each data point i
Note:
m = # data points(rows)
n = # attributes(columns)
xij = jth column of ith data point
yi = response for row i
line
a1x1 + a2x2 + anxn + a0 = 0 OR Sum from j = 1 to n of ajxij + a0 = 0
How to calculate error in svm?
Correct side of line - Sum from j = 1 to n of (ajxij + a0 )yi -1 >= 0
Wrong side of line - Sum from j = 1 to n of (ajxij + a0 )yi -1 < 0
-amount it’s less than 0 is there error
Note:
m = # data points(rows)
n = # attributes(columns)
xij = jth column of ith data point
yi = response for row i
line
a1x1 + a2x2 + anxn + a0 = 0 OR Sum from j = 1 to n of ajxk + a0 = 0
Svm error for data point
max( 0, 1 -(Sum from j = 1 to n of (ajxij + a0 )yi)
SVM total error
sum from i = 1 to m (max( 0, 1 -(Sum from j = 1 to n of (ajxij + a0 )yi))
SVM margin denominator
sum from j = 1 to n of (aj)^2
SVM Equation
minimize a0… an sum from i = 1 to m max( 0, 1 -(Sum from j = 1 to n of (ajxij + a0 )yi) + lambda sum j = 1 to n (aj)^2