Classification Flashcards
What is the process of modeling (3 things)
- Real life situation expressed as math
- analyze the math
- turn math back into real life solution
Model meaning in analytics
-regression
-regression based on size weight and distance
-regression extimate = 37 +81xsize + 76xweight + 4xDistance
What is classification?
putting things into categories
Data table vocab
row - data point
column - attributes, features, covariate, predictor, factor, variable
response /outcome - “answer” or outcome
what is structured data
data that can be described and scored in a structured way
ex: quantitative credit score, age, sales
categorical - m/f, hair color
what is unstructured data
data that is not easily described and stored
ex- written text
Data Types
quantative - #s w/ meaning
ex sales age temp income
categorical - # w/o meaning
ex zip codes - higher /lower not meaningful
Binary data - (subset of categorical)
-only 2 values
ex: m/f on off t/f
sometimes quantitative measure
Data Relations
Unrelated
-no relationship between data points
ex: different customers, loan applications
Time series
same data recorded over time
-often recorded @ equal intervals
ex: daily sales, stock prices, child’s height on each birthday
Support Vector Machine line information
m = # data points(rows)
n = # attributes(columns)
xij = jth column of ith data point
yi = response for row i
line
a1x1 + a2x2 + anxn + a0 = 0 OR Sum from j = 1 to n of ajxk + a0 = 0
for classification the line a1x1 + a2x2 + anxn + a0 = +- 1 ( any number, 1 in this case)
you could alos say (a1x1 + a2x2 + anxn + a0 = 0)yi >= 1
Distance between support vectors
2/ sqrt(sumEj(aj)^2)
aij I think would be the coefficients from both lines
Note:
m = # data points(rows)
n = # attributes(columns)
xij = jth column of ith data point
yi = response for row i
line
a1x1 + a2x2 + anxn + a0 = 0 OR Sum from j = 1 to n of ajxij + a0 = 0
How do you maximize the margin in SVM?
minimize a0…an sum from j = 1 to n (aj)^2 subject to (a1x1 + a2x2 + anxn + a0 = 0)yi >= 1 for each data point i
Note:
m = # data points(rows)
n = # attributes(columns)
xij = jth column of ith data point
yi = response for row i
line
a1x1 + a2x2 + anxn + a0 = 0 OR Sum from j = 1 to n of ajxij + a0 = 0
How to calculate error in svm?
Correct side of line - Sum from j = 1 to n of (ajxij + a0 )yi -1 >= 0
Wrong side of line - Sum from j = 1 to n of (ajxij + a0 )yi -1 < 0
-amount it’s less than 0 is there error
Note:
m = # data points(rows)
n = # attributes(columns)
xij = jth column of ith data point
yi = response for row i
line
a1x1 + a2x2 + anxn + a0 = 0 OR Sum from j = 1 to n of ajxk + a0 = 0
Svm error for data point
max( 0, 1 -(Sum from j = 1 to n of (ajxij + a0 )yi)
SVM total error
sum from i = 1 to m (max( 0, 1 -(Sum from j = 1 to n of (ajxij + a0 )yi))
SVM margin denominator
sum from j = 1 to n of (aj)^2
SVM Equation
minimize a0… an sum from i = 1 to m max( 0, 1 -(Sum from j = 1 to n of (ajxij + a0 )yi) + lambda sum j = 1 to n (aj)^2
What happens when lambda (SVM) increases and decreases
Lambda control the tradeoff between error and margin. as lambda increases the importance of a larger margin outweigh mistakes in data points.
if lambda decreases the margin term drops to 0 and the importance of correctly classifying outweighs having a large margin
what is a support vector?
point that holds up the shape
-can support from sides or top
-can have more than one line
Support vector machine model
-determines support vectors automatically from the data (hence machine)
Where is the classifier in relation to the support vectors
the classifier is between the two support vectors
How can you weight an svm to be more conservative in a direction?
-hard classification- if giving a bad loan is 2x as bad as not giving a good loan we could adjust the intercept (2/3(a0-1) + 1/3 (a0+1) or (a0-1/3)
- soft classification - add a multiplier to your error term >1 for more costly errors and <1 for less costly errors
Note: the intercept can be between a0 -1 and a0+1 without making any mistakes on the data (line is still within margin)
When you maximize the margin what are you doing?
minimizing the sum of squares of the coefficients
Do you need to scale data for SVM? ( needs some work)
Yes! we are minimizing the sum of squares of the coefficients in order to maximize the margins and that relies on the orders of magnitude being the same
In svm what does it mean when a coefficients value is close to 0?
it’s not relevant for classification, similar to if your classifier is a vertical line meaning one attribute does not matter
Scaling equation
xminj = mini xij
xmaxj = maxi xij
for each data point i:
xijscaled = xij- xmin j / xmaxj-xminj
Standardization equation
factor j has mean j = (sum i = 1 to n xij) / n
j sd = sd j
for each datapoint i
xij standardized = xij - mean j /sd j
General scaling
xij scaled [a,b] = xij scaled [0,1] (a-b) + b
what does scaling do?
gets all values between 0 and 1 (or any other bumpers)
what does standardization do?
scaling to a normal distribution
-commonly mean = 0 and sd = 1
When to use scaling?
-data required within bounded range
-ex: neural networks
-optimization models that need bounded data
-batting avgs 0 -1
rgb color intensities 0 -255
sat scores 200-800
you can always try both and see what works best
When to use standardization?
ex - principal component analysis
-clustering
you can always try both and see what works best
How does KNN work?
KNN counts the # of classes for k closest points. The max class is this data points class
KNN considerations
-which type of distance to use?-straight line distance
weighted distances
-unimportant attributes can be removed (when weight is close to 0)
-what is a good k value (validation)
Which of these is a datapoint?
A survey of 25 people recorded each person’s family size and type of car
-The 14th person’s family size and car type
-14th person’s family size
-the car type of each person
-The 14th person’s family size and car type
A data point is all the information about one observation
Which of these is structured data?
-a persons twitter feed
-the amount of money in a persons bank account
-the amount of money in a persons bank account
every entry will be a number of dollars and cents
Which of these is time series data?
-avg cost of a house in us every year since 1820
-the height of each professional basketball player in the nba at the start of the season
-avg cost of a house in us every year since 1820
the same thing measured at yearly time intervals
Which term measures error in classifying all of the data points
sum from j=1 to n max(0,1 - (sum i = 1 to m aixij+a0)yi)
When you are multiplying your error term, would a higher number favor or disfavor classification errors?
favor
Which dataset is scaled between 0 and 1?
- 5,12,27,29
-0.0,0.2,0.6,1.0
-0.3,0.4,0.7,0.75
0.0,0.2,0.6,1.0
What is the purpose of classification models?
putting things into categories
-differentiate