Classification Flashcards
What is a feature space
a coordinate space used to represent the input examples for a given problem, with one coordinate for each descriptive feature
Eager Learning Classification Strategy
- classifier build a full model during an initial training phase, to use later when new query examples arrive
- more offline setup work, less work at run-time
- generalise before seeing the query example
Lazy Learning Classification Strategy
- Classifier keeps all the training examples for later use.
- Little work is done offline, wait for new query examples.
- Focus on the local space around the examples
What learning strategy does KNN Classifier use and how?
Lazy. K-NN identifies the k most similar previous examples from the training set for which a label has already been assigned, using some distance function
How does Weighted kNN differ from the regular model
Weighted voting, closer neighbours get higher votes.
Is there a “best” distance measure
No, the choice of distance measure is highly problem-dependent
What is the difference between a local distance function (LDF) and a global distance function (GDF)
LDFs measure the distance between two examples based on a single feature, where as GDFs are based on the combination of the local distances across all features
Define the overlap function (measuring distance)
Returns 0 if the two values for a feature are equal and 1 otherwise
Define Hamming Distance (measuring distance)
GDF which is the sum of the overlap differences across all features
Define Absolute Difference (measuring distance)
Absolute value of the difference between values for a feature or several features
Define Absolute Difference for ordinal features (measuring distance)
calculate the absolute value of the difference between the two positions in the ordered list of possible values
Define Euclidean Distance,and give the formula
- “Straight line” distance between two points in a feature space
- calculated as the square root of the sum of squared differences for each feature f, representing a pair of examples.
ED(p, q) = SQR(SUM_f (q_f - p_f)^2)
What are Heterogeneous (Diverse) Distance Functions
GDF created from different local distance functions, using an appropriate function for each feature
Min-max normalization formula
z_i = x_i - min(x) / max(x) - min(x)
Standard Normalisation Formula
z_i = x_i - μ / σ