Stampen Flashcards
K nearest neighbor requires three things
Set of stored records
Distance metric to compute distance
Value of k neighbors
Advantages of KNN
Simple method
lack of parametric assumption
Good performance with large data sets
Disadvantages of knn
Can be time consuming
Prohibits real time prediction of large dataset
Curse of dimensionality
Advantages naive bayes
Computational efficient
Good classification prediction with many predictors
Disadvantages of naive bayes
Requires large amount of records to obtain good results
Independence assumption may not hold for some variabele
Only categoriaal data
Snowflake characteristics
Less restricted
Ability to store aggregations
Smaller data volumes
However,
Not easily understood by end users
High number of joins
Not predictable framework
Dashboard conforms to three levels
Levels of perception
Comprehension of the current situation
Future status
Pros decision
Easily understood
Easy to generate rules
Cons decision trees
May suffer from overfitting
Does not handle correlated features well
can become quite large (needs pruning)
Does not handle streaming data easily
When to stop expanding decision tree nodes
Stop expanding when all records have the same value
Stop expanding when all the records have similar attributes
stop if expansion does not improve inpurity measures
Watch out for overfitting, look for knee point
4 ways to validate clusters
Cluster interpretability Cluster stability (doesn't change when rows altered) Cluster seperation (intra en inter good divided) Number of clusters (useful number)
Advantages of clustering
Does not require specifications of clusters
Purely data driven
Dendrograms easy to understand and interpret
Limitations of clustering
Requires the computation and storage of nxn distance matrix
Makes only one pass through the data
Tends to have slow stability
Issue with respect to chosen distance metric
Hierarchical clustering sensitive to outliers
Dimensions hierachies characteristics
Time related
unbalanced
Multiple branches of different types
Conforming dimensions