Regression and Clustering Flashcards
1
Q
Ridge
A
Shrinks coefficients towards 0 but never exactly 0.
1
Q
Lasso (L1)
A
Some are exactly 0
- Like embedded feature selection.
2
Q
Regularisation
A
if $w$ is not REGULARISED. THEN they can explode. And you get overfitting. Regularisation is a penalty to keep $w$ under control.
3
Q
K-Means steps
A
- Start with k random cluster centres/centroids
- Assign each object to the nearest centroids
- Compute the new centroid for each cluster as the mean of the objects assigned to the cluster
- Repeat step 2 until no change to the centroids
4
Q
K-Means pros
A
Advantages:
- Simple
- Flexible
- Scales well to a large dataset(features and samples)
5
Q
K-Means cons
A
Disadvantages:
- Have to specify K
- Categorical data
- Need to re-run to obtain clustering for different k
- Stochastic (non-deterministic). Having different starting centroids will produce different results.
- Usually convert to local optima