Week 11 Flashcards
What was the goal of supervised learning? What are the pros of it?
The goal was to build the model approximating the conditional mean of Y well out-of-sample, but avoiding overfit
could relatively easily tell whether the model is doing a good or bad job – just look at the forecast risk estimates out- of-sample or some other performance measures
How does data in unsupervised learning look like? What is the goal of unsupervised learning?
In the unsupervised learning scenario, we only have data on a P-dimensional set of features X.
The goal is then to extract interesting and hopefully low- dimensional information from these data.
What is the main challenge of unsupervised learning?
it is very subjective and hard to assess in many situations.
No clear goal of analysis – not a prediction prob
What are some methods for unsupervised learning?
- Density estimation
- principal component analysis
- Clustering
- Topic model
- Graphical model
What is clustering? Which are the clustering algorithms?
collection of methods for finding subgroups, or clusters, in the data at hand.
trying to break density functions into more bite-sized bits.
Algo: K means, Hierachical
How do we define simlarity in clustering?
proximity/closensess
What does k means clustering do ? What kind of data is it designed for?
seeks to partition a data set into K non- overlapping clusters using the Euclidean distance as a measure of closeness.
continuous data(consideration similar to knn)
What is the k mean clustering algo process?
1)
Randomly assign a number, from 1 to K, to each of the observations. These serve as initial cluster assignments for the observations.
2)
Iterate until the cluster assignments stop changing:
- For each of K clusters, compute the cluster centroid. The k-th cluster centroid is the vector of the p feature means for the observations in the k-th cluster
- Assign each observation to the cluster whose centroid is closest (in Euclidean distance).
The algorithm is guaranteed to _____________________________at each step, since the ______ minimize the _____________, and reallocating observations closer can only improve the objective.
decrease the value of the objective function
means
sum of squared errors
What type (local/global)of minimum does the algo find and why? How to find the other min?
locall not flobal
Results depend on initial assignmnet
rerun the algorithm multiple times with different initial randomizations and select the solution with the smallest objective.
Will we always benefit from standardising data for kmeans?
May or may not
nee dto try both and see which makes more sense
How is hierachical clustering better than k means?
no need to commit to specific choice of k, resuts in nice looking tree
What is the nice looking tree based representation called?
dendogram
What is the general process behind hierachical clusteirng?
build a series of clusters in which each higher level cluster is formed by merging lower level clusters.
Lowest level – single observation; highest level – whole sample.
What are the key approaches of hierachical clustering ? How are clusters formed?
- agglomerative (bottom up, fuse observation, MOST COMMON)
- divisive(top down, split existing clusters)
Clusters are formed based on a dissimilarity measure, with the method being quite sensitive to the choice of this measure.
What is the agglomerative clustering algo process?
- Start with N observations and a dissimilarity measure of all (N 2) = N (N -1)/2 (same as choose) pairwise disimilarities. (Compute euclideadn distance between all possible points)
- for i =N, N-1,….
- Identify the pair of clusters that are least dissimilar. Fuse to form a new cluster. Dissimilarity between these clusters indicates height of dendrogram for fusion.
- Compute new pairwise inter-cluster dissimilarities among the i-1 remaining clusters.
What is linkage?
way toi measure dissimilarity between observation
___________ linkages are typically preferred by statisticians.
____ are commonly used in genetics
Complete, single and average
centroids
What kind of clusters does the different linkage produce?
Complete and average tend to produce balanced clusters. Some meaningful strucutre
Single tends to produce single leaf clusters. Erratic
Centroid linkage is common in genomics applications.
Clustering methods are somewhat similar in trying to _______________, but focus on ___________rather than ________________
extract signal from a large dataset
commonality
variance decomposition.
What can hierachical clustering be used as?
starting point for further study, preferably on a new dataset
Can we use cross-validation for k means clustering?
No
CAUSE NO Y VARIABLE
What does the first term represent in information criteria?
fite/ criterion function
What does teh second term represent in inforamtion critiera?
penalty, and related to number of clusters you use and the numebr of variables
What does df represent?
degree of freedom
df= K x dim(x)
What does BIC, AIC AICc stand for?
BIC: bayesian information crtiera
AIC: AKAIKIAN information criteria
AICc: Corrected (for small sample bias) AKAIKIAN information criteria
What is information critieria? Which is the ideal ic?
measure of sample fit
lowest IC wins( if you plot ic against x, it should be the minimum point)
Can information critierai be negative?
yes
What is the most common method/approach to hierachical clustering?
agglomerative
What is dissimilarity?
Euclidean distance
Why is centroid linkage not ideal for econometric implementation?
Cetnroid not good, cause of inversion
as you add more data set to the cluster, it may result in other neighbouring dataset to the newly added one to also be added to the cluster althoug its not from the cluster
Should you cluster usiung all variable?
NO
INCLUDE FEW AT A TIME
Should we standardize data or scale data?
no asn
trial and error