Week 11 Flashcards by Hash Loi

What was the goal of supervised learning? What are the pros of it?

The goal was to build the model approximating the conditional mean of Y well out-of-sample, but avoiding overfit

could relatively easily tell whether the model is doing a good or bad job – just look at the forecast risk estimates out- of-sample or some other performance measures

How well did you know this?

Not at all

Perfectly

How does data in unsupervised learning look like? What is the goal of unsupervised learning?

In the unsupervised learning scenario, we only have data on a P-dimensional set of features X.

The goal is then to extract interesting and hopefully low- dimensional information from these data.

How well did you know this?

Not at all

Perfectly

What is the main challenge of unsupervised learning?

it is very subjective and hard to assess in many situations.

No clear goal of analysis – not a prediction prob

How well did you know this?

Not at all

Perfectly

What are some methods for unsupervised learning?

Density estimation
principal component analysis
Clustering
Topic model
Graphical model

How well did you know this?

Not at all

Perfectly

What is clustering? Which are the clustering algorithms?

collection of methods for finding subgroups, or clusters, in the data at hand.

trying to break density functions into more bite-sized bits.

Algo: K means, Hierachical

How well did you know this?

Not at all

Perfectly

How do we define simlarity in clustering?

proximity/closensess

How well did you know this?

Not at all

Perfectly

What does k means clustering do ? What kind of data is it designed for?

seeks to partition a data set into K non- overlapping clusters using the Euclidean distance as a measure of closeness.

continuous data(consideration similar to knn)

How well did you know this?

Not at all

Perfectly

What is the k mean clustering algo process?

1)
Randomly assign a number, from 1 to K, to each of the observations. These serve as initial cluster assignments for the observations.

2)
Iterate until the cluster assignments stop changing:
- For each of K clusters, compute the cluster centroid. The k-th cluster centroid is the vector of the p feature means for the observations in the k-th cluster

Assign each observation to the cluster whose centroid is closest (in Euclidean distance).

How well did you know this?

Not at all

Perfectly

The algorithm is guaranteed to _____________________________at each step, since the ______ minimize the _____________, and reallocating observations closer can only improve the objective.

decrease the value of the objective function
means
sum of squared errors

How well did you know this?

Not at all

Perfectly

What type (local/global)of minimum does the algo find and why? How to find the other min?

locall not flobal
Results depend on initial assignmnet
rerun the algorithm multiple times with different initial randomizations and select the solution with the smallest objective.

How well did you know this?

Not at all

Perfectly

Will we always benefit from standardising data for kmeans?

May or may not

nee dto try both and see which makes more sense

How well did you know this?

Not at all

Perfectly

How is hierachical clustering better than k means?

no need to commit to specific choice of k, resuts in nice looking tree

How well did you know this?

Not at all

Perfectly

What is the nice looking tree based representation called?

dendogram

How well did you know this?

Not at all

Perfectly

What is the general process behind hierachical clusteirng?

build a series of clusters in which each higher level cluster is formed by merging lower level clusters.

Lowest level – single observation; highest level – whole sample.

How well did you know this?

Not at all

Perfectly

What are the key approaches of hierachical clustering ? How are clusters formed?

agglomerative (bottom up, fuse observation, MOST COMMON)
divisive(top down, split existing clusters)

Clusters are formed based on a dissimilarity measure, with the method being quite sensitive to the choice of this measure.

How well did you know this?

Not at all

Perfectly

What is the agglomerative clustering algo process?

Study These Flashcards

Start with N observations and a dissimilarity measure of all (N 2) = N (N -1)/2 (same as choose) pairwise disimilarities. (Compute euclideadn distance between all possible points)
for i =N, N-1,….
- Identify the pair of clusters that are least dissimilar. Fuse to form a new cluster. Dissimilarity between these clusters indicates height of dendrogram for fusion.
- Compute new pairwise inter-cluster dissimilarities among the i-1 remaining clusters.

What is linkage?

Study These Flashcards

way toi measure dissimilarity between observation

___________ linkages are typically preferred by statisticians.

____ are commonly used in genetics

Study These Flashcards

Complete, single and average
centroids

What kind of clusters does the different linkage produce?

Study These Flashcards

Complete and average tend to produce balanced clusters. Some meaningful strucutre

Single tends to produce single leaf clusters. Erratic

Centroid linkage is common in genomics applications.

Clustering methods are somewhat similar in trying to _______________, but focus on ___________rather than ________________

Study These Flashcards

extract signal from a large dataset
commonality
variance decomposition.

What can hierachical clustering be used as?

Study These Flashcards

starting point for further study, preferably on a new dataset

Can we use cross-validation for k means clustering?

Study These Flashcards

CAUSE NO Y VARIABLE

What does the first term represent in information criteria?

Study These Flashcards

fite/ criterion function

What does teh second term represent in inforamtion critiera?

Study These Flashcards

penalty, and related to number of clusters you use and the numebr of variables

What does df represent?

degree of freedom df= K x dim(x)

What does BIC, AIC AICc stand for?

BIC: bayesian information crtiera AIC: AKAIKIAN information criteria AICc: Corrected (for small sample bias) AKAIKIAN information criteria

What is information critieria? Which is the ideal ic?

measure of sample fit lowest IC wins( if you plot ic against x, it should be the minimum point)

Can information critierai be negative?

yes

What is the most common method/approach to hierachical clustering?

agglomerative

What is dissimilarity?

Euclidean distance

Why is centroid linkage not ideal for econometric implementation?

Cetnroid not good, cause of inversion as you add more data set to the cluster, it may result in other neighbouring dataset to the newly added one to also be added to the cluster althoug its not from the cluster

Should you cluster usiung all variable?

NO INCLUDE FEW AT A TIME

Should we standardize data or scale data?

no asn trial and error

Week 11 Flashcards

(33 cards)