Week 11 Flashcards

1
Q

What was the goal of supervised learning? What are the pros of it?

A

The goal was to build the model approximating the conditional mean of Y well out-of-sample, but avoiding overfit

could relatively easily tell whether the model is doing a good or bad job – just look at the forecast risk estimates out- of-sample or some other performance measures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How does data in unsupervised learning look like? What is the goal of unsupervised learning?

A

In the unsupervised learning scenario, we only have data on a P-dimensional set of features X.

The goal is then to extract interesting and hopefully low- dimensional information from these data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the main challenge of unsupervised learning?

A

it is very subjective and hard to assess in many situations.

No clear goal of analysis – not a prediction prob

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are some methods for unsupervised learning?

A
  • Density estimation
  • principal component analysis
  • Clustering
  • Topic model
  • Graphical model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is clustering? Which are the clustering algorithms?

A

collection of methods for finding subgroups, or clusters, in the data at hand.

trying to break density functions into more bite-sized bits.

Algo: K means, Hierachical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do we define simlarity in clustering?

A

proximity/closensess

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does k means clustering do ? What kind of data is it designed for?

A

seeks to partition a data set into K non- overlapping clusters using the Euclidean distance as a measure of closeness.

continuous data(consideration similar to knn)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the k mean clustering algo process?

A

1)
Randomly assign a number, from 1 to K, to each of the observations. These serve as initial cluster assignments for the observations.

2)
Iterate until the cluster assignments stop changing:
- For each of K clusters, compute the cluster centroid. The k-th cluster centroid is the vector of the p feature means for the observations in the k-th cluster

  • Assign each observation to the cluster whose centroid is closest (in Euclidean distance).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The algorithm is guaranteed to _____________________________at each step, since the ______ minimize the _____________, and reallocating observations closer can only improve the objective.

A

decrease the value of the objective function
means
sum of squared errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What type (local/global)of minimum does the algo find and why? How to find the other min?

A

locall not flobal
Results depend on initial assignmnet
rerun the algorithm multiple times with different initial randomizations and select the solution with the smallest objective.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Will we always benefit from standardising data for kmeans?

A

May or may not

nee dto try both and see which makes more sense

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is hierachical clustering better than k means?

A

no need to commit to specific choice of k, resuts in nice looking tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the nice looking tree based representation called?

A

dendogram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the general process behind hierachical clusteirng?

A

build a series of clusters in which each higher level cluster is formed by merging lower level clusters.

Lowest level – single observation; highest level – whole sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the key approaches of hierachical clustering ? How are clusters formed?

A
  1. agglomerative (bottom up, fuse observation, MOST COMMON)
  2. divisive(top down, split existing clusters)

Clusters are formed based on a dissimilarity measure, with the method being quite sensitive to the choice of this measure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the agglomerative clustering algo process?

A
  1. Start with N observations and a dissimilarity measure of all (N 2) = N (N -1)/2 (same as choose) pairwise disimilarities. (Compute euclideadn distance between all possible points)
  2. for i =N, N-1,….
    - Identify the pair of clusters that are least dissimilar. Fuse to form a new cluster. Dissimilarity between these clusters indicates height of dendrogram for fusion.
    - Compute new pairwise inter-cluster dissimilarities among the i-1 remaining clusters.
17
Q

What is linkage?

A

way toi measure dissimilarity between observation

18
Q

___________ linkages are typically preferred by statisticians.

____ are commonly used in genetics

A

Complete, single and average
centroids

19
Q

What kind of clusters does the different linkage produce?

A

Complete and average tend to produce balanced clusters. Some meaningful strucutre

Single tends to produce single leaf clusters. Erratic

Centroid linkage is common in genomics applications.

20
Q

Clustering methods are somewhat similar in trying to _______________, but focus on ___________rather than ________________

A

extract signal from a large dataset
commonality
variance decomposition.

21
Q

What can hierachical clustering be used as?

A

starting point for further study, preferably on a new dataset

22
Q

Can we use cross-validation for k means clustering?

A

No

CAUSE NO Y VARIABLE

23
Q

What does the first term represent in information criteria?

A

fite/ criterion function

24
Q

What does teh second term represent in inforamtion critiera?

A

penalty, and related to number of clusters you use and the numebr of variables

25
Q

What does df represent?

A

degree of freedom
df= K x dim(x)

26
Q

What does BIC, AIC AICc stand for?

A

BIC: bayesian information crtiera
AIC: AKAIKIAN information criteria

AICc: Corrected (for small sample bias) AKAIKIAN information criteria

27
Q

What is information critieria? Which is the ideal ic?

A

measure of sample fit

lowest IC wins( if you plot ic against x, it should be the minimum point)

28
Q

Can information critierai be negative?

A

yes

29
Q

What is the most common method/approach to hierachical clustering?

A

agglomerative

30
Q

What is dissimilarity?

A

Euclidean distance

31
Q

Why is centroid linkage not ideal for econometric implementation?

A

Cetnroid not good, cause of inversion

as you add more data set to the cluster, it may result in other neighbouring dataset to the newly added one to also be added to the cluster althoug its not from the cluster

32
Q

Should you cluster usiung all variable?

A

NO
INCLUDE FEW AT A TIME

33
Q

Should we standardize data or scale data?

A

no asn

trial and error