Final Prep Flashcards

Question 1

Q

Unsupervised Learning

Answer

A

Make Sense of Unlabeled Data
Data Description - find a more compact way to describe it

Versus Function Approximation of Supervised Learning with Labeled Data

Question 2

Q

Basic Clustering Problem

Answer

A

Given:
Set of objects X
Inter object Distances D(x,y)
which is equal D(y,x)

~~~
Output:
Partition Pd(x) = Pd(y)
this is the "partition function"
which outputs a compact descriptor of all equivalent data

~~~

Question 3

Q

Single Linkage Clustering (SLC)

Answer

A

A space of n objects all distances from each other

The two closest points are connected
To be SLC it must be closest
And then the closes clusters are merged

Until only k clusters are achieved

Distances only matter between un-linked items

Question 4

Q

Hierarchical Agglomerative Cluster Structure

Answer

A

Used to represent linkage cluster derived from SLC

Question 5

Q

Running Time of SLC

Answer

A

<p>for n points
with k Clusters
O(n^3)
n^2 from the distances and n/2 to link the closest one
find the 
Similar to spanning tree problem
</p>

Question 6

Q

k-means clustering algorithm

Answer

A

-pick k "centers" (at random)

- each center claims center
- recompute the centers by averaging clustered
- repeat until converged
- this center could be a point in the collection or a point in the space. This is almost like creating kNN

Question 7

Q

Euclidean space

Answer

A

In geometry, a two- or three-dimensional space in which the axioms and postulates of Euclidean geometry apply; also, a space in any finite number of dimensions, in which points are designated by coordinates (one for each dimension) and the distance between two points is given by a distance formula.

Question 8

Q

k-means Proof

Answer

A

P(x) : Partition/Cluster of object x
Ci: set of all points in a cluster of i
center = sum(y)/(Ci)

P(x) -> center -> P(x)

Question 9

Q

K-Means as Optimization

Answer

A

Configurations(inputs) - center, P

Scores - the further from the center increases the error of that object E(P, center) = Sum||center - x||^2

Neighborhood-where the P are set and the center changes or the center is set and the P changes

Question 10

Q

Randomized Optimization most like K-Means

Answer

A

Hill Climbing
You are taking steps towards a configuration better than the configuration before. We are finding Neighbors slightly better than before

Question 11

Q

Properties of K-means clustering

Answer

A

-Each Iteration is polynomial O(kn)
-finite (exponential) iterations O(k^n)
different ways of assigning partitions.
in practice this is short as there are limited ways
- Error Decreases (if ties broken consistently)
- can get stuck

Question 12

Q

Stop K-means stuck mitigation

Answer

A

-Random Restarts

| - Initial review of data to find centers far apart

Question 13

Q

Equal Distance Points between Clusters

Answer

A

for points equal distance between two clusters. It will sometimes be assigned to either cluster

Question 14

Q

Soft Clustering

Answer

A

Assume the data generated by

1) Select one of k Gaussians with known variances
2) Same xi from that Guassian
3) Repeat n times

Task- Find a hypothesis h =

Question 15

Q

Maximum Likelihood Gaussian

Answer

A

- The ML mean of the gaussian is the mean of the data

- does not apply for k different means, just for one
- the k different means is based on assigning each point hidden variables to determine what gaussian it is a part of

Question 16

Q

Expectation Maximazation

Answer

A

Tick tok between expectation (definze z from mu) to maximazation where you define ,u from z

Question 17

Q

Properties of EM

Answer

A

-monotonically non- decreasing likelihood
- It always moves to something better
- Has a change to not converge (practically does)
- Will not diverge
in k means there is change to diverge
- can get stuck
can make a local optima where it finds a good assignment but we need ot fix with random restart
- works with any distribution
domain knowledge for E and M stop based on data

Question 18

Q

Clustering Properties

Answer

A

- Richness
all inputs and any way of clustering is possible
- Scale In-variance
doubling or halving distances does not changing the way the clusters should be determined
- Consistency
shrinking the point to point and expanding the cluster to cluster distance will not change the clustering. This is an application of domain knowledge.

Question 19

Q

Impossibility Theorem

Answer

A

Richness/Scale In-variance/ and Consistency cannot all be true at the same time

Question 20

Q

Feature Selection

Answer

A

Two Reasons:
1) Knowledge Discovery
Make it able to interpret-ability and Insight
2) Cure of Dimensionality
As you add more data you need exponential to the number of features you have

Question 21

Q

How hard is it to reduce a problem from N features to m features

Answer

A

This is an exponential hard problem. 2^N or (n m) which is n choose m. This is NP-hard. This can match the 3 set np-hard problem.

Question 22

Q

Filtering- Features

Answer

A

Features with some type of search algorithm then to a learning algorithm. This is flow forward, but with no feedback. It can look at the labels, so you could use some type of thing like information gain. You could even use DT as the filterer.
Pro
Speed
Cons
Look at features in isolation, but maybe it is important when looked at with something else
Ignores the learner.

Question 23

Q

Wrapping - Features

Answer

A

Take in the features, searches over a subset then goes to the learning algorithm. Then updates the search with those scores. The advantage being
Pro
Takes into account the learner and model bias
Con
Very slow since the speed of the learner is also included in the time.

Question 24

Q

Filtering Implementation

Answer

A

<p>Information Gain
Variance, Entropy
GINI Index
NN where we remove low weights
"Useful" Features
Indepedent / Non_redundant</p>

Question 25

Q

Wrapping Implementation

Answer

A

Hill Climbing but taking the return scores from the learner
Randomized Optimization in general can be done
Forward Search

Question 26

Q

Forward Search

Answer

A

Search through the entire set and find the best, then try in combination with each of the other features, and continue until the increase in score plateau

Question 27

Q

Back Ward Search

Answer

A

Try all combinations of n-1 features. Then n-2. Then stop at a point when the subset does pretty well. And the error dramatically increases.

Question 28

Q

Relevance

Answer

A

xi is strongly relevant if remove it degrades Bayes Optimial Classifier
xi is weakly relevent if it is not strongly relevent and some subset of features S such that adding xi to S improves BOC
otherwise xi is irrelevant

Question 29

Q

Bayes Optimal Classifier

Answer

A

Takes weighted average of all hypthosis

| BOC is best you can do on average if you can find it

Question 30

Q

Relevance vs Usefulness

Answer

A

Relevance measures effect on BOC
Usefulness Measures effect on a particular predictor
Relevance aka information
Usefulness aka Error given Model/Learner

Question 31

Q

Feature Transformation

Answer

A

Pre Process a set of features, to create a new set of features. THis should be smaller or more compact. Retain the relavent and useful information.

Question 32

Q

Polysemy

Answer

A

Words mean many things, false positives

Question 33

Q

Synonomy

Answer

A

The same thing can be represented by different words, false negatives

Question 34

Q

Principal Components Analysis

Answer

A

Allows transformation intoa new space that can be used for feature selection, by looking at the eigenvalue. For instance anything with an eigen value of 0 would be removed. THis finds correlation, maximizng variance and allows reconstruction. I need to research more on this.

Question 35

Q

Principal Components Analysis

Answer

A

Allows transformation into a new space that can be used for feature selection, by looking at the eigenvalue. For instance anything with an eigen value of 0 would be removed. This finds correlation, maximizing variance and allows reconstruction.

Mutually Orthogonal
Maximal Variance
Ordered Features
Bag of Features (because Ordered Features are a type of bag of features)

I need to research more on this.

Question 36

Q

Independent Components Analysis

Answer

A

Independence of features. Creates new features from each base features through linear transformation that allows all the new features to be statistically independent. Mutual information of zero

Mutually Independent
Maximal Mutual Information
Bag of Features

Question 37

Q

Random Components Analysis

Answer

A

Generates Random Directions and project the data out to these. It works well if the next thing is some type of classification. The projection is lower than N but not as much as PCA. The big advantage is speed.

Question 38

Q

Linear Discriminant Analysis

Answer

A

Finds a projection that discriminates based on the label. Project into clumps or clusters.

Question 39

Q

Three Types of Learning

Answer

A

Supervised Learning: y = f(x)
Like function approximation

Unsupervised Learning: f(x)
Clustering/Description Like Description

Reinforcement Learning: y = f(x), z
A lot like function approximation with the added z

Question 40

Q

Markov Decision Process

Answer

A

<p>State: S
Model: T(s,a,s') ~ Pr(s'|s,a)
Actions: A(s), A
Reward: R(s), R(s,a) , R(s,a,s')
-----------------------------------------
Policy: pi(s) -> a</p>

Question 41

Q

Markov State

Answer

A

Is the possible representations inside of the world

Question 42

Q

Markov Action

Answer

A

Things that a state can do, that when you are in a state that you can take

Question 43

Q

Markov Model

Answer

A

Given state, state prime, and action it gives you the probability of state prime given state and action

Question 44

Q

Markov Property

Answer

A

Only the present matters
Pr(s'|s,a). Only the most recent state matters
You can trick this to that the current state remembers everything it needs to know

Question 45

Q

Markov Reward

Answer

A

R(s), R(s,a) , R(s,a,s') There are several ways to look at rewards. But might be better to think one way or another. We will focus on R(s). A reward for a certain state.

Question 46

Q

Markov Policy

Answer

A

pi(s) -> a, a function that takes in a state and and gives you an action you should take. A "command". pi* is an optimized reward you should take to maximize reward.

Question 47

Q

Temporal Credit Assignment Problem

Answer

A

This is the problem of assigning blame/credit of each move in a sequence when only the final state matters

Question 48

Q

Utility of Sequences

Answer

A

If one sequence is of higher utility then all subset of the sequences are also of higher utility
This can be thought of as the sum of the rewards of each state

Question 49

Q

Utility of State

Answer

A

Is the reward for that state, but also all the reward we could theoretically get for the rest of the sequence based on the policy.

Question 50

Q

Finding the Policty

Answer

A

1) Start with a guess at the policy
2) Evaluate the policty by calcuating the utility with that policy
3) Improve the policy based on the utility function
because there is no max here, now this is a linear equation solve

Question 51

Q

Finding the Policy (Policy Iteration)

Answer

A

1) Start with a guess at the policy
2) Evaluate the policy by calculating the utility with that policy
3) Improve the policy based on the utility function
because there is no max here, now this is a linear equation solve

Question 52

Q

Question 53

Q

Modeler

Answer

A

Takes in Transistions and puts out Models

Question 54

Q

Simulator

Answer

A

Takes in Model and Makes Transistions

Question 55

Q

Learner

Answer

A

takes trancitions and puts out a policy

Question 56

Q

PLaner

Answer

A

Takes in a model and puts out a policy

Question 57

Q

Model Based Reinforcement Learning

Answer

A

Transistions to Modeler to Planner to Policy

Question 58

Q

RL Based Planner

Answer

A

Model to Simulator to Transistions to Learner to Policy

Question 59

Q

Three Main Approaches to RL

Answer

A

Policy Search - States are given and policy Derived

Value Function Based - Takes states to determine utility, states mapped to values

Model Based - (fairly direct learning) States and Actions take out

Final Prep Flashcards

2nd Half Semester Information