Classification, Clustering, And Feature Learning Flashcards by Andrew Legg

When is classification used?

When relating categorical response variables to predictor variables.

How well did you know this?

Not at all

Perfectly

What is a supervised process?

Processes that require a training dataset that shows how particular attributes match up to an outcome of interest.

E.g a set of data where we know for sure that individuals either have or do not have a disease, and we have various attributes measured for each individual.

This can be made into a model and tested on some test data.

How well did you know this?

Not at all

Perfectly

What is an unsupervised process?

A process that doesn’t require a training dataset or any pre defined outcomes

How well did you know this?

Not at all

Perfectly

What are the methods of classification?

1) Logistic regression
2) Random forest classification
3) Support vector machines

How well did you know this?

Not at all

Perfectly

What does the logistic function do?

Transforms linear function allowing us to visualise a binary y scale where y varies from 0 to 1.

This is useful in a binary classification problem when things you want to predict can either take a value of 0 or 1 (yes or no)

This allows for a continuous predictor variable on the X axis and a probability of achieving classification 1 or 0 on the y axis.

How well did you know this?

Not at all

Perfectly

What does the X axis of a logistic regression curve show?

Shows the continuous predictor variable

How well did you know this?

Not at all

Perfectly

What does the y axis of a logistic regression curve show?

The probability of achieving classification of 1 or 0.

The predictions (on the y axis) for each value of the predictor variable (X axis) give a probability for observing the outcome we have coded as “1” for that value of the predictor variable.

How well did you know this?

Not at all

Perfectly

How do support vector machines achieve classification?

They achieve classification by mapping the training data points to points in k dimensional space.

Where k is the number of attributes of the data.

Once data is plotted in k dimensional space a hyperplane separates known categories whilst achieving maximum separation between categories.

How well did you know this?

Not at all

Perfectly

What is a hyperplane?

The hyperplane is one dimensional order lower than the order of k dimensional space. Allowing the hyperplane to slice through k dimensional space therefore separating categories

How well did you know this?

Not at all

Perfectly

What does random forest classification do?

The algorithm takes the training dataset and uses it to build a set of decision trees, each based on a subset of the training data.

When we want to classify a new data point the data point is classified by every decision tree in the forest.

Each tree then votes for how the data point should be classified.

Giving us a measure of the probability that the data point belongs in a particular class.

How well did you know this?

Not at all

Perfectly

What is an advantage of random forest classification?

Random forest algorithms also offer embedded feature selection.

Therefore we can look at the measures it uses to determine the importance of each factor.

How well did you know this?

Not at all

Perfectly

What is mean decrease accuracy?

R permutes each variable of interest and measures how the accuracy of classification is affected.

The greater the decrease in accuracy when a variable is randomly permuted the more important it is.

How well did you know this?

Not at all

Perfectly

What is mean decrease Gini?

Gini is a way of measuring if something is homogeneous or heterogeneous.

Each node in each decision tree is associated with a decrease in Gini impurity score for descendent nodes compared to the parent node.

Adding up all the reductions every time a node decision tree is split based on a particular variable gives a measure of importance for that variable.

How well did you know this?

Not at all

Perfectly

What is a test data set? And what can it be used for in classification?

A test dataset is some data that has been kept where you know the classification of each data point and test a model against it to see how well the model makes predictions compared to the tire classifications.

How well did you know this?

Not at all

Perfectly

Draw a confusion matrix?

Drawing

How well did you know this?

Not at all

Perfectly

What is sensitivity?

Study These Flashcards

What proportion of actually true readings did the test manage to detect.

True positives/(true positives + false negatives)

What is specificity?

Study These Flashcards

The proportion of actually false readings did the test manage to detect.

True negatives/(true negatives + false positives)

What are ROC curves used for?

Study These Flashcards

To study the predictive performance of a binary classifier across different values of its discrimination thershold

What is a discrimination threshold?

Study These Flashcards

The threshold at which the classification algorithm deems a data point to belong to one category or another

What are the axis of a ROC curve?

Study These Flashcards

Sensitivity Vs 1-specificity

What is the AUC of a perfect classifier?

Study These Flashcards

What is the AUC of a useless classifier?

Study These Flashcards

0.5

Is clustering supervised or unsupervised?

Study These Flashcards

Unsupervised

Is classification supervised or unsupervised?

Study These Flashcards

Supervised

Why would you want to cluster?

When we believe there might be groups within the dataset but we are not sure how many groups or what their characteristics might be.

What are the methods of clustering?

1) K-Means clustering | 2) Hierarchical clustering

What is the goal of k means clustering?

To assign n data points to k clusters To minimise the within clusters sum of squares (the amount of variation within each cluster).

What is the goal of heirarchical clustering and how is it done?

1) it builds a tree of data items based on a predefined distance metric between data points. Things that are closer to eachother appear closer in the tree. 2)Each data point starts off in its own cluster. The algorithm iteratively merges the closest pairs of clusters untill you arrive at the never of clusters you wanted.

How do you decide the right number of clusters?

1) Elbow Method | 2) BIC and AIC (preferred model ( one with most appropriate clusters) is the one with minimum BIC/AIC).

What does the elbow method plot?

Number of clusters on X axis | Within clusters sum of squares on y axis

How does the elbow method work?

There comes a point where increasing the number of clusters doesn't give you any more explanatory power. This is the point on the graph where there is hardly any more decrease in within clusters sum of squares.

How can you tell if clustering algorithm matches known categories?

Use Adjusted Rand Index.

What is the adjusted Rand Index?

a+b/a+b+c+d ``` a= # pairs of items in the same cluster in both partitions b= # pairs of items in different clusters in both partitions c= # pairs in the same cluster in partition X but different clusters in partition Y ``` d= # pairs in different clusters in partition X but same clusters in partition Y

What is the best score for an adjusted Rand Index?

The higher the better 1 is the best

What is feature learning?

Feature learning transforms raw data into learned features. Drawing elements out of all features into a learned feature which has combined elements of the features that were measured.

How can feature learning be done?

Feature learning can be done through PCA

What is PCA

The data is plotted on its principal components. Which are lines in space that account for the data's variation . This therefore can incorporate multiple features at the same time.

Classification, Clustering, And Feature Learning Flashcards

(37 cards)