theory lecture 6 Flashcards
Association Rule Mining
looks at the occurrences in your data. it focuses on frequency. eg. used to determine which items in a supermarket are often bought together to place them together in the shop. based on the data, you can calculate the occurrence frequency. it can be used for single items and combinations.
Clustering
looks at the difference between data points. if they are not far apart, they belong together in a cluster. it finds “natural” groupings of instances given unlabelled data. it is unsupervised.
association rules
show for example, if you buy diapers, you also buy beer. this association may not go both ways.
support
the probability that a particular item occurs in your data set.
confidence
shows the probability that if you buy an apple, you also buy beer. it is a conditional probability; the probability of beer given apple.
clustering around centroids
if you know how many clusters to expect, the algorithm can pick three random points and assume these as the centroids of the clusters. the algorithm then calculates the distances to move the centroid to the optimal location.
centroid
the point at which the distance from the centroid to the points of the cluster is minimised.
k-means/k nearest neighbour clustering algorithm
(1) start with random points (2) determine the distance from the points to the cluster (3) iterate the points until you end up with distinct clusters.
neural nets
a method for classification that can select more complex regions and be more accurate. it can also overfit the data and find patterns in random noise. it draws shapes instead of straight lines.
neural networks
based on the workings of the brain and imitating it. weights show the importance of inputs.
Processing Element
part of the neural network that connects inputs and outputs. it uses a nonlinear function and can imitate any other learning algorithm. every function can be approximated by a neural net.