Machine Learning Concepts Flashcards
Active Learning
Active learning is a special case of semi-supervised machine learning in which a learning algorithm is able to interactively query the user (or some other information source) to obtain the desired outputs at new data points. In statistics literature it is sometimes also called optimal experimental design. There are situations in which unlabeled data is abundant but manually labeling is expensive. In such a scenario, learning algorithms can actively query the user/teacher for labels. This type of iterative supervised learning is called active learning. Since the learner chooses the examples, the number of examples to learn a concept can often be much lower than the number required in normal supervised learning.
Association Rules
Detect relationships or associations between specific values of categorical variables in large data sets. Market basket analysis: uncover hidden patterns in large data sets, such as “customers who order product A often also order product B or C” or “employees who said positive things about initiative X also frequently complain about issue Y but are happy with issue Z.”
Bayes Theorem
P(A | B) = P(B | A) * P(A) / P(B); P(A) being the number of instances of a given value divided by the total number of instances; P(B) is often ignored since this equation is typically used in a probability ratio that compares two different values for A, with P(B) being the same for both
Bayesian Networks
Bayesian networks are a graphical formalism for representing the structure of a probabilistic model, i.e. the ways in which the random variables may depend on each other. Intuitively, they are good at representing domains with a causal structure, and the edges in the graph determine which variables directly influence which other variables. They can be equivalently viewed as representing a factorization structure of the joint probability distribution, or as encoding a set of conditional independence assumptions about the distribution.
Deeply understand the top 20 concepts
…
Discriminative vs. Generative
Discriminative models learn to discriminate between different inputs. For example, classifying images as containing a dog or not containing a dog is a discriminative task. An example of a discriminative model is a support-vector machine. Generative models usually involve probabilities and their distinguishing feature is that you can generate new data from them. For example, if you tried to estimate the probability distribution over images containing dogs and a different distribution over images not containing dogs, you would have generatively modeled the situation. You could use these distributions to sample new images of dogs (or new images not containing dogs). If you wanted to use this generative model for a discriminative task, you could: given an image, you could see which of the two distributions assigns higher probability to that image and then choose that as your result. Thus, there is a distinction here between discriminative model and discriminative task: it may be possible to use generative models for discriminative tasks.
Elastic Nets
…
Ensemble Learning
Machine learning approach that combines the results from many different algorithms, whose combined vote (from the ensemble) provides a more robust and accurate predictive output than any single algorithm can muster.
Factor Analysis
used as a variable reduction technique to identify groups of clustered variables. (submitted by Vincent Granville)
Feature Scaling
Different features will have different effects on the accuracy of classifications. A scaling factor can be used for each feature to reduce its effect on the classification, possibly to 0. These scales can be optimized to not only improve classifications, but to also show the relative importance of the features.
Feature Selection
…
Frequentist vs. Bayesian
The Bayesian view is essentially that everything should be done with Bayes’ rule: computing posterior probabilities by multiplying priors with likelihoods. In a Bayesian approach, you usually have a posterior distribution over models. Then, if you want to use this model for something, like making a prediction, you integrate over your posterior distribution of models to get a sort of “expected value” of the thing you are trying to predict. Frequentist is often used to mean not Bayesian. In a frequentist approach, you typically find a “best” solution (i.e., model) to the problem you are trying to solve. You then use this best model to make the prediction. I believe there is a relationship between the frequentist approach and discriminative models, and likewise for the Bayesian approach and generative models.
Generative Approach
Models the measurements in each class. It is more work, but it can exploit more prior knowledge, needs less data, is more modular, and can handle missing or corrupted data. Methods include mixture models and Hidden Markov Models.
Graph Databases
they use graph structures (a finite set of ordered pairs or certain entities), with edges, properties and nodes for data storage. It provides index-free adjacency, meaning that every element is directly linked to its neighbour element.
HDPs or other Bayesian non-parametric model
…