Section 1 Introduction Flashcards
Define statistical machine learning
Statistical machine learning encompasses automatic learning and data analysis procedures. A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.
Name two ensemble methods
Bagging (random forests) and stacking
Name two supervised learning methods
Classification, Regression
Name three unsupervised learning methods
Pattern search, clustering, dimension reduction or generalization
What are the two purposes of tasks performed by models generally.
Extract patterns of interest from a collection of input variables X -unsupervised
Describe the generating process of a target variable Y as accurately as possible - Supervised
Define unsupervised learing
Learn interesting characteristics of the data without any particular guide. Learning from the input variables X without target. Find interpretable patterns, define low-dimensional representations, and/or detect associations which characterise the data object of investigation.
Define supervised learning
A target is available, which guides the learning/predictive process and the extraction of useful structure. Learning is from target and input, D = {y, X}.
Describe what a “good” function is that relates the target variable to the input variable in supervised learning
Function minimises the discrepancy between the observed values and the values in output from the model. Discrepancy is measured by loss functions.
Explain what is training and test data
The parameters of the function are learned (estimated) by minimising the loss on the available data, “training” data.
Performance is evaluated using the test data, which are data points not seen by the machine learning algorithm
Explain three different summaries of the data that can be formed under unsupervised learing to infer an interpretable and/or usable summary of the data
Detecting groups of similar units is clustering
Inferring meaningful associations would be Association rule mining, graphical Models etc
Reduce the dimension of the data would be embedding learning (PCA)