Section 1 Introduction Flashcards

1
Q

Define statistical machine learning

A

Statistical machine learning encompasses automatic learning and data analysis procedures. A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Name two ensemble methods

A

Bagging (random forests) and stacking

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Name two supervised learning methods

A

Classification, Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Name three unsupervised learning methods

A

Pattern search, clustering, dimension reduction or generalization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the two purposes of tasks performed by models generally.

A

Extract patterns of interest from a collection of input variables X -unsupervised
Describe the generating process of a target variable Y as accurately as possible - Supervised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define unsupervised learing

A

Learn interesting characteristics of the data without any particular guide. Learning from the input variables X without target. Find interpretable patterns, define low-dimensional representations, and/or detect associations which characterise the data object of investigation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define supervised learning

A

A target is available, which guides the learning/predictive process and the extraction of useful structure. Learning is from target and input, D = {y, X}.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe what a “good” function is that relates the target variable to the input variable in supervised learning

A

Function minimises the discrepancy between the observed values and the values in output from the model. Discrepancy is measured by loss functions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain what is training and test data

A

The parameters of the function are learned (estimated) by minimising the loss on the available data, “training” data.
Performance is evaluated using the test data, which are data points not seen by the machine learning algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Explain three different summaries of the data that can be formed under unsupervised learing to infer an interpretable and/or usable summary of the data

A

Detecting groups of similar units is clustering
Inferring meaningful associations would be Association rule mining, graphical Models etc
Reduce the dimension of the data would be embedding learning (PCA)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly