Topic 3 Flashcards

Question 1

Q

what does each letter stand for in this equation y = f(x)

Answer

A

y is output (label/target)
f is prediction function
x is input (feature)

Question 2

Q

what is the process of collecting good training data

Answer

A

data collection
data cleaning
data labelling
data pre-processing (filtering, scaling, etc)

Question 3

Q

good training data is …..

Answer

A

large, correctly labelled, reliable, diverse

Question 4

Q

what do weights do in a model

Answer

A

they are parameters of a model that determine strength and direction of relationship between the features and the target

Question 5

Q

what is the goal of training

Answer

A

to minimise a loss function by updating the weights

Question 6

Q

ridge regression

Answer

A

method of estimating coefficients of multiple - regression models in scenarios where the independent variables (outputs) are highly correlated. Avoids overfitting through regularisation

Question 7

Q

what do decision trees do

Answer

A

divide data features space into set of hypercubes that are classified as signal (+1) or background (-1)
- each region can be fitted with a constant to represent the data in that region
- can continue to sub-divide the data until some minimum number of examples are left in each sub division
- output of decison tree is either 1 or -1

Question 8

Q

negatives of decision trees

Answer

A

single tree is susceptible to overtraining
-EDIT

Question 9

Q

what is a random forest

Answer

A

they are constructed from an ensemble of individual trees
each tree uses a randomly selected subset of the feature space, and the minimum node size is usually set to 1 = classifier prediction is almost accurate
the mode (classification) or mean (regression) of the ensemble is the output of the random forest

Question 10

Q

what is clustering

Answer

A

unsupervised machine learning technique designed to group unlabelled examples based on their similarity to each other