Topic 3 Flashcards

1
Q

what does each letter stand for in this equation y = f(x)

A

y is output (label/target)
f is prediction function
x is input (feature)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is the process of collecting good training data

A
  1. data collection
  2. data cleaning
  3. data labelling
  4. data pre-processing (filtering, scaling, etc)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

good training data is …..

A

large, correctly labelled, reliable, diverse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what do weights do in a model

A

they are parameters of a model that determine strength and direction of relationship between the features and the target

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is the goal of training

A

to minimise a loss function by updating the weights

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

ridge regression

A

method of estimating coefficients of multiple - regression models in scenarios where the independent variables (outputs) are highly correlated. Avoids overfitting through regularisation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what do decision trees do

A

divide data features space into set of hypercubes that are classified as signal (+1) or background (-1)
- each region can be fitted with a constant to represent the data in that region
- can continue to sub-divide the data until some minimum number of examples are left in each sub division
- output of decison tree is either 1 or -1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

negatives of decision trees

A
  • single tree is susceptible to overtraining
    -EDIT
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is a random forest

A
  • they are constructed from an ensemble of individual trees
  • each tree uses a randomly selected subset of the feature space, and the minimum node size is usually set to 1 = classifier prediction is almost accurate
  • the mode (classification) or mean (regression) of the ensemble is the output of the random forest
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is clustering

A
  • unsupervised machine learning technique designed to group unlabelled examples based on their similarity to each other
How well did you know this?
1
Not at all
2
3
4
5
Perfectly