Topic 3 Flashcards
what does each letter stand for in this equation y = f(x)
y is output (label/target)
f is prediction function
x is input (feature)
what is the process of collecting good training data
- data collection
- data cleaning
- data labelling
- data pre-processing (filtering, scaling, etc)
good training data is …..
large, correctly labelled, reliable, diverse
what do weights do in a model
they are parameters of a model that determine strength and direction of relationship between the features and the target
what is the goal of training
to minimise a loss function by updating the weights
ridge regression
method of estimating coefficients of multiple - regression models in scenarios where the independent variables (outputs) are highly correlated. Avoids overfitting through regularisation
what do decision trees do
divide data features space into set of hypercubes that are classified as signal (+1) or background (-1)
- each region can be fitted with a constant to represent the data in that region
- can continue to sub-divide the data until some minimum number of examples are left in each sub division
- output of decison tree is either 1 or -1
negatives of decision trees
- single tree is susceptible to overtraining
-EDIT
what is a random forest
- they are constructed from an ensemble of individual trees
- each tree uses a randomly selected subset of the feature space, and the minimum node size is usually set to 1 = classifier prediction is almost accurate
- the mode (classification) or mean (regression) of the ensemble is the output of the random forest
what is clustering
- unsupervised machine learning technique designed to group unlabelled examples based on their similarity to each other