Supervised Learning Flashcards
Includes the topics: Input representation, hypothesis class, version space, VC Dimension, PAC, Noise, Learning multiple classes, model selection and generalization.
What do you mean by input representation for the problem?
A real-world problem can have large number of input features. But not all these features are always important or relevant.
Only those features which are significant are needed to be considered for assigning the class labels. These “Input Features” constitute an “Input Representation” for the given problem.
What is a hypothesis?
It is a statement or a proposition that explains the given set of facts or observations.
What is a hypothesis space?
It is the set of hypotheses for a problem.
What do you mean by consistency?
A hypothesis is said to be consistent if h(x)=c(x).
Where h(x) is the hypothesis function and
c(x) is the class labels.
What is version Space?
It is a set of hypotheses, consistent with the set of training examples.
The version space is present between the most general and most specific hypotheses.
When we can say that a hypothesis is consistent?
A hypothesis is consistent if it correctly classifies all training examples.
What do you mean by the Most General Hypothesis(G)?
A hypothesis is said to be the most general hypothesis if it covers none of the negative examples and there is no other hypothesis ‘ h’ ‘ that covers no negative examples such that ‘ h’ ‘ is more general than h.
What do you mean by the Most Specific Hypothesis(S)?
A hypothesis is said to be the most specific hypothesis if it covers no negative examples and there is no other hypothesis ‘ h’ ‘ that covers no negative examples such that ‘ h’ ‘ is more specific than h.
What is noise?
Noises are the unwanted anomaly in the data.
How noise arises?
The factors affecting the creation of noise are:
1. Imprecision in recording the input attributes, which may shift the data points in the input space.
2. Errors in labelling the data points.
3. Neglecting attributes that are relevant to the prediction of labels.
What are the effects of noise?
- Noise disotrs data
- Leads to wrong prediction.
- Reduces the accuracy of the model.
- An increase in the complexity of the induced classifier.
- An increase in training time.
Which are the methods used in learning multiple classes?
- One-against-all
- One-against-one
Explain One-vs-all approach?
In this approach, the n number of classification models are trained in parallel with the n number of the output classes by considering that there is always a separation between the actual class and the remaining classes.
Explain One-vs-one approach?
One-vs-one is an alternative approach to One-vs-all. This means training a machine learning model for each pair of classes. The time complexity of this approach is therefore not linear and the right class is determined by the majority class. In general, One-vs-one is more expensive than One-vs-all and it should only be adopted when a comparison of the complete data set is not preferred.
What do you mean by model selection?
It is the process of selecting a model for a problem. It may include selecting appropriate algorithms, choosing the set of input features, or choosing the initial values for certain parameters. It has also been described as the process of selecting the right inductive bias.