TDS - Top 30 Questions Flashcards
What is the difference between supervised and unsupervised machine learning?
Supervised Machine learning: Supervised machine learning requires training labelled data. Let’s discuss it in bit detail, when we have Unsupervised Machine learning: Unsupervised machine learning doesn’t required labelled data.
What is bias, variance trade off ?
Bias:
Bias is error introduced in your model due to over simplification of machine learning algorithm. It can lead to under fitting. When you train your model at that time model makes simplified assumptions to make the target function easier to understand. Low bias machine learning algorithms — Decision Trees, k-NN and SVM High bias machine learning algorithms — Linear Regression, Logistic Regression
Variance:
Variance is error introduced in your model due to complex machine learning algorithm, your model learns noise also from the training data set and performs bad on test data set. It can lead high sensitivity and over fitting. Normally, as you increase the complexity of your model, you will see a reduction in error due to lower bias in the model. However, this only happens till a particular point. As you continue to make your model more complex, you end up over-fitting your model and hence your model will start suffering from high variance.
Bias, Variance trade off:
The goal of any supervised machine learning algorithm is to have low bias and low variance to achieve good prediction performance. The k-nearest neighbours algorithm has low bias and high variance, but the trade-off can be changed by increasing the value of k which increases the number of neighbours that contribute to the prediction and in turn increases the bias of the model. The support vector machine algorithm has low bias and high variance, but the trade-off can be changed by increasing the C parameter that influences the number of violations of the margin allowed in the training data which increases the bias but decreases the variance. There is no escaping the relationship between bias and variance in machine learning. Increasing the bias will decrease the variance. Increasing the variance will decrease the bias.
What are exploding gradients ?
Gradient:
Gradient is the direction and magnitude calculated during training of a neural network that is used to update the network weights in the right direction and by the right amount. Exploding gradients are a problem where large error gradients accumulate and result in very large updates to neural network model weights during training. At an extreme, the values of weights can become so large as to overflow and result in NaN values. This has the effect of your model being unstable and unable to learn from your training data. Now let’s understand what is the gradient.
What is a confusion matrix ?
The confusion matrix is a 2X2 table that contains 4 outputs provided by the binary classifier. Various measures, such as error-rate, accuracy, specificity, sensitivity, precision and recall are derived from it. Confusion Matrix A data set used for performance evaluation is called test data set. It should contain the correct labels and predicted labels. The predicted labels will exactly the same if the performance of a binary classifier is perfect. The predicted labels usually match with part of the observed labels in real world scenarios. A binary classifier predicts all data instances of a test dataset as either positive or negative. This produces four outcomes- True positive(TP) — Correct positive prediction False positive(FP) — Incorrect positive prediction True negative(TN) — Correct negative prediction False negative(FN) — Incorrect negative prediction Basic measures derived from the confusion matrix Error Rate = (FP+FN)/(P+N) Accuracy = (TP+TN)/(P+N) Sensitivity(Recall or True positive rate) = TP/P Specificity(True negative rate) = TN/N Precision(Positive predicted value) = TP/(TP+FP) F-Score(Harmonic mean of precision and recall) = (1+b)(PREC.REC)/(b²PREC+REC) where b is commonly 0.5, 1, 2.
Explain how a ROC curve works ?
The ROC curve is a graphical representation of the contrast between true positive rates and false positive rates at various thresholds. It is often used as a proxy for the trade-off between the sensitivity(true positive rate) and false positive rate.
What is Selection Bias?
Selection bias occurs when sample obtained is not representative of the population intended to be analysed. Selection bias is the bias introduced by the selection of individuals, groups or data for analysis in such a way that proper randomisation is not achieved, thereby ensuring that the sample obtained is not representative of the population intended to be analysed. It is sometimes referred to as the selection effect. The phrase “selection bias” most often refers to the distortion of a statistical analysis, resulting from the method of collecting samples. If the selection bias is not taken into account, then some conclusions of the study may not be accurate.
Explain the SVM ML Algorithm in detail
SVM stands for support vector machine, it is a supervised machine learning algorithm which can be used for both Regression and Classification. If you have n features in your training data set, SVM tries to plot it in n-dimensional space with the value of each feature being the value of a particular coordinate. SVM uses hyper planes to separate out different classes based on the provided kernel function.
What are support vectors in SVM?
In the above diagram we see that the thinner lines mark the distance from the classifier to the closest data points called the support vectors (darkened data points). The distance between the two thin lines is called the margin.
What are the different kernel functions in SVM?
There are four types of kernels in SVM. Linear Kernel Polynomial kernel Radial basis kernel Sigmoid kernel
Explain the Decision Tree algorithm in detail.
Decision tree is a supervised machine learning algorithm mainly used for the Regression and Classification.It breaks down a data set into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed. The final result is a tree with decision nodes and leaf nodes. Decision tree can handle both categorical and numerical data.
What is Entropy and Information gain in the Decision tree algorithm?
The core algorithm for building decision tree is called ID3. ID3 uses Enteropy and Information Gain to construct a decision tree. Entropy A decision tree is built top-down from a root node and involve partitioning of data into homogenious subsets. ID3 uses enteropy to check the homogeneity of a sample. If the sample is completely homogenious then entropy is zero and if the sample is an equally divided it has entropy of one. Information Gain The Information Gain is based on the decrease in entropy after a dataset is split on an attribute. Constructing a decision tree is all about finding attributes that returns the highest information gain.
What is pruning in Decision Tree ?
When we remove sub-nodes of a decision node, this process is called pruning or opposite process of splitting.
What is Ensemble Learning ?
Ensemble is the art of combining diverse set of learners(Individual models) together to improvise on the stability and predictive power of the model. Ensemble learning has many types but two more popular ensemble learning techniques are mentioned below. Bagging Bagging tries to implement similar learners on small sample populations and then takes a mean of all the predictions. In generalised bagging, you can use different learners on different population. As you expect this helps us to reduce the variance error. Boosting Boosting is an iterative technique which adjust the weight of an observation based on the last classification. If an observation was classified incorrectly, it tries to increase the weight of this observation and vice versa. Boosting in general decreases the bias error and builds strong predictive models. However, they may over fit on the training data.
What is a Random Forest? How does it work ?
Random forest is a versatile machine learning method capable of performing both regression and classification tasks. It is also used for dimentionality reduction, treats missing values, outlier values. It is a type of ensemble learning method, where a group of weak models combine to form a powerful model. In Random Forest, we grow multiple trees as opposed to a single tree. To classify a new object based on attributes, each tree gives a classification. The forest chooses the classification having the most votes(Over all the trees in the forest) and in case of regression, it takes the average of outputs by different trees.
What cross-validation technique would you use on a time series data set?
Instead of using k-fold cross-validation, you should be aware to the fact that a time series is not randomly distributed data — It is inherently ordered by chronological order. In case of time series data, you should use techniques like forward chaining — Where you will be model on past data then look at forward-facing data. fold 1: training[1], test[2] fold 1: training[1 2], test[3] fold 1: training[1 2 3], test[4] fold 1: training[1 2 3 4], test[5]