Types of ml Flashcards
What is supervised learning?
Supervised learning is a type of machine learning where models are trained on labeled data, meaning each input has a corresponding correct output.
What are the two main types of supervised learning?
The two main types are regression (predicting continuous values) and classification (predicting discrete categories).
What is regression in supervised learning?
Regression is a type of supervised learning where the model predicts a continuous numerical value based on input data.
What is classification in supervised learning?
Classification is a type of supervised learning where the model assigns input data to predefined categories or labels.
Give an example of a regression problem.
Predicting house prices based on features like area, number of rooms, and location.
Give an example of a classification problem.
Spam detection, where emails are classified as ‘spam’ or ‘not spam’.
What is the key difference between regression and classification?
Regression deals with continuous outputs, while classification deals with discrete category outputs.
What are some common regression algorithms?
Linear Regression, Polynomial Regression, Ridge Regression, Lasso Regression.
What are some common classification algorithms?
Logistic Regression, Decision Trees, Random Forest, Support Vector Machines (SVM), Naive Bayes.
What is the loss function used in linear regression?
Mean Squared Error (MSE) is commonly used in linear regression to measure prediction errors.
Which loss function is used in classification problems?
Cross-Entropy Loss (Log Loss) is commonly used for classification problems.
What is overfitting in supervised learning?
Overfitting occurs when a model learns patterns too specific to the training data, reducing its performance on unseen data.
How can overfitting be prevented?
Using techniques like regularization (Lasso, Ridge), cross-validation, pruning (for decision trees), and increasing training data.
What is underfitting in supervised learning?
Underfitting occurs when a model is too simple to capture patterns in the data, leading to poor performance on both training and test data.
What is logistic regression, and why is it used for classification?
Logistic Regression is a statistical model used for binary classification. It predicts probabilities and applies a threshold to classify data into categories.
What is the role of a decision boundary in classification?
A decision boundary is a surface that separates different classes in classification problems.
What is a confusion matrix?
A confusion matrix is a table used to evaluate the performance of a classification model by comparing actual and predicted values.
What are precision, recall, and F1-score in classification?
Precision measures correctness of positive predictions, recall measures completeness of positive predictions, and F1-score is the harmonic mean of both.
What is multi-class classification?
Multi-class classification refers to problems where there are more than two possible output categories.
What is multi-label classification?
Multi-label classification is where each instance can belong to multiple classes simultaneously, unlike single-label classification.
What is class imbalance, and how do you handle it?
Class imbalance occurs when one class significantly outnumbers others. It can be handled using oversampling, undersampling, or synthetic techniques like SMOTE.
What is Ridge Regression, and how does it help?
Ridge Regression is a regularization technique that reduces overfitting by adding an L2 penalty to the loss function.
What is Lasso Regression?
Lasso Regression is a type of regression that adds an L1 penalty to the loss function to perform feature selection by shrinking some coefficients to zero.
What are some real-world applications of supervised learning?
Spam detection, fraud detection, medical diagnosis, stock price prediction, speech recognition, and image classification.
What evaluation metrics are used for regression models?
Common metrics include Mean Squared Error (MSE), Mean Absolute Error (MAE), and R-squared (R²).