Machine Learning Flashcards
What is overfitting in machine learning, and how can it be prevented?
Overfitting occurs when a model learns the training data too well, including noise and random fluctuations, resulting in poor generalization to unseen data. It can be prevented through techniques such as cross-validation, regularization (e.g., L1 or L2 regularization), early stopping, and using simpler models.
Explain the difference between supervised and unsupervised learning.
In supervised learning, the model learns from labeled data, where each example is associated with a target label. The goal is to learn a mapping from input features to target labels. In unsupervised learning, the model learns from unlabeled data, aiming to discover hidden patterns or structures within the data without explicit guidance.
What evaluation metrics would you use for a binary classification problem?
Common evaluation metrics for binary classification include accuracy, precision, recall (sensitivity), F1-score, specificity, area under the ROC curve (AUC-ROC), and area under the precision-recall curve (AUC-PR).
What are the advantages and disadvantages of decision trees?
Decision trees are interpretable and easy to understand, making them suitable for explaining decision-making processes. They can handle both numerical and categorical data and require minimal data preprocessing. However, decision trees are prone to overfitting, especially with complex datasets, and may not generalize well to unseen data.
Explain the bias-variance tradeoff.
The bias-variance tradeoff refers to the tradeoff between the error due to bias and the error due to variance in machine learning models. High bias models are overly simplistic and may underfit the data, while high variance models are overly complex and may overfit the data. Finding the right balance between bias and variance is crucial for achieving good generalization performance.
What is cross-validation, and why is it important?
Cross-validation is a technique used to assess the performance of machine learning models by splitting the dataset into multiple subsets (folds), training the model on some folds, and evaluating it on the remaining fold. It helps to estimate how well the model will generalize to unseen data and reduces the risk of overfitting.
Describe the difference between batch gradient descent, stochastic gradient descent, and mini-batch gradient descent.
Batch gradient descent computes the gradient of the cost function with respect to the parameters using the entire training dataset.
Stochastic gradient descent (SGD) computes the gradient using only one randomly chosen training example at a time, making it faster but more noisy than batch gradient descent.
Mini-batch gradient descent computes the gradient using a subset (mini-batch) of the training dataset, striking a balance between the efficiency of SGD and the stability of batch gradient descent.
Explain the difference between generative and discriminative models.
Generative models learn the joint probability distribution of the input features and the target labels, allowing them to generate new samples similar to the training data. Discriminative models, on the other hand, directly learn the decision boundary between different classes without modeling the underlying probability distribution.
What are the key components of a support vector machine (SVM)?
The key components of an SVM include the kernel function, which computes the similarity between data points in a high-dimensional feature space, the margin, which represents the distance between the decision boundary and the nearest data points (support vectors), and the regularization parameter, which controls the tradeoff between maximizing the margin and minimizing classification errors.
What is the curse of dimensionality, and how does it affect machine learning algorithms?
The curse of dimensionality refers to the phenomena where the volume of the feature space increases exponentially with the number of dimensions. This can lead to sparsity of data, making it difficult for machine learning algorithms to effectively learn from the data, especially with limited training examples. Dimensionality reduction techniques such as PCA or feature selection can help mitigate this issue.
Explain the concept of feature engineering.
Feature engineering involves creating new features or transforming existing features to improve the performance of machine learning models. This may include techniques such as scaling, normalization, encoding categorical variables, creating interaction terms, and extracting relevant information from raw data.
What is ensemble learning, and why is it useful?
Ensemble learning involves combining multiple base learners to improve the overall performance of the model. This can be achieved through techniques such as bagging, boosting, and stacking. Ensemble methods are useful because they reduce overfitting, increase model robustness, and often result in better generalization performance compared to individual base learners.
Describe the difference between K-means clustering and hierarchical clustering.
K-means clustering is a partitioning algorithm that divides the data into K clusters by iteratively assigning data points to the nearest cluster centroid and updating the centroids based on the mean of the data points in each cluster. Hierarchical clustering, on the other hand, creates a hierarchy of clusters by either iteratively merging or splitting clusters based on the similarity between data points until a desired number of clusters is achieved.
What is regularization, and why is it important in machine learning?
Regularization is a technique used to prevent overfitting by adding a penalty term to the loss function, which discourages overly complex models. Common regularization techniques include L1 regularization (Lasso), which encourages sparsity in the model parameters, and L2 regularization (Ridge), which penalizes the squared magnitude of the parameters.
What is the difference between a hyperparameter and a parameter in machine learning models?
Hyperparameters are configuration settings that are external to the model and are typically set before the learning process begins (e.g., learning rate, regularization parameter). Parameters, on the other hand, are internal to the model and are learned from the training data (e.g., weights and biases in neural networks).
Explain the concept of cross-entropy loss and its role in classification tasks.
Cross-entropy loss, also known as log loss, measures the difference between the predicted probability distribution and the true probability distribution of the target labels. It is commonly used as the loss function for binary and multiclass classification tasks, where the goal is to minimize the cross-entropy between the predicted and true labels.