Extras Flashcards
Describe the difference and relationship between artificial intelligence, machine learning, and
deep learning.
Artificial Intelligence (AI): The broader field of machines mimicking human intelligence.
Machine Learning (ML): A subset of AI, where machines learn from data to perform tasks.
Deep Learning (DL): A subset of ML, using deep neural networks for complex data tasks.
What is bootstrapping
a resampling technique in statistics and machine learning. It involves repeatedly drawing samples with replacement from a dataset to estimate population parameters or assess model stability. It helps create multiple datasets from one, which is useful for generating confidence intervals or improving model accuracy.
When does Under sampling take place
occurs in imbalanced datasets to balance class distribution by reducing the number of instances in the majority class, preventing bias in machine learning models.
When does Over sampling take place
occurs in imbalanced datasets to balance class distribution by increasing the number of instances in the minority class, helping improve the performance of machine learning models.
what is bagging
Bootstrap Aggregating, is an ensemble machine learning technique that combines multiple models (often decision trees) trained on different subsets of the training data, typically obtained through bootstrapping (random sampling with replacement). It helps reduce variance and improve the overall accuracy and robustness of predictions by averaging or voting on the outputs of these models. One of the most famous bagging algorithms is Random Forest, which employs this approach with decision trees.
what is boosting
is an ensemble machine learning technique that combines multiple weak learners (typically shallow decision trees or other simple models) to create a strong learner. Unlike bagging, boosting focuses on correcting the errors of previous models. It assigns more weight to instances that are misclassified in the previous iteration, allowing subsequent models to pay more attention to those cases. Popular boosting algorithms include AdaBoost, Gradient Boosting, and XGBoost, which iteratively build models and adaptively adjust their weights to improve overall predictive accuracy.
what is Principal component analysis
PCA is a technique to reduce the number of features in data while preserving important information. It identifies new axes (principal components) to simplify complex data.
how does Principal component analysis work
PCA works by finding new axes (principal components) that capture the most variance in data. It transforms data into a lower-dimensional space, reducing noise and redundancy while retaining important information.
How do machines learn mapping functions
Machines learn mapping functions through a process called training. During training, machines use input data and known output labels to adjust internal parameters (weights and biases) in a way that minimizes the difference between predicted outputs and actual outputs. This optimization process helps the machine learn the mapping function that can accurately predict outputs for new, unseen inputs. Common techniques for training include gradient descent and backpropagation in neural networks.
what is Agglomerative Clustering
Agglomerative Clustering: A type of hierarchical clustering that starts with individual data points as clusters and iteratively merges them.
what is Latent variable models
Latent variable models involve observed and hidden (latent) variables. They uncover underlying structures in data. Observed variables are directly measured, while latent variables are inferred from data to explain relationships. Examples include Factor Analysis, PCA, Latent Class Analysis, and Structural Equation Modeling, used to simplify complex data and find hidden factors.
what is classification
Classification is a machine learning task where data is categorized into predefined classes or labels. It involves training a model on labeled examples to learn patterns and then using it to classify new, unlabeled data. Common applications include spam email detection, image classification, and sentiment analysis.
Why use K- means clustering
K-means clustering is used in data analysis to partition data points into K distinct, non-overlapping clusters based on their similarity. It’s useful for various purposes:
Pattern Discovery: K-means helps discover hidden patterns or structures within data, making it easier to understand complex datasets.
Data Compression: By grouping similar data points, K-means can reduce the dimensionality of data, making it more manageable.
Anomaly Detection: It can help identify outliers or anomalies as they often don’t fit well into any cluster.
Recommendation Systems: K-means is used to segment users or items into clusters, aiding in recommendation algorithms.
Image Compression: In image processing, K-means can be applied to compress images by reducing the number of colors.
Market Segmentation: Businesses use K-means to segment customers into groups with similar behavior, aiding in targeted marketing.
what is Concept learning - Target concept
the specific idea or pattern we aim to discover or predict from data. It represents what we want to learn or infer from the available information. For example, in spam email detection, the target concept is distinguishing between spam and non-spam emails based on various characteristics.
What is a feature in terms of concept feature spaces
In the context of concept feature spaces, a “feature” refers to a specific attribute or characteristic that is used to describe or represent an object or data point. Features are the individual properties or variables used to define the position or location of data points within the feature space. These features help differentiate and categorize objects, making them essential for various machine learning and data analysis tasks. For example, in image recognition, features could include pixel values, color histograms, or edge detection results, which collectively describe the visual characteristics of an image.