Extras Flashcards

Question 1

Q

Describe the difference and relationship between artificial intelligence, machine learning, and
deep learning.

Answer

A

Artificial Intelligence (AI): The broader field of machines mimicking human intelligence.
Machine Learning (ML): A subset of AI, where machines learn from data to perform tasks.
Deep Learning (DL): A subset of ML, using deep neural networks for complex data tasks.

Question 2

Q

What is bootstrapping

Answer

A

a resampling technique in statistics and machine learning. It involves repeatedly drawing samples with replacement from a dataset to estimate population parameters or assess model stability. It helps create multiple datasets from one, which is useful for generating confidence intervals or improving model accuracy.

Question 3

Q

When does Under sampling take place

Answer

A

occurs in imbalanced datasets to balance class distribution by reducing the number of instances in the majority class, preventing bias in machine learning models.

Question 4

Q

When does Over sampling take place

Answer

A

occurs in imbalanced datasets to balance class distribution by increasing the number of instances in the minority class, helping improve the performance of machine learning models.

Question 5

Q

what is bagging

Answer

A

Bootstrap Aggregating, is an ensemble machine learning technique that combines multiple models (often decision trees) trained on different subsets of the training data, typically obtained through bootstrapping (random sampling with replacement). It helps reduce variance and improve the overall accuracy and robustness of predictions by averaging or voting on the outputs of these models. One of the most famous bagging algorithms is Random Forest, which employs this approach with decision trees.

Question 6

Q

what is boosting

Answer

A

is an ensemble machine learning technique that combines multiple weak learners (typically shallow decision trees or other simple models) to create a strong learner. Unlike bagging, boosting focuses on correcting the errors of previous models. It assigns more weight to instances that are misclassified in the previous iteration, allowing subsequent models to pay more attention to those cases. Popular boosting algorithms include AdaBoost, Gradient Boosting, and XGBoost, which iteratively build models and adaptively adjust their weights to improve overall predictive accuracy.

Question 7

Q

what is Principal component analysis

Answer

A

PCA is a technique to reduce the number of features in data while preserving important information. It identifies new axes (principal components) to simplify complex data.

Question 8

Q

how does Principal component analysis work

Answer

A

PCA works by finding new axes (principal components) that capture the most variance in data. It transforms data into a lower-dimensional space, reducing noise and redundancy while retaining important information.

Question 9

Q

How do machines learn mapping functions

Answer

A

Machines learn mapping functions through a process called training. During training, machines use input data and known output labels to adjust internal parameters (weights and biases) in a way that minimizes the difference between predicted outputs and actual outputs. This optimization process helps the machine learn the mapping function that can accurately predict outputs for new, unseen inputs. Common techniques for training include gradient descent and backpropagation in neural networks.

Question 10

Q

what is Agglomerative Clustering

Answer

A

Agglomerative Clustering: A type of hierarchical clustering that starts with individual data points as clusters and iteratively merges them.

Question 11

Q

what is Latent variable models

Answer

A

Latent variable models involve observed and hidden (latent) variables. They uncover underlying structures in data. Observed variables are directly measured, while latent variables are inferred from data to explain relationships. Examples include Factor Analysis, PCA, Latent Class Analysis, and Structural Equation Modeling, used to simplify complex data and find hidden factors.

Question 12

Q

what is classification

Answer

A

Classification is a machine learning task where data is categorized into predefined classes or labels. It involves training a model on labeled examples to learn patterns and then using it to classify new, unlabeled data. Common applications include spam email detection, image classification, and sentiment analysis.

Question 13

Q

Why use K- means clustering

Answer

A

K-means clustering is used in data analysis to partition data points into K distinct, non-overlapping clusters based on their similarity. It’s useful for various purposes:

Pattern Discovery: K-means helps discover hidden patterns or structures within data, making it easier to understand complex datasets.

Data Compression: By grouping similar data points, K-means can reduce the dimensionality of data, making it more manageable.

Anomaly Detection: It can help identify outliers or anomalies as they often don’t fit well into any cluster.

Recommendation Systems: K-means is used to segment users or items into clusters, aiding in recommendation algorithms.

Image Compression: In image processing, K-means can be applied to compress images by reducing the number of colors.

Market Segmentation: Businesses use K-means to segment customers into groups with similar behavior, aiding in targeted marketing.

Question 14

Q

what is Concept learning - Target concept

Answer

A

the specific idea or pattern we aim to discover or predict from data. It represents what we want to learn or infer from the available information. For example, in spam email detection, the target concept is distinguishing between spam and non-spam emails based on various characteristics.

Question 15

Q

What is a feature in terms of concept feature spaces

Answer

A

In the context of concept feature spaces, a “feature” refers to a specific attribute or characteristic that is used to describe or represent an object or data point. Features are the individual properties or variables used to define the position or location of data points within the feature space. These features help differentiate and categorize objects, making them essential for various machine learning and data analysis tasks. For example, in image recognition, features could include pixel values, color histograms, or edge detection results, which collectively describe the visual characteristics of an image.

Question 16

Q

what are concept feature spaces

Answer

Study These Flashcards

A

Concept feature spaces, also known as feature spaces, refer to the multidimensional spaces in which objects or data points are represented based on their features or attributes. These spaces are used in various fields, including machine learning, data analysis, and pattern recognition, to describe and analyze data in a structured way.

In a concept feature space:

Each dimension represents a specific feature or attribute of the data.
Data points are represented as vectors, with each element of the vector corresponding to the value of a particular feature.
The arrangement of data points in this multidimensional space provides insights into patterns, relationships, and similarities among the objects.

Concept feature spaces are fundamental in machine learning algorithms, such as clustering and classification, where the goal is to find patterns or make decisions based on the characteristics of data points within these spaces. The choice of features and their representation in the feature space plays a crucial role in the success of these algorithms.

Question 17

Q

How do you avoid overfitting

Answer

Study These Flashcards

A

To avoid overfitting:

Use more training data.
Simplify your model.
Apply regularization techniques.
Cross-validate your model.
Monitor performance on validation data.
Consider ensemble methods.

Question 18

Q

what is Bias

Answer

Study These Flashcards

A

Bias, in the context of machine learning and statistics, refers to the error or inaccuracy introduced into a model’s predictions due to overly simplistic assumptions. It represents the model’s tendency to consistently predict values that are different from the true values, often due to underfitting. High bias can result in a model that is too simple to capture the underlying patterns in the data, leading to poor performance.

Question 19

Q

what is Variance

Answer

Study These Flashcards

A

Variance, in the context of machine learning and statistics, refers to the variability or spread of model predictions around the mean or expected value. It quantifies how much the predictions for a given dataset differ from each other. High variance indicates that the model is sensitive to small fluctuations in the training data and may be overfitting, meaning it’s capturing noise instead of the underlying patterns. Balancing bias and variance is essential for building models that generalize well to unseen data.

Question 20

Q

what is multilayer perceptron(MLP) class

Answer

Study These Flashcards

A

A Multilayer Perceptron (MLP) is a type of neural network used for complex tasks. It has layers of interconnected neurons and is great for tasks like classification and regression. MLPs adjust weights during training to learn patterns in data. They are a key part of deep learning used in various applications.

Question 21

Q

How does the algorithm from MLP work?

Answer

Study These Flashcards

A

A Multilayer Perceptron (MLP) works by passing data through multiple layers of interconnected neurons. Each neuron performs a weighted sum of its inputs, applies an activation function, and passes the result to the next layer. During training, the algorithm adjusts the weights to minimize prediction errors. This process is repeated iteratively until the model learns to make accurate predictions. MLPs use backpropagation to update weights and learn complex patterns in data.

Question 22

Q

Explain how multilayer perceptron(MLP) utilize activation functions

Answer

Study These Flashcards

A

Multilayer Perceptrons (MLPs) utilize activation functions to introduce non-linearity into the model. These functions, such as ReLU (Rectified Linear Unit) or sigmoid, determine whether a neuron should “fire” or activate based on the weighted sum of inputs. Activation functions enable MLPs to learn complex patterns and relationships in data, making them capable of approximating a wide range of functions and solving various tasks, including classification and regression.

Question 23

Q

what is back propogation

Answer

Study These Flashcards

A

Backpropagation is how neural networks learn. It calculates errors, adjusts weights, and improves predictions through forward and backward passes.

Extras Flashcards

(23 cards)