Feature engineering Flashcards

Question 1

Q

Feature engineering

Answer

A

Feature engineering is the process of transforming raw data into features that better represent the underlying problem to the predictive models, resulting in improved model accuracy on unseen data. In conclusion, feature engineering is an essential step that requires a mix of domain knowledge, intuition, and a bit of trial and error. When done correctly, it can significantly improve the performance of machine learning models.

Question 2

Q

Definition

Answer

A

Feature engineering is a crucial step in the machine learning pipeline that involves creating new features or modifying existing features to improve machine learning model performance.

Question 3

Q

Importance

Answer

A

The performance of machine learning models heavily depends on the quality of the features in the dataset. Even sophisticated models cannot learn from irrelevant features. Good feature engineering can often make the difference between a poor model and an excellent one.

Question 4

Q

Domain Knowledge

Answer

A

Incorporating domain knowledge can help in creating features that make machine learning algorithms work better. By understanding the context of the problem, one can create relevant features that capture essential aspects of the problem.

Question 5

Q

Categorical Encoding

Answer

A

Many machine learning models require the input data to be in numerical format. Categorical variables (like ‘color’, ‘city’ etc.) are typically converted to numerical format using techniques like one-hot encoding, label encoding, or target encoding.

Question 6

Q

Handling Missing Values

Answer

A

Missing data is a common problem in real-world datasets. Techniques to handle missing data include imputation (filling missing values with statistical measures like mean or median) and creating an indicator feature to highlight when a value was missing.

Question 7

Q

Feature Scaling

Answer

A

Certain machine learning algorithms like linear regression, logistic regression, SVM, k-nearest neighbors (KNN), and neural networks require the input features to be on similar scales. Techniques like min-max scaling and standardization are used to scale the features.

Question 8

Q

Feature Transformation

Answer

A

Features can be transformed to better fit the assumptions of a machine learning algorithm. Common transformations include logarithmic transformation, square root transformation, square transformation, etc.

Question 9

Q

Feature Selection

Answer

A

Feature selection involves selecting the most useful features to train your machine learning model. This can reduce overfitting, improve accuracy, and reduce training time. Methods include correlation coefficients, chi-square test, mutual information, and feature importance from tree-based models.

Question 10

Q

Feature Extraction

Answer

A

This technique reduces the dimension of high-dimensional data. Techniques like Principal Component Analysis (PCA), t-SNE, and UMAP are used for feature extraction.

Question 11

Q

Time-Series Specific

Answer

A

In time-series problems, features are often engineered from date-time variables, such as hour of day, day of week, quarter of year, month, year, etc.

Feature engineering Flashcards

(11 cards)