Feature Engineering Flashcards by Unknown Unknown

What is the 3 main goal of the analyze stage in machine learning?

Understanding response variables and how they’re structured. continuous? categorical?
Explore predictor variables.
Featuring Engineering

How well did you know this?

Not at all

Perfectly

What is the 3 general categories of Feature Engineering?

feature selection, feature extraction, feature transformation.

How well did you know this?

Not at all

Perfectly

Explain the concept of feature engineering and its importance in machine learning.

Feature engineering involves selecting, transforming, and creating relevant features from raw data to improve model performance. It plays a crucial role in enhancing model accuracy.

How well did you know this?

Not at all

Perfectly

What is feature engineering in machine learning

Feature engineering is the process of using practical, statistical, and data science knowledge to select, transform, or extract characteristics, properties, and attributes from raw data for the construction of machine learning models.

How well did you know this?

Not at all

Perfectly

How does feature engineering improve model’s performance

by solving the data structure issues.
A well-structured data can help the model to detect predictive signals better.

How well did you know this?

Not at all

Perfectly

Is feature engineering dependent on the type of data used

Yes, the process of feature engineering is highly dependent on the type of data you’re working with.

How well did you know this?

Not at all

Perfectly

What is the difference between feature engineering and Exploratory Data Analysis (EDA)

Feature engineering goes a step beyond EDA.
EDA involves exploring the data
feature engineering involves selecting, extracting, or transforming variables or features from datasets for the construction of machine learning models.

How well did you know this?

Not at all

Perfectly

What is the goal of feature selection in data engineering

to select the features in the data that contribute the most to predicting your response variable. This usually involves dropping features that do not help in making a prediction.

How well did you know this?

Not at all

Perfectly

What does feature transformation involve in machine learning

modifying the existing features
to improve accuracy when training the model.
This could involve changing the data from numerical to categorical or creating new categories based on the data.

How well did you know this?

Not at all

Perfectly

Can you give an example of feature transformation

For instance, if your data includes exact temperatures, but you only need a feature that indicates if it’s hot, cold, or temperate. To make that transformation, you could define some cut off points for the data, such as defining anything above 80°F as hot, anything below 70 as cold, and anything in between as temperate.

How well did you know this?

Not at all

Perfectly

What is feature extraction in machine learning

taking multiple features to create a new one to improve the accuracy of the algorithm.

For example, creating a new variable that becomes true if the temperature is warm and the humidity is high, and false otherwise.

How well did you know this?

Not at all

Perfectly

How does feature extraction benefit machine learning models

by creating new features that capture important information in a format that’s more understandable for the model, potentially leading to improved accuracy.

How well did you know this?

Not at all

Perfectly

What is Feature Selection in machine learning

Feature Selection is the process of picking variables from a dataset that will be used as predictor variables for a model.

The goal is to find the predictive and interactive features and
- exclude redundant and irrelevant features
- to improve model performance.

How well did you know this?

Not at all

Perfectly

What are the three types of features in Feature Selection

Predictive
Interactive
Irrelevant

How well did you know this?

Not at all

Perfectly

How does Feature Selection fit into the PACE workflow

Feature Selection occurs at multiple stages of the PACE workflow.

Plan phase, where you define your problem and decide on a target variable to predict. It occurs again during the
Analyze phase, where after exploratory data analysis, it might be clear that some features might not be suitable for modeling.
Construct phase, where the goal is to find the smallest set of predictive features that still results in good overall model performance.

How well did you know this?

Not at all

Perfectly

Why is Feature Selection important in model simplicity and explainability

Study These Flashcards

Models with fewer, but more predictive features are simpler - simpler models are generally more stable and easier to understand.
Hence, data professionals often base final model selection not solely on score, but also on model simplicity and explainability.

How do data professionals perform Feature Selection during the Construct phase

Study These Flashcards

They typically use statistical methodologies to determine which features to keep and which to drop. This could be as simple as ranking the model’s feature importance and keeping only the top percentage of them, or keeping the top features that account for a certain percentage of the model’s predictive signal.

Predictive Feature

Study These Flashcards

Features that by themselves contain information useful to predict the target

Interactive Feature

Study These Flashcards

Features that are not useful by themselves to predict the target variable, but become predictive in conjunction with other features

Irrelevant Feature

Study These Flashcards

Features that don’t contain any useful information to predict the target

What is a redundant feature in machine learning

Study These Flashcards

A redundant feature is a feature that is highly correlated with other features and therefore does not provide the model with any new information.

Can predictive features also be redundant and when?

Study These Flashcards

Yes, predictive features can also be redundant if they are highly correlated with other features.

What is the goal of feature selection

Study These Flashcards

The goal of feature selection is to find the predictive and interactive features while excluding redundant and irrelevant features.

What is feature transformation?

Study These Flashcards

Feature transformation is a process where you take existing features in the dataset, and alter them so that they’re better suited to be used for training the model.

What is log normalization and when is it used

Log normalization is a type of feature transformation used to handle continuous variables with skewed distributions. It involves taking the log of a skewed feature, reducing the skew and making the data better for modeling.

What is scaling in the context of feature transformation

Scaling is a type of feature transformation where you **adjust the range** of a feature’s values by applying a normalization function to them. This helps prevent features with very **large values** from having **undue influence** over a model compared to features with smaller values but which may be equally important as predictors.

What is the difference between normalization and standardization in feature scaling

- Normalization transforms data to fall within the **range [0, 1]** - Standardization, on the other hand, **transforms each value** within a feature so they collectively have a **mean of zero** and a **standard deviation of one.**

What is encoding in feature transformation

Encoding is a form of feature transformation that involves the process of **converting categorical data to numerical** data. This enables machine learning models to **interpret and process** the categorical data.

What is the benefit of scaling in feature transformation?

This helps prevent features with very large values from having undue influence over a model compared to features with smaller values but which may be equally important as predictors.

What is feature extraction in machine learning?

- involves producing **new features** from **existing** ones rather than simply changing one that already exists. - with the goal of having features that deliver **more predictive power** to your model.

How can feature extraction improve a machine learning model?

- by providing **more relevant and insightful data** for the model to learn from. - By transforming or combining existing features, we might **reveal hidden relationships** or patterns that help the model make **more accurate predictions.**

Can you provide an example of feature extraction

Yes, consider a feature called “Date of Last Purchase,” which contains information about when a customer last purchased something from the company. Instead of giving the model raw dates, a new feature can be extracted called “Days Since Last Purchase.” This could tell the model how long it has been since a customer has bought something, giving insight into the likelihood that they’ll buy something again in the future.

Can features be extracted from multiple variables

Yes, features can also be extracted from multiple variables. For example, there are two variables: “Days Since Last Purchase” and “Price of Last Purchase.” A new variable could be created from these by dividing the price by the number of days since the last purchase, creating a new variable altogether.

What is the key difference between feature selection, feature transformation and feature extraction?

Feature Selection = dropping any and all unnecessary or unwanted features from the dataset Feature Transformation = editing features into a form where they’re better for training the model Feature Extraction = creating brand new features from other features that already exist in the dataset

Why is feature transformation important to consider?

Many types of models are designed in a way that **requires** the data coming in to be **numerical**. So, transforming categorical features into numerical features is an important step.

How to use scaler?

you must fit it to the training data and transform both the training data and the test data using that same scaler. Import the scaler function from sklearn.preprocessing import MinMaxScaler Instantiate the scaler scaler = MinMaxScaler() Fit the scaler to the training data scaler.fit(X_train) Scale the training data X_train = scaler.transform(X_train) Scale the test data X_test = scaler.transform(X_test)

Feature Engineering Flashcards

(36 cards)