Feature Engineering Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What is the 3 main goal of the analyze stage in machine learning?

A
  1. Understanding response variables and how they’re structured. continuous? categorical?
  2. Explore predictor variables.
  3. Featuring Engineering
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the 3 general categories of Feature Engineering?

A

feature selection, feature extraction, feature transformation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explain the concept of feature engineering and its importance in machine learning.

A

Feature engineering involves selecting, transforming, and creating relevant features from raw data to improve model performance. It plays a crucial role in enhancing model accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is feature engineering in machine learning

A

Feature engineering is the process of using practical, statistical, and data science knowledge to select, transform, or extract characteristics, properties, and attributes from raw data for the construction of machine learning models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does feature engineering improve model’s performance

A
  • by solving the data structure issues.
  • A well-structured data can help the model to detect predictive signals better.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Is feature engineering dependent on the type of data used

A

Yes, the process of feature engineering is highly dependent on the type of data you’re working with.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the difference between feature engineering and Exploratory Data Analysis (EDA)

A
  • Feature engineering goes a step beyond EDA.
  • EDA involves exploring the data
  • feature engineering involves selecting, extracting, or transforming variables or features from datasets for the construction of machine learning models.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the goal of feature selection in data engineering

A

to select the features in the data that contribute the most to predicting your response variable. This usually involves dropping features that do not help in making a prediction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does feature transformation involve in machine learning

A
  • modifying the existing features
  • to improve accuracy when training the model.
  • This could involve changing the data from numerical to categorical or creating new categories based on the data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Can you give an example of feature transformation

A

For instance, if your data includes exact temperatures, but you only need a feature that indicates if it’s hot, cold, or temperate. To make that transformation, you could define some cut off points for the data, such as defining anything above 80°F as hot, anything below 70 as cold, and anything in between as temperate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is feature extraction in machine learning

A

taking multiple features to create a new one to improve the accuracy of the algorithm.

For example, creating a new variable that becomes true if the temperature is warm and the humidity is high, and false otherwise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does feature extraction benefit machine learning models

A

by creating new features that capture important information in a format that’s more understandable for the model, potentially leading to improved accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Feature Selection in machine learning

A

Feature Selection is the process of picking variables from a dataset that will be used as predictor variables for a model.

The goal is to find the predictive and interactive features and
- exclude redundant and irrelevant features
- to improve model performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the three types of features in Feature Selection

A
  • Predictive
  • Interactive
  • Irrelevant
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How does Feature Selection fit into the PACE workflow

A

Feature Selection occurs at multiple stages of the PACE workflow.

  • Plan phase, where you define your problem and decide on a target variable to predict. It occurs again during the
  • Analyze phase, where after exploratory data analysis, it might be clear that some features might not be suitable for modeling.
  • Construct phase, where the goal is to find the smallest set of predictive features that still results in good overall model performance.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why is Feature Selection important in model simplicity and explainability

A
  • Models with fewer, but more predictive features are simpler - simpler models are generally more stable and easier to understand.
  • Hence, data professionals often base final model selection not solely on score, but also on model simplicity and explainability.
17
Q

How do data professionals perform Feature Selection during the Construct phase

A

They typically use statistical methodologies to determine which features to keep and which to drop. This could be as simple as ranking the model’s feature importance and keeping only the top percentage of them, or keeping the top features that account for a certain percentage of the model’s predictive signal.

18
Q

Predictive Feature

A

Features that by themselves contain information useful to predict the target

19
Q

Interactive Feature

A

Features that are not useful by themselves to predict the target variable, but become predictive in conjunction with other features

20
Q

Irrelevant Feature

A

Features that don’t contain any useful information to predict the target

21
Q

What is a redundant feature in machine learning

A

A redundant feature is a feature that is highly correlated with other features and therefore does not provide the model with any new information.

22
Q

Can predictive features also be redundant and when?

A

Yes, predictive features can also be redundant if they are highly correlated with other features.

23
Q

What is the goal of feature selection

A

The goal of feature selection is to find the predictive and interactive features while excluding redundant and irrelevant features.

24
Q

What is feature transformation?

A

Feature transformation is a process where you take existing features in the dataset, and alter them so that they’re better suited to be used for training the model.

25
Q

What is log normalization and when is it used

A

Log normalization is a type of feature transformation used to handle continuous variables with skewed distributions. It involves taking the log of a skewed feature, reducing the skew and making the data better for modeling.

26
Q

What is scaling in the context of feature transformation

A

Scaling is a type of feature transformation where you adjust the range of a feature’s values by applying a normalization function to them. This helps prevent features with very large values from having undue influence over a model compared to features with smaller values but which may be equally important as predictors.

27
Q

What is the difference between normalization and standardization in feature scaling

A
  • Normalization transforms data to fall within the range [0, 1]
  • Standardization, on the other hand, transforms each value within a feature so they collectively have a mean of zero and a standard deviation of one.
28
Q

What is encoding in feature transformation

A

Encoding is a form of feature transformation that involves the process of converting categorical data to numerical data. This enables machine learning models to interpret and process the categorical data.

29
Q

What is the benefit of scaling in feature transformation?

A

This helps prevent features with very large values from having undue influence over a model compared to features with smaller values but which may be equally important as predictors.

30
Q

What is feature extraction in machine learning?

A
  • involves producing new features from existing ones rather than simply changing one that already exists.
  • with the goal of having features that deliver more predictive power to your model.
31
Q

How can feature extraction improve a machine learning model?

A
  • by providing more relevant and insightful data for the model to learn from.
  • By transforming or combining existing features, we might reveal hidden relationships or patterns that help the model make more accurate predictions.
32
Q

Can you provide an example of feature extraction

A

Yes, consider a feature called “Date of Last Purchase,” which contains information about when a customer last purchased something from the company. Instead of giving the model raw dates, a new feature can be extracted called “Days Since Last Purchase.” This could tell the model how long it has been since a customer has bought something, giving insight into the likelihood that they’ll buy something again in the future.

33
Q

Can features be extracted from multiple variables

A

Yes, features can also be extracted from multiple variables. For example, there are two variables: “Days Since Last Purchase” and “Price of Last Purchase.” A new variable could be created from these by dividing the price by the number of days since the last purchase, creating a new variable altogether.

34
Q

What is the key difference between feature selection, feature transformation and feature extraction?

A

Feature Selection = dropping any and all unnecessary or unwanted features from the dataset
Feature Transformation = editing features into a form where they’re better for training the model
Feature Extraction = creating brand new features from other features that already exist in the dataset

35
Q

Why is feature transformation important to consider?

A

Many types of models are designed in a way that requires the data coming in to be numerical.
So, transforming categorical features into numerical features is an important step.

36
Q

How to use scaler?

A

you must fit it to the training data and transform both the training dataandthe test data using that same scaler.

Import the scaler function
from sklearn.preprocessing import MinMaxScaler

Instantiate the scaler
scaler = MinMaxScaler()

Fit the scaler to the training data
scaler.fit(X_train)

Scale the training data
X_train = scaler.transform(X_train)

Scale the test data
X_test = scaler.transform(X_test)