Chapter 1 Continued Flashcards
What is the purpose of data transformation?
Data transformation aims to map the values of an attribute to new replacement values.
What are the common techniques used in data transformation?
Common techniques for data transformation include data normalization, standardization, and conversion.
Why might you want to combine attributes during data transformation?
Combining attributes can create more useful ratios or relationships between them.
How does scaling data benefit data mining algorithms?
Scaling attributes to the same approximate scale improves the performance of many data mining algorithms and results in better models.
What is the purpose of data conversion in data transformation?
Data conversion can involve converting categorical data to numeric values and discretizing continuous data, making it more intuitive and improving algorithm performance.
Why is feature scaling important in data mining?
Feature scaling is essential in data mining because variables with widely varying ranges can lead to biases in the results, favoring attributes with larger ranges.
What is the primary goal of feature scaling in data mining?
The main objective of feature scaling is to ensure that all variables or features are within the same scale, preventing attributes with large ranges from dominating those with smaller ranges.
What is feature scaling?
Feature scaling is essential in data mining because variables with widely varying ranges can lead to biases in the results, favoring attributes with larger ranges.
What is the primary goal of feature scaling in data mining?
The main objective of feature scaling is to ensure that all variables or features are within the same scale, preventing attributes with large ranges from dominating those with smaller ranges.
What are the two common methods of feature scaling?
Normalization and standardization are two common methods of feature scaling.
What is normalization?
Normalization scales the values of a feature to a range between 0 and 1.
What is standardization?
Standardization scales the values to have a mean of 0 and a standard deviation of 1.
When is normalization useful?
Normalization is useful when the distribution of the feature is not Gaussian.
When is standardization useful?
Standardization is useful when the distribution of the feature is Gaussian.
Why are normalization and standardization used?
Both techniques are used to improve the performance of machine learning algorithms by ensuring that all features have equal importance.
How is Min-Max Normalization calculated for a value like $73,000 in the income range of $12,000 to $98,000?
To normalize $73,000 using Min-Max Normalization, you calculate it as (73,000 - 12,000) / (98,000 - 12,000), which results in 0.716.
What is the formula for Z-Score Standardization?
The formula for Z-Score Standardization is (x - mean) / standard deviation (sd), where x represents the data point.
What is data conversion?
Changing data from one format to another
What are some DM techniques that can handle categorical variables without transforming them?
Naïve Bayes and decision tree
Other techniques (such as neural nets and regression) require only numeric inputs.
What is data conversion encoding?
Ordinal to Numeric
How is a single categorical variable with m categories typically transformed?
m-1 dummy variables
Why is data conversion important?
To make data usable across different systems or applications
Why do we need to convert nominal fields into numeric values for techniques like neural nets and regression?
These techniques require only numeric inputs
How can ordinal data be converted to numbers?
Preserving natural order
What values do the dummy variables take?
0 or 1
What is the purpose of using dummy variables when converting a categorical predictor with k ≥ 3 possible values?
To create k - 1 dummy variables and use the unassigned category as the reference category.
What is an example of converting ordinal data to numbers?
Grade: A → 4.0, A- → 3.7, B+ → 3.3, B- → 3.0
What does a value of 1 represent in a dummy variable?
yes” category”
What is discretization?
Transforming continuous attributes into discrete ones
In the data mining field, many learning methods –like association rules can handle only discrete attributes.
What is the problem when creating dummy variables for nominal to numeric data conversion with many values?
Too many variables
How many dummy variables can be created for a nominal variable with few values?
Two dummy variables
Why is discretization necessary in data mining?
To handle learning methods that can only handle discrete attributes
What is the solution to reduce the number of variables when converting nominal to numeric data with many values?
Combine similar values and create dummy variables for the new values
Can you give an example of discretization?
Transforming the age attribute into child and adult categories
Color=Red, Yellow or Green.
What does it mean if C_red = 0 and C_yellow = 0?
The color is green
What is the purpose of C_red dummy variable?
To represent if the color is red or not
What is the purpose of C_yellow dummy variable?
To represent if the color is yellow or not
How many common methods are there for binning numerical predictors?
Four
What is the purpose of binning numerical variables?
To discretize data into categories
What are the 3 categories created using equal width binning?
Low, Medium, High