lec 8(done) Flashcards
data transformation
the data are transformed or consolidated into forms appropriate for mining, so that:
1-The resulting mining process may be more efficient.
2-The patterns found may be easier to understand.
Data Transformation Strategies:
1-Attribute construction
2-Aggregation
3-Normalization
4-Discretization
Attribute construction (or feature construction)
1-New attributes are constructed and added from the given set of attributes to help the mining process.
2-Can help improve accuracy and understanding of structure in high dimensional data.
For example, we may wish to add the attribute areabased on the attributes heightand width.
Aggregation
Summary or aggregation operations are applied to the data.
Typically used in constructing a data cube for data analysis at multiple abstraction levels.
For example, the daily sales data may be aggregated as to compute monthly and annual total amounts.
Normalization :
1-The attribute data are scaled so as to fall within a smaller range
such as [-1,1] or [0.0,1.0]
2-Helps avoid dependence on the choice of measurement units.
3-Normalizing the data attempts to give all attributes an equal weight.
Data Normalization Methods:
1-min-max normalization
2-z-score normalization
3-normalization by decimal scaling
Min –Max Normalization
slide 7
Z-Score Normalization
slide 8
Normalization by Decimal Scaling
slide 9
Data discretization
transforms numeric data by mapping values to interval or concept labels
slide 10
Discretization techniques
1-Binning
2-Histogram analysis
3-Cluster analysis
For Nominal data=> Concept hierarchy generation can be used to transform the data into multiple levels of granularity
Example: street attribute can be generalized to higher-level concepts, like city or country.
slide 11
Concept Hierarchy Generation for Nominal Data
slide 12-13