IICS CONCEPTS Flashcards
What is the difference between Connected and Unconnected Lookup ?
What are some common uses for them ?
Connected Lookup– Is a part of the mapping data flow. With connected lookups you can have multiple return values. That is you can pass multiple values from the same row in the lookup table out of the lookup transformation.
Common uses for them are
* Finding a name based on a number
* Finding a value based on a range of dates
* Finding a value based on multiple conditions
Unconnected Lookup: Transformation that exists separate from the data flow in the mapping.
Common uses for them are:
- Testing the results of a lookup in a expression
- Filtering records based on the lookup results
- Marking records for update based on the result of a lookup ( for example updating slowly changing dimensions table)
- Calling the same lookup multiple times in one mapping.
What is a Normalizer ?
It is an active transformation that transforms one incoming row to multiple output rows
When the Normalizer transformation receives a row that contains multiple-occurring data, it returns a row for each instance of the multiple-occurring data.
For example, a relational source includes four fields with quarterly sales data. You can configure a Normalizer transformation to generate a separate output row for each quarter.
What are the normalizer properties ?
Desribe them as well ?
- Normalized Fields tab: It define the multiple-occurring fields and specify additional fields that you want to use in the mapping.
- Field Mapping tab: It connect the incoming fields to the normalized fields.
What is GeneratedColumnID (GCID) and GeneratedKey (GK) ?
GeneratedColumnID (GCID): Generates an incremental value starting with 1 for each occurrence of a multiple-occurring data within the row. Resets back to 1 for next row.
GeneratedKey (GK): Generate a key for each incoming row. The key starts with one and is incremented by one for each processed row. It is an optional field.
What is SCD ?
In data management and data warehousing, a slowly changing dimension (SCD) is a dimension that consists of relatively static data that can change slowly but unexpectedly, rather than on a regular schedule. Some examples of specific slowly changing dimensions are entities in the form of names of geographic locations, customers or products.
What are the 6 types of SCD and how are they used ?
Type 0 – Fixed Dimension
-No changes allowed, dimension never changes
Type 1 – No History
-Update record directly, there is no record of historical values, only current state
Type 2 – Row Versioning
-Track changes as version records with current flag & active dates and other metadata
** Type 3 – Previous Value column**
-Track change to a specific attribute, add a column to show the previous value, which is updated as further changes occur
Type 4 – History Table
-Show current value in dimension table but track all changes in separate table
Type 6 – Hybrid SCD
-Utilise techniques from SCD Types 1, 2 and 3 to track change.
SCD 0,1,2 are widely used. There is no SCD 5
WHAT IS
WHAT IS SCD2 ?
What is a Dimension Table ?
SCD type 2 will store the entire history in the dimension table.
In SCD type 2 effective date, the dimension table will have Start_Date (Begin_Date) and End_Date as the fields.
If the End_Date is Null, then it indicates the current row.
SCD type 2 retains the full history of values.
When the value of a chosen attribute changes, the current record is closed.
A new record is created with the changed data values and this new record becomes the current record.
Dimension table having the detailed information about the product.
Dimension tables are typically small, ranging from a few to several thousand rows.
It provides the context /descriptive information for a fact table measurements and it’s structure is smaller than Fact table.
Surrogate Key is used to prevent the primary key (pk) violation (store historical data).
What is a Fact table ?
Fact table having the aggregate information about the product.
Fact tables can grow very large, with millions or even billions of rows.
Fact table is a table that contains summarized numerical and historical data (facts) and a multipart index composed of foreign keys from the primary keys of related dimension tables.
It size is larger than Dimension table.
What are the transformations that is active and which transformation is passive ?
Active transformations: Filter, Aggregator, joiner, rank, router and source qualifier
Passive transformations: Expression, Sorter, Sequence Generator, Lookup
How can we make joiner as a passive transformation ?
By selecting the Full join
How can we make sorter transformation as active transformation ?
By using Distinct option
How many methods we have to achieve SCD2 and which method is mostly used ?
There are 3 methods
Method 1 ——- FLAG
Method 2 ——– Version
Method 3———- Effective date
what are the cloud services ?
Iaas?
Paas?
Saas?
- (infrastructure as a Service) third type of computing across the cloud, Iaas provides a virtualized platform.
- (Platform as a Service) Provides computers platforms for users of cloud computing
- (Software as a service) applications are hosted on cloud servers and users access them over the internet
What is data intergration used for in the cloud ? What are the different tasks that are used for batch jobs ? Define them as well
What are the rare tasks that get used as well ?
Developing Batch jobs
* Mapping task - This is similar to the PowerCenter mapping designer. This is used to define the data flow logic to process the data.
* Replication task - this can be used to replicate more than one object from a source to a target. You can either use full load, incremental load options for replicating the data.
* **Synchronization task **- This is used to synchronize the data between the source and target. It provides DML operations like insert, update, upsert and delete. Using this task, we can sync only one object data at a time.
* Taskflows - You can combine other tasks and run them together in a flow. You can run these tasks in sequence, parallel or based on a decision.
The rare tasks are
Powercenter task
Integration
Cleansing
Warehousing
What does the filter transformation do ? What happens when the filter condition is true or false ?
What exactly is a filter ? And how does it work best in mapping effiency
- The Filter transformation filters data out of the data flow based on a specified filter condition.
- To improve job performance, place the Filter transformation close to mapping sources to remove unnecessary data from the data flow.
Filter condition is an expression that returns TRUE or FALSE.
- When the filter condition returns TRUE for a row, the Filter transformation passes the row to the rest of the data flow.
- When the filter condition returns FALSE, the Filter transformation drops the row.
You can use all the operators in filter especially this one too != which
What is an expression transformation ? What else do you use the transformation for ?
The Expression transformation calculates values within a single row. Use the Expression transformation to perform non-aggregate calculations.
You might use an Expression transformation to
- concatenate first and last names,
- change FIRST_NAME to upper case or lower case,
- change LAST_NAME to lower case or upper case,
- add number of months in date column etc.
you are manipulating rows