Sunday, November 17, 2013

Aggregator Transformation

1.       Aggregator Transformation
a)      Connected and Active Transformation
b)      Allow us to perform aggregate calculations such as Avg. and Sum.
c)       It allows us to perform calculations on groups.

Properties of Aggregator Transformation: -

1.       Aggregate Expression
2.       Group by port
3.       Sorted Input
4.       Aggregate cache

 Below images shows Ports and Properties of Aggregator transformation.
Aggregator Transformation Group By

Aggregator Transformation Sorted Input


1)       Aggregate Expressions
(a)  Entered in an output port.
(b) Can include non-aggregate expressions and conditional clauses.

(A)   Aggregate Function : - The transformation language includes the following Aggregate functions:
    AVG, COUNT, MAX, MIN, SUM
    FIRST, LAST
    MEDIAN, PERCENTILE, STDDEV, VARIANCE

There are three variations of aggregate function
(i)     Single Level Aggregate Function e.g. SUM(QUANTITY)
(ii)   Nested Aggregate Function e.g. MAX(COUNT(ITEM))
(iii) Conditional Clause: - use conditional clauses in the aggregate expression to reduce the number of rows used in the aggregation. The conditional clause can be any clause that evaluates to TRUE or FALSE  e.g. SUM( COMMISSION, COMMISSION > 100 )

       While using Nested Aggregate function exercise below rules
1.       You can include multiple single level functions or multiple nested functions in output port of an aggregator.
2.       You cannot have both types of functions together.
3.       If you need to create both singe level and nested functions create separate aggregator transformation.
       
(B)   Non-Aggregate Functions
        We can also use non-aggregate functions in the aggregate expression.
                                      IIF ( MAX( QUANTITY ) > 0, MAX( QUANTITY ), 0))

2)     Group by Ports
(a) The Aggregator transformation lets you define groups for aggregations, rather than performing the aggregation across all input data.
(b) When grouping data, The Integration Service typically returns the last row of each group (or the last row received) with the result of the aggregation, unless otherwise specified.
For e.g. The following Aggregator transformation groups first by STORE_ID and then by ITEM
Aggregator Transformation Image
If you send the following data through this Aggregator transformation
STORE_ID
ITEM
QTY
PRICE
101
'battery'
3
2.99
101
'battery'
1
3.19
101
'battery'
2
2.59
101
'AAA'
2
2.45
201
'battery'
1
1.99
201
'battery'
4
1.59
301
'battery'
1
2.45

The Integration Service performs the aggregate calculation on the following unique groups
STORE_ID
ITEM
101
'battery'
101
'AAA'
201
'battery'
301
'battery'

The Integration Service then passes the last row received, along with the results of the aggregation, as follows
STORE_ID
ITEM
QTY
PRICE
SALE_PER_STORE
101
'battery'
2
2.59
17.34
101
'AAA'
2
2.45
4.9
201
'battery'
4
1.59
8.35
301
'battery'
1
2.45
2.45


3)    Using Sorted Input
(a) Use to improve session performance.
(b) To use sorted input, we must pass data to the Aggregator transformation sorted by group by port, in ascending or descending order.
(c) If the option is checked but we are not passing sorted data to the transformation, then the session fails.

Do not use Sorted Input if following conditions are true: -
(i)      The aggregate expression uses nested aggregate functions.
(ii)     The session uses incremental aggregation.
(iii)   Source data is data driven.

To use sorted input, you pass sorted data through the Aggregator. Data must be sorted in the following ways:
(i)       By the Aggregator group by ports, in the order they appear in the Aggregator transformation.
(ii)     Using the same sort order configured for the session. If data is not in strict ascending or descending order based on session sort order, , the Integration Service fails the session . For example, if you configure a session to use a French sort order, data passing into the Aggregator transformation must be sorted using the French sort order
For relational and file sources, use the Sorter transformation to sort data in the mapping before passing it to the Aggregator transformation
If the session uses relational sources, you can also use the Number of Sorted Ports option in the Source Qualifier transformation to sort group by columns in the source database

(4)   Aggregator Index Cache & Data Cache:

(a) The Power Center Server stores data in the aggregate cache until it completes Aggregate calculations.
(b) It stores group values in an index cache and row data in the data cache. If the Power Center Server requires more space, it stores overflow values in cache files.
(A)   Aggregator Index Cache: The index cache holds group information from the group by ports. If we are     using Group By on DEPTNO, then this cache stores values 10, 20, 30 etc.

All Group by Columns are in AGGREGATOR INDEX CACHE. E.g. DEPTNO
(B)   Aggregator Data Cache: DATA CACHE is generally larger than the AGGREGATOR INDEX CACHE.
               Data Cache contains:
a)      Variable ports if any
b)      Non group by input/output ports.
c)       Non group by input ports used in non-aggregate output expression.
d)      Port containing aggregate function

Example of Aggregator: - To calculate MAX, MIN, AVG and SUM of salary of EMP table.

Aggregator Mappinng
1.       EMP will be source table.
2.       Create a target table EMP_AGG_EG in target designer. Table should contain DEPTNO, MAX_SAL, MIN_SAL, AVG_SAL and SUM_SAL
3.       Create the shortcuts in your folder.

Creating Mapping:
1. Open folder where we want to create the mapping.
2. Click Tools -> Mapping Designer.
3. Click Mapping-> Create-> Give mapping name. e.g. m_aggregator
4. Drag EMP from source in mapping.
5. Click Transformation -> Create -> Select AGGREGATOR from list. Give name and click Create. Now click done.
6. Pass SAL and DEPTNO only from SQ_EMP to AGGREGATOR Transformation.
7. Edit AGGREGATOR Transformation. Go to Ports Tab
8. Create 4 output ports: OUT_MAX_SAL, OUT_MIN_SAL, OUT_AVG_SAL, and OUT_SUM_SAL
9. Open Expression Editor one by one for all output ports and give the calculations e.g. MAX(SAL), MIN(SAL), AVG(SAL),SUM(SAL)
10. Click Apply -> Ok.
11. Drag target table now.
12. Connect the output ports from Rank to target table.
13. Click Mapping -> Validate
14. Repository -> Save

Create Session and Workflow. Run the Workflow and see the data in target table

Dept_No
SAL
MAX_SAL
MIN_SAL
AVG_SAL
Sum_SAL
10
1300
5000
1300
2917
8750
20
3000
3000
800
2175
10875
30
950
2850
950
1567
9400

No comments:

Post a Comment