0% found this document useful (0 votes)

61 views134 pages

PSK DWH Material

Uploaded by

sai Charan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views134 pages

PSK DWH Material

Uploaded by

sai Charan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 134

PSK

DWH-Informatica Material
Version 1.0
REVISION HISTORY

The following table reflects all changes to this document.

Date Author / Contributor Version Reason for Change

01-Nov-2004 1.0 Initial Document

03-May-2011 1.1 Updated Document

Table of Contents

1 Introduction 3
1.1 Purpose 3
2 ORACLE 3
2.1 DEFINATIONS 3
NORMALIZATION: 3
First Normal Form: 4
Second Normal Form: 4
Third Normal Form: 4
Boyce-Codd Normal Form: 5
Fourth Normal Form: 5
ORACLE SET OF STATEMENTS: 5
Data Definition Language :(DDL) 5
Data Manipulation Language (DML) 5
Data Querying Language (DQL) 5
Page 1 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Data Control Language (DCL) 5
Transactional Control Language (TCL) 6
Syntaxes: 6
ORACLE JOINS: 9
Equi Join/Inner Join: 9
Non-Equi Join 9
Self Join 10
Natural Join 10
Cross Join 10
Outer Join 10
Left Outer Join 11
Right Outer Join 11
Full Outer Join 11
What’s the difference between View and Materialized View? 12
View: 12
Materialized View: 13
Inline view: 13
Indexes: 19
Why hints Require? 19
Explain Plan: 22
Store Procedure: 23
Packages: 24
Triggers: 25
Data files Overview: 27
2.2 IMPORTANT QUERIES 27
3 DWH CONCEPTS 30
What is BI? 30
4 ETL-INFORMATICA 55
4.1 Informatica Overview 55
4.2 Informatica Scenarios: 98
4.3 Development Guidelines 118
4.4 Performance Tips 121
4.5 Unit Test Cases (UTP): 124
5 UNIX 127

Page 2 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Detailed Design DocumentAutomation of Candidate
Extract and Load Process

1 Introduction

1.1 Purpose
The purpose of this document is to provide the detailed information
about Oracle, DWH Concepts, Informatica and UNIX based on Real-Time.

2 ORACLE

2.1 DEFINATIONS
Organizations can store data on various media and in different formats, such as
a hard-copy document

in a filing cabinet or data stored in electronic spreadsheets or in databases.

A database is an organized collection of information.

To manage databases, you need database management systems (DBMS). A

DBMS is a program that

stores, retrieves, and modifies data in the database on request. There are four
main types of databases:

hierarchical, network, relational, and more recently object relational(ORDBMS).

NORMALIZATION:
Some Oracle databases were modeled according to the rules of normalization
that were intended to eliminate redundancy.

Page 3 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Obviously, the rules of normalization are required to understand your
relationships and functional dependencies

First Normal Form:

A row is in first normal form (1NF) if all underlying domains contain atomic
values only.

 Eliminate duplicative columns from the same table.

 Create separate tables for each group of related data and identify each
row with a unique column or set of columns (the primary key).

Second Normal Form:

An entity is in Second Normal Form (2NF) when it meets the requirement of

being in First Normal Form (1NF) and additionally:

 Does not have a composite primary key. Meaning that the primary key
can not be subdivided into separate logical entities.
 All the non-key columns are functionally dependent on the entire primary
key.

 A row is in second normal form if, and only if, it is in first normal form and
every non-key attribute is fully dependent on the key.

 2NF eliminates functional dependencies on a partial key by putting the

fields in a separate table from those that are dependent on the whole
key. An example is resolving many: many relationships using an
intersecting entity.

Third Normal Form:

An entity is in Third Normal Form (3NF) when it meets the requirement of

being in Second Normal Form (2NF) and additionally:

 Functional dependencies on non-key fields are eliminated by putting them

in a separate table. At this level, all non-key fields are dependent on the
primary key.

Page 4 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
 A row is in third normal form if and only if it is in second normal form and
if attributes that do not contribute to a description of the primary key are
move into a separate table. An example is creating look-up tables.

Boyce-Codd Normal Form:

Boyce Codd Normal Form (BCNF) is a further refinement of 3NF. In his later
writings Codd refers to BCNF as 3NF. A row is in Boyce Codd normal form if,
and only if, every determinant is a candidate key. Most entities in 3NF are
already in BCNF.

Fourth Normal Form:

An entity is in Fourth Normal Form (4NF) when it meets the requirement of

being in Third Normal Form (3NF) and additionally:

Has no multiple sets of multi-valued dependencies. In other words, 4NF states

that no entity can have more than a single one-to-many relationship.

ORACLE SET OF STATEMENTS:

Data Definition Language :(DDL)

Create

Alter

Drop

Truncate

Data Manipulation Language (DML)

Insert

Update

Delete

Data Querying Language (DQL)

Select
Page 5 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Data Control Language (DCL)

Grant

Revoke

Transactional Control Language (TCL)

Commit

Rollback

Save point

Syntaxes:

CREATE OR REPLACE SYNONYM HZ_PARTIES FOR SCOTT.HZ_PARTIES

CREATE DATABASE LINK CAASEDW CONNECT TO ITO_ASA IDENTIFIED BY

exact123 USING ' CAASEDW’

Materialized View syntax:

CREATE TABLE EMP(NAME VARCHAR2(20),NO NUMBE PRIMARY

KEY,ADDR VARCHAR2(20));

CREATE MATERIALIZED VIEW LOG ON EMP;

CREATE MATERIALIZED VIEW MV_EMP_PK REFRESH FAST START WITH

SYSDATE NEXT SYSDATE +1/48 WITH PRIMARY KEY AS SELECT * FROM
EMP;

EXECUTE DBMS_SNAPSHOT.REFRESH(‘MV_EMP_PK’,’F’);

CREATE MATERIALIZED VIEW

Page 6 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
EBIBDRO.HWMD_MTH_ALL_METRICS_CURR_VIEW

REFRESH COMPLETE

START WITH sysdate

NEXT TRUNC(SYSDATE+1)+ 4/24

WITH PRIMARY KEY

select * from HWMD_MTH_ALL_METRICS_CURR_VW;

Another Method to refresh MV:

DBMS_MVIEW.REFRESH('MV_COMPLEX', 'C');

we can use informatica mapping to refresh Materialized views.

Case Statement:

Select NAME,
(CASE
WHEN (CLASS_CODE = 'Subscription')
THEN ATTRIBUTE_CATEGORY
ELSE TASK_TYPE
END) TASK_TYPE,
CURRENCY_CODE
From EMP

Decode()

Select empname,Decode(address,’HYD’,’Hyderabad’,
‘Bang’, Bangalore’, address) as address from emp;
Procedure:

CREATE OR REPLACE PROCEDURE Update_bal (

Page 7 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
cust_id_IN In NUMBER,

amount_IN In NUMBER DEFAULT 1) AS

BEGIN

Update account_tbl Set amount= amount_IN where cust_id= cust_id_IN

End

Trigger:

CREATE OR REPLACE TRIGGER EMP_AUR

AFTER/BEFORE UPDATE ON EMP

REFERENCING

NEW AS NEW

OLD AS OLD

FOR EACH ROW

DECLARE

BEGIN

IF (:NEW.last_upd_tmst <> :OLD.last_upd_tmst) THEN

-- Insert into Control table record

Insert into table emp_w values('wrk',sysdate)

ELSE

-- Exec procedure

Exec update_sysdate()

END;

Page 8 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
ORACLE JOINS:
Equi join
Non-equi join
Self join
Natural join
Cross join
Outer join
 Left outer
 Right outer
 Full outer

Equi Join/Inner Join:

SQL> select empno,ename,job,dname,loc from emp e,dept d where

e.deptno=d.deptno;

USING CLAUSE

SQL> select empno,ename,job ,dname,loc from emp e join dept d

using(deptno);

ON CLAUSE

SQL> select empno,ename,job,dname,loc from emp e join dept d

on(e.deptno=d.deptno);

Non-Equi Join

A join which contains an operator other than ‘=’ in the joins condition.

Page 9 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Ex: SQL> select empno,ename,job,dname,loc from emp e,dept d where
e.deptno > d.deptno;

Self Join

Joining the table itself is called self join.

Ex1: SQL> select e1.empno,e2.ename ,e1.job,e2.deptno from emp e1,emp

e2 where e1.mgr=e2.empno;

Ex2:

SELECT worker. employee_id, manager.last_name as manger name

FROM employees worker, employees manager

WHERE worker.manager_id = manager.employee_id ;

Natural Join

Natural join compares all the common columns.

Ex: SQL> select empno,ename,job,dname,loc from emp natural join dept;

Cross Join

This will gives the cross product.

Ex: SQL> select empno,ename,job,dname,loc from emp cross join dept;

Outer Join

Outer join gives the non-matching records along with matching records.

Page 10 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Left Outer Join

This will display the all matching records and the records which are in left hand
side table those that are not in right hand side table.

Ex: SQL> select empno,ename,job,dname,loc from emp e left outer join dept
d on(e.deptno=d.deptno);

SQL> select empno,ename,job,dname,loc from emp e,dept d where

e.deptno=d.deptno(+);

Right Outer Join

This will display the all matching records and the records which are in right
hand side table those that are not in left hand side table.

Ex: SQL> select empno,ename,job,dname,loc from emp e right outer join

dept d on(e.deptno=d.deptno);

SQL> select empno,ename,job,dname,loc from emp e,dept d where

e.deptno(+) = d.deptno;

Full Outer Join

This will display the all matching records and the non-matching records from
both tables.

Ex: SQL> select empno,ename,job,dname,loc from emp e full outer join dept
Page 11 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
d on(e.deptno=d.deptno);

SQL> select p.part_id, s.supplier_name

2 from part p, supplier s
3 where p.supplier_id = s.supplier_id (+)
4 union
5 select p.part_id, s.supplier_name
6 from part p, supplier s
7 where p.supplier_id (+) = s.supplier_id;

What’s the difference between View and Materialized View?

View:

Why Use Views?

• To restrict data access

• To make complex queries easy

• To provide data independence

A simple view is one that:

– Derives data from only one table

– Contains no functions or groups of data

– Can perform DML operations through the view.

A complex view is one that:

– Derives data from many tables

Page 12 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
– Contains functions or groups of data

– Does not always allow DML operations through the view

A view has a logical existence but a materialized view has

a physical existence.Moreover a materialized view can be
Indexed, analysed and so on....that is all the things that
we can do with a table can also be done with a materialized
view.

We can keep aggregated data into materialized view. we can schedule the MV
to refresh but table can’t.MV can be created based on multiple tables.

Materialized View:

In DWH materialized views are very essential because in reporting side if we do

aggregate calculations as per the business requirement report performance
would be de graded. So to improve report performance rather than doing
report calculations and joins at reporting side if we put same logic in the MV
then we can directly select the data from MV without any joins and
aggregations. We can also schedule MV (Materialize View).

Inline view:

If we write a select statement in from clause that is nothing but inline view.

Ex:
Get dept wise max sal along with empname and emp no.

Select a.empname, a.empno, b.sal, b.deptno

From EMP a, (Select max (sal) sal, deptno from EMP group by deptno) b
Where
a.sal=b.sal and
a.deptno=b.deptno

Page 13 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
What is the difference between view and materialized view?

View Materialized view

A view has a logical existence. It does A materialized view has a physical

not contain data. existence.

Its not a database object. It is a database object.

We cannot perform DML operation We can perform DML operation on

on view. materialized view.

When we do select * from view it will When we do select * from

fetch the data from base table. materialized view it will fetch the
data from materialized view.

In view we cannot schedule to In materialized view we can schedule

refresh. to refresh.

We can keep aggregated data into

materialized view. Materialized view
can be created based on multiple
tables.

What is the Difference between Delete, Truncate and Drop?

DELETE

The DELETE command is used to remove rows from a table. A WHERE clause
can be used to only remove some rows. If no WHERE condition is specified, all
rows will be removed. After performing a DELETE operation you need to
COMMIT or ROLLBACK the transaction to make the change permanent or to
undo it.

TRUNCATE

TRUNCATE removes all rows from a table. The operation cannot be rolled back.
As such, TRUCATE is faster and doesn't use as much undo space as a DELETE.
Page 14 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
DROP

The DROP command removes a table from the database. All the tables' rows,
indexes and privileges will also be removed. The operation cannot be rolled
back.

Difference between Rowid and Rownum?

ROWID

A globally unique identifier for a row in a database. It is created at the time the
row is inserted into a table, and destroyed when it is removed from a
table.'BBBBBBBB.RRRR.FFFF' where BBBBBBBB is the block number, RRRR is the
slot(row) number, and FFFF is a file number.

ROWNUM

For each row returned by a query, the ROWNUM pseudo column returns a
number indicating the order in which Oracle selects the row from a table or set
of joined rows. The first row selected has a ROWNUM of 1, the second has 2,
and so on.

You can use ROWNUM to limit the number of rows returned by a query, as in
this example:

SELECT * FROM employees WHERE ROWNUM < 10;

Rowid Row-num

Rowid is an oracle internal id that is Row-num is a row number returned

allocated every time a new record by a select statement.
is inserted in a table. This ID is
unique and cannot be changed by
the user.

Rowid is permanent. Row-num is temporary.

Rowid is a globally unique identifier The row-num pseudocoloumn

Page 15 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
for a row in a database. It is created returns a number indicating the
at the time the row is inserted into order in which oracle selects the
the table, and destroyed when it is row from a table or set of joined
removed from a table. rows.

Order of where and having:

SELECT column, group_function

FROM table

[WHERE condition]

[GROUP BY group_by_expression]

[HAVING group_condition]

[ORDER BY column];

The WHERE clause cannot be used to restrict groups. you use the

HAVING clause to restrict groups.

Differences between where clause and having clause

Where clause Having clause

Both where and having clause can be used to filter the data.

Where as in where clause it is not But having clause we need to use it

mandatory. with the group by.

Where clause applies to the Where as having clause is used to

individual rows. test some condition on the group
rather than on individual rows.

Where clause is used to restrict rows. But having clause is used to restrict
Page 16 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
groups.

Restrict normal query by where Restrict group by function by having

In where clause every record is In having clause it is with aggregate

filtered based on where. records (group by functions).

MERGE Statement

You can use merge command to perform insert and update in a single
command.

Ex: Merge into student1 s1

Using (select * from student2) s2

On (s1.no=s2.no)

When matched then

Update set marks = s2.marks

When not matched then

Insert (s1.no, s1.name, s1.marks) Values (s2.no, s2.name, s2.marks);

What is the difference between sub-query & co-related sub query?

A sub query is executed once for the parent statement

whereas the correlated sub query is executed once for each

row of the parent query.

Sub Query:

Page 17 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Example:

Select deptno, ename, sal from emp a where sal in (select sal from Grade
where sal_grade=’A’ or sal_grade=’B’)

Co-Related Sun query:

Example:

Find all employees who earn more than the average salary in their department.

SELECT last-named, salary, department_id FROM employees A

WHERE salary > (SELECT AVG (salary)

FROM employees B WHERE B.department_id =A.department_id

Group by B.department_id)

EXISTS:

The EXISTS operator tests for existence of rows in

the results set of the subquery.

Select dname from dept where exists

(select 1 from EMP
where dept.deptno= emp.deptno);

Sub-query Co-related sub-query

A sub-query is executed once for Where as co-related sub-query is

the parent Query executed once for each row of the
parent query.

Example: Example:

Select * from emp where deptno Select a.* from emp e where sal >=
in (select deptno from dept); (select avg(sal) from emp a where

Page 18 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
a.deptno=e.deptno group by
a.deptno);

Indexes:

1. Bitmap indexes are most appropriate for columns having low distinct
values—such as GENDER, MARITAL_STATUS, and RELATION. This
assumption is not completely accurate, however. In reality, a bitmap
index is always advisable for systems in which data is not frequently
updated by many concurrent systems. In fact, as I'll demonstrate here,
a bitmap index on a column with 100-percent unique values (a column
candidate for primary key) is as efficient as a B-tree index.

2. When to Create an Index

3. You should create an index if:

4. A column contains a wide range of values

5. A column contains a large number of null values

6. One or more columns are frequently used together in a WHERE clause

or a join condition

7. The table is large and most queries are expected to retrieve less than 2
to 4 percent of the rows

8. By default if u create index that is nothing but b-tree index.

Why hints Require?

It is a perfect valid question to ask why hints should be used. Oracle comes
with an optimizer that promises to optimize a query's execution plan. When
this optimizer is really doing a good job, no hints should be required at all.

Sometimes, however, the characteristics of the data in the database are

changing rapidly, so that the optimizer (or more accuratly, its statistics) are out
Page 19 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
of date. In this case, a hint could help.

You should first get the explain plan of your SQL and determine what changes
can be done to make the code operate without using hints if possible.
However, hints such as ORDERED, LEADING, INDEX, FULL, and the various AJ
and SJ hints can take a wild optimizer and give you optimal performance

Tables analyze and update Analyze Statement

The ANALYZE statement can be used to gather statistics for a specific table,
index or cluster. The statistics can be computed exactly, or estimated based on
a specific number of rows, or a percentage of rows:

ANALYZE TABLE employees COMPUTE STATISTICS;

ANALYZE TABLE employees ESTIMATE STATISTICS SAMPLE 15 PERCENT;

EXEC DBMS_STATS.gather_table_stats('SCOTT', 'EMPLOYEES');

Automatic Optimizer Statistics Collection

By default Oracle 10g automatically gathers optimizer statistics using a

scheduled job called GATHER_STATS_JOB. By default this job runs within
maintenance windows between 10 P.M. to 6 A.M. week nights and all day on
weekends. The job calls the
DBMS_STATS.GATHER_DATABASE_STATS_JOB_PROC internal procedure which
gathers statistics for tables with either empty or stale statistics, similar to the
DBMS_STATS.GATHER_DATABASE_STATS procedure using the GATHER AUTO
option. The main difference is that the internal job prioritizes the work such
that tables most urgently requiring statistics updates are processed first.

Hint categories:

Hints can be categorized as follows:

 ALL_ROWS
One of the hints that 'invokes' the Cost based optimizer
ALL_ROWS is usually used for batch processing or data warehousing
systems.

Page 20 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
(/*+ ALL_ROWS */)

 FIRST_ROWS
One of the hints that 'invokes' the Cost based optimizer
FIRST_ROWS is usually used for OLTP systems.

(/+ FIRST_ROWS /)

 CHOOSE
One of the hints that 'invokes' the Cost based optimizer
This hint lets the server choose (between ALL_ROWS and FIRST_ROWS,
based on statistics gathered.

 Hints for Join Orders,

 Hints for Join Operations,

 Hints for Parallel Execution, (/*+ parallel(a,4) */) specify degree either 2
or 4 or 16

 Additional Hints

 HASH
Hashes one table (full scan) and creates a hash index for that table. Then
hashes other table and uses hash index to find corresponding records.
Therefore not suitable for < or > join conditions.

/*+ use_hash */

Use Hint to force using index

SELECT /+INDEX (TABLE_NAME INDEX_NAME) / COL1,COL2 FROM

TABLE_NAME

Select ( /+ hash / ) empno from

ORDERED- This hint forces tables to be joined in the order specified. If you
know table X has fewer rows, then ordering it first may speed execution in a
join.

Page 21 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
PARALLEL (table, instances)This specifies the operation is to be done in
parallel.

If index is not able to create then will go for /*+ parallel(table, 8)*/-----For
select and update example---in where clase like st,not in ,>,< ,<> then we will
use.

Explain Plan:

Explain plan will tell us whether the query properly using indexes or not.whatis
the cost of the table whether it is doing full table scan or not, based on these
statistics we can tune the query.
The explain plan process stores data in the PLAN_TABLE. This table can be
located in the current schema or a shared schema and is created using in
SQL*Plus as follows:

SQL> CONN sys/password AS SYSDBA

Connected
SQL> @$ORACLE_HOME/rdbms/admin/utlxplan.sql
SQL> GRANT ALL ON sys.plan_table TO public;

SQL> CREATE PUBLIC SYNONYM plan_table FOR sys.plan_table;

What is your tuning approach if SQL query taking long time? Or how do u
tune SQL query?

If query taking long time then First will run the query in Explain Plan, The
explain plan process stores data in the PLAN_TABLE.

it will give us execution plan of the query like whether the query is using the
relevant indexes on the joining columns or indexes to support the query are
missing.

If joining columns doesn’t have index then it will do the full table scan if it is full
table scan the cost will be more then will create the indexes on the joining
columns and will run the query it should give better performance and also
needs to analyze the tables if analyzation happened long back. The ANALYZE
statement can be used to gather statistics for a specific table, index or cluster
Page 22 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
using

ANALYZE TABLE employees COMPUTE STATISTICS;

If still have performance issue then will use HINTS, hint is nothing but a clue.
We can use hints like

 ALL_ROWS
One of the hints that 'invokes' the Cost based optimizer
ALL_ROWS is usually used for batch processing or data warehousing
systems.

(/+ ALL_ROWS /)

 FIRST_ROWS
One of the hints that 'invokes' the Cost based optimizer
FIRST_ROWS is usually used for OLTP systems.

(/+ FIRST_ROWS /)

 CHOOSE
One of the hints that 'invokes' the Cost based optimizer
This hint lets the server choose (between ALL_ROWS and FIRST_ROWS,
based on statistics gathered.

/*+ use_hash */

Hints are most useful to optimize the query performance.

Store Procedure:
What are the differences between stored procedures and triggers?

Stored procedure normally used for performing tasks

But the Trigger normally used for tracing and auditing logs.
Page 23 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Stored procedures should be called explicitly by the user in order to execute
But the Trigger should be called implicitly based on the events defined in the
table.

Stored Procedure can run independently

But the Trigger should be part of any DML events on the table.

Stored procedure can be executed from the Trigger

But the Trigger cannot be executed from the Stored procedures.

Stored Procedures can have parameters.

But the Trigger cannot have any parameters.

Stored procedures are compiled collection of programs or SQL statements in

the database.

Using stored procedure we can access and modify data present in many
tables.

Also a stored procedure is not associated with any particular database

object.

But triggers are event-driven special procedures which are attached to a

specific database object say a table.

Stored procedures are not automatically run and they have to be called
explicitly by the user. But triggers get executed when the particular event
associated with the event gets fired.

Packages:

Packages provide a method of encapsulating related procedures, functions, and

associated cursors and variables together as a unit in the database.

package that contains several procedures and functions that process related to
same transactions.

Page 24 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
A package is a group of related procedures and functions, together with the
cursors and variables they use,

Packages provide a method of encapsulating related procedures, functions, and

associated cursors and variables together as a unit in the database.

Triggers:

Oracle lets you define procedures called triggers that run implicitly when an
INSERT, UPDATE, or DELETE statement is issued against the associated table

Triggers are similar to stored procedures. A trigger stored in the database can
include SQL and PL/SQL

Types of Triggers

This section describes the different types of triggers:

 Row Triggers and Statement Triggers

 BEFORE and AFTER Triggers

 INSTEAD OF Triggers

 Triggers on System Events and User Events

Row Triggers

A row trigger is fired each time the table is affected by the triggering
statement. For example, if an UPDATE statement updates multiple rows of a
table, a row trigger is fired once for each row affected by the UPDATE
statement. If a triggering statement affects no rows, a row trigger is not run.

BEFORE and AFTER Triggers

When defining a trigger, you can specify the trigger timing--whether the trigger
action is to be run before or after the triggering statement. BEFORE and AFTER
apply to both statement and row triggers.

Page 25 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
BEFORE and AFTER triggers fired by DML statements can be defined only on
tables, not on views.

Difference between Trigger and Procedure

Triggers Stored Procedures

In trigger no need to execute Where as in procedure we need to

manually. Triggers will be fired execute manually.
automatically.

Triggers that run implicitly when an

INSERT, UPDATE, or DELETE
statement is issued against the
associated table.

Differences between stored procedure and functions

Stored Procedure Functions

Stored procedure may or may not Function should return at least one
return values. output parameter. Can return more
than one parameter using OUT
argument.

Stored procedure can be used to Function can be used to calculations

solve the business logic.

Stored procedure is a pre-compiled But function is not a pre-compiled

statement. statement.

Stored procedure accepts more than Whereas function does not accept
one argument. arguments.

Stored procedures are mainly used to Functions are mainly used to compute
process the tasks. values

Page 26 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Cannot be invoked from SQL Can be invoked form SQL statements
statements. E.g. SELECT e.g. SELECT

Can affect the state of database using Cannot affect the state of database.
commit.

Stored as a pseudo-code in database Parsed and compiled at runtime.

i.e. compiled form.

Data files Overview:

A tablespace in an Oracle database consists of one or more physical datafiles. A

datafile can be associated with only one tablespace and only one database.

Table Space:

Oracle stores data logically in tablespaces and physically in datafiles associated

with the corresponding tablespace.

A database is divided into one or more logical storage units called tablespaces.
Tablespaces are divided into logical units of storage called segments.

Control File:

A control file contains information about the associated database that is

required for access by an instance, both at startup and during normal operation.
Control file information can be modified only by Oracle; no database
administrator or user can edit a control file.

2.2 IMPORTANT QUERIES

1. Get duplicate rows from the table:

Page 27 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Select empno, count (*) from EMP group by empno having count (*)>1;

2. Remove duplicates in the table:

Delete from EMP where rowid not in (select max (rowid) from EMP group by
empno);

3. Below query transpose columns into rows.

Name No Add1 Add2

abc 100 Hyd bang

xyz 200 Mysore pune

Select name, no, add1 from A

UNION

Select name, no, add2 from A;

4. Below query transpose rows into columns.

select

emp_id,

max(decode(row_id,0,address))as address1,

max(decode(row_id,1,address)) as address2,

max(decode(row_id,2,address)) as address3

from (select emp_id,address,mod(rownum,3) row_id from temp order by

emp_id )

group by emp_id

Other query:

Page 28 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
select

emp_id,

max(decode(rank_id,1,address)) as add1,

max(decode(rank_id,2,address)) as add2,

max(decode(rank_id,3,address))as add3

from

(select emp_id,address,rank() over (partition by emp_id order by

emp_id,address )rank_id from temp )

group by

emp_id

5. Rank query:

Select empno, ename, sal, r from (select empno, ename, sal, rank () over (order
by sal desc) r from EMP);

6. Dense rank query:

The DENSE_RANK function works acts like the RANK function except that it
assigns consecutive ranks:

Select empno, ename, Sal, from (select empno, ename, sal, dense_rank () over
(order by sal desc) r from emp);

7. Top 5 salaries by using rank:

Select empno, ename, sal,r from (select empno,ename,sal,dense_rank() over

(order by sal desc) r from emp) where r<=5;

Select * from (select * from EMP order by sal desc) where rownum<=5;

8. 2 nd highest Sal:
Page 29 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Select empno, ename, sal, r from (select empno, ename, sal, dense_rank ()
over (order by sal desc) r from EMP) where r=2;

9. Top sal:

Select * from EMP where sal= (select max (sal) from EMP);

10.How to display alternative rows in a table?

SQL> select *from emp where (rowid, 0) in (select rowid,mod(rownum,2)

from emp);

11.Hierarchical queries

Starting at the root, walk from the top down, and eliminate employee Higgins
in the result, but

process the child rows.

SELECT department_id, employee_id, last_name, job_id, salary

FROM employees

WHERE last_name! = ’Higgins’

START WITH manager_id IS NULL

CONNECT BY PRIOR employee_id = menagerie;

3 DWH CONCEPTS

What is BI?
Business Intelligence refers to a set of methods and techniques that are used by
organizations for tactical and strategic decision making. It leverages methods
and technologies that focus on counts, statistics and business objectives to
improve business performance.

Page 30 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
The objective of Business Intelligence is to better understand customers and
improve customer service, make the supply and distribution chain more
efficient, and to identify and address business problems and opportunities
quickly.

Warehouse is used for high level data analysis purpose. It

is used for predictions, time series analysis, financial
Analysis, what -if simulations etc. Basically it is used
for better decision making.

What is a Data Warehouse?

Data Warehouse is a "Subject-Oriented, Integrated, Time-Variant Nonvolatile

collection of data in support of decision making".

In terms of design data warehouse and data mart are almost the same.

In general a Data Warehouse is used on an enterprise level and a Data Marts is

used on a business division/department level.

Subject Oriented:

Data that gives information about a particular subject instead of about a

company's ongoing operations.

Integrated:

Data that is gathered into the data warehouse from a variety of sources and
merged into a coherent whole.

Time-variant:

All data in the data warehouse is identified with a particular

time period.

Non-volatile:

Data is stable in a data warehouse. More data is added but data is never
removed.
Page 31 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
What is a DataMart?

Datamart is usually sponsored at the department level and developed with a

specific details or subject in mind, a Data Mart is a subset of data warehouse
with a focused objective.

What is the difference between a data warehouse and a data mart?

In terms of design data warehouse and data mart are almost the same.

In general a Data Warehouse is used on an enterprise level and a Data Marts is

used on a business division/department level.

A data mart only contains data specific to a particular subject areas.

Difference between data mart and data warehouse

Data Mart Data Warehouse

Data mart is usually sponsored at the Data warehouse is a “Subject-

department level and developed with Oriented, Integrated, Time-Variant,
a specific issue or subject in mind, a Nonvolatile collection of data in
data mart is a data warehouse with a support of decision making”.
focused objective.

A data mart is used on a business A data warehouse is used on an

division/ department level. enterprise level

A Data Mart is a subset of data from a A Data Warehouse is simply an

Data Warehouse. Data Marts are built integrated consolidation of data from
for specific user groups. a variety of sources that is specially
designed to support strategic and
tactical decision making.

By providing decision makers with The main objective of Data

only a subset of data from the Data Warehouse is to provide an
Warehouse, Privacy, Performance and integrated environment and coherent
Clarity Objectives can be attained. picture of the business at a point in
Page 32 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
time.

what is fact less fact table?

A fact table that contains only primary keys from the dimension tables,
and that do not contain any measures that type of fact table is called fact less
fact table .

What is a Schema?

Graphical Representation of the datastructure.

First Phase in implementation of Universe

What are the most important features of a data warehouse?

DRILL DOWN, DRILL ACROSS, Graphs, PI charts, dashboards and TIME

HANDLING

To be able to drill down/drill across is the most basic requirement of an end

user in a data warehouse. Drilling down most directly addresses the natural
end-user need to see more detail in an result. Drill down should be as generic
as possible becuase there is absolutely no good way to predict users drill-down
path.

What does it mean by grain of the star schema?

In Data warehousing grain refers to the level of detail available in a given fact
table as well as to the level of detail provided by a star schema.

It is usually given as the number of records per key within the table. In general,
the grain of the fact table is the grain of the star schema.

What is a star schema?

Star schema is a data warehouse schema where there is only one "fact table"
and many denormalized dimension tables.

Fact table contains primary keys from all the dimension tables and other
Page 33 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
numeric columns columns of additive, numeric facts.

What is a snowflake schema?

Unlike Star-Schema, Snowflake schema contain normalized dimension tables

in a tree like structure with many nesting levels.

Snowflake schema is easier to maintain but queries require more joins.

Page 34 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
What is the difference between snow flake and star schema

Star Schema Snow Flake Schema

The star schema is the simplest data Snowflake schema is a more complex
warehouse scheme. data warehouse model than a star
schema.

In star schema each of the In snow flake schema at least one

dimensions is represented in a single hierarchy should exists between
table .It should not have any dimension tables.
hierarchies between dims.

It contains a fact table surrounded by It contains a fact table surrounded by

dimension tables. If the dimensions dimension tables. If a dimension is
are de-normalized, we say it is a star normalized, we say it is a snow flaked
schema design. design.

In star schema only one join In snow flake schema since there is
establishes the relationship between relationship between the dimensions
the fact table and any one of the tables it has to do many joins to fetch
dimension tables. the data.

A star schema optimizes the Snowflake schemas normalize

performance by keeping queries dimensions to eliminated
simple and providing fast response redundancy. The result is more
time. All the information about the complex queries and reduced query
each level is stored in one row. performance.

It is called a star schema because the It is called a snowflake schema

diagram resembles a star. because the diagram resembles a
snowflake.

What is Fact and Dimension?

Page 35 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
A "fact" is a numeric value that a business wishes to count or sum. A
"dimension" is essentially an entry point for getting at the facts. Dimensions are
things of interest to the business.

A set of level properties that describe a specific aspect of a business, used for
analyzing the factual measures.

What is Fact Table?

A Fact Table in a dimensional model consists of one or more numeric facts of

importance to a business. Examples of facts are as follows:

 the number of products sold

 the value of products sold

 the number of products produced

 the number of service calls received

What is Factless Fact Table?

Factless fact table captures the many-to-many relationships between

dimensions, but contains no numeric or textual facts. They are often used to
record events or coverage information.

Common examples of factless fact tables include:

 Identifying product promotion events (to determine promoted products

that didn’t sell)

 Tracking student attendance or registration events

 Tracking insurance-related accident events

Types of facts?

There are three types of facts:

 Additive: Additive facts are facts that can be summed up through all of
the dimensions in the fact table.

Page 36 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
 Semi-Additive: Semi-additive facts are facts that can be summed up for
some of the dimensions in the fact table, but not the others.

 Non-Additive: Non-additive facts are facts that cannot be summed up for

any of the dimensions present in the fact table.

What is Granularity?

Principle: create fact tables with the most granular data possible to support
analysis of the business process.

In Data warehousing grain refers to the level of detail available in a given fact
table as well as to the level of detail provided by a star schema.

It is usually given as the number of records per key within the table. In general,
the grain of the fact table is the grain of the star schema.

Facts: Facts must be consistent with the grain.all facts are at a uniform grain.

 Watch for facts of mixed granularity

 Total sales for day & montly total

Dimensions: each dimension associated with fact table must take on a single
value for each fact row.

 Each dimension attribute must take on one value.

 Outriggers are the exception, not the rule.

Page 37 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Dimensional Model

What is slowly Changing Dimension?

Slowly changing dimensions refers to the change in dimensional attributes

over time.

An example of slowly changing dimension is a Resource dimension where

Page 38 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
attributes of a particular employee change over time like their designation
changes or dept changes etc.

What is Conformed Dimension?

Conformed Dimensions (CD): these dimensions are something that is built once
in your model and can be reused multiple times with different fact tables. For
example, consider a model containing multiple fact tables, representing
different data marts. Now look for a dimension that is common to these facts
tables. In this example let’s consider that the product dimension is common
and hence can be reused by creating short cuts and joining the different fact
tables.Some of the examples are time dimension, customer dimensions,
product dimension.

What is Junk Dimension?

A "junk" dimension is a collection of random transactional codes, flags and/or

text attributes that are unrelated to any particular dimension. The junk
dimension is simply a structure that provides a convenient place to store the
junk attributes. A good example would be a trade fact in a company that
brokers equity trades.

When you consolidate lots of small dimensions and instead of having 100s of
small dimensions, that will have few records in them, cluttering your database
with these mini ‘identifier’ tables, all records from all these small dimension
tables are loaded into ONE dimension table and we call this dimension table
Junk dimension table. (Since we are storing all the junk in this one table) For
example: a company might have handful of manufacture plants, handful of
order types, and so on, so forth, and we can consolidate them in one
dimension table called junked dimension table

It’s a dimension table which is used to keep junk attributes

What is De Generated Dimension?

An item that is in the fact table but is stripped off of its description, because
the description belongs in dimension table, is referred to as Degenerated
Dimension. Since it looks like dimension, but is really in fact table and has been
degenerated of its description, hence is called degenerated dimension..
Page 39 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Degenerated Dimension: a dimension which is located in fact table known as
Degenerated dimension

Dimensional Model:

A type of data modeling suited for data warehousing. In a dimensional

model, there are two types of tables: dimensional tables and fact tables.
Dimensional table records information on each dimension, and fact table
records all the "fact", or measures.

Data modeling

There are three levels of data modeling. They are conceptual, logical, and
physical. This section will explain the difference among the three, the order
with which each one is created, and how to go from one level to the other.

Conceptual Data Model

Features of conceptual data model include:

 Includes the important entities and the relationships among them.

 No attribute is specified.

 No primary key is specified.

At this level, the data modeler attempts to identify the highest-level

relationships among the different entities.

Logical Data Model

Features of logical data model include:

 Includes all entities and relationships among them.

 All attributes for each entity are specified.

 The primary key for each entity specified.

 Foreign keys (keys identifying the relationship between different entities)

Page 40 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
are specified.

 Normalization occurs at this level.

At this level, the data modeler attempts to describe the data in as much detail
as possible, without regard to how they will be physically implemented in the
database.

In data warehousing, it is common for the conceptual data model and the
logical data model to be combined into a single step (deliverable).

The steps for designing the logical data model are as follows:

1. Identify all entities.

2. Specify primary keys for all entities.

3. Find the relationships between different entities.

4. Find all attributes for each entity.

5. Resolve many-to-many relationships.

6. Normalization.

Physical Data Model

Features of physical data model include:

 Specification all tables and columns.

 Foreign keys are used to identify relationships between tables.

 Demoralization may occur based on user requirements.

 Physical considerations may cause the physical data model to be quite

different from the logical data model.

At this level, the data modeler will specify how the logical data model will be
realized in the database schema.

The steps for physical data model design are as follows:

Page 41 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
1. Convert entities into tables.

2. Convert relationships into foreign keys.

3. Convert attributes into columns.

9. http://www.learndatamodeling.com/dm_standard.htm

10.Modeling is an efficient and effective way to represent the

organization’s needs; It provides information in a graphical way to the
members of an organization to understand and communicate the
business rules and processes. Business Modeling and Data Modeling
are the two important types of modeling.

The differences between a logical data model and physical data model is
shown below.

Logical vs Physical Data Modeling

Logical Data Model Physical Data Model

Represents business information Represents the physical implementation

and defines business rules of the model in a database.

Entity Table

Attribute Column

Primary Key Primary Key Constraint

Alternate Key Unique Constraint or Unique Index

Inversion Key Entry Non Unique Index

Rule Check Constraint, Default Value

Relationship Foreign Key

Page 42 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Definition Comment

Below is the simple data model

Page 43 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Below is the sq for one of sq for the dimension table load

Page 44 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Page 45 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
EDIII – Logical Design

Page 46 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
ACW_ORGANIZATION_D
ACW_DF_FEES_STG ACW_DF_FEES_F Primary Key
Non-Key Attributes Primary Key ORG_KEY [PK1]
SEGMENT1 ACW_DF_FEES_KEY Non-Key Attributes
ORGANIZATION_ID [PK1] ORGANIZATION_CODE
ITEM_TYPE
Non-Key Attributes CREA TED_BY
BUYER_ID CREA TION_DATE
PRODUCT_KEY
COST_REQUIRED
ORG_KEY LAST_UPDATE_DATE
QUARTER_1_COST
DF_MGR_KEY LAST_UPDATED_BY
QUARTER_2_COST D_CREATED_BY
COST_REQUIRED
QUARTER_3_COST
DF_FEES D_CREATION_DATE PID for DF Fees
QUARTER_4_COST
COSTED_BY D_LAST_UPDATE_DATE
COSTED_BY D_LAST_UPDATED_BY
COSTED_DATE
COSTED_DATE
APPROV ING_MGR
APPROV ED_BY
APPROV ED_DATE
APPROV ED_DATE
D_CREATED_BY
D_CREATION_DATE ACW_USERS_D
D_LAST_UPDATE_BY Primary Key
D_LAST_UPDATED_DATE USER_KEY [PK1]
EDW_TIME_HIERARCHY Non-Key Attributes
PERSON_ID
EMAIL_ADDRESS
ACW_PCBA_A PPROVAL_F LAST_NAME
Primary Key FIRST_NAME
ACW_PCBA_A PPROVAL_STG PCBA _APPROVAL_KEY FULL_NAME
Non-Key Attributes [PK1] EFFECTIV E_STA RT_DATE
INV ENTORY_ITEM_ID Non-Key Attributes EFFECTIV E_END_DATE
LATEST_REV EMPLOYEE_NUMBER
PART_KEY
LOCATION_ID CISCO_PART_NUMBER LAST_UPDATED_BY
LOCATION_CODE SUPPLY_CHANNEL_KEY LAST_UPDATE_DATE
APPROV AL_FLAG CREA TION_DATE
NPI
ADJUSTMENT CREA TED_BY
APPROV AL_FLAG
APPROV AL_DATE D_LAST_UPDATED_BY
ADJUSTMENT
TOTA L_ADJUSTMENT D_LAST_UPDATE_DATE
APPROV AL_DATE
TOTA L_ITEM_COST D_CREATION_DATE
ADJUSTMENT_AMT
DEMAND SPEND_BY _ASSEMBLY D_CREATED_BY
COMM_MGR COMM_MGR_KEY ACW_PRODUCTS_D
BUYER_ID Primary Key
BUYER_ID
BUYER ACW_PART_TO_PID_D PRODUCT_KEY [PK1]
RFQ_CREATED
RFQ_CREATED Users
Primary Key Non-Key Attributes
RFQ_RESPONSE
RFQ_RESPONSE
CSS PART_TO_PID_KEY [PK1] PRODUCT_NA ME
CSS
D_CREATED_BY Non-Key Attributes BUSINESS_UNIT_ID
D_CREATED_DATE PART_KEY BUSINESS_UNIT
D_LAST_UPDATED_BY CISCO_PART_NUMBER PRODUCT_FAMILY_ID
ACW_DF_A PPROVAL_STG D_LAST_UPDATE_DATE PRODUCT_KEY PRODUCT_FAMILY
Non-Key Attributes PRODUCT_NA ME ITEM_TYPE
LATEST_REVISION D_CREATED_BY
INV ENTORY_ITEM_ID ACW_DF_A PPROVAL_F
D_CREATED_BY D_CREATION_DATE
CISCO_PART_NUMBER Primary Key
D_CREATION_DATE D_LAST_UPDATE_BY
LATEST_REV
DF_APPROVAL_KEY D_LAST_UPDATED_BY D_LAST_UPDATED_DATE
PCBA _ITEM_FLAG [PK1]
APPROV AL_FLAG D_LAST_UPDATE_DATE
Non-Key Attributes
APPROV AL_DATE
LOCATION_ID PART_KEY
LOCATION_CODE CISCO_PART_NUMBER
BUYER SUPPLY_CHANNEL_KEY
BUYER_ID PCBA _ITEM_FLAG
RFQ_CREATED APPROV ED ACW_SUPPLY_CHA NNEL_D
RFQ_RESPONSE APPROV AL_DATE
Primary Key
CSS BUYER_ID
SUPPLY_CHANNEL_KEY
RFQ_CREATED
[PK1]
RFQ_RESPONSE
CSS Non-Key Attributes
D_CREATED_BY SUPPLY_CHANNEL
D_CREATION_DATE DESCRIPTION
D_LAST_UPDATED_BY LAST_UPDATED_BY
D_LAST_UPDATE_DATE LAST_UPDATE_DATE
CREA TED_BY
CREA TION_DATE
D_LAST_UPDATED_BY
D_LAST_UPDATE_DATE
D_CREATED_BY
D_CREATION_DATE

Page 47 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
EDII– Physical Design

ACW_DF_FEES_F ACW_ORGANIZAT ION_D

ACW_DF_FEES_ST G Colum ns Colum ns
Colum ns ACW_DF_FEES_KEY NUM B ER(10) [P K1] ORG_KE Y NUM B ER(10) [P K1]
SEGM ENT 1 VARCHAR2(40) PRODUCT _KEY NUM B ER(10) ORGA NIZAT ION_CODE CHA R(30)
ORGA NIZAT ION_IDNUM B ER(10) ORG_KE Y NUM B ER(10) CRE AT ED_BY NUM B ER(10)
IT EM _T YPE CHA R(30) DF_M GR_K EY NUM B ER(10) CRE AT ION_DAT E DAT E
BUY ER_ID NUM B ER(10) COST _REQUIRED CHA R(1) LAST _UPDAT E_DAT E DAT E
COST _REQUIRED CHA R(1) DF_FE ES FLOAT (12) LAST _UPDAT ED_BY NUM B ER
QUART E R_1_COSTFLOAT (12) COST ED_B Y NUM B ER(10) D_CREA T ED_BY CHA R(10)
QUART E R_2_COSTFLOAT (12) COST ED_DAT E DAT E D_CREA T ION_DAT E DAT E
QUART E R_3_COSTFLOAT (12) APP ROV ING_M GR NUM B ER(10) D_LAST _UPDAT E_DAT DEAT E
QUART E R_4_COSTFLOAT (12) APP ROV ED_DAT E DAT E D_LAST _UPDAT ED_BYCHA R(10)
COST ED_B Y NUM B ER(10) D_CREA T ED_BY CHA R(10)
COST ED_DAT E DAT E D_CREA T ION_DAT E DAT E
PID_for_DF_Fees
APP ROV ED_BY NUM B ER(10) D_LAST _UPDAT E_BY CHA R(10)
APP ROV ED_DAT E DAT E D_LAST _UPDAT ED_DAT CHA
E R(10)

EDW_T IM E_HIE RARCHY

ACW_US ERS_D
ACW_PCBA_APPROVAL_F Colum ns
Colum ns USE R_K EY NUM B ER(10) [P K1]
PCB A_A PPROVAL_KEY CHA R(10) [PK1] PERSON_ID CHA R(10)
ACW_PCBA_APPROVAL_ST G
PART _K EY NUM B ER(10) EM AIL_ADDRESS CHA R(10)
Colum ns
CISCO_PA RT _NUM BE R CHA R(10) LAST _NAM E VARCHAR2(50)
INVENT ORY_IT EM _IDNUM B ER(10) FIRST _NAM E VARCHAR2(50)
SUP PLY _CHANNE L_KEYNUM B ER(10)
LAT EST _REV CHA R(10) FULL_NAM E CHA R(10)
NPI CHA R(1)
LOCAT ION_ID NUM B ER(10) EFFECT IVE_ST ART _DATDAT
E E
APP ROV AL_FLAG CHA R(1)
LOCAT ION_CODE CHA R(10) EFFECT IVE_END_DAT E DAT E
ADJUST M E NT CHA R(1)
APP ROV AL_FLAG CHA R(1) EM PLOYEE_NUM BER NUM B ER(10)
APP ROV AL_DA T E DAT E
ADJUST M E NT CHA R(1) SEX NUM B ER
ADJUST M E NT _AM T FLOAT (12)
APP ROV AL_DA T E DAT E LAST _UPDAT E_DAT E DAT E
SPE ND_BY_ASSE M BLYFLOAT (12)
T OT AL_ADJUST M ENTCHA R(10) CRE AT ION_DAT E DAT E
COM M _M GR_K EY NUM B ER(10)
T OT AL_IT EM _COST FLOAT (10) CRE AT ED_BY NUM B ER(10)
BUY ER_ID NUM B ER(10)
DEM A ND NUM B ER D_LAST _UPDAT ED_BY CHA R(10)
RFQ_CREAT ED CHA R(1)
COM M _M GR CHA R(10) D_LAST _UPDAT E_DAT E DAT E
RFQ_RE SPONSE CHA R(1)
BUY ER_ID NUM B ER(10) D_CREA T ION_DAT E DAT E
CSS CHA R(10)
BUY ER VARCHAR2(240) D_CREA T ED_BY CHA R(10)
D_CREA T ED_BY CHA R(10)
RFQ_CREAT ED CHA R(1)
D_CREA T ED_DAT E CHA R(10)
RFQ_RE SPONSE CHA R(1)
D_LAST _UPDAT ED_BY CHA R(10)
CSS CHA R(10)
D_LAST _UPDAT E_DAT EDAT E

ACW_PRODUCT S_D
Colum ns
ACW_DF_APPROVA L_ST G
PRODUCT _KEY NUM B ER(10) [P K1]
Colum ns
PRODUCT _NAM E CHA R(30)
INVENT ORY_IT EM _ID NUM B ER(10) BUS INESS _UNIT _ID NUM B ER(10)
CISCO_PA RT _NUM BE RCHA R(30) ACW_DF_APPROVA L_F ACW_PA RT _T O_PID_D
BUS INESS _UNIT VARCHAR2(60)
LAT EST _REV CHA R(10) Colum ns Colum ns
PRODUCT _FAM ILY_ID NUM B ER(10)
PCB A_IT EM _FLAG CHA R(1) DF_APPROVAL_KEY NUM B ER(10) [P K1] PART _T O_PID_KEY NUM B ER(10) [P K1]
PRODUCT _FAM ILY VARCHAR2(180)
APP ROV AL_FLAG CHA R(1) PART _K EY NUM B ER(10) PART _K EY NUM B ER(10)
IT EM _T YPE CHA R(30)
APP ROV AL_DA T E DAT E CISCO_PA RT _NUM BE R CHA R(30) CISCO_PA RT _NUM BE RCHA R(30)
D_CREA T ED_BY CHA R(10)
LOCAT ION_ID NUM B ER(10) SUP PLY _CHANNE L_KEYNUM B ER(10) PRODUCT _KEY NUM B ER(10)
D_CREA T ION_DAT E DAT E
SUP PLY _CHANNE L CHA R(10) PCB A_IT EM _FLAG CHA R(1) PRODUCT _NAM E CHA R(30)
D_LAST _UPDAT E_BY CHA R(10)
BUY ER VARCHAR2(240) APP ROV ED CHA R(1) LAT EST _REVIS ION CHA R(10)
D_LAST _UPDAT ED_DAT CHA
E R(10)
BUY ER_ID NUM B ER(10) APP ROV AL_DA T E DAT E D_CREA T ED_BY CHA R(10)
RFQ_CREAT ED CHA R(1) BUY ER_ID NUM B ER(10) D_CREA T ION_DAT E DAT E
RFQ_RE SPONSE CHA R(1) RFQ_CREAT ED CHA R(1) D_LAST _UPDAT ED_BYCHA R(10)
CSS CHA R(10) RFQ_RE SPONSE CHA R(1) D_LAST _UPDAT E_DAT D EAT E
CSS CHA R(10)
D_CREA T ED_BY CHA R(10)
D_CREA T ION_DAT E DAT E
D_LAST _UPDAT ED_BY CHA R(10)
D_LAST _UPDAT E_DAT EDAT E
ACW_SUPPLY_CHANNEL_D
Colum ns
SUP PLY _CHANNE L_KEYNUM B ER(10) [P K1]
SUP PLY _CHANNE L CHA R(60)
DES CRIPT ION VARCHAR2(240)
LAST _UPDAT ED_BY NUM B ER
LAST _UPDAT E_DAT E DAT E
CRE AT ED_BY NUM B ER(10)
CRE AT ION_DAT E DAT E
D_LAST _UPDAT ED_BY CHA R(10)
D_LAST _UPDAT E_DAT EDAT E
D_CREA T ED_BY CHA R(10)
D_CREA T ION_DAT E DAT E

Users

Page 48 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Types of SCD Implementation:

Type 1 Slowly Changing Dimension

In Type 1 Slowly Changing Dimension, the new information simply overwrites

the original information. In other words, no history is kept.

In our example, recall we originally have the following table:

Customer Key Name State

1001 Christina Illinois

After Christina moved from Illinois to California, the new information replaces
the new record, and we have the following table:

Customer Key Name State

1001 Christina California

Advantages:

- This is the easiest way to handle the Slowly Changing Dimension problem,
since there is no need to keep track of the old information.

Disadvantages:

- All history is lost. By applying this methodology, it is not possible to

trace back in history. For example, in this case, the company would
not be able to know that Christina lived in Illinois before.

- Usage:

About 50% of the time.

When to use Type 1:

Type 1 slowly changing dimension should be used when it is not necessary for
the data warehouse to keep track of historical changes.

Page 49 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Type 2 Slowly Changing Dimension

In Type 2 Slowly Changing Dimension, a new record is added to the table to

represent the new information. Therefore, both the original and the new
record will be present. The newe record gets its own primary key.

In our example, recall we originally have the following table:

Customer Key Name State

1001 Christina Illinois

After Christina moved from Illinois to California, we add the new information as
a new row into the table:

Customer Key Name State

1001 Christina Illinois

1005 Christina California

Advantages:

- This allows us to accurately keep all historical information.

Disadvantages:

- This will cause the size of the table to grow fast. In cases where the number of
rows for the table is very high to start with, storage and performance can
become a concern.

- This necessarily complicates the ETL process.

Usage:

About 50% of the time.

When to use Type 2:

Type 2 slowly changing dimension should be used when it is necessary for the
Page 50 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
data warehouse to track historical changes.

Type 3 Slowly Changing Dimension

In Type 3 Slowly Changing Dimension, there will be two columns to indicate the
particular attribute of interest, one indicating the original value, and one
indicating the current value. There will also be a column that indicates when
the current value becomes active.

In our example, recall we originally have the following table:

Customer Key Name State

1001 Christina Illinois

To accommodate Type 3 Slowly Changing Dimension, we will now have the

following columns:

 Customer Key

 Name

 Original State

 Current State

 Effective Date

After Christina moved from Illinois to California, the original information gets
updated, and we have the following table (assuming the effective date of
change is January 15, 2003):

Customer Key Name Original State Current State Effective Date

1001 Christina Illinois California 15-JAN-2003

Advantages:

- This does not increase the size of the table, since new information is updated.
Page 51 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
- This allows us to keep some part of history.

Disadvantages:

- Type 3 will not be able to keep all history where an attribute is changed more
than once. For example, if Christina later moves to Texas on December 15,
2003, the California information will be lost.

Usage:

Type 3 is rarely used in actual practice.

When to use Type 3:

Type III slowly changing dimension should only be used when it is necessary for
the data warehouse to track historical changes, and when such changes will
only occur for a finite number of time.

What is Staging area why we need it in DWH?

If target and source databases are different and target table volume is high it
contains some millions of records in this scenario without staging table we need
to design your informatica using look up to find out whether the record exists or
not in the target table since target has huge volumes so its costly to create
cache it will hit the performance.

If we create staging tables in the target database we can simply do outer join in
the source qualifier to determine insert/update this approach will give you good
performance.

It will avoid full table scan to determine insert/updates on target.

And also we can create index on staging tables since these tables were designed
for specific application it will not impact to any other schemas/users.

While processing flat files to data warehousing we can perform cleansing.

Page 52 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Data cleansing, also known as data scrubbing, is the process of ensuring that a
set of data is correct and accurate. During data cleansing, records are checked
for accuracy and consistency.

 Since it is one-to-one mapping from ODS to staging we do truncate

and reload.

 We can create indexes in the staging state, to perform our source

qualifier best.

 If we have the staging area no need to relay on the informatics

transformation to known whether the record exists or not.

Data cleansing

Weeding out unnecessary or unwanted things (characters and spaces etc)

from incoming data to make it more meaningful and informative

Data merging

Data can be gathered from heterogeneous systems and put together

Data scrubbing

Data scrubbing is the process of fixing or eliminating individual pieces of

data that are incorrect, incomplete or duplicated before the data is
passed to end user.

Data scrubbing is aimed at more than eliminating errors and redundancy.

The goal is also to bring consistency to various data sets that may have
been created with different, incompatible business rules.

ODS (Operational Data Sources):

My understanding of ODS is, its a replica of OLTP system and so the need of
this, is to reduce the burden on production system (OLTP) while fetching data
for loading targets. Hence its a mandate Requirement for every Warehouse.
Page 53 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
So every day do we transfer data to ODS from OLTP to keep it up to date?

OLTP is a sensitive database they should not allow multiple select statements it
may impact the performance as well as if something goes wrong while fetching
data from OLTP to data warehouse it will directly impact the business.

ODS is the replication of OLTP.

ODS is usually getting refreshed through some oracle jobs.

enables management to gain a consistent picture of the business.

What is a surrogate key?

A surrogate key is a substitution for the natural primary key. It is a unique

identifier or number ( normally created by a database sequence generator ) for
each record of a dimension table that can be used for the primary key to the
table.

A surrogate key is useful because natural keys may change.

What is the difference between a primary key and a surrogate key?

A primary key is a special constraint on a column or set of columns. A primary

key constraint ensures that the column(s) so designated have no NULL values,
and that every value is unique. Physically, a primary key is implemented by the
database system using a unique index, and all the columns in the primary key
must have been declared NOT NULL. A table may have only one primary key,
but it may be composite (consist of more than one column).

A surrogate key is any column or set of columns that can be declared as the
primary key instead of a "real" or natural key. Sometimes there can be several
natural keys that could be declared as the primary key, and these are all called
candidate keys. So a surrogate is a candidate key. A table could actually have
more than one surrogate key, although this would be unusual. The most
common type of surrogate key is an incrementing integer, such as an auto
increment column in MySQL, or a sequence in Oracle, or an identity column in
SQL Server.

Page 54 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
4 ETL-INFORMATICA

4.1 Informatica Overview

Informatica is a powerful ETL( Extraction, Transformation, and Loading) tool and DEVELOPED
BY Informatica corporation.. Informatica comes with the following clients to perform various
tasks.

 Designer – Used to mappings

 Workflow Manager / Workflow Monitor -- Used to create sessions /
workflows/ worklets to run, schedule, and monitor workflows
 Repository Manager – Used to create and maintain folders, users,
permissions, locks, and repositories.
 Integration Services – the “workhorse” of the domain. Informatica Server
is the component responsible for the actual work of moving data
according to the mappings developed and placed into operation. It
contains several distinct parts such as the Load Manager, Data
Transformation Manager, Reader and Writer.
 Repository Services- Informatica client tools and Informatica Server
connect to the repository database over the network through the
Repository Server.

Informatica Transformations:

Mapping: Mapping is the Informatica Object which contains set of

transformations including source and target. Its look like pipeline.

Mapplet:

Mapplet is a set of reusable transformations. We can use this mapplet in any

mapping within the Folder.

A mapplet can be active or passive depending on the transformations in the

mapplet. Active mapplets contain one or more active transformations. Passive
mapplets contain only passive transformations.

When you add transformations to a mapplet, keep the following restrictions in

Page 55 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
mind:

 If you use a Sequence Generator transformation, you must use a reusable

Sequence Generator transformation.

 If you use a Stored Procedure transformation, you must configure the

Stored Procedure Type to be Normal.

 You cannot include the following objects in a mapplet:

o Normalizer transformations

o COBOL sources

o XML Source Qualifier transformations

o XML sources

o Target definitions

o Other mapplets

 The mapplet contains Input transformations and/or source definitions

with at least one port connected to a transformation in the mapplet.

 The mapplet contains at least one Output transformation with at least

one port connected to a transformation in the mapplet.

Input Transformation: Input transformations are used to create a logical

interface to a mapplet in order to allow data to pass into the mapplet.

Output Transformation: Output transformations are used to create a logical

interface from a mapplet in order to allow data to pass out of a mapplet.

System Variables

$$$SessStartTime returns the initial system date value on the machine hosting
the Integration Service when the server initializes a session. $$$SessStartTime
returns the session start time as a string value. The format of the string
depends on the database you are using.

Page 56 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Session: A session is a set of instructions that tells informatica Server how to
move data from sources to targets.

WorkFlow: A workflow is a set of instructions that tells Informatica Server how

to execute tasks such as sessions, email notifications and commands. In a
workflow multiple sessions can be included to run in parallel or sequential
manner.

Source Definition: The Source Definition is used to logically represent database

table or Flat files.

Target Definition: The Target Definition is used to logically represent a

database table or file in the Data Warehouse / Data Mart.

Aggregator: The Aggregator transformation is used to perform Aggregate

calculations on group basis.

Expression: The Expression transformation is used to perform the arithmetic

calculation on row by row basis and also used to convert string to integer vis
and concatenate two columns.

Filter: The Filter transformation is used to filter the data based on single
condition and pass through next transformation.

Router: The router transformation is used to route the data based on multiple
conditions and pass through next transformations.

It has three groups

1) Input group

2) User defined group

3) Default group

Joiner: The Joiner transformation is used to join two sources residing in

different databases or different locations like flat file and oracle sources or two
relational tables existing in different databases.

Source Qualifier: The Source Qualifier transformation is used to describe in SQL

the method by which data is to be retrieved from a source application system
Page 57 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
and also

used to join two relational sources residing in same databases.

What is Incremental Aggregation?

A. Whenever a session is created for a mapping Aggregate Transformation, the

session option for Incremental Aggregation can be enabled. When
PowerCenter performs incremental aggregation, it passes new source data
through the mapping and uses historical cache data to perform new
aggregation calculations incrementally.

Lookup: Lookup transformation is used in a mapping to look up data in a flat

file or a relational table, view, or synonym.

Two types of lookups:

1) Connected

2) Unconnected

Differences between connected lookup and unconnected lookup

Connected Lookup Unconnected Lookup

This is connected to pipleline and Which is not connected to pipeline

receives the input values from and receives input values from the
pipleline. result of a: LKP expression in
another transformation via
arguments.

We cannot use this lookup more We can use this transformation

than once in a mapping. more than once within the mapping

We can return multiple columns Designate one return port (R),

from the same row. returns one column from each row.

We can configure to use dynamic We cannot configure to use

cache. dynamic cache.

Page 58 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Pass multiple output values to Pass one output value to another
another transformation. Link transformation. The
lookup/output ports to another lookup/output/return port passes
transformation. the value to the transformation
calling: LKP expression.

Use a dynamic or static cache Use a static cache

Supports user defined default Does not support user defined

values. default values.

Cache includes the lookup source Cache includes all lookup/output

column in the lookup condition and ports in the lookup condition and
the lookup source columns that are the lookup/return port.
output ports.

Lookup Caches:

When configuring a lookup cache, you can specify any of the following options:

 Persistent cache

 Recache from lookup source

 Static cache

 Dynamic cache

 Shared cache

Dynamic cache: When you use a dynamic cache, the PowerCenter Server
updates the lookup cache as it passes rows to the target.

If you configure a Lookup transformation to use a dynamic cache, you can only
use the equality operator (=) in the lookup condition.

NewLookupRow Port will enable automatically.

Page 59 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
NewLookupRow
Description
Value

The PowerCenter Server does not update or insert the

0
row in the cache.

1 The PowerCenter Server inserts the row into the cache.

2 The PowerCenter Server updates the row in the cache.

Static cache: It is a default cache; the PowerCenter Server doesn’t update the
lookup cache as it passes rows to the target.

Persistent cache: If the lookup table does not change between sessions,
configure the Lookup transformation to use a persistent lookup cache. The
PowerCenter Server then saves and reuses cache files from session to session,
eliminating the time required to read the lookup table.

Differences between dynamic lookup and static lookup

Dynamic Lookup Cache Static Lookup Cache

In dynamic lookup the cache In static lookup the cache memory

memory will get refreshed as soon will not get refreshed even though
as the record get inserted or record inserted or updated in the
updated/deleted in the lookup lookup table it will refresh only in
table. the next session run.

When we configure a lookup It is a default cache.

transformation to use a dynamic
lookup cache, you can only use the
equality operator in the lookup
condition.

Page 60 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
NewLookupRow port will enable
automatically.

Best example where we need to use If we use static lookup first record
dynamic cache is if suppose first it will go to lookup and check in the
record and last record both are lookup cache based on the
same but there is a change in the condition it will not find the match
address. What informatica mapping so it will return null value then in
has to do here is first record needs the router it will send that record
to get insert and last record should to insert flow.
get update in the target table.
But still this record dose not
available in the cache memory so
when the last record comes to
lookup it will check in the cache it
will not find the match so it returns
null value again it will go to insert
flow through router but it is
suppose to go to update flow
because cache didn’t get refreshed
when the first record get inserted
into target table.

Normalizer: The Normalizer transformation is used to generate multiple

records from a single record based on columns (transpose the column data into
rows)

We can use normalize transformation to process cobol sources instead of

source qualifier.

Rank: The Rank transformation allows you to select only the top or bottom
rank of data. You can use a Rank transformation to return the largest or
smallest numeric value in a port or group.

The Designer automatically creates a RANKINDEX port for each Rank

transformation.

Page 61 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Sequence Generator: The Sequence Generator transformation is used to
generate numeric key values in sequential order.

Stored Procedure: The Stored Procedure transformation is used to execute

externally stored database procedures and functions. It is used to perform the
database level operations.

Sorter: The Sorter transformation is used to sort data in ascending or

descending order according to a specified sort key. You can also configure the
Sorter transformation for case-sensitive sorting, and specify whether the
output rows should be distinct. The Sorter transformation is an active
transformation. It must be connected to the data flow.

Union Transformation:

The Union transformation is a multiple input group transformation that you can
use to merge data from multiple pipelines or pipeline branches into one
pipeline branch. It merges data from multiple sources similar to the UNION ALL
SQL statement to combine the results from two or more SQL statements.
Similar to the UNION ALL statement, the Union transformation does not
remove duplicate rows.Input groups should have similar structure.

Update Strategy: The Update Strategy transformation is used to indicate the

DML statement.

We can implement update strategy in two levels:

1) Mapping level

2) Session level.

Session level properties will override the mapping level properties.

Aggregator Transformation:

Transformation type:

Page 62 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Active

Connected

The Aggregator transformation performs aggregate calculations, such as

averages and sums. The Aggregator transformation is unlike the Expression
transformation, in that you use the Aggregator transformation to perform
calculations on groups. The Expression transformation permits you to perform
calculations on a row-by-row basis only.

Components of the Aggregator Transformation:

The Aggregator is an active transformation, changing the number of rows in the

pipeline. The Aggregator transformation has the following components and
options

Aggregate cache: The Integration Service stores data in the aggregate cache
until it completes aggregate calculations. It stores group values in an index
cache and row data in the data cache.

Group by port: Indicate how to create groups. The port can be any input,
input/output, output, or variable port. When grouping data, the Aggregator
transformation outputs the last row of each group unless otherwise specified.

Sorted input: Select this option to improve session performance. To use sorted
input, you must pass data to the Aggregator transformation sorted by group by
port, in ascending or descending order.

Aggregate Expressions:

The Designer allows aggregate expressions only in the Aggregator

transformation. An aggregate expression can include conditional clauses and
non-aggregate functions. It can also include one aggregate function nested
within another aggregate function, such as:

MAX (COUNT (ITEM))

The result of an aggregate expression varies depending on the group by ports

used in the transformation

Page 63 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Aggregate Functions

Use the following aggregate functions within an Aggregator transformation.

You can nest one aggregate function within another aggregate function.

The transformation language includes the following aggregate functions:

(AVG,COUNT,FIRST,LAST,MAX,MEDIAN,MIN,PERCENTAGE,SUM,VARIANCE and
STDDEV)

When you use any of these functions, you must use them in an expression
within an Aggregator transformation.

Perfomance Tips in Aggregator

Use sorted input to increase the mapping performance but we need to sort the
data before sending to aggregator transformation.

Filter the data before aggregating it.

If you use a Filter transformation in the mapping, place the transformation

before the Aggregator transformation to reduce unnecessary aggregation.

SQL Transformation

Transformation type:

Active/Passive

Connected

The SQL transformation processes SQL queries midstream in a pipeline. You

can insert, delete, update, and retrieve rows from a database. You can pass the
database connection information to the SQL transformation as input data at
run time. The transformation processes external SQL scripts or SQL queries that
you create in an SQL editor. The SQL transformation processes the query and
returns rows and database errors.

For example, you might need to create database tables before adding new
transactions. You can create an SQL transformation to create the tables in a
workflow. The SQL transformation returns database errors in an output port.
Page 64 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
You can configure another workflow to run if the SQL transformation returns
no errors.

When you create an SQL transformation, you configure the following options:

Mode. The SQL transformation runs in one of the following modes:

Script mode. The SQL transformation runs ANSI SQL scripts that are externally
located. You pass a script name to the transformation with each input row. The
SQL transformation outputs one row for each input row.

Query mode. The SQL transformation executes a query that you define in a
query editor. You can pass strings or parameters to the query to define
dynamic queries or change the selection parameters. You can output multiple
rows when the query has a SELECT statement.

Database type. The type of database the SQL transformation connects to.

Connection type. Pass database connection information to the SQL

transformation or use a connection object.

Script Mode

An SQL transformation configured for script mode has the following default
ports:

Port Type Description

ScriptName Input Receives the name of the script to execute for the current

Page 65 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
row.

ScriptResult Output Returns PASSED if the script execution succeeds for the
row. Otherwise contains FAILED.

ScriptError Output Returns errors that occur when a script fails for a row.

Java Transformation Overview

Transformation type:

Active/Passive

Connected

The Java transformation provides a simple native programming interface to

define transformation functionality with the Java programming language. You
can use the Java transformation to quickly define simple or moderately
complex transformation functionality without advanced knowledge of the Java
programming language or an external Java development environment.

For example, you can define transformation logic to loop through input rows
and generate multiple output rows based on a specific condition. You can also
use expressions, user-defined functions, unconnected transformations, and
mapping variables in the Java code.

Transaction Control Transformation

Transformation type:

Active

Connected

PowerCenter lets you control commit and roll back transactions based on a set
of rows that pass through a Transaction Control transformation. A transaction
is the set of rows bound by commit or roll back rows. You can define a
transaction based on a varying number of input rows. You might want to define

Page 66 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
transactions based on a group of rows ordered on a common key, such as
employee ID or order entry date.

In PowerCenter, you define transaction control at the following levels:

Within a mapping. Within a mapping, you use the Transaction Control

transformation to define a transaction. You define transactions using an
expression in a Transaction Control transformation. Based on the return value
of the expression, you can choose to commit, roll back, or continue without
any transaction changes.

Within a session. When you configure a session, you configure it for user-
defined commit. You can choose to commit or roll back a transaction if the
Integration Service fails to transform or write any row to the target.

When you run the session, the Integration Service evaluates the expression for
each row that enters the transformation. When it evaluates a commit row, it
commits all rows in the transaction to the target or targets. When the
Integration Service evaluates a roll back row, it rolls back all rows in the
transaction from the target or targets.

If the mapping has a flat file target you can generate an output file each time
the Integration Service starts a new transaction. You can dynamically name
each target flat file.

On the Properties tab, you can configure the following properties:

Transaction control
expression

Enter the transaction control expression in the Transaction Control Condition

field. The transaction control expression uses the IIF function to test each row
against the condition. Use the following syntax for the expression:

IIF (condition, value1, value2)

The expression contains values that represent actions the Integration Service
Page 67 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
performs based on the return value of the condition. The Integration Service
evaluates the condition on a row-by-row basis. The return value determines
whether the Integration Service commits, rolls back, or makes no transaction
changes to the row. When the Integration Service issues a commit or roll back
based on the return value of the expression, it begins a new transaction. Use
the following built-in variables in the Expression Editor when you create a
transaction control expression:

TC_CONTINUE_TRANSACTION. The Integration Service does not perform any

transaction change for this row. This is the default value of the expression.

TC_COMMIT_BEFORE. The Integration Service commits the transaction,

begins a new transaction, and writes the current row to the target. The
current row is in the new transaction.

TC_COMMIT_AFTER. The Integration Service writes the current row to the

target, commits the transaction, and begins a new transaction. The current
row is in the committed transaction.

TC_ROLLBACK_BEFORE. The Integration Service rolls back the current

transaction, begins a new transaction, and writes the current row to the
target. The current row is in the new transaction.

TC_ROLLBACK_AFTER. The Integration Service writes the current row to the

target, rolls back the transaction, and begins a new transaction. The current
row is in the rolled back transaction.

Transaction Control transformation. Create the following transaction control

expression to commit data when the Integration Service encounters a new
order entry date:

IIF(NEW_DATE = 1, TC_COMMIT_BEFORE, TC_CONTINUE_TRANSACTION)

What is the difference between joiner and lookup?

Joiner Lookup

Page 68 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
In joiner on multiple matches it will In lookup it will return either first
return all matching records. record or last record or any value or
error value.

In joiner we cannot configure to use Where as in lookup we can configure

persistence cache, shared cache, to use persistence cache, shared
uncached and dynamic cache cache, uncached and dynamic cache.

We cannot override the query in We can override the query in lookup

joiner to fetch the data from multiple tables.

We can’t perform any filters along We can apply filters along with lkp
with join condition in joiner conditions using lkp query override
transformation. lookup transformation.

We cannot use relational operators in Where as in lookup we can use the

joiner transformation.(i.e. <,>,<= and relation operators. (i.e. <,>,<= and so
so on) on)

What is the difference between source qualifier and lookup?

Source Qualifier Lookup

In source qualifier it will push all the Where as in lookup we can restrict
matching records. whether to display first value, last
value or any value

In source qualifier there is no Where as in lookup we concentrate

concept of cache. on cache concept.

When both source and lookup are in When the source and lookup table
same database we can use source exists in different database then we
qualifier. need to use lookup.

What is the difference between source qualifier and Joiner?

Page 69 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Source Qualifier Joiner

We use source qualifier to join the We use joiner to join the tables if
tables if tables are in the same tables are in the different database
database

In source qualifier we can use any Where as in joiner we can’t use other
type of join between two tables. than 4 types of joins.

We can join N number of sources in a Where as in joiner we can join only 2

single source qualifier using sq sources using 1 joiner, to join N
override sources we need N-1 joiners.

Difference between Stop and Abort?

Stoped:

You choose to stop the workflow or task in the Workflow Monitor or through
pmcmd. The Integration Service stops processing the task and all other tasks in
its path. The Integration Service continues running concurrent tasks like
backend store procedures.s

Abort:

You choose to abort the workflow or task in the Workflow Monitor or through
pmcmd. The Integration Service kills the DTM process and aborts the task.

2nd Approach

Use Mod() function in routers based on Seq.Next values we can route the data
into multiple targets.

Have you done any Performance tuning in informatica?

1) Yes, One of my mapping was taking 3-4 hours to process 40 millions rows
into staging table we don’t have any transformation inside the mapping

Page 70 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
its 1 to 1 mapping .Here nothing is there to optimize the mapping so I
created session partitions using key range on effective date column. It
improved performance lot, rather than 4 hours it was running in 30
minutes for entire 40millions.Using partitions DTM will creates multiple
reader and writer threads.

2) There was one more scenario where I got very good performance in the
mapping level .Rather than using lookup transformation if we can able to
do outer join in the source qualifier query override this will give you good
performance if both lookup table and source were in the same database.
If lookup tables is huge volumes then creating cache is costly.

3) And also if we can able to optimize mapping using less no of

transformations always gives you good performance.

4) If any mapping taking long time to execute then first we need to look in to
source and target statistics in the monitor for the throughput and also
find out where exactly the bottle neck by looking busy percentage in the
session log will come to know which transformation taking more time ,if
your source query is the bottle neck then it will show in the end of the
session log as “query issued to database “that means there is a
performance issue in the source query.we need to tune the query using .

Informatica Session Log shows busy percentage

If we look into session logs it shows busy percentage based on that we need to
find out where is bottle neck.

***** RUN INFO FOR TGT LOAD ORDER GROUP [1], CONCURRENT SET [1] ****

Thread [READER_1_1_1] created for [the read stage] of partition point

[SQ_ACW_PCBA_APPROVAL_STG] has completed: Total Run Time = [7.193083]
secs, Total Idle Time = [0.000000] secs, Busy Percentage = [100.000000]

Thread [TRANSF_1_1_1] created for [the transformation stage] of partition

point [SQ_ACW_PCBA_APPROVAL_STG] has completed. The total run time was
insufficient for any meaningful statistics.

Thread [WRITER_1_*_1] created for [the write stage] of partition point

Page 71 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
[ACW_PCBA_APPROVAL_F1, ACW_PCBA_APPROVAL_F] has completed: Total
Run Time = [0.806521] secs, Total Idle Time = [0.000000] secs, Busy Percentage
= [100.000000]

If suppose I've to load 40 lacs records in the target table and the workflow
is taking about 10 - 11 hours to finish. I've already increased
the cache size to 128MB.
There are no joiner, just lookups
and expression transformations

Ans:

(1) If the lookups have many records, try creating indexes

on the columns used in the lkp condition. And try
increasing the lookup cache.If this doesnt increase
the performance. If the target has any indexes disable
them in the target pre load and enable them in the
target post load.

(2) Three things you can do w.r.t it.

1. Increase the Commit intervals ( by default its 10000)

2. Use bulk mode instead of normal mode incase ur target doesn't have
primary keys or use pre and post session SQL to
implement the same (depending on the business req.)
3. Uses Key partitionning to load the data faster.

(3)If your target consists key constraints and indexes u slow

the loading of data. To improve the session performance in

this case drop constraints and indexes before you run the

session and rebuild them after completion of session.

Page 72 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
What is Constraint based loading in informatica?

By setting Constraint Based Loading property at session level in Configaration

tab we can load the data into parent and child relational tables (primary
foreign key).

Genarally What it do is it will load the data first in parent table then it will load
it in to child table.

What is use of Shortcuts in informatica?

If we copy source definaltions or target definations or mapplets from Shared

folder to any other folders that will become a shortcut.

Let’s assume we have imported some source and target definitions in a shared
folder after that we are using those sources and target definitions in another
folders as a shortcut in some mappings.

If any modifications occur in the backend (Database) structure like adding new
columns or drop existing columns either in source or target I f we reimport into
shared folder those new changes automatically it would reflect in all
folder/mappings wherever we used those sources or target definitions.

Target Update Override

If we don’t have primary key on target table using Target Update Override
option we can perform updates.By default, the Integration Service updates
target tables based on key values. However, you can override the default
UPDATE statement for each target in a mapping. You might want to update the
target based on non-key columns.

Overriding the WHERE Clause

You can override the WHERE clause to include non-key columns. For example,
you might want to update records for employees named Mike Smith only. To
do this, you edit the WHERE clause as follows:

UPDATE T_SALES SET DATE_SHIPPED =:TU.DATE_SHIPPED,

TOTAL_SALES = :TU.TOTAL_SALES WHERE EMP_NAME = :TU.EMP_NAME and
EMP_NAME = 'MIKE SMITH'
Page 73 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
If you modify the UPDATE portion of the statement, be sure to use :TU to
specify ports.

How do you perform incremental logic or Delta or CDC?

Incremental means suppose today we processed 100 records ,for tomorrow

run u need to extract whatever the records inserted newly and updated after
previous run based on last updated timestamp (Yesterday run) this process
called as incremental or delta.

Approach_1: Using set max var ()

1) First need to create mapping var ($$Pre_sess_max_upd)and assign initial

value as old date (01/01/1940).

2) Then override source qualifier query to fetch only LAT_UPD_DATE >=$

$Pre_sess_max_upd (Mapping var)

3) In the expression assign max last_upd_date value to $

$Pre_sess_max_upd(mapping var) using set max var

4) Because its var so it stores the max last upd_date value in the
repository, in the next run our source qualifier query will fetch only the
records updated or inseted after previous run.

Page 74 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Logic in the mapping variable is

Page 75 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Logic in the SQ is

Page 76 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
In expression assign max last update date value to the variable using function
set max variable.

Page 77 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Page 78 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Logic in the update strategy is below

Page 79 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Approach_2: Using parameter file

1 First need to create mapping parameter ($

$Pre_sess_start_tmst )and assign initial value as old date
(01/01/1940) in the parameterfile.

2 Then override source qualifier query to fetch only LAT_UPD_DATE

>=$$Pre_sess_start_tmst (Mapping var)

3 Update mapping parameter($$Pre_sess_start_tmst) values in the

parameter file using shell script or another mapping after first session
get completed successfully

4 Because its mapping parameter so every time we need to update

the value in the parameter file after comptetion of main session.

Below is the parameterfile format Parameter file:

Page 80 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
[GEHC_APO_DEV.WF:w_GEHC_APO_WEEKLY_HIST_LOAD.WT:wl_GEHC_APO_
WEEKLY_HIST_BAAN.ST:s_m_GEHC_APO_BAAN_SALES_HIST_AUSTRI]

$DBConnection_Source=DMD2_GEMS_ETL

$DBConnection_Target=DMD2_GEMS_ETL

$$LastUpdateDate Time =01/01/1940

Updating parameter File

Logic in the expression

Page 81 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Main mapping

Page 82 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Sql override in SQ Transformation

Page 83 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Workflod Design

Approach_3: Using oracle Control tables

1 First we need to create two control tables cont_tbl_1 and cont_tbl_1

with structure of session_st_time,wf_name

2 Then insert one record in each table with session_st_time=1/1/1940

and workflow_name

3 create two store procedures one for update cont_tbl_1 with session
st_time, set property of store procedure type as Source_pre_load .

4 In 2nd store procedure set property of store procedure type as

Target _Post_load.this proc will update the session _st_time in
Cont_tbl_2 from cnt_tbl_1.

5 Then override source qualifier query to fetch only LAT_UPD_DATE

>=(Select session_st_time from cont_tbl_2 where workflow
name=’Actual work flow name’.
Page 84 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
SCD Type-II Effective-Date Approach

 We have one of the dimension in current project called resource

dimension. Here we are maintaining the history to keep track of SCD
changes.

 To maintain the history in slowly changing dimension or resource

dimension. We followed SCD Type-II Effective-Date approach.

 My resource dimension structure would be eff-start-date, eff-end-date,

s.k and source columns.

 Whenever I do a insert into dimension I would populate eff-start-date

with sysdate, eff-end-date with future date and s.k as a sequence
number.

 If the record already present in my dimension but there is change in the

source data. In that case what I need to do is

 Update the previous record eff-end-date with sysdate and insert as a new
record with source data.

Informatica design to implement SDC Type-II effective-date approach

 Once you fetch the record from source qualifier. We will send it to lookup
to find out whether the record is present in the target or not based on
source primary key column.

 Once we find the match in the lookup we are taking SCD column from
lookup and source columns from SQ to expression transformation.

 In lookup transformation we need to override the lookup override query

to fetch Active records from the dimension while building the cache.

 In expression transformation I can compare source with lookup return

data.

 If the source and target data is same then I can make a flag as ‘S’.

 If the source and target data is different then I can make a flag as ‘U’.

Page 85 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
 If source data does not exists in the target that means lookup returns null
value. I can flag it as ‘I’.

 Based on the flag values in router I can route the data into insert and
update flow.

 If flag=’I’ or ‘U’ I will pass it to insert flow.

 If flag=’U’ I will pass this record to eff-date update flow

 When we do insert we are passing the sequence value to s.k.

 Whenever we do update we are updating the eff-end-date column based

on lookup return s.k value.

Complex Mapping

 We have one of the order file requirement. Requirement is every day in

source system they will place filename with timestamp in informatica
server.

 We have to process the same date file through informatica.

 Source file directory contain older than 30 days files with timestamps.

 For this requirement if I hardcode the timestamp for source file name it
will process the same file every day.

 So what I did here is I created $InputFilename for source file name.

 Then I am going to use the parameter file to supply the values to session
variables ($InputFilename).

 To update this parameter file I have created one more mapping.

 This mapping will update the parameter file with appended timestamp to
file name.

 I make sure to run this parameter file update mapping before my actual
mapping.

Page 86 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
How to handle errors in informatica?

 We have one of the source with numerator and denominator values we

need to calculate num/deno

 When populating to target.

 If deno=0 I should not load this record into target table.

 We need to send those records to flat file after completion of 1st session
run. Shell script will check the file size.

 If the file size is greater than zero then it will send email notification to
source system POC (point of contact) along with deno zero record file and
appropriate email subject and body.

 If file size<=0 that means there is no records in flat file. In this case shell
script will not send any email notification.

 Or

 We are expecting a not null value for one of the source column.

 If it is null that means it is a error record.

 We can use the above approach for error handling.

Why we need source qualifier?

Simply it performs select statement.

Select statement fetches the data in the form of row.

Source qualifier will select the data from the source table.

It identifies the record from the source.

Parameter file it will supply the values to session level variables and mapping
level variables.

Variables are of two types:

Page 87 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
 Session level variables

 Mapping level variables

Session level variables are of four types:

 $DBConnection_Source

 $DBConnection_Target

 $InputFile

 $OutputFile

Mapping level variables are of two types:

 Variable

 Parameter

What is the difference between mapping level and session level variables?

Mapping level variables always starts with $$.

A session level variable always starts with $.

Flat File

Flat file is a collection of data in a file in the specific format.

Informatica can support two types of files

 Delimiter

 Fixed Width

In delimiter we need to specify the separator.

In fixed width we need to known about the format first. Means how many
Page 88 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
character to read for particular column.

In delimiter also it is necessary to know about the structure of the delimiter.

Because to know about the headers.

If the file contains the header then in definition we need to skip the first row.

List file:

If you want to process multiple files with same structure. We don’t need
multiple mapping and multiple sessions.

We can use one mapping one session using list file option.

First we need to create the list file for all the files. Then we can use this file in
the main mapping.

Parameter file Format:

It is a text file below is the format for parameter file. We use to place this file in
the unix box where we have installed our informatic server.

[GEHC_APO_DEV.WF:w_GEHC_APO_WEEKLY_HIST_LOAD.WT:wl_GEHC_APO_
WEEKLY_HIST_BAAN.ST:s_m_GEHC_APO_BAAN_SALES_HIST_AUSTRI]

$InputFileName_BAAN_SALE_HIST=/interface/dev/etl/apo/srcfiles/
HS_025_20070921

$DBConnection_Target=DMD2_GEMS_ETL

$$CountryCode=AT

$$CustomerNumber=120165

[GEHC_APO_DEV.WF:w_GEHC_APO_WEEKLY_HIST_LOAD.WT:wl_GEHC_APO_
WEEKLY_HIST_BAAN.ST:s_m_GEHC_APO_BAAN_SALES_HIST_BELUM]

$DBConnection_Sourcet=DEVL1C1_GEMS_ETL

$OutputFileName_BAAN_SALES=/interface/dev/etl/apo/trgfiles/
Page 89 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
HS_002_20070921

$$CountryCode=BE

$$CustomerNumber=101495

Difference between 7.x and 8.x

Power Center 7.X Architecture.

Page 90 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Power Center 8.X Architecture.

Page 91 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Page 92 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Developer Changes:

For example, in PowerCenter:

• PowerCenter Server has become a service, the Integration Service

• No more Repository Server, but PowerCenter includes a Repository

Service

• Client applications are the same, but work on top of the new services
framework

Below are the difference between 7.1 and 8.1 of infa..

1) powercenter connect for sap netweaver bw option

2) sql transformation is added

3) service oriented architecture

Page 93 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
4) grid concept is additional feature

5) random file name can genaratation in target

6) command line programms: infacmd and infasetup new commands were

added.

7) java transformation is added feature

8) concurrent cache creation and faster index building are additional feature
in lookup transformation

9) caches or automatic u dont need to allocate at transformation level

10) push down optimization techniques,some

11) we can append data into the flat file target.

12)Dynamic file names we can generate in informatica 8

13)flat file names we can populate to target while processing through list file .

14)For Falt files header and footer we can populate using advanced options in
8 at session level.

15) GRID option at session level

Effective in version 8.0, you create and configure a grid in the Administration
Console. You configure a grid to run on multiple nodes, and you configure one
Integration Service to run on the grid. The Integration Service runs processes
on the nodes in the grid to distribute workflows and sessions. In addition to
running a workflow on a grid, you can now run a session on a grid. When you
run a session or workflow on a grid, one service process runs on each available
node in the grid.

Page 94 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Pictorial Representation of Workflow execution:

1. A PowerCenter Client request IS to start workflow

2. IS starts ISP

3. ISP consults LB to select node

4. ISP starts DTM in node selected by LB

Integration Service (IS)

The key functions of IS are

 Interpretation of the workflow and mapping metadata from the

repository.

 Execution of the instructions in the metadata

 Manages the data from source system to target system within the
memory and disk

The main three components of Integration Service which enable data

movement are,

 Integration Service Process

 Load Balancer

Page 95 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
 Data Transformation Manager

Integration Service Process (ISP)

The Integration Service starts one or more Integration Service processes to run
and monitor workflows. When we run a workflow, the ISP starts and locks the
workflow, runs the workflow tasks, and starts the process to run sessions. The
functions of the Integration Service Process are,

 Locks and reads the workflow

 Manages workflow scheduling, ie, maintains session dependency

 Reads the workflow parameter file

 Creates the workflow log

 Runs workflow tasks and evaluates the conditional links

 Starts the DTM process to run the session

 Writes historical run information to the repository

 Sends post-session emails

Load Balancer

The Load Balancer dispatches tasks to achieve optimal performance. It

dispatches tasks to a single node or across the nodes in a grid after performing
a sequence of steps. Before understanding these steps we have to know about
Resources, Resource Provision Thresholds, Dispatch mode and Service levels

 Resources – we can configure the Integration Service to check the

resources available on each node and match them with the resources
required to run the task. For example, if a session uses an SAP source, the
Load Balancer dispatches the session only to nodes where the SAP client
is installed

 Three Resource Provision Thresholds, The maximum number of runnable

threads waiting for CPU resources on the node called Maximum CPU Run
Queue Length. The maximum percentage of virtual memory allocated on
Page 96 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
the node relative to the total physical memory size called Maximum
Memory %. The maximum number of running Session and Command
tasks allowed for each Integration Service process running on the node
called Maximum Processes

 Three Dispatch mode’s – Round-Robin: The Load Balancer dispatches

tasks to available nodes in a round-robin fashion after checking the
“Maximum Process” threshold. Metric-based: Checks all the three
resource provision thresholds and dispatches tasks in round robin fashion.
Adaptive: Checks all the three resource provision thresholds and also
ranks nodes according to current CPU availability

 Service Levels establishes priority among tasks that are waiting to be

dispatched, the three components of service levels are Name, Dispatch
Priority and Maximum dispatch wait time. “Maximum dispatch wait time”
is the amount of time a task can wait in queue and this ensures no task
waits forever

A .Dispatching Tasks on a node

1. The Load Balancer checks different resource provision thresholds on the

node depending on the Dispatch mode set. If dispatching the task causes
any threshold to be exceeded, the Load Balancer places the task in the
dispatch queue, and it dispatches the task later

2. The Load Balancer dispatches all tasks to the node that runs the master
Integration Service process

B. Dispatching Tasks on a grid,

1. The Load Balancer verifies which nodes are currently running and enabled

2. The Load Balancer identifies nodes that have the PowerCenter resources
required by the tasks in the workflow

3. The Load Balancer verifies that the resource provision thresholds on each
candidate node are not exceeded. If dispatching the task causes a
threshold to be exceeded, the Load Balancer places the task in the
dispatch queue, and it dispatches the task later

Page 97 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
4. The Load Balancer selects a node based on the dispatch mode

Data Transformation Manager (DTM) Process

When the workflow reaches a session, the Integration Service Process starts
the DTM process. The DTM is the process associated with the session task. The
DTM process performs the following tasks:

 Retrieves and validates session information from the repository.

 Validates source and target code pages.

 Verifies connection object permissions.

 Performs pushdown optimization when the session is configured for

pushdown optimization.

 Adds partitions to the session when the session is configured for dynamic
partitioning.

 Expands the service process variables, session parameters, and mapping

variables and parameters.

 Creates the session log.

 Runs pre-session shell commands, stored procedures, and SQL.

 Sends a request to start worker DTM processes on other nodes when the
session is configured to run on a grid.

 Creates and runs mapping, reader, writer, and transformation threads to

extract, transform, and load data

 Runs post-session stored procedures, SQL, and shell commands and sends
post-session email

After the session is complete, reports execution result to ISP .

Page 98 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
4.2 Informatica Scenarios:

1) How to populate 1st record to 1st target ,2nd record to 2nd

target ,3rd record to 3rd target and 4th record to 1st target
through informatica?

We can do using sequence generator by setting end value=3 and enable cycle
option.then in the router take 3 goups

In 1st group specify condition as seq next value=1 pass those records to 1st
target simillarly

In 2nd group specify condition as seq next value=2 pass those records to 2nd
target

In 3rd group specify condition as seq next value=3 pass those records to 3rd
target.

Since we have enabled cycle option after reaching end value sequence
generator will start from 1,for the 4th record seq.next value is 1 so it will go to
1st target.

2) How to do Dymanic File generation in Informatica?

I want to generate the separate file for every State (as per state, it should
generate file).It has to generate 2 flat files and name of the flat file is
corresponding state name that is the requirement.

Below is my mapping.

Source (Table) -> SQ -> Target (FF)

Source:

State Transaction City

AP 2 HYD
Page 99 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
AP 1 TPT

KA 5 BANG

KA 7 MYSORE

KA 3 HUBLI

This functionality was added in informatica 8.5 onwards earlier versions it was
not there.

We can achieve it with use of transaction control and special "FileName" port
in the target file .

In order to generate the target file names from the mapping, we should make
use of the special "FileName" port in the target file. You can't create this
special port from the usual New port button. There is a special button with
label "F" on it to the right most corner of the target flat file when viewed in
"Target Designer".

When you have different sets of input data with different target files created,
use the same instance, but with a Transaction Control transformation which
defines the boundary for the source sets.

in target flat file there is option in column tab i.e filename as column.
when you click that one non editable column gets created in metadata of
target.

in transaction control give condition as

iif(flag=1,tc_commit_before,tc_continue_tranaction)

map the state column to target's filename column

ur mapping will be like this

source -> sq->expression-> transaction control-> target

run it ,separate files will be created by name of state

Page 100 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Implementation Procedure:

Dynamic file Generation:

a)Add filename column to this table in Target definition:

Double click on target definition and click on ports tab there we can find out on
right side one label ‘F’ property click on that label,automatically new
port(filename) shall be created

b)Mapping overview:

Total mapping design using different transformations

1.source qualifier transformation

Page 101 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
2.sorter transformation

3.Expression transformation

4.Transaction control transformation.

Page 102 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
c) ports&expression in Expression transformation

Page 103 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
d)condition in transaction control Transformation:

Page 104 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
e) linking between Transaction control and target ports:

At session level specify any name as a output file name with valid output
directory or path.

Page 105 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
3) How to concatenate row data through informatica?
Source:

Ename EmpNo

Stev 100

methew 100

John 101

Tom 101

Target:

Ename EmpNo

Stev 100
methew

John tom 101

Approach1: Using Dynamic Lookup on Target table:

If record doen’t exit do insert in target .If it is already exist then get
corresponding Ename vale from lookup and concat in expression with current
Ename value then update the target Ename column using update strategy.

Approch2: Using Var port :

Sort the data in sq based on EmpNo column then Use expression to store
previous record information using Var port after that use router to insert a
record if it is first time if it is already inserted then update Ename with concat
value of prev name and current name value then update in target.
Page 106 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Implementation Procedure:

2)Expression variable Approach:

a)Mapping overview:

Write a sql query in source qualifier transformation

Page 107 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
SELECT EMP.NO, EMP.NAME FROM EMP order by EMP.NO

b)Expression transformation ports:

Conditions:

V1=iif(NO != no_v,'i','u' )

Prename= iif(NO=no_v,concat(concat1,concat(',',NAME)),NAME)

Page 108 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
c)Router transformations conditions:

Page 109 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
4) How to generate Sequence numbers without Seq generator
transformation .

Solution:

We can use mapping variable and one variable port for increment
purpose in expression and assign increment value to Mapping
variable by using Setmaxvariable() function.

a)Mapping overview:

Create one mapping variable in Mapping Tab.

$$SEQ_NO

Assign initial value as ‘1’

b)In expression transformation:

Create 2 variable ports like below

Page 110 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
In 1st Variable port ( SEQ_NO_v ) expression:

SETMAXVARIABLE ($$SEQ_NO,INC_v)

In 2nd Variable port(INC_v) expression :

$$SEQ_NO + 1

Create one output port and assign first variable port(SEQ_NO_v) and link to
Target Surrogate key column.

1) How to send Unique (Distinct) records into One target and duplicates
into another tatget?

Source:

Ename EmpNo

stev 100

Stev 100
Page 111 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
john 101

Mathew 102

Output:

Target_1:

Ename EmpNo

Stev 100

John 101

Mathew 102

Target_2:

Ename EmpNo

Stev 100

Approch 1: Using Dynamic Lookup on Target table:

If record doen’t exit do insert in target_1 .If it is already exist then send it to
Target_2 using Router.

Approch2: Using Var port :

Sort the data in sq based on EmpNo column then Use expression to store
previous record information using Var ports after that use router to route the
data into targets if it is first time then sent it to first target if it is already
inserted then send it to Tartget_2.
Page 112 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
a. How to Process multiple flat files to single target table through
informatica if all files are same structure?

We can process all flat files through one mapping and one session using list file.

First we need to create list file using unix script for all flat file the extension of
the list file is .LST.

This list file it will have only flat file names.

At session level we need to set

source file directory as list file path

And source file name as list file name

And file type as indirect.

b. How to populate file name to target while loading multiple files

using list file concept.

In informatica 8.6 by selecting Add currently processed flatfile name option in

the properties tab of source definition after import source file defination in
source analyzer.It will add new column as currently processed file name.we
can map this column to target to populate filename.

c. If we want to run 2 workflow one after another(how to set the

dependence between wf’s)

 If both workflow exists in same folder we can create 2 worklet rather than
creating 2 workfolws.

 Finally we can call these 2 worklets in one workflow.

 There we can set the dependency.

 If both workflows exists in different folders or repository then we cannot

Page 113 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
create worklet.

 We can set the dependency between these two workflow using shell
script is one approach.

 The other approach is event wait and event rise.

If both workflow exists in different folrder or different rep then we can use
below approaches.

1) Using shell script

 As soon as first workflow get completes we are creating zero byte file
(indicator file).

 If indicator file is available in particular location. We will run second

workflow.

 If indicator file is not available we will wait for 5 minutes and again we will
check for the indicator. Like this we will continue the loop for 5 times i.e
30 minutes.

 After 30 minutes if the file does not exists we will send out email
notification.

2) Event wait and Event rise approach

We can put event wait before actual session run in the workflow to wait a
indicator file if file available then it will run the session other event wait it will
wait for infinite time till the indicator file is available.

d. How to load cumulative salary in to target ?

Solution:

Using var ports in expression we can load cumulative salary into target.

Page 114 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Page 115 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
SQL Transformation:
E. How to generate multiple records in target based on source column value.

Solution:

We can use SQL Transformation with Query mode by passing value to query
dynamically.
Source table

Name Adderss No

Kiran Tpt 2

Somu Kkd 3

Output of Target able

Name Adderss No

Kiran tpt 2

Kiran Tpt 2

Somu Kkd 3

Somu kkd 3

Mapping over view

Page 116 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Page 117 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Below is the query
SELECT NAME, ADDR,NUM FROM (SELECT NAME,ADDR,NUM FROM EMP A,EMP B WHERE NUM=?
NUM1?) WHERE ROWNUM <=?NUM1?

The above snapshot shows the logic of sql transformation.

Page 118 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
4.3 Development Guidelines
General Development Guidelines

The starting point of the development is the logical model created by the Data
Architect. This logical model forms the foundation for metadata, which will be
continuously be maintained throughout the Data Warehouse Development Life
Cycle (DWDLC). The logical model is formed from the requirements of the
project. At the completion of the logical model technical documentation
defining the sources, targets, requisite business rule transformations, mappings
and filters. This documentation serves as the basis for the creation of the
Extraction, Transformation and Loading tools to actually manipulate the data
from the applications sources into the Data Warehouse/Data Mart.

To start development on any data mart you should have the following things
set up by the Informatica Load Administrator

 Informatica Folder. The development team in consultation with the BI

Support Group can decide a three-letter code for the project, which
would be used to create the informatica folder as well as Unix
directory structure.
 Informatica Userids for the developers
 Unix directory structure for the data mart.
 A schema XXXLOAD on DWDEV database.

Transformation Specifications

Before developing the mappings you need to prepare the specifications

document for the mappings you need to develop. A good template is placed in
the templates folder You can use your own template as long as it has as much
detail or more than that which is in this template.
Page 119 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
While estimating the time required to develop mappings the thumb rule is as
follows.

 Simple Mapping – 1 Person Day

 Medium Complexity Mapping – 3 Person Days
 Complex Mapping – 5 Person Days.
Usually the mapping for the fact table is most complex and should be allotted
as much time for development as possible.

Data Loading from Flat Files

It’s an accepted best practice to always load a flat file into a staging table
before any transformations are done on the data in the flat file.

Always use LTRIM, RTRIM functions on string columns before loading data into
a stage table.

You can also use UPPER function on string columns but before using it you
need to ensure that the data is not case sensitive (e.g. ABC is different from
Abc)

If you are loading data from a delimited file then make sure the delimiter is not
a character which could appear in the data itself. Avoid using comma-
separated files. Tilde (~) is a good delimiter to use.

Failure Notification

Once in production your sessions and batches need to send out notification
when then fail to the Support team. You can do this by configuring email task
in the session level.

Naming Conventions and usage of Transformations

Port Standards:

Input Ports – It will be necessary to change the name of input ports for lookups,
expression and filters where ports might have the same name. If ports do have
the same name then will be defaulted to having a number after the name.
Change this default to a prefix of “in_”. This will allow you to keep track of
input ports through out your mappings.
Page 120 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Prefixed with: IN_

Variable Ports – Variable ports that are created within an expression

Transformation should be prefixed with a “v_”. This will allow the developer to
distinguish between input/output and variable ports. For more explanation of
Variable Ports see the section “VARIABLES”.
Prefixed with: V_

Output Ports – If organic data is created with a transformation that will be

mapped to the target, make sure that it has the same name as the target port
that it will be mapped to.

Prefixed with: O_

Quick Reference

Object Type Syntax

Folder XXX_<Data Mart Name>

Mapping m_fXY_ZZZ_<Target Table

Name>_x.x

Session s_fXY_ZZZ_<Target Table

Name>_x.x

Batch b_<Meaningful name representing

the sessions inside>

Source Definition <Source Table Name>

Target Definition <Target Table Name>

Page 121 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Aggregator AGG_<Purpose>

Expression EXP_<Purpose>

Filter FLT_<Purpose>

Joiner JNR_<Names of Joined Tables>

Lookup LKP_<Lookup Table Name>

Normalizer Norm_<Source Name>

Rank RNK_<Purpose>

Router RTR_<Purpose>

Sequence Generator SEQ_<Target Column Name>

Source Qualifier SQ_<Source Table Name>

Stored Procedure STP_<Database Name>_<Procedure

Name>

Update Strategy UPD_<Target Table Name>_xxx

Mapplet MPP_<Purpose>

Input Transformation INP_<Description of Data being

funneled in>

Output Tranformation OUT_<Description of Data being

funneled out>

Database Connections XXX_<Database Name>_<Schema

Name>

4.4 Performance Tips

What is Performance tuning in Informatica

The aim of performance tuning is optimize session

Page 122 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
performance so sessions run during the available load window

for the Informatica Server.

Increase the session performance by following.

The performance of the Informatica Server is related to

network connections. Data generally moves across a network

at less than 1 MB per second, whereas a local disk moves

data five to twenty times faster. Thus network connections

ofteny affect on session performance. So avoid work

connections.

1. Cache lookups if source table is under 500,000 rows and DON’T cache for
tables over 500,000 rows.

2. Reduce the number of transformations. Don’t use an Expression

Transformation to collect fields. Don’t use an Update Transformation if
only inserting. Insert mode is the default.

3. If a value is used in multiple ports, calculate the value once (in a variable)
and reuse the result instead of recalculating it for multiple ports.

4. Reuse objects where possible.

5. Delete unused ports particularly in the Source Qualifier and Lookups.

6. Use Operators in expressions over the use of functions.

7. Avoid using Stored Procedures, and call them only once during the
mapping if possible.

8. Remember to turn off Verbose logging after you have finished debugging.
Page 123 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
9. Use default values where possible instead of using IIF (ISNULL(X),,) in
Expression port.

10.When overriding the Lookup SQL, always ensure to put a valid Order By
statement in the SQL. This will cause the database to perform the order
rather than Informatica Server while building the Cache.

11.Improve session performance by using sorted data with the Joiner

transformation. When the Joiner transformation is configured to use
sorted data, the Informatica Server improves performance by minimizing
disk input and output.

12.Improve session performance by using sorted input with the Aggregator

Transformation since it reduces the amount of data cached during the
session.

13.Improve session performance by using limited number of connected

input/output or output ports to reduce the amount of data the
Aggregator transformation stores in the data cache.

14. Use a Filter transformation prior to Aggregator transformation to reduce

unnecessary aggregation.

15. Performing a join in a database is faster than performing join in the

session. Also use the Source Qualifier to perform the join.

16.Define the source with less number of rows and master source in Joiner
Transformations, since this reduces the search time and also the cache.

17.When using multiple conditions in a lookup conditions, specify the

conditions with the equality operator first.

18.Improve session performance by caching small lookup tables.

19.If the lookup table is on the same database as the source table, instead of
using a Lookup transformation, join the tables in the Source Qualifier
Transformation itself if possible.

20.If the lookup table does not change between sessions, configure the
Lookup transformation to use a persistent lookup cache. The Informatica
Page 124 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Server saves and reuses cache files from session to session, eliminating
the time required to read the lookup table.

21.Use :LKP reference qualifier in expressions only when calling

unconnected Lookup Transformations.

22.Informatica Server generates an ORDER BY statement for a cached lookup

that contains all lookup ports. By providing an override ORDER BY clause
with fewer columns, session performance can be improved.

23.Eliminate unnecessary data type conversions from mappings.

24.Reduce the number of rows being cached by using the Lookup SQL
Override option to add a WHERE clause to the default SQL statement.

4.5 Unit Test Cases (UTP):

QA Life Cycle consists of 5 types of

Testing regimens:

1. Unit Testing

2. Functional Testing

3. System Integration Testing

4. User Acceptance Testing

Unit testing: The testing, by development, of the application modules to verify

each unit (module) itself meets the accepted user requirements and design and
development standards

Functional Testing: The testing of all the application’s modules individually to

ensure the modules, as released from development to QA, work together as
designed and meet the accepted user requirements and system standards

System Integration Testing: Testing of all of the application modules in the

same environment, database instance, network and inter-related applications,
Page 125 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
as it would function in production. This includes security, volume and stress
testing.

User Acceptance Testing(UAT): The testing of the entire application by the

end-users ensuring the application functions as set forth in the system
requirements documents and that the system meets the business needs.

UTP Template:
Actual Pass Tested
Results, or Fail By

(P or
Step Description Test Conditions Expected Results F)

SAP-
CMS
Interf
aces

1 Check for the SOURCE: Both the source and target Should be Pass Stev
total count table load record count same as the
of records in SELECT count(*) FROM should match. expected
source tables XST_PRCHG_STG
that is
fetched and
the total TARGET:
records in
the PRCHG Select count(*) from
table for a _PRCHG
perticular
session
timestamp

Page 126 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Actual Pass Tested
Results, or Fail By

(P or
Step Description Test Conditions Expected Results F)

2 Check for all select PRCHG_ID, Both the source and target Should be Pass Stev
the target table record values should same as the
columns PRCHG_DESC, return zero records expected
whether
they are DEPT_NBR,
getting
populated EVNT_CTG_CDE,
correctly
with source PRCHG_TYP_CDE,
data.
PRCHG_ST_CDE,

from T_PRCHG

MINUS

select PRCHG_ID,

PRCHG_DESC,

DEPT_NBR,

EVNT_CTG_CDE,

PRCHG_TYP_CDE,

PRCHG_ST_CDE,

from PRCHG

3 Check for Identify a one record from It should insert a record into Should be Pass Stev
Insert the source which is not in target table with source data same as the
strategy to target table. Then run the expected
load records session
into target
table.

Page 127 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Actual Pass Tested
Results, or Fail By

(P or
Step Description Test Conditions Expected Results F)

4 Check for Identify a one Record It should update record into Should be Pass Stev
Update from the source which is target table with source data same as the
strategy to already present in the for that existing record expected
load records target table with different
into target PRCHG_ST_CDE or
table. PRCHG_TYP_CDE values
Then run the session

5 UNIX

How strong you are in UNIX?

1) I have Unix shell scripting knowledge whatever informatica required like

If we want to run workflows in Unix using PMCMD.

Below is the script to run workflow using Unix.

cd /pmar/informatica/pc/pmserver/

/pmar/informatica/pc/pmserver/pmcmd startworkflow -u $INFA_USER -p

$INFA_PASSWD -s $INFA_SERVER:$INFA_PORT -f $INFA_FOLDER -wait $1 >>
$LOG_PATH/$LOG_FILE

2) And if we suppose to process flat files using informatica but those files were
exists in remote server then we have to write script to get ftp into informatica
server before start process those files.

3) And also file watch mean that if indicator file available in the specified
location then we need to start our informatica jobs otherwise will send email
notification using
Page 128 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Mail X command saying that previous jobs didn’t completed successfully
something like that.

4) Using shell script update parameter file with session start time and end time.

This kind of scripting knowledge I do have. If any new UNIX requirement comes
then I can Google and get the solution implement the same.

Basic Commands:

Cat file1 (cat is the command to create none zero byte file)
cat file1 file2 > all -----it will combined (it will create file if it doesn’t exit)
cat file1 >> file2---it will append to file 2

o > will redirect output from standard out (screen) to file or printer or
whatever you like.

o >> Filename will append at the end of a file called filename.

o < will redirect input to a process or command.

How to create zero byte file?

Touch filename (touch is the command to create zero byte file)

how to find all processes that are running

ps -A

Crontab command.
Crontab command is used to schedule jobs. You must have permission to run
this command by Unix Administrator. Jobs are scheduled in five numbers, as
follows.

Minutes (0-59) Hour (0-23) Day of month (1-31) month (1-12) Day of week (0-
6) (0 is Sunday)

so for example you want to schedule a job which runs from script named
backup jobs in /usr/local/bin directory on sunday (day 0) at 11.25 (22:25) on
Page 129 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
15th of month. The entry in crontab file will be. * represents all values.

25 22 15 * 0 /usr/local/bin/backup_jobs

The * here tells system to run this each month.

Syntax is
crontab file So a create a file with the scheduled jobs as above and then type
crontab filename .This will scheduled the jobs.

Below cmd gives total no of users logged in at this time.

who | wc -l

echo "are total number of people logged in at this time."

Below cmd will display only directories

$ ls -l | grep '^d'

Pipes:

The pipe symbol "|" is used to direct the output of one command to the input

of another.

Moving, renaming, and copying files:

Cp file1 file2 copy a file

mv file1 newname move or rename a file

mv file1 ~/AAA/ move file1 into sub-directory AAA in your home

directory.

rm file1 [file2 ...] remove or delete a file

Page 130 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
To display hidden files

ls –a

Viewing and editing files:

cat filename Dump a file to the screen in ascii.

More file name to view the file content

head filename Show the first few lines of a file.

head -5 filename Show the first 5 lines of a file.

tail filename Show the last few lines of a file.

Tail -7 filename Show the last 7 lines of a file.

Searching for files :

find command

find -name aaa.txt Finds all the files named aaa.txt in the current directory
or

any subdirectory tree.

Sed (The usual sed command for global string search and replace is this)

If you want to replace 'foo' with the string 'bar' globally in a file.

$ sed -e 's/foo/bar/g' myfile.txt

find / -name vimrc Find all the files named 'vimrc' anywhere on the system.

find /usr/local/games -name "xpilot"

Find all files whose names contain the string 'xpilot' which

Page 131 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
exist within the '/usr/local/games' directory tree.

You can find out what shell you are using by the command:

echo $SHELL

If file exists then send email with attachment.

if [[ -f $your_file ]]; then

uuencode $your_file $your_file|mailx -s "$your_file exists..."
your_email_address
fi

Below line is the first line of the script

#!/usr/bin/sh

#!/bin/ksh

What does #! /bin/sh mean in a shell script?

It actually tells the script to which interpreter to refer. As you know, bash shell
has some specific functions that other shell does not have and vice-versa. Same
way is for perl, python and other languages.

It's to tell your shell what shell to you in executing the following statements in
your shell script.

Interactive History

A feature of bash and tcsh (and sometimes others) you can use

the up-arrow keys to access your previous commands, edit

them, and re-execute them.

Page 132 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
Basics of the vi editor

Opening a file

Vi filename

Creating text

Edit modes: These keys enter editing modes and type in the text

of your document.

i Insert before current cursor position

I Insert at beginning of current line

a Insert (append) after current cursor position

A Append to end of line

r Replace 1 character

R Replace mode

<ESC> Terminate insertion or overwrite mode

Deletion of text

x Delete single character

dd Delete current line and put in buffer

:w Write the current file.

:w new.file Write the file to the name 'new.file'.

:w! existing.file Overwrite an existing file with the file currently being edited.

:wq Write the file and quit.

:q Quit.

:q! Quit with no changes.

Page 133 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com
You Have Successfully
Completed Data
Warehousing Training.

Best Of Luck.

Page 134 of 134

PSK Real-Time Training Institute, Kasthuri Nagar, Nr.Tin Factory-Bangalore

Contact No:9739096158

www.pskinfo.com

SQL Handbook
No ratings yet
SQL Handbook
127 pages
Answer Key Advanced EDB Postgres v15
No ratings yet
Answer Key Advanced EDB Postgres v15
13 pages
Primers - RDBMS My SQL
No ratings yet
Primers - RDBMS My SQL
105 pages
1z0 047 Oracle Database SQL Expert
No ratings yet
1z0 047 Oracle Database SQL Expert
270 pages
Chapter 7 - Structured Query Language
No ratings yet
Chapter 7 - Structured Query Language
55 pages
Dbms Merged Notes
No ratings yet
Dbms Merged Notes
70 pages
Dbms Notes
No ratings yet
Dbms Notes
83 pages
Real Time Scenario
100% (3)
Real Time Scenario
206 pages
SQLSanta 2000 - 03
No ratings yet
SQLSanta 2000 - 03
93 pages
Informatica Interview Part 1
100% (1)
Informatica Interview Part 1
111 pages
Informatica
100% (2)
Informatica
191 pages
Riza Washiq SQL
No ratings yet
Riza Washiq SQL
127 pages
Advanced SQL I I I
No ratings yet
Advanced SQL I I I
24 pages
Sqlday 21
No ratings yet
Sqlday 21
14 pages
Informatica Handbook
No ratings yet
Informatica Handbook
182 pages
Oracle 10g SQL
100% (6)
Oracle 10g SQL
19 pages
Oracle SQL & PL/SQL Training: Click To Edit Master Subtitle Style
No ratings yet
Oracle SQL & PL/SQL Training: Click To Edit Master Subtitle Style
99 pages
Slides
No ratings yet
Slides
24 pages
DBMS (R23) Lab Manual - Final
No ratings yet
DBMS (R23) Lab Manual - Final
55 pages
SQL Narayana Reddy
100% (1)
SQL Narayana Reddy
124 pages
Structured Query Language: Chandra S. Amaravadi
No ratings yet
Structured Query Language: Chandra S. Amaravadi
49 pages
ADBMS ppt2
No ratings yet
ADBMS ppt2
158 pages
Data Analyst Roadmap
No ratings yet
Data Analyst Roadmap
16 pages
Dbms Unit3
No ratings yet
Dbms Unit3
61 pages
III BSC Paper III Dbms
No ratings yet
III BSC Paper III Dbms
14 pages
Informatica Material
No ratings yet
Informatica Material
115 pages
Sathyabama University Department of Management Studies Oracle Lab Lab Manual Lab Programs
No ratings yet
Sathyabama University Department of Management Studies Oracle Lab Lab Manual Lab Programs
26 pages
DMS Chapter3 FULL
No ratings yet
DMS Chapter3 FULL
111 pages
Database2024-Not para
No ratings yet
Database2024-Not para
95 pages
Oracle Interview Questions and Answers
100% (2)
Oracle Interview Questions and Answers
23 pages
Oracle 12C - (SQL & PL/SQL) : 1.fundamentals of Database
No ratings yet
Oracle 12C - (SQL & PL/SQL) : 1.fundamentals of Database
8 pages
DBMS Cs-It
No ratings yet
DBMS Cs-It
28 pages
SQL Notes
No ratings yet
SQL Notes
79 pages
SQL Basics and Command
No ratings yet
SQL Basics and Command
9 pages
Database Basics
No ratings yet
Database Basics
7 pages
Unit 2 DBM
No ratings yet
Unit 2 DBM
44 pages
PLSQL Notes Modified
No ratings yet
PLSQL Notes Modified
124 pages
SQL Basics
No ratings yet
SQL Basics
74 pages
DBMS Important Questions and Answers
No ratings yet
DBMS Important Questions and Answers
12 pages
Working With Composite Data Types: L/O/G/O
No ratings yet
Working With Composite Data Types: L/O/G/O
67 pages
Database Concept
No ratings yet
Database Concept
9 pages
Practical File
No ratings yet
Practical File
32 pages
SQL Update For Db2
No ratings yet
SQL Update For Db2
38 pages
Dbms Mcqs
No ratings yet
Dbms Mcqs
5 pages
DBMS Practical List
No ratings yet
DBMS Practical List
6 pages
2 SQL - PLSQL Material
No ratings yet
2 SQL - PLSQL Material
317 pages
Of PSK Real Time Training - 9739096158
No ratings yet
Of PSK Real Time Training - 9739096158
111 pages
DBMS Interview Q
No ratings yet
DBMS Interview Q
16 pages
K Rohith - PLSQL Dev - Conneqt
No ratings yet
K Rohith - PLSQL Dev - Conneqt
4 pages
Database Management Systems Lab Manual
No ratings yet
Database Management Systems Lab Manual
40 pages
DMS Unit 2 22319-1
No ratings yet
DMS Unit 2 22319-1
13 pages
Chapter '1' '2' '5' '6' '7' - Note !!
No ratings yet
Chapter '1' '2' '5' '6' '7' - Note !!
26 pages
Curriculum
No ratings yet
Curriculum
10 pages
LAb1 - 4 - BTech - Labmanual - DBS - 2022
No ratings yet
LAb1 - 4 - BTech - Labmanual - DBS - 2022
16 pages
DDL, DML, TCL Commands-1
No ratings yet
DDL, DML, TCL Commands-1
10 pages
COC Sample Practical Question For Database Administration Le
No ratings yet
COC Sample Practical Question For Database Administration Le
6 pages
How To Configure ODBC Connection For EXCEL: Informatica 7.x Vs 8.x An S
No ratings yet
How To Configure ODBC Connection For EXCEL: Informatica 7.x Vs 8.x An S
22 pages
Structural Flow
No ratings yet
Structural Flow
26 pages
SQL Structured Query Language
No ratings yet
SQL Structured Query Language
7 pages
Oracle SQL: Program Duration: 7 Days. Contents
No ratings yet
Oracle SQL: Program Duration: 7 Days. Contents
11 pages
Sybase Catalog 2012
No ratings yet
Sybase Catalog 2012
48 pages
Web Application Development 1 - Final Quiz 1
No ratings yet
Web Application Development 1 - Final Quiz 1
4 pages
Dev Guide
No ratings yet
Dev Guide
260 pages
Unit 1. Introduction To Big Data: False
No ratings yet
Unit 1. Introduction To Big Data: False
7 pages
B.SC (Computer Science) : III Year Theory Paper - Iii Subject: Computer Science Modern Database Management Systems
No ratings yet
B.SC (Computer Science) : III Year Theory Paper - Iii Subject: Computer Science Modern Database Management Systems
9 pages
Interview Questions - Who Attended From Batch - 12
No ratings yet
Interview Questions - Who Attended From Batch - 12
6 pages
SQL Database Interview Questions
No ratings yet
SQL Database Interview Questions
29 pages
Create Database Hospital
No ratings yet
Create Database Hospital
10 pages
SQL Basics & PL-SQL: Training Highlights
No ratings yet
SQL Basics & PL-SQL: Training Highlights
5 pages
Gate Project
No ratings yet
Gate Project
77 pages
Joins - 04 05 2023
No ratings yet
Joins - 04 05 2023
86 pages
Lookup
No ratings yet
Lookup
63 pages
RealTime Unit Testing Document
No ratings yet
RealTime Unit Testing Document
60 pages
Unit 4 Pig and Hive
No ratings yet
Unit 4 Pig and Hive
86 pages
Experiment 10
No ratings yet
Experiment 10
12 pages
Unit-03 DBMS Notes
No ratings yet
Unit-03 DBMS Notes
51 pages
Rdbms Relation PDF
No ratings yet
Rdbms Relation PDF
11 pages
IT 10 Class Shrijal
No ratings yet
IT 10 Class Shrijal
17 pages
Linux - V4.0
No ratings yet
Linux - V4.0
95 pages
DVS Technologies Aws & Devops: 1.availability Zones
No ratings yet
DVS Technologies Aws & Devops: 1.availability Zones
80 pages
Exam COMP1638 Ver1 CollabsTerm2 2020
No ratings yet
Exam COMP1638 Ver1 CollabsTerm2 2020
5 pages
DVS Technologies Aws & Devops: Verifying A/C Limits
No ratings yet
DVS Technologies Aws & Devops: Verifying A/C Limits
76 pages
SQL Practice1
No ratings yet
SQL Practice1
46 pages
12-05-2023
No ratings yet
12-05-2023
16 pages
Lookup Transformation
No ratings yet
Lookup Transformation
32 pages
Coffee Shop Sales Analysis Project
100% (1)
Coffee Shop Sales Analysis Project
3 pages
Jenkins Teaching Notes
No ratings yet
Jenkins Teaching Notes
18 pages
Connections and Mapping Document - 8
No ratings yet
Connections and Mapping Document - 8
28 pages
Session-23 Mapplets Reusable TR Worklets
No ratings yet
Session-23 Mapplets Reusable TR Worklets
20 pages
Tasks To Completed On Flat Files
No ratings yet
Tasks To Completed On Flat Files
18 pages
(English) Power BI Project For Beginners - Sales Insights Data Analysis Project
No ratings yet
(English) Power BI Project For Beginners - Sales Insights Data Analysis Project
20 pages
SESSION-9 - Mapping Document - Router - Transformation
No ratings yet
SESSION-9 - Mapping Document - Router - Transformation
16 pages
2024 I-Dbms
No ratings yet
2024 I-Dbms
1 page
Agg Transformation
No ratings yet
Agg Transformation
11 pages
Session-29 - Normalizer, Transaction Control Transformation
No ratings yet
Session-29 - Normalizer, Transaction Control Transformation
13 pages
DB Topic 2
No ratings yet
DB Topic 2
29 pages
JOINS
No ratings yet
JOINS
17 pages
Presentation14 Physical Database Design
No ratings yet
Presentation14 Physical Database Design
21 pages
ElasticFileSystem
No ratings yet
ElasticFileSystem
11 pages
Notes
No ratings yet
Notes
6 pages
SESSION-16 - Source Qualifier Transformation
No ratings yet
SESSION-16 - Source Qualifier Transformation
7 pages
Basic SOQL Query 1730815763
No ratings yet
Basic SOQL Query 1730815763
11 pages
Untitled Document
No ratings yet
Untitled Document
1 page
Lti 2ND Round
No ratings yet
Lti 2ND Round
2 pages
Project 04 Online Course
No ratings yet
Project 04 Online Course
3 pages
Class 1
No ratings yet
Class 1
2 pages
Workshop 06 - PHP and MySQL
No ratings yet
Workshop 06 - PHP and MySQL
5 pages
Cape Geminin
No ratings yet
Cape Geminin
1 page
Integrate Apex 4.0.2 With Oracle EBS R11i Using EPG
No ratings yet
Integrate Apex 4.0.2 With Oracle EBS R11i Using EPG
3 pages
Go Programming Essentials: From Zero to Production-Ready Applications
From Everand
Go Programming Essentials: From Zero to Production-Ready Applications
Marcus Hartwell
No ratings yet
Reverse Osmosis Principles, Design, and Operation for Water Treatment Professionals
From Everand
Reverse Osmosis Principles, Design, and Operation for Water Treatment Professionals
Ramven
No ratings yet
Python金融实战: Chinese Edition
From Everand
Python金融实战: Chinese Edition
Posts & Telecom Press
No ratings yet
SOLID .NET: Clean Code Principles Made Easy with Real Projects
From Everand
SOLID .NET: Clean Code Principles Made Easy with Real Projects
SAINISH
No ratings yet
Jetpack Compose 1.6 Essentials: Developing Android Apps with Jetpack Compose 1.6, Android Studio, and Kotlin
From Everand
Jetpack Compose 1.6 Essentials: Developing Android Apps with Jetpack Compose 1.6, Android Studio, and Kotlin
Neil Smyth
5/5 (1)
Python高级编程（第2版）: Chinese Edition
From Everand
Python高级编程（第2版）: Chinese Edition
Posts & Telecom Press
No ratings yet
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.