0% found this document useful (0 votes)

180 views

Data Warehousing Basics

Data warehousing involves integrating data from multiple sources into a central repository to support analysis and decision making. It has three key characteristics: it is subject-oriented, integrated, and time-variant. There are typically two main stages - ETL (extract, transform, load) and report generation. ETL moves data from operational systems into the data warehouse where it can then be analyzed using reporting tools. Common data warehouse architectures include enterprise data warehouses, data marts, and operational data stores. Dimensional modeling is often used, with common approaches being star schemas, snowflake schemas, and hybrid schemas.

Uploaded by

baba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

180 views

Data Warehousing Basics

Uploaded by

baba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 20

Data warehousing Basics

Definition of data warehousing?

Data warehouse is a Subject oriented, Integrated, Time variant, Non volatile collection of
data in support of management's decision making process.

Subject Oriented
Data warehouses are designed to help you analyze data. For example, to learn more about your
company's sales data, you can build a warehouse that concentrates on sales. Using this warehouse,
you can answer questions like "Who was our best customer for this item last year?" This ability to
define a data warehouse by subject matter, sales in this case makes the data warehouse subject
oriented.
Integrated
Integration is closely related to subject orientation. Data warehouses must put data from
disparate sources into a consistent format. They must resolve such problems as naming conflicts
and inconsistencies among units of measure. When they achieve this, they are said to be integrated.
Nonvolatile
Nonvolatile means that, once entered into the warehouse, data should not change. This is
logical because the purpose of a warehouse is to enable you to analyze what has occurred.
Time Variant
In order to discover trends in business, analysts need large amounts of data. This is very
much in contrast to online transaction processing (OLTP) systems, where performance
requirements demand that historical data be moved to an archive. A data warehouse's focus on
change over time is what is meant by the term time variant.
2. How many stages in Datawarehousing?
Data warehouse generally includes two stages
 ETL
 Report Generation
ETL
Short for extract, transform, load, three database functions that are combined into one tool
 Extract -- the process of reading data from a source database.
 Transform -- the process of converting the extracted data from its previous form into
required form
 Load -- the process of writing the data into the target database.

ETL is used to migrate data from one database to another, to form data marts and data
warehouses and also to convert databases from one format to another format.
It is used to retrieve the data from various operational databases and is transformed into useful
information and finally loaded into Data warehousing system.
1 INFORMATICA
2 ABINITO
3 DATASTAGE
4. BODI
5 ORACLE WAREHOUSE BUILDERS
Report generation
In report generation, OLAP is used (i.e.) online analytical processing. It is a set of specification
which allows the client applications in retrieving the data for analytical processing.
It is a specialized tool that sits between a database and user in order to provide various analyses of
the data stored in the database.
OLAP Tool is a reporting tool which generates the reports that are useful for Decision support for top
level management.

1. Business Objects
2. Cognos
3. Micro strategy
4. Hyperion
5. Oracle Express
6. Microsoft Analysis Services

3. What are the types of datawarehousing?

EDW (Enterprise datawarehousing)
 It provides a central database for decision support throughout the enterprise
 It is a collection of DATAMARTS
DATAMART
 It is a subset of Datawarehousing
 It is a subject oriented database which supports the needs of individuals depts. in an
organizations
 It is called high performance query structure
 It supports particular line of business like sales, marketing etc..
ODS (Operational data store)
 It is defined as an integrated view of operational database designed to support operational
monitoring
 It is a collection of operational data sources designed to support Transaction processing
 Data is refreshed near real-time and used for business activity
 It is an intermediate between the OLTP and OLAP which helps to create an instance reports
4. What are the modeling involved in Data Warehouse Architecture?

5. What are the types of Approach in DWH?

Bottom up approach: first we need to develop data mart then we integrate these data mart into
EDW
Top down approach: first we need to develop EDW then form that EDW we develop data mart
Bottom up
OLTP ETL Data mart DWH OLAP
Top down
OLTP ETL DWH Data mart OLAP
Top down
 Cost of initial planning & design is high
 Takes longer duration of more than an year
Bottom up
 Planning & Designing the Data Marts without waiting for the Global warehouse design
 Immediate results from the data marts
 Tends to take less time to implement
 Errors in critical modules are detected earlier.
 Benefits are realized in the early phases.
 It is a Best Approach
Data Modeling Types:
 Conceptual Data Modeling
 Logical Data Modeling
 Physical Data Modeling
 Dimensional Data Modeling
1. Conceptual Data Modeling
 Conceptual data model includes all major entities and relationships and does not contain
much detailed level of information about attributes and is often used in the INITIAL
PLANNING PHASE
 Conceptual data model is created by gathering business requirements from various sources
like business documents, discussion with functional teams, business analysts, smart
management experts and end users who do the reporting on the database. Data modelers
create conceptual data model and forward that model to functional team for their review.
 Conceptual data modeling gives an idea to the functional and technical team about
how business requirements would be projected in the logical data model.
2. Logical Data Modeling
 This is the actual implementation and extension of a conceptual data model. Logical
data model includes all required entities, attributes, key groups, and relationships
that represent business information and define business rules.
3. Physical Data Modeling
 Physical data model includes all required tables, columns, relationships, database
properties for the physical implementation of databases. Database performance, indexing
strategy, physical storage and demoralization are important parameters of a physical model.
Logical vs. Physical Data Modeling
Logical Data Model Physical Data Model
Represents business information and Represents the physical implementation of the
defines business rules model in a database.
Entity Table
Attribute Column
Primary Key Primary Key Constraint
Alternate Key Unique Constraint or Unique Index
Inversion Key Entry Non Unique Index
Rule Check Constraint, Default Value
Relationship Foreign Key

Definition Comment

Dimensional Data Modeling

 Dimension model consists of fact and dimension tables
 It is an approach to develop the schema DB designs
Types of Dimensional modeling
 Star schema
 Snow flake schema
 Star flake schema (or) Hybrid schema
 Multi star schema
What is Star Schema?
 The Star Schema Logical database design which contains a centrally located fact table
surrounded by at least one or more dimension tables
 Since the database design looks like a star, hence it is called star schema db
 The Dimension table contains Primary keys and the textual descriptions
 It contain de-normalized business information
 A Fact table contains a composite key and measures
 The measure are of types of key performance indicators which are used to evaluate the
enterprise performance in the form of success and failure
 Eg: Total revenue , Product sale , Discount given, no of customers
 To generate meaningful report the report should contain at least one dimension and one fact
table
The advantage of star schema
 Less number of joins
 Improve query performance
 Slicing down
 Easy understanding of data.
Disadvantage:
 Require more storage space

Example of Star Schema:

Snowflake Schema
 In star schema, If the dimension tables are spitted into one or more dimension tables
 The de-normalized dimension tables are spitted into a normalized dimension table
Example of Snowflake Schema:
 In Snowflake schema, the example diagram shown below has 4 dimension tables, 4
lookup tables and 1 fact table. The reason is that hierarchies (category, branch, state, and
month) are being broken out of the dimension tables (PRODUCT, ORGANIZATION,
LOCATION, and TIME) respectively and separately.

 It increases the number of joins and poor performance in retrieval of data.

 In few organizations, they try to normalize the dimension tables to save space.
 Since dimension tables hold less space snow flake schema approach may be
avoided.
 Bit map indexes cannot be effectively utilized

Important aspects of Star Schema & Snow Flake Schema

 In a star schema every dimension will have a primary key.
 In a star schema, a dimension table will not have any parent table.
 Whereas in a snow flake schema, a dimension table will have one or more parent tables.
 Hierarchies for the dimensions are stored in the dimensional table itself in star schema.
 Whereas hierarchies are broken into separate tables in snow flake schema. These
hierarchies help to drill down the data from topmost hierarchies to the lowermost
hierarchies.
Star flake schema (or) Hybrid Schema
 Hybrid schema is a combination of Star and Snowflake schema

Multi Star schema

 Multiple fact tables sharing a set of dimension tables
 Confirmed Dimensions are nothing but Reusable Dimensions.
 The dimensions which u r using multiple times or in multiple data marts.
 Those are common in different data marts
Measure Types (or) Types of Facts
 Additive - Measures that can be summed up across all dimensions.
o Ex: Sales Revenue
 Semi Additive - Measures that can be summed up across few dimensions and not
with others
o Ex: Current Balance
 Non Additive - Measures that cannot be summed up across any of the dimensions.
o Ex: Student attendance
Surrogate Key
 Joins between fact and dimension tables should be based on surrogate keys
 Users should not obtain any information by looking at these keys
 These keys should be simple integers

A sample data warehouse schema

WHY NEED STAGING AREA FOR DWH?
 Staging area needs to clean operational data before loading into data warehouse.
 Cleansing in the sense your merging data which comes from different source.
 It’s the area where most of the ETL is done
Data Cleansing
 It is used to remove duplications
 It is used to correct wrong email addresses
 It is used to identify missing data
 It used to convert the data types
 It is used to capitalize name & addresses.
Types of Dimensions:
There are three types of Dimensions
 Confirmed Dimensions
 Junk Dimensions Garbage Dimension
 Degenerative Dimensions
 Slowly changing Dimensions
Garbage Dimension or Junk Dimension
 Confirmed is something which can be shared by multiple Fact Tables or multiple Data Marts.
 Junk Dimensions is grouping flagged values
 Degenerative Dimension is something dimensional in nature but exist fact table.(Invoice No)
Which is neither fact nor strictly dimension attributes. These are useful for some kind
of analysis. These are kept as attributes in fact table called degenerated dimension
Degenerate dimension: A column of the key section of the fact table that does not
have the associated dimension table but used for reporting and analysis, such column is
called degenerate dimension or line item dimension.
For ex, we have a fact table with customer_id, product_id, branch_id, employee_id, bill_no,
and date in key section and price, quantity, amount in measure section. In this fact table,
bill_no from key section is a single value; it has no associated dimension table. Instead of
creating a Separate dimension table for that single value, we can Include it in fact table to
improve performance. SO here the column, bill_no is a degenerate dimension or line item
dimension.
1. What is a Data Warehouse?
A Data Warehouse is a collection of data marts representing historical data from different
operational data source (OLTP). The data from these OLTP are structured and optimized for querying
and data analysis in a Data Warehouse.

2. What is a Data mart?

A Data Mart is a subset of a data warehouse that can provide data for reporting and analysis
on a section, unit or a department like Sales Dept, HR Dept, etc. The Data Mart are sometimes also
called as HPQS (Higher Performance Query Structure).

3. What is OLAP?
OLAP stands for Online Analytical Processing. It uses database tables (Fact and Dimension
tables) to enable multidimensional viewing, analysis and querying of large amount of data.

4. What is OLTP?
OLTP stands for Online Transaction Processing Except data warehouse databases the other
databases are OLTPs. These OLTP uses normalized schema structure. These OLTP databases are
designed for recording the daily operations and transactions of a business.
 Different Between OLTP and OLAP
OLTP OLAP
1 Application Oriented (e.g., purchase order it is Subject Oriented (subject in the sense customer, product,
functionality of an application)
item, time)
2 Used to run business Used to analyze business

3 Detailed data Summarized data

4 Repetitive access Ad-hoc access

5 Few Records accessed at a time (tens), simple Large volumes accessed at a time(millions), complex query
query
6 Small database Large Database

7 Current data Historical data

8 Clerical User Knowledge User

9 Row by Row Loading Bulk Loading

10 Time invariant Time variant

11 Normalized data De-normalized data

12 E – R schema Star schema

5. What are Dimensions?

Dimensions are categories by which summarized data can be viewed. For example a profit
Fact table can be viewed by a time dimension.

6. What are Confirmed Dimensions?

The Dimensions which are reusable and fixed in nature Example customer, time, geography
dimensions.

7. What are Fact Tables?

A Fact Table is a table that contains summarized numerical (facts) and historical data. This
Fact Table has a foreign key-primary key relation with a dimension table. The Fact Table maintains the
information in 3rd normal form.
A star schema is defined is defined as a logical database design in which there will be a
centrally located fact table which is surrounded by at least one or more dimension tables. This design
is best suited for Data Warehouse or Data Mart.

8. What are the types of Facts?

The types of Facts are as follows.
1. Additive Facts: A Fact which can be summed up for any of the dimension available in the
fact table.
2. Semi-Additive Facts: A Fact which can be summed up to a few dimensions and not for all
dimensions available in the fact table.
3. Non-Additive Fact: A Fact which cannot be summed up for any of the dimensions available
in the fact table.

9. What are the types of Fact Tables?

The types of Fact Tables are:
1. Cumulative Fact Table: This type of fact tables generally describes what was
happened over the period of time. They contain additive facts.
2. Snapshot Fact Table: This type of fact table deals with the particular period of
time. They contain non-additive and semi-additive facts.

10. What is Grain of Fact?

The Grain of Fact is defined as the level at which the fact information is stored in a fact table.
This is also called as Fact Granularity or Fact Event Level.

11. What is Factless Fact table?

The Fact Table which does not contains facts is called as Fact Table. Generally when we need
to combine two data marts, then one data mart will have a fact less fact table and other one with
common fact table.

12. What are Measures?

Measures are numeric data based on columns in a fact table.

13. What are Cubes?

Cubes are data processing units composed of fact tables and dimensions from the data
warehouse. They provided multidimensional analysis.

14. What are Virtual Cubes?

These are combination of one or more real cubes and require no disk space to store them.
They store only definition and not the data.

15. What is a Star schema design?

A Star schema is defined as a logical database design in which there will be a centrally located
fact table which is surrounded by at least one or more dimension tables. This design is best suited for
Data Warehouse or Data Mart.

16. What is Snow Flake schema Design?

In a Snow Flake design the dimension table (de-normalized table) will be further divided into
one or more dimensions (normalized tables) to organize the information in a better structural format.
To design snow flake we should first design star schema design.

17. What is Operational Data Store [ODS] ?

It is a collection of integrated databases designed to support operational monitoring. Unlike
the OLTP databases, the data in the ODS are integrated, subject oriented and enterprise wide data.

18. What is Denormalization?

Denormalization means a table with multi duplicate key. The dimension table follows
Denormalization method with the technique of surrogate key.

19. What is Surrogate Key?

A Surrogate Key is a sequence generated key which is assigned to be a primary key in the
system (table).

20. What are the client components of Informatica 7.1.1?

Informatica 7.1.1 Client Components:
1. Informatica Designer
2. Informatica Work Flow Manager
3. Informatica Work Flow Monitor
4. Informatica Repository Manager
5. Informatica Repository Server Administration Console.

21. What are the server components of Informatica 7.1.1?

Informatica 7.1.1 Server Components:
1. Informatica Server
2. Informatica Repository Server.

22. What is Metadata?

Data about data is called as Metadata. The Metadata contains the definition of a data.

23. What is a Repository?

Repository is a centrally stored container which stores the metadata, which is used by the
Informatica Power center server and Power Center client tools. The Informatica stores Repository in
relational database format.

Informatica 7.1.1 Repository has 247 database objects

Informatica 6.1.1 Repository has 172 database objects
Informatica 5.1.1 Repository has 145 database objects
Informatica 4.1.1 Repository has 111 database objects

24. What is Data Acquisition Process?

The process of extracting the data from different source (operational databases) systems,
integrating the data and transforming the data into a homogenous format and loading into the target
warehouse database. Simple called as ETL (Extraction, Transformation and Loading). The Data
Acquisition process designs are called in different manners by different ETL vendors.
Informatica ----> Mapping
Data Stage ----> Job
Abinitio ----> Graph

25. What are the GUI based ETL tools?

The following are the GUI based ETL tools:
1. Informatica
2. DataStage
3. Data Junction
4. Oracle Warehouse Builder
5. Abinitio
6. Business Object Data Integrator
7. Cognos Decision Stream.

26. What are programmatic based ETL tools?

1. Pl/Sql
2. SAS BASE
3. SAS ACCESS
4. Tera Data Utilities
a. BTEQ
b. Fast Load
c. Multi Load
d. Fast Export
e. T (Trickle) Pump

27. What is a Transformation?

A transformation is a repository object that generates, modifies, or passes data.
Transformations in a mapping represent the operations the PowerCenter Server performs on the data.
Data passes into and out of transformations through ports that you link in a mapping or mapplet.
Transformations can be active or passive. An active transformation can change the number of rows
that pass through it. A passive transformation does not change the number of rows that pass through
it.

28. The following are details description of Transformations available in Informatica.

Transformation Type Description

Aggregator Active / Connected Performs aggregate calculations

Application Source Active / Connected Represents the rows that the Power
Qualifier Center Server reads from an
application, such as an ERP source,
when it runs a session.

Custom Active or Passive / Calls a procedure in a shared library or

Connected DLL.

Expression Passive / Connected Calculates a value

External Procedure Active / Connected or Calls a procedure in a shared library or

Unconnected in the COM layer of windows.

Filter Active / Connected Filters data

Input Passive / Connected Defines mapplet input rows. Available

in the Mapplet Designer

Joiner Active / Connected Joins data from different databases of

flat file systems.

Lookup Passive / Connected or Looks up values

Unconnected

Normalizer Active / Connected Source qualifier for COBOL sources.

Can also use in the pipeline to
normalize data from relational or flat
file sources.

Output Passive / Connected Defines mapplet output rows. Available

in the Mapplet Designer.

Rank Active / Connected Limits records to a top or bottom range.

Router Active / Connected Router data into multiple

transformations based on group
conditions.

Sequence Generator Passive / Connected Generates primary keys.

Sorter Active / Connected Sorts data base4d on a sort key.

Source Qualifier Active / Connected Represents the rows that the
PowerCenter Server reads from a
relational or flat file source when it runs
a session.

Stored Procedure Passive / Connected or Calls a stored procedure.

Unconnected

Transaction Control Active / Connected Defines commit and rollback

transactions.

Union Active / Connected Merges data from different databases

or flat file systems.

Update Strategy Active / Connected Determines whether to insert, delete,

update, or reject rows.

XML Generator Active / Connected Reads data from one or more input
ports and outputs XML through a single
output port.

XML Parser Active / Connected Reads XML from one input port and
outputs data to one or more output
ports.

XML Source Qualifier Active / Connected Represents the rows that the
PowerCenter Server reads from an XML
source when it runs a session.

29. What are features of Informatica Repository Server?

Features of Informatica Repository Server.

1. Informatica client application and Informatica server access the repository

database tables through the Repository Server.
2. Informatica client connects to the repository server through the host name/ IP address
and its port number.
3. The Repository Server can manager multiple repository on different machines on the
network.
4. For each repository database registered with the Repository Server it configures and
manages a Repository Agent process.
5. The Repository Agent is a multi-threaded process that performs the action needed to
retrieve, insert and updated metadata in the repository database tables.

30. How many types of Repositories are there?

There are three types of Repositories:
1. Standalone Repository
2. Global Repository
3. Local Repository

31. What are the types of metadata stored in Repository?

The following types of metadata are stored in Repository:
1. Database connections
2. Global objects
3. Mappings
4. Mapplets
5. Multi-dimensional metadata
6. Reusable Transformations
7. Sessions and Batches
8. Shortcuts
9. Source Definitions
10. Target definitions
11. Transformations

32. What are the types of locks in Repository?

There are two types of Locks in Repository:
1. Read Lock
2. Write Lock
3. Execute Lock
4. Fetch Lock
5. Save Lock

33. What are Repository objects which we can export?

We can export the following Repository objects:
1. Sources
2. Targets
3. Transformations
4. Mapplets
5. Mappings
6. Sessions

34. What is a Work Flow?

A Work Flow is a set of instructions on how to execute tasks such as sessions, emails and shell
commands. A WorkFlow is created from Workflow Manager.

35. What are actions which can be performed by pmcmd command?

We can perform the following actions with pmcmd:
1. Check whether the Informatica server is running
2. Start and stop sessions and batches
3. Recover sessions.
4. Stop the Informatica server.
pmcmd returns zero on success and non zero on failure

36. What is commit interval?

A commit interval is the interval at which the Informatica Server commits data to relational
targets during a session.
37. What is the use of Stored Procedure Transformation?
We use the Stored Procedure Transformation for populating and maintaining the database.

38. What is the use of partitioning the sessions?

The partitioning of session increases the session performance by reducing the time period of
reading the source data and loading the data into the target.

39. What is the uses of Lookup Transformation?

The Lookup Transformation is useful for:
1. Getting a related value form a table using a key column value
2. Update slowly changing dimension table
3. To check whether records already exists in the table.

40. What is Polling?

It displays the update information about the session in the monitor window. The monitor
window displays the status of each session when you poll the Informatica Server.

41. What is a Parameter File?

The parameter File is used to define the values of the parameters and variables used in a
session. It is a file created in a notepad and saved with .prm extension.

42. What is Metadata Reporter?

It is a web based application that enables you to run reports against the repository metadata.
With a metadata reporter you can access information about your repository without having knowledge
of SQL.

43. What is meant by Lookup Cache?

The Informatica server builts a cache in memory when it process the first row of a data in a
cached lookup transformation.

44. What is a Load Manager?

The Load Manager is a primary Informatica Server process. It performs the following tasks:
 Manages sessions and batch scheduling
 Locks the session and read the session properties
 Read the parameter file
 Expand the server and session variables and parameters.
 Verify permissions and privileges

45. What are the tasks performed by Sequence Generator Transformation?

1. Create keys
2. Replace missing values
3. Cycle through a sequential range of numbers.

46. What is the end value of the Sequence Generator?

The end value of the Sequence Generator is 2147483647.

47. What are variables supplied by the Transaction Control Transformation?

1. TC_COMMIT_BEFORE
2. TC_COMMIT_AFTER
3. TC_ ROLLBACK_BEFORE
4. TC_ROLLBACK_AFTER
5. TC_CONTINUE_TRANSACTION [Default]

48. How to implement Update Strategy?

To implement Update Strategy Transformation the source and target table should have
primary keys to compare the records the records and to find out the latest changes happened.

49. What are constants of Update Strategy Transformation?

The constants of Update Strategy Transformation are:

1. DD_INSERT - 0
2. DD_UPDATE - 1
3. DD_DELETE - 2
4. DD_REJECT - 3
DD Stands For Data Driven

50 What are the benefits of Star Schema Design?

 Fewer tables
 Designed for analysis across time
 Simplify joins
 Less database space
 Supports drilling on reports

51 What is Data Scrubbing?

The Data Scrubbing is the process of cleaning up the junk in the legacy data and make it
accurate and useful. Simply, making good data out of bad data.
52. What are Bad Rows (Rejected Rows)?
The Informatica Server will dumped the bad or rejected rows which are sent out by the
transformation into a text file with tablename.bad extension.

53. The Normalizer Transformation is mainly used to extract and format the Cobol files.

54. We can apply “Distinct” clause only in Source Qualifier and Sorter Transformations.

55. What are types of Dimensional Modeling?

1. Conceptual Modeling
2. Physical Modeling
3. Logical Modeling

56. What is Forward Engineering?

Using the Erwin tool the data modeler will convert the .SQL script (logical structure of tables)
into a physical structure tables at the database level, this is called as Forward Engineering.

57. What is common use of creating a Factless Fact Table?

The most common use of creating a Factless fact table is to capture date transaction events.

58. What are the different sources of Source systems of Data Warehouse?
1. RDBMS
2. Flat Files
3. XML Files
4. SAP R/3
5. PeopleSoft
6. SAP BW
7. Web Methods
8. Web Services
9. Seibel
10. Cobol Files
11. Legacy Systems.

59. You cannot use XML source qualifier in a mapplet and Joiner and Normalizer
Transformations.

60. What are the Session Partitions types?

1. Round-robin
2. Hash keys
3. Key range
4. Pass-through
5. Database partitioning

61. You cannot use Incremental Aggregation when the mapping includes an aggregator
transformation.

62.While importing source definition the metadata that will be imported are:
1. Source Name
2. Database Location
3. Column Names
4. Data Types
5. Key Constraints

63. We can stop the Batch by two ways:

1. Server Manager
2. By pmcmd command
64. What is stop the Batch and types of Batches?
Grouping of sessions is known as Batch. There are two types of batches:

1. Sequential
2. Concurrent

65. What is a tracing level and types of Tracing level?

Tracing level represents the amount of information that Informatica server writes in a log file.
Types of Tracing levels are:

1. Normal
2. Verbose
3. Verbose lnit
4. Verbose Data

66. What is the default join that source qualifier provides?

Inner Join

67. Types of Slowly Changing Dimensions:

1. Type – 1 (Recent updates)
2. Type – 11 (Full historical information)
3. Type – 111 (Partial historical information)

68. What are Update Strategy’s target table options?

1. Update as Update: Updates each row flagged for update if it exists in the table.
2. Update as Insert: Inserts a new row for each update.
3. Update else Insert: Updates if row exists, else inserts.

69. What is a Data in a database this include the source of tables, the meaning of the keys
and the relationship between the tables.

70. In Conceptual Modeling and Logical modeling the tables are called as entities.

71. What does a Mapping document contains?

The Mapping document contains the following information :
1. Source Definition – from where the database has to be loaded
2. Target Definition – to where the database has to be loaded
3. Business Logic – what logic has to be implemented in staging area.

72. What does the Top Down Approach says?

The Top Down Approach is coined by Bill Immon. According to his approach he says “First we
need to implement the Enterprise data warehouse by extracting the data from individual departments
and from the Enterprise data warehouse develop subject oriented databases called as “Data Marts”.

73. What does the Bottom Up Approach or Ralph Kimball Approach says?
The Bottom Down Approach is coined by Ralph Kimball. According to his approach he says
“First we need to develop subject oriented database called as “Data Marts” then integrate all the Data
Marts to develop the Enterprise data warehouse.

74. Who is the first person in the organization to start the Data Warehouse project?
The first person to start the Data Warehouse project in a organization is Business Analyst.

75. What is a Dimension Modeling?

A Dimensional Modeling is a high level methodology used to implement the start schema
structure which is done by the Data Modeler.

76. What are the types of OLAPs ?

1. DOLAP: The OLAP tool which words with desktop databases are called as DOLAP. Example:
Cognos EP 7 Series and Business Objects, Micro strategy.
2. ROLAP: The OLAP which works with Relational databases are called as ROLAP. Example:
Business Object, Micro strategy, Cognos ReportNet and BRIO.
3. MOLAP: The OLAP which is responsible for creating multidimensional structures called cubes
are called as MOLAP. Example: Cognos ReportNet.
4. HOLAP: The OLAP which uses the combined features of ROLAP and MOLAP are called as
HOLAP. Example Cognos ReportNet.

77. What is the extension of Repository backup?

The extension of the Repository backup is .rep

78. Which join is not supported by Joiner Transformation?

The non-equi joins are not supported by joiner Transformation.

79. What is SQL Override?

Applying the joining condition in the source qualifier is called as sql override.

80.What is Rank Index?

When you create a Rank Transformation by default “Rank Index” port will be created, to store
the number of ranks specified.

81. What is Sort Key?

The column on which the sorting takes place in the Sorter Transformation is called as “Sort
Key” Column.

82. What is default group in Router Transformation?

In the Router Transformation the rejected rows are captured by default group and the data
will be passed to target table.

83. What is unconnected Transformation?

The transformation which does not involve in mapping data flow is called as Unconnected
Transformation.

84. What is Connected Transformation?

The Transformation which involve in mapping data flow is called as connected transformation.
By default all the transformation are connected transformation.

85. Which Transformation is responsible to maintain updates in warehouse database?

Update Strategy Transformation.

86. What are the caches contained by the Look up Transformation?

1. Static Lookup cache
2. Dynamic Lookup Cache
3. Persistent Lookup Cache
4. Data cache
5. Index cache

87. What are the Direct and Indirect methods in the Flat file extraction?
In the direct method the extract the flat file by using its own meta data. In indirect method we
extract all the flat files by using one flat file’s meta data.

88. What is the Maplet?

Mapplet is type of meta data object which contains set of reusable transformation logic which
can be reused in multiple mapping. A maplet contains one maplet input Transformation and one
maplet output Transformation.

89. What is the basic difference between reusable transformation and mapplet?
Maplets are set of reusable transformation logic and reusable transformations are created by
single transformation logic.
90. What is Target Load Planer?
The Target Load plan is the order in which we should load the target to implement the Data
Acquisition Process.

91. What is Constraint Based Load ordering?

The Constraint Based Load ordering specified the loading of the dimensions tables based on
the constraints designed in the dimension table. The Constraint Based Load order is used for
implementing snow-flake schema data loading.

92. How may Loading criteria?

There are three types of Loading criteria.
1. Paralle loading
2. Sequential
3. Control flow loading

93. What is File Watch Event?

The Event Wait activity of a session has event called as File Watch which will watch wether the
file is copied or not.

94. What is worklet?

The worklet is a group of sessions. To execute the worklet we have to create the workflow.

95. Why we use stored procedure transformation?

For populating and maintaining databases.

96. Why we use partitioning the session in Informatica?

Partitioning achieves the session performance by reducing the time period of reading the
source and loading the data into target.

97. Why we use lookup transformation?

Look up Transformations can access data from relational tables that are not sources in
mapping. With Lookup transformation, we can accomplish the following tasks.

98. Which transformation should we use to normalize the COBOL and relational sources?
When you drag the COBOL source into the Designer workspace, the normalized transformation
automatically appears, creating input and output ports for every column in the source.

99. Which tool you use to create and manage sessions and batches and to monitor and stop
the Informatica server?
Informatica server manager.

100. What are the types of data that passes between Informatica server and stored
procedure?
There are three types of data
1. Input/output parameter
2. Return Values
3. Status code

101. What are the groups available in Router Transformation?

1. User defined group
2. Default group

102. What are join types in Joiner Transformation?

The joins supplied by the Joiner Transformation are:
1. Normal Join
2. Master Outer Join
3. Detail Outer Join
4. Full Outer Join
103. What are the designer tools available for creation of Transformations?
1. Mapping Designer
2. Transformation Developer
3. Mapplet Designer

104. What are the basic needs to join two sources in Source Qualifier?
The two source tables should have a primary key – foreign key relationship and the two source
tables should have matching data types.

105. What is a Status code?

Status code provides error handling facility during the session execution.

106. What is Data Driven?

The Data Driven is the instruction which is fed to Informatica Server whether to
insert/delete/update when using Update Strategy Transformation.

107. What are the tasks to be done to partition a session?

 Configure the session to partition the source data
 Install the Informatica on a machine with multiple CPU

108. In which circumstances the Informatica creates a reject file (bad file)?
When it encounters the DD_REJECT in Update strategy Transformation
Voilets database constraints file in the rows was truncated or overflowed.

109.In a sequential batch can you run the session if previous session fails?
Yes, by setting the option always runs the session.

110. How many ways your create ports?

Two ways:
1. Drag the prot from another transformation
2. Click the add button on the ports tab.

111. How can you stop the batch?

By using server manager or pmcmd.
112.How can you improve session performance in aggregator transformation?
Use sorted input.

113. Can you use the mapping parameters or variables created in one mapping into any
other reusable transformation?
Yes, because reusable transformation is not contained with any mapplet or mapping.

114. Can you use the mapping parameters or variables created in one mapping into another
mapping?
No.
We can use mapping parameters or variables in any transformation of the same mapping or
mapplet in which you have created mapping parameters or variables.

115. Can you start a session inside a batch individually?

We can start our required session only in case of sequential batch in case of concurrent batch.
We can do like this.

116. Can you start batches with in a batch?

You cannot. If you wants to start batch that resides in a batch, create a new independent
batch and copy the necessary sessions into the new batch.

117. Can you generate reports in Informatica?

Yes. By using Metadata reporter we can generate reports in Informatica.
118. Can you copy the batches?
No.

119. After dragging the ports of there sources(sql server, oracle, Infomix) to a single
source qualifier, can you map these three ports directly to target?
No, Unless and until you join those three ports I source qualifier you cannot map them
directly.

120. What are Target Types on the Server?

Target Types are File, Relational and ERP.

121.What is the aggregate transformations?

Aggregate transformation allows you to perform aggregate calculations, such as averages and
sums.

122. What are Target Options on the Servers?

Target Options for File Target type are FTP File, Loader and MQ There are no target options for
ERP target type.

123. How do you identify existing rows of data in the target table using lookup
transformation?
Can identify existing rows of data using Unconnected transformation.

124. What is Code Page used for?

Code page is used to identify characters that might be I different languages. If your are
importing Japanese data into mapping, you must select the Japanese code page of source data.

125. What is a source qualifier?

It represents all data queried from the source.

126. Where should you place the flat file to import the flat file definition to the designer?
Place it in Local folder.

127. What are the settings that you use to configure the joiner transformation?
1. Master and detail source
2. Type of join
3. Condition of the join

128. What are the session parameters?

Session parameters are like mapping parameters; represent values you might want to change
between Sessions such as database connections or source files.

129. What are the methods for creating reusable transformations?

There are two methods:
1. Design it in the transformation developer.
2. Promote a standard transformation from mapping designer. After you ass a transformation to
the mapping, you can promote it to the status of reusable transformation.

130. What are the joiner caches?

When a Joiner transformation occurs in a session, the Informatica Server reads all the records
from the master source and builds index and data caches bases on the master rows. After building the
caches, the Joiner transformation reads records from the detail source and performs joins.

131. What are different options uses to configure the sequential batches?
There are two options:
1. Run the session only if previous session completes successfully.
2. Always runs the session.

132. What are the data movement modes in Informatica?

Data movement modes determine how Informatica server handles the character data. You
chooses the data movement in the Informatica server configuration settings Two types of data
movement modes available in Informatica.
1. ASCII mode
2. Uni code mode

133. What is difference between stored procedure transformation and external procedure
transformation?
Inner equi join.

134. What is difference between stored procedure will be compiled and external procedure
transformation?
In case of stored procedure transformation procedure will be compiled and executed in a
relational data source. You needs data base connection to import the stored procedure in to yours
mapping. Where as in external procedure transformation procedure or function will be executed out
side of data source. That is you need to make it as a DLL to access in your mapping. No need to have
data base connection in case of external procedure transformation.

135. To achieve the session partition what are the necessary tasks you have to do?
1. Configure the session to partition source data.
2. Install the Informatica server on a machine with multiple CPU’S

136. Performance tuning in Informatica?

The goal of performance tuning is optimize session performance so sessions run during the
available load window for the Informatica server.

137.When can session fail?

The session fails when,
 Server cannot allocate enough system resources
 Session exceeds the maximum no. of sessions
 Server cannot obtain an execute lock for the session
 Server encounters database errors
 Network related errors

138. How many ways you can update a relational source definition?
There are ways you can update a relational source definition:
1. Edit the definition
2. reimport the definition

139. How many ways you can create a Reusable Transformation?

There are two ways to create a Reusable Transformation
1. By designing it in the transformation developer
2. By promoting the already existing Transformation to reusable from its properties.

140. What is a aggregator cache in aggregate transformation?

The aggregator transformation stores the data in the aggregator cache until it completes
aggregate calculations.

141. What is data cache and Index cache?

When you use aggregator transformation in your mapping then Informatica server creates
Data and Index cache in memory to process the transformation.

142. What is a Mapping Variable?

A Mapping Variable represents a value that can change throughout the session. The
Informatica server saves the value of mapping variable in repository in the send of the session and
uses it for the next session run.

143. In which scenario does the Update Strategy Transformation is best suited?
Within a session: When you configure a session, you can instruct the Informatica server to
either treat all records in same way (treat all as insert/treat all as update/treat all as update) or use
instructions coded into the session to flag records for different database operations.

Within a Mapping: Within a mapping, you use the update strategy transformation to flag
records for insert, update or reject.

144. What are the types of mappings in Getting Started Wizard?

1. Simple pass through mapping: loads a slowly growing fact or dimension table be inserting new
rows. Use this map to loading new data into it.
2. Slowly growing target: Loads a slowly growing fact or dimension table be inserting new rows.
Use this map to load new data without disturbing the existing data.

145. How can you recognize whether or not the data is added in the table in Type – II
dimension?
1. By version number
2. By flag value
3. By effective date range

146. Why you use Repository connectivity?

When you edit or schedule the session each time, Informatica server directly communicates
the repository to check whether or not the session and users are valid.

147. What are the data movement modes in Informatica?

The data movement modes determines how Informatica server handles the character data. There
are two types of data movement modes:
1. ASCII mode
2. Uni code mode

148. Can you copy the session to a different folder or Repository?

Yes. By using the copy session wizard you can copy a session in a different folder or
Repository. But first you should copy the mapping of that session before you copy session.

149. What is the difference between partitioning of relational target and partitioning of file
target?
If you partition a session with a relational target Informatica server creates multiple
connections to the target database to write target data concurrently. If you partition a session with file
target the Informatica server create one target file for each partition.

150. What are the Transformations that restrict the partition of sessions?
1. Advanced External Transformation
2. External Procedure Transformation
3. Aggregator Transformation
4. Joiner Transformation
5. Normalizer Transformation
6. XML Targets

151. What is a Power Center Repository?

The Power Center Repository allows you to share metadata across repositories to create a
data mart domain. In a data mart domain, you can create a single global repository to store metadata
used across an enterprise and a number of local repositories to share the global metadata as needed.

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Databricks Questions
No ratings yet
Databricks Questions
23 pages
24 StoredProcs
No ratings yet
24 StoredProcs
6 pages
Practice Questions for Snowflake Snowpro Core Certification Concept Based - Latest Edition 2023
From Everand
Practice Questions for Snowflake Snowpro Core Certification Concept Based - Latest Edition 2023
Exam OG
5/5 (1)
Data Warehousing Interview Questions - by Shobha Bhagwat - Medium
No ratings yet
Data Warehousing Interview Questions - by Shobha Bhagwat - Medium
9 pages
Data Warehouse Concepts & Terminology: - Vamshi Myana
No ratings yet
Data Warehouse Concepts & Terminology: - Vamshi Myana
39 pages
Data Warehouse Interview Questions:: Why Oracle No Netezza?
No ratings yet
Data Warehouse Interview Questions:: Why Oracle No Netezza?
6 pages
Star and Snowflake Schemas
No ratings yet
Star and Snowflake Schemas
4 pages
SQL interview questions for a Data Engineer
No ratings yet
SQL interview questions for a Data Engineer
11 pages
External Tables
No ratings yet
External Tables
105 pages
Top 50 Data Warehousing Interview Questions & Answers
No ratings yet
Top 50 Data Warehousing Interview Questions & Answers
8 pages
Load Data With Azure Data Factory
No ratings yet
Load Data With Azure Data Factory
4 pages
What Are The Dimensions in Data Warehouse
100% (1)
What Are The Dimensions in Data Warehouse
6 pages
What Are Slowly Changing Dimensions
No ratings yet
What Are Slowly Changing Dimensions
2 pages
A Data Pipeline Should Address These Issues:: Topics To Study
No ratings yet
A Data Pipeline Should Address These Issues:: Topics To Study
10 pages
Tuning SQL Queries - Oracle
100% (1)
Tuning SQL Queries - Oracle
27 pages
What Is Dimensional Model
No ratings yet
What Is Dimensional Model
7 pages
Datawarehouse Tools
No ratings yet
Datawarehouse Tools
8 pages
DBT Interview
No ratings yet
DBT Interview
7 pages
Fact and Dimension Tables
No ratings yet
Fact and Dimension Tables
11 pages
Azure Databricks Overview
No ratings yet
Azure Databricks Overview
23 pages
SQL Questions
No ratings yet
SQL Questions
4 pages
Data Warehousing Concepts
No ratings yet
Data Warehousing Concepts
9 pages
Adbms Data Warehousing and Data Mining
No ratings yet
Adbms Data Warehousing and Data Mining
169 pages
DWH Fundamentals
No ratings yet
DWH Fundamentals
63 pages
Big Query Optimization Document
No ratings yet
Big Query Optimization Document
10 pages
15.table Types
No ratings yet
15.table Types
13 pages
SQL01 - Introduction To Business Intelligence
No ratings yet
SQL01 - Introduction To Business Intelligence
75 pages
SQL Interview
No ratings yet
SQL Interview
73 pages
DW-BI Best Practices
100% (1)
DW-BI Best Practices
15 pages
Data Warehouse - Concept and Fundamentals: Sridevi
No ratings yet
Data Warehouse - Concept and Fundamentals: Sridevi
25 pages
Etl VS Elt
No ratings yet
Etl VS Elt
8 pages
Pushpender Snowflake 24thjune Questions
No ratings yet
Pushpender Snowflake 24thjune Questions
3 pages
PLSQL Introduction Final
No ratings yet
PLSQL Introduction Final
81 pages
Final InformaticaHandBook
No ratings yet
Final InformaticaHandBook
133 pages
Data Warehouse Schema
No ratings yet
Data Warehouse Schema
6 pages
Windowing Functions
No ratings yet
Windowing Functions
54 pages
Incremental Loading For Dimension Table
100% (1)
Incremental Loading For Dimension Table
3 pages
Ssis Interview Imp1
No ratings yet
Ssis Interview Imp1
4 pages
Azure Data Engineer Mock Interview - Project Special
No ratings yet
Azure Data Engineer Mock Interview - Project Special
11 pages
DWH BASICS Interview Questions
No ratings yet
DWH BASICS Interview Questions
29 pages
Data Mining and Data Warehouse BY
100% (1)
Data Mining and Data Warehouse BY
12 pages
Designing The Data Warehouse Aima Second Lecture
No ratings yet
Designing The Data Warehouse Aima Second Lecture
34 pages
Course+Slides+ +Data+Warehouse+ +the+Ultimate+Guide
No ratings yet
Course+Slides+ +Data+Warehouse+ +the+Ultimate+Guide
393 pages
Dimensional Modelling
No ratings yet
Dimensional Modelling
26 pages
Teradata SQL Performance Tuning Case Study Part II
0% (1)
Teradata SQL Performance Tuning Case Study Part II
37 pages
Data Wharehousing, OLAP and Data Mining
No ratings yet
Data Wharehousing, OLAP and Data Mining
84 pages
Data Warehouse - What Is It
No ratings yet
Data Warehouse - What Is It
5 pages
Databricks Performance Tuning
No ratings yet
Databricks Performance Tuning
9 pages
Azure Data Factory Notes 1682135573
No ratings yet
Azure Data Factory Notes 1682135573
78 pages
Oracle PLSQL Notes
100% (4)
Oracle PLSQL Notes
59 pages
Views in Snowflake
No ratings yet
Views in Snowflake
13 pages
Sqoop Cammand
No ratings yet
Sqoop Cammand
8 pages
Dataware Q&a Bank
100% (1)
Dataware Q&a Bank
42 pages
Data Warehouse Concepts
No ratings yet
Data Warehouse Concepts
11 pages
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet
The Definitive Guide to Data Integration: Unlock the power of data integration to efficiently manage, transform, and analyze data
From Everand
The Definitive Guide to Data Integration: Unlock the power of data integration to efficiently manage, transform, and analyze data
Pierre-yves Bonnefoy
No ratings yet
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
Instant Pentaho Data Integration Kitchen
From Everand
Instant Pentaho Data Integration Kitchen
Sergio Ramazzina
No ratings yet
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet
CVHE-SB-58 (Current Transformers)
No ratings yet
CVHE-SB-58 (Current Transformers)
6 pages
Ethical Hacking: BSC Computer Science
No ratings yet
Ethical Hacking: BSC Computer Science
31 pages
The Transformation of The American Dream Article and Qs
No ratings yet
The Transformation of The American Dream Article and Qs
4 pages
Proc 1159 Gma
No ratings yet
Proc 1159 Gma
4 pages
The Yoga Ladder Student's Booklet Latest
100% (1)
The Yoga Ladder Student's Booklet Latest
4 pages
Amha List
No ratings yet
Amha List
4 pages
Chen 2022
No ratings yet
Chen 2022
13 pages
Chapter 10 Consumer Mathematics Financial Manage
No ratings yet
Chapter 10 Consumer Mathematics Financial Manage
2 pages
Book Review of His Needs Her Needs: Building An Affair-Proof Marriage
No ratings yet
Book Review of His Needs Her Needs: Building An Affair-Proof Marriage
4 pages
Surface Smoothness Tester
No ratings yet
Surface Smoothness Tester
93 pages
ORAN_Fronthaul
No ratings yet
ORAN_Fronthaul
2 pages
Download ebooks file Workplace Learning in Context Helen Rainbird all chapters
100% (9)
Download ebooks file Workplace Learning in Context Helen Rainbird all chapters
67 pages
PR II DLL
No ratings yet
PR II DLL
11 pages
Research Paper BK 2
No ratings yet
Research Paper BK 2
12 pages
On Defining The Morpheme
No ratings yet
On Defining The Morpheme
7 pages
Ansys PDF
No ratings yet
Ansys PDF
163 pages
ACTIVITY BASED BUDGETING
No ratings yet
ACTIVITY BASED BUDGETING
2 pages
NIMS University Jaipur - Ph.D. Economics Selection Process, Course Fee, Placement
No ratings yet
NIMS University Jaipur - Ph.D. Economics Selection Process, Course Fee, Placement
6 pages
El Nino and La Nina: Durlov Jyoti Kalita/104
No ratings yet
El Nino and La Nina: Durlov Jyoti Kalita/104
3 pages
Master Thesis Sales Management
100% (3)
Master Thesis Sales Management
5 pages
A Critique Paper On "Kwentong Jollibee: Apart" COMMERCIAL Written by BSE 1-2 Group 5
No ratings yet
A Critique Paper On "Kwentong Jollibee: Apart" COMMERCIAL Written by BSE 1-2 Group 5
2 pages
Organizational Behavior Use and Its Importance
No ratings yet
Organizational Behavior Use and Its Importance
13 pages
2drag Coefficient
No ratings yet
2drag Coefficient
2 pages
Derrida and Deconstruction
100% (1)
Derrida and Deconstruction
5 pages
[FREE PDF sample] Diasporic Returns to the Ethnic Homeland Takeyuki Tsuda ebooks
100% (3)
[FREE PDF sample] Diasporic Returns to the Ethnic Homeland Takeyuki Tsuda ebooks
62 pages
9 The Simple Truth September 2024 F
No ratings yet
9 The Simple Truth September 2024 F
28 pages
Combined PNAP Administrative
No ratings yet
Combined PNAP Administrative
112 pages
Friday Lunchtime Lecture: What Open Data Do We Need For A Greener, Cleaner World?
No ratings yet
Friday Lunchtime Lecture: What Open Data Do We Need For A Greener, Cleaner World?
35 pages
SRD T73 PDF
No ratings yet
SRD T73 PDF
2 pages
College of Natural Sciences: Arbaminch University
No ratings yet
College of Natural Sciences: Arbaminch University
21 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.