Data Warehousing Basics
Data Warehousing Basics
Subject Oriented
Data warehouses are designed to help you analyze data. For example, to learn more about your
company's sales data, you can build a warehouse that concentrates on sales. Using this warehouse,
you can answer questions like "Who was our best customer for this item last year?" This ability to
define a data warehouse by subject matter, sales in this case makes the data warehouse subject
oriented.
Integrated
Integration is closely related to subject orientation. Data warehouses must put data from
disparate sources into a consistent format. They must resolve such problems as naming conflicts
and inconsistencies among units of measure. When they achieve this, they are said to be integrated.
Nonvolatile
Nonvolatile means that, once entered into the warehouse, data should not change. This is
logical because the purpose of a warehouse is to enable you to analyze what has occurred.
Time Variant
In order to discover trends in business, analysts need large amounts of data. This is very
much in contrast to online transaction processing (OLTP) systems, where performance
requirements demand that historical data be moved to an archive. A data warehouse's focus on
change over time is what is meant by the term time variant.
2. How many stages in Datawarehousing?
Data warehouse generally includes two stages
ETL
Report Generation
ETL
Short for extract, transform, load, three database functions that are combined into one tool
Extract -- the process of reading data from a source database.
Transform -- the process of converting the extracted data from its previous form into
required form
Load -- the process of writing the data into the target database.
ETL is used to migrate data from one database to another, to form data marts and data
warehouses and also to convert databases from one format to another format.
It is used to retrieve the data from various operational databases and is transformed into useful
information and finally loaded into Data warehousing system.
1 INFORMATICA
2 ABINITO
3 DATASTAGE
4. BODI
5 ORACLE WAREHOUSE BUILDERS
Report generation
In report generation, OLAP is used (i.e.) online analytical processing. It is a set of specification
which allows the client applications in retrieving the data for analytical processing.
It is a specialized tool that sits between a database and user in order to provide various analyses of
the data stored in the database.
OLAP Tool is a reporting tool which generates the reports that are useful for Decision support for top
level management.
1. Business Objects
2. Cognos
3. Micro strategy
4. Hyperion
5. Oracle Express
6. Microsoft Analysis Services
Definition Comment
3. What is OLAP?
OLAP stands for Online Analytical Processing. It uses database tables (Fact and Dimension
tables) to enable multidimensional viewing, analysis and querying of large amount of data.
4. What is OLTP?
OLTP stands for Online Transaction Processing Except data warehouse databases the other
databases are OLTPs. These OLTP uses normalized schema structure. These OLTP databases are
designed for recording the daily operations and transactions of a business.
Different Between OLTP and OLAP
OLTP OLAP
1 Application Oriented (e.g., purchase order it is Subject Oriented (subject in the sense customer, product,
functionality of an application)
item, time)
2 Used to run business Used to analyze business
5 Few Records accessed at a time (tens), simple Large volumes accessed at a time(millions), complex query
query
6 Small database Large Database
Application Source Active / Connected Represents the rows that the Power
Qualifier Center Server reads from an
application, such as an ERP source,
when it runs a session.
XML Generator Active / Connected Reads data from one or more input
ports and outputs XML through a single
output port.
XML Parser Active / Connected Reads XML from one input port and
outputs data to one or more output
ports.
XML Source Qualifier Active / Connected Represents the rows that the
PowerCenter Server reads from an XML
source when it runs a session.
1. DD_INSERT - 0
2. DD_UPDATE - 1
3. DD_DELETE - 2
4. DD_REJECT - 3
DD Stands For Data Driven
53. The Normalizer Transformation is mainly used to extract and format the Cobol files.
54. We can apply “Distinct” clause only in Source Qualifier and Sorter Transformations.
58. What are the different sources of Source systems of Data Warehouse?
1. RDBMS
2. Flat Files
3. XML Files
4. SAP R/3
5. PeopleSoft
6. SAP BW
7. Web Methods
8. Web Services
9. Seibel
10. Cobol Files
11. Legacy Systems.
59. You cannot use XML source qualifier in a mapplet and Joiner and Normalizer
Transformations.
61. You cannot use Incremental Aggregation when the mapping includes an aggregator
transformation.
62.While importing source definition the metadata that will be imported are:
1. Source Name
2. Database Location
3. Column Names
4. Data Types
5. Key Constraints
1. Sequential
2. Concurrent
1. Normal
2. Verbose
3. Verbose lnit
4. Verbose Data
69. What is a Data in a database this include the source of tables, the meaning of the keys
and the relationship between the tables.
70. In Conceptual Modeling and Logical modeling the tables are called as entities.
73. What does the Bottom Up Approach or Ralph Kimball Approach says?
The Bottom Down Approach is coined by Ralph Kimball. According to his approach he says
“First we need to develop subject oriented database called as “Data Marts” then integrate all the Data
Marts to develop the Enterprise data warehouse.
74. Who is the first person in the organization to start the Data Warehouse project?
The first person to start the Data Warehouse project in a organization is Business Analyst.
87. What are the Direct and Indirect methods in the Flat file extraction?
In the direct method the extract the flat file by using its own meta data. In indirect method we
extract all the flat files by using one flat file’s meta data.
89. What is the basic difference between reusable transformation and mapplet?
Maplets are set of reusable transformation logic and reusable transformations are created by
single transformation logic.
90. What is Target Load Planer?
The Target Load plan is the order in which we should load the target to implement the Data
Acquisition Process.
98. Which transformation should we use to normalize the COBOL and relational sources?
When you drag the COBOL source into the Designer workspace, the normalized transformation
automatically appears, creating input and output ports for every column in the source.
99. Which tool you use to create and manage sessions and batches and to monitor and stop
the Informatica server?
Informatica server manager.
100. What are the types of data that passes between Informatica server and stored
procedure?
There are three types of data
1. Input/output parameter
2. Return Values
3. Status code
104. What are the basic needs to join two sources in Source Qualifier?
The two source tables should have a primary key – foreign key relationship and the two source
tables should have matching data types.
108. In which circumstances the Informatica creates a reject file (bad file)?
When it encounters the DD_REJECT in Update strategy Transformation
Voilets database constraints file in the rows was truncated or overflowed.
109.In a sequential batch can you run the session if previous session fails?
Yes, by setting the option always runs the session.
113. Can you use the mapping parameters or variables created in one mapping into any
other reusable transformation?
Yes, because reusable transformation is not contained with any mapplet or mapping.
114. Can you use the mapping parameters or variables created in one mapping into another
mapping?
No.
We can use mapping parameters or variables in any transformation of the same mapping or
mapplet in which you have created mapping parameters or variables.
119. After dragging the ports of there sources(sql server, oracle, Infomix) to a single
source qualifier, can you map these three ports directly to target?
No, Unless and until you join those three ports I source qualifier you cannot map them
directly.
123. How do you identify existing rows of data in the target table using lookup
transformation?
Can identify existing rows of data using Unconnected transformation.
126. Where should you place the flat file to import the flat file definition to the designer?
Place it in Local folder.
127. What are the settings that you use to configure the joiner transformation?
1. Master and detail source
2. Type of join
3. Condition of the join
131. What are different options uses to configure the sequential batches?
There are two options:
1. Run the session only if previous session completes successfully.
2. Always runs the session.
133. What is difference between stored procedure transformation and external procedure
transformation?
Inner equi join.
134. What is difference between stored procedure will be compiled and external procedure
transformation?
In case of stored procedure transformation procedure will be compiled and executed in a
relational data source. You needs data base connection to import the stored procedure in to yours
mapping. Where as in external procedure transformation procedure or function will be executed out
side of data source. That is you need to make it as a DLL to access in your mapping. No need to have
data base connection in case of external procedure transformation.
135. To achieve the session partition what are the necessary tasks you have to do?
1. Configure the session to partition source data.
2. Install the Informatica server on a machine with multiple CPU’S
138. How many ways you can update a relational source definition?
There are ways you can update a relational source definition:
1. Edit the definition
2. reimport the definition
143. In which scenario does the Update Strategy Transformation is best suited?
Within a session: When you configure a session, you can instruct the Informatica server to
either treat all records in same way (treat all as insert/treat all as update/treat all as update) or use
instructions coded into the session to flag records for different database operations.
Within a Mapping: Within a mapping, you use the update strategy transformation to flag
records for insert, update or reject.
145. How can you recognize whether or not the data is added in the table in Type – II
dimension?
1. By version number
2. By flag value
3. By effective date range
149. What is the difference between partitioning of relational target and partitioning of file
target?
If you partition a session with a relational target Informatica server creates multiple
connections to the target database to write target data concurrently. If you partition a session with file
target the Informatica server create one target file for each partition.
150. What are the Transformations that restrict the partition of sessions?
1. Advanced External Transformation
2. External Procedure Transformation
3. Aggregator Transformation
4. Joiner Transformation
5. Normalizer Transformation
6. XML Targets