0% found this document useful (0 votes)

6 views42 pages

Research On Wings1 t13 Lab

This guide offers a comprehensive overview of Informatica PowerCenter and Tableau, focusing on data integration and visualization. It details the ETL process, the architecture of PowerCenter, and the functionalities of its client tools for effective data management. The document aims to equip data professionals with the knowledge to optimize their use of these tools in various data environments.

Uploaded by

Deena Bandhu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views42 pages

Research On Wings1 t13 Lab

Uploaded by

Deena Bandhu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

A Comprehensive Guide to Informatica

PowerCenter and Tableau: From Data

Integration to Visual Analytics
This guide provides an in-depth exploration of two pivotal tools in the data management and
business intelligence landscape: Informatica PowerCenter for robust data integration and ETL
(Extract, Transform, Load) processes, and Tableau for dynamic data visualization and analytics.
It aims to equip data professionals and students with a thorough understanding of their
architectures, core functionalities, development practices, and optimization strategies.
Part 1: Mastering Informatica PowerCenter for Data Integration
Section 1: Introduction to ETL and Informatica PowerCenter
The journey of data from raw, disparate sources to actionable insights often involves complex
integration processes. Central to these processes is the concept of ETL, and Informatica
PowerCenter stands as a leading platform for implementing these data workflows.
● Core ETL Concepts (Extract, Transform, Load)ETL is a foundational data integration
methodology comprising three distinct phases: Extract, Transform, and Load.
1. Extract: This initial phase involves the collection of data from one or multiple
originating systems. These sources can be heterogeneous, ranging from databases
and flat files to enterprise applications. During extraction, it is common practice to
apply validation rules to the incoming data. This early testing ensures that the data
meets certain predefined requirements of its eventual destination. Data that fails
these initial validation checks is typically rejected and does not proceed to
subsequent stages, preventing the propagation of errors. This proactive approach
to data quality, initiating checks at the point of ingestion, can significantly reduce
complications and resource expenditure in later processing phases. The clear
definition of what constitutes "valid" data, established from business requirements
before ETL development commences, is therefore paramount.
2. Transform: Once extracted, data undergoes transformation. This phase is critical
for processing the data to ensure its values and structure conform consistently with
the intended use case and the schema of the target repository. The transformation
stage can encompass a wide array of operations, including but not limited to
aggregators, data masking, expression evaluations, joiner logic, filtering, lookups,
ranking, routing, union operations, XML processing, normalization (H2R -
Hierarchical to Relational, R2H - Relational to Hierarchical), and interactions with
web services. These processes collectively serve to normalize, standardize,
cleanse, and filter the data, rendering it fit for consumption in analytics, business
functions, and other downstream activities.
3. Load: The final phase involves moving the transformed data into a permanent
target system. This destination could be a target database, data warehouse, data
mart, data store, data hub, or a data lake, located either on-premises or in the
cloud.
ETL pipelines are generally considered most appropriate for handling smaller datasets
that necessitate complex and often multi-step transformations. For scenarios involving
larger, often unstructured datasets where transformations might be less complex or
deferred, an ELT (Extract, Load, Transform) approach is frequently preferred. This
distinction implies that platforms like Informatica PowerCenter excel in environments
where data requires significant reshaping, cleansing, or enrichment before it can be
effectively utilized for analytical purposes, aligning well with traditional data warehousing
and the creation of integrated data environments.
● Overview of Informatica PowerCenter and its Role in Data IntegrationInformatica
PowerCenter is an enterprise-grade data integration platform extensively used for
implementing ETL operations. Its primary function is to facilitate backend data operations,
such as cleaning up data, modifying data based on a predefined set of business rules, or
simply loading bulk data from various sources to designated targets. The platform is
designed to manage and execute data movement and transformation processes, enabling
organizations to integrate data from diverse systems. This includes capabilities for bulk
data movement and Change Data Capture (CDC), which allows for the identification and
processing of only the data that has changed since the last extraction, optimizing
efficiency.The utility of Informatica PowerCenter extends to any scenario where a data
system exists and requires backend operations to be performed on its data. This broad
applicability across various industries and data environments, without being confined to
specific database types or applications, underscores its versatility. The platform's design
for high configurability allows it to meet a wide spectrum of business rules for data
cleansing, modification, and loading, positioning it as a comprehensive rather than a niche
solution for data integration challenges.
● PowerCenter Architecture: Domain, Nodes, Repository Service, Integration
ServiceInformatica PowerCenter's architecture is fundamentally a Service-Oriented
Architecture (SOA), which promotes modularity and scalability by organizing
functionalities into distinct, interacting services. This design is crucial for handling the
large data volumes and complex processing loads typical in enterprise environments, as
services can be distributed across multiple machines.Key architectural components
include:
○ Informatica Domain: This is the primary administrative unit within PowerCenter. It
acts as a collection of one or more nodes and services, organized into folders and
sub-folders for administrative convenience. The domain facilitates communication
and management across its constituent parts.
■ Nodes: A node is a logical representation of a physical server machine
where Informatica services and processes run. A domain can comprise
multiple nodes, allowing for distributed processing and high availability.
■ Gateway Node: Within a domain, one or more nodes can be designated as
gateway nodes. These nodes are responsible for receiving requests from
various PowerCenter client tools (like Designer, Workflow Manager) and
routing them to the appropriate service (e.g., Repository Service, Integration
Service) running on potentially different nodes within the domain.
○ PowerCenter Repository and Repository Service:
■ The PowerCenter Repository is a relational database (e.g., Oracle, SQL
Server, DB2) that serves as the central metadata store for the PowerCenter
environment. It contains definitions for all objects created within PowerCenter,
such as source and target metadata, mapping logic, transformation rules,
workflow configurations, and operational metadata. The integrity and
availability of this repository are paramount, as any loss or corruption would
severely impact the PowerCenter environment.
■ The Repository Service is a dedicated application service that manages
connections from PowerCenter client tools and the Integration Service to the
PowerCenter repository. It is a multi-threaded process responsible for
fetching, inserting, and updating metadata in the repository tables, thereby
ensuring metadata consistency. The Repository Service also handles object
locking to prevent concurrent modifications and can manage object versions
if version control is enabled. This version control capability is a significant
feature for robust development lifecycle management, enabling rollback,
auditing, and collaborative development efforts.
○ Integration Service: This is the core execution engine of Informatica PowerCenter.
When a workflow is initiated, the Integration Service reads the workflow metadata
(including mapping details and session configurations) from the repository. It then
executes the tasks defined in the workflow, performing the actual data extraction,
transformation, and loading operations between source and target systems. The
Integration Service generates detailed logs of its operations.
The clear architectural separation between metadata management (handled by the Repository
Service) and data processing (handled by the Integration Service) is a key design principle. This
allows for optimized performance characteristics for different aspects of ETL operations. The
Repository Service's performance is critical during development and deployment phases when
metadata is frequently accessed, while the Integration Service's performance (dependent on
CPU, memory, and network bandwidth for data movement) is crucial during the actual execution
of ETL jobs.
Section 2: Navigating PowerCenter Client Tools
Informatica PowerCenter provides a suite of client tools, each designed for specific aspects of
the ETL development and management lifecycle. These tools connect to the PowerCenter
domain to interact with the Repository and Integration Services.
● PowerCenter Designer: Defining Sources, Targets, and Creating MappingsThe
PowerCenter Designer is the primary development environment where ETL developers
create the core logic for data transformation. Its main purpose is to define mappings,
which specify how data is extracted from sources, transformed, and loaded into
targets.Key functionalities and components within the Designer include :
○ Source Analyzer: Used to import or manually create source definitions. The
Designer can connect to a wide array of source types, including relational
databases (e.g., Oracle, SQL Server), flat files (delimited or fixed-width), COBOL
files (facilitating integration with mainframe systems), and Microsoft Excel files.
Importing metadata directly from these sources, rather than manual definition,
significantly reduces development time and minimizes errors. This capability to
handle diverse source types underscores PowerCenter's adaptability in
heterogeneous enterprise data landscapes.
○ Target Designer: Used to import or manually create target definitions. Similar to
sources, this tool allows developers to define the structure of the data's destination.
○ Mapping Designer: This is the canvas where mappings are visually constructed.
Developers drag source and target definitions onto the workspace and then
introduce and configure various transformations to define the data flow and logic.
○ Mapplet Designer: Allows the creation of mapplets, which are reusable sets of
transformations. These mapplets can then be incorporated into multiple mappings,
promoting modularity and consistency in ETL logic. The concept of reusability is
fundamental to efficient ETL development, as common routines (e.g., data
cleansing, standardization) can be built once and deployed across many processes,
simplifying maintenance and ensuring uniformity.
○ Transformation Developer: Used to create reusable transformations.
The Designer also provides a range of supporting tools and options, such as general
configuration settings, customizable toolbars, workspace navigation aids, data preview
capabilities, and features for managing versioned objects if version control is enabled in
the repository.
● PowerCenter Workflow Manager: Building Workflows and TasksWhile the Designer
focuses on what data transformations occur, the PowerCenter Workflow Manager is used
to define how and when these transformations are executed. It is the tool for building,
scheduling, and managing workflows, which are the executable units in PowerCenter.The
Workflow Manager interface is typically organized into three main tabs :
○ Task Developer: This area is used to create various types of reusable tasks.
Common tasks include:
■ Session Task: This is a fundamental task that links a specific mapping
(created in Designer) to physical data sources and targets through
connection objects. It also defines runtime properties for the mapping's
execution, such as memory allocation, commit intervals, and error handling. A
mapping cannot run without being encapsulated in a session task.
■ Command Task: Allows the execution of operating system commands or
scripts as part of the workflow (e.g., pre-processing file checks,
post-processing archival scripts).
■ Email Task: Used to send automated email notifications about workflow
status (e.g., success, failure, warnings).
■ Other tasks include Timer, Event-Wait, Decision, Assignment, etc., providing
rich orchestration capabilities. The availability of such diverse task types
allows PowerCenter workflows to automate complex data processes that
extend beyond simple data movement.
○ Worklet Designer: Worklets are reusable groups of tasks. Similar to mapplets for
transformations, worklets allow for the modularization of workflow logic. A common
sequence of tasks can be defined in a worklet and then used in multiple workflows.
○ Workflow Designer: This is where workflows are constructed by adding and linking
tasks and worklets. Workflows define the order of execution, dependencies
between tasks, and conditional logic for the data processing pipeline.
The separation of mapping design and workflow orchestration is a deliberate architectural
choice. It allows ETL developers to concentrate on data logic within mappings, while
operations teams or schedulers can manage the execution flow, dependencies, and error
handling at the workflow level without needing to delve into the detailed transformation
logic of each mapping.
● PowerCenter Workflow Monitor: Tracking and Reviewing ExecutionThe PowerCenter
Workflow Monitor serves as the operational dashboard for observing and managing the
execution of PowerCenter workflows and sessions. It provides real-time and historical
views of job status.Key features include :
○ Multiple Views: Offers different perspectives on job execution, such as:
■ Task View: Displays workflow runs in a report-like format, showing details
like status, start/completion times, and the node on which tasks executed.
■ Gantt Chart View: Provides a chronological, graphical representation of
workflow runs, illustrating task durations and dependencies. This view is
particularly useful for identifying performance bottlenecks within a workflow
by visualizing which tasks are taking the longest or where dependencies are
causing delays.
○ Interactive Control: The Monitor is not merely a passive display; it allows for active
management of running jobs. Users can stop, abort, or restart workflows and
individual tasks. This capability is critical for production support, enabling
intervention if a job encounters issues or needs to be paused.
○ Log Access: Provides access to session and workflow logs, which contain detailed
information about the execution, including errors, warnings, and performance
statistics.
● Repository Manager: Managing Repository ObjectsThe Repository Manager is an
administrative client tool used for managing objects and metadata within the PowerCenter
repository. Its functions are crucial for maintaining an organized, secure, and
version-controlled ETL environment, especially in larger teams or more complex
deployments.Typical tasks performed in the Repository Manager include:
○ Folder Management: Creating and organizing repository folders to structure ETL
projects.
○ Deployment: Managing the deployment of PowerCenter objects (mappings,
workflows, etc.) between different environments (e.g., from development to testing,
and then to production).
○ Security Management: Defining users, groups, and permissions to control access
to repository objects. This ensures that developers and operators only have access
to the objects and functionalities relevant to their roles, preventing unauthorized
modifications and maintaining data governance.
○ Version Control Management: If version control is enabled, managing object
versions, viewing version history, and comparing versions.
Section 3: Designing and Developing Mappings
Mappings are the heart of Informatica PowerCenter, defining the detailed logic for data
extraction, transformation, and loading. Effective mapping design is crucial for data accuracy,
performance, and maintainability.
● Understanding Mapping Components and Data FlowA mapping in PowerCenter is a
visual representation of the data flow from source(s) to target(s), incorporating various
transformations along the way. The key components of a mapping include:
○ Source Definitions: These represent the structure and properties of the source
data (e.g., tables, files).
○ Target Definitions: These represent the structure and properties of the destination
for the transformed data.
○ Transformations: These are objects that modify, cleanse, aggregate, join, or route
data as it flows through the mapping. PowerCenter offers a rich library of
transformations.
○ Links (or Connectors): These connect the ports of sources, transformations, and
targets, defining the path of data flow through the mapping.
The data generally flows from left to right: from source definitions, through a series of
transformations, and finally into target definitions. The visual nature of this design, where
developers drag, drop, and link these objects, makes even complex data flows relatively
intuitive to understand and maintain compared to purely code-based ETL solutions. This
visual paradigm can lower the entry barrier for ETL development and improve
collaboration among team members. However, for very intricate mappings, careful
organization, clear naming conventions, and the use of reusable components like
mapplets are essential to prevent visual clutter and maintain clarity.
● In-Depth Look at Key TransformationsInformatica PowerCenter provides a wide array
of transformations to handle diverse data integration requirements. Understanding their
purpose, type (active/passive, connected/unconnected), and performance characteristics
is vital for effective mapping design. An active transformation can change the number of
rows passing through it (e.g., Filter, Aggregator), while a passive transformation does not
change the row count (e.g., Expression). A connected transformation is part of the direct
data flow, while an unconnected transformation is called from another transformation
(typically an Expression) as needed.
○ Source Qualifier (SQ) Transformation
■ Type: Active, Connected.
■ Function: The Source Qualifier transformation represents the rows that the
Integration Service reads from relational or flat file sources when a session
runs. It is automatically added to a mapping when such a source is dragged
into the Mapping Designer. Its primary roles include converting
source-specific data types to Informatica's native data types and providing a
powerful mechanism to customize or override how data is fetched from the
source.
■ Key Features & Use Cases:
■ SQL Override: For relational sources, developers can provide a
custom SQL query to replace the default query generated by
PowerCenter. This is extremely useful for performing joins of multiple
tables directly in the source database, applying complex filtering logic
at the database level, or calling database functions.
■ Source Filter: Allows specifying a filter condition that is applied when
reading data from the source, reducing the number of rows brought into
the mapping pipeline.
■ Number of Sorted Ports: Can be used to indicate that the incoming
data from a relational source is already sorted by specific columns,
which can optimize downstream transformations like Aggregators or
Joiners that benefit from sorted input.
■ Select Distinct: Option to retrieve only unique rows from the source.
■ Performance Considerations: Using the Source Qualifier to filter or join
data at the source database level is often a critical performance optimization
technique. Databases are generally highly optimized for these operations.
Processing data within the database before it even enters the PowerCenter
data pipeline reduces network traffic and the load on the Integration Service.
For instance, if joining several tables from the same Oracle database, a SQL
override in the Source Qualifier is typically much more efficient than bringing
each table into the mapping separately and using multiple Joiner
transformations.
○ Expression Transformation
■ Type: Passive, Connected.
■ Function: The Expression transformation is used for performing row-level
calculations and manipulations. It processes data on a record-by-record basis
without altering the number of rows passing through it.
■ Key Features & Use Cases:
■ Deriving new columns based on calculations involving other ports (e.g.,
Price * Quantity = Total_Amount).
■ Concatenating strings (e.g., FirstName | | ' ' | | LastName = FullName).
■ Converting data types using built-in functions.
■ Implementing conditional logic using functions like IIF or DECODE.
■ Calling unconnected Lookup transformations or stored procedures.
■ Using variables to store values across rows for more complex
calculations (e.g., calculating running totals, though dedicated
transformations might be better for some aggregate scenarios).
■ Design Considerations: Expression transformations are fundamental for
most data cleansing and enrichment tasks. While they are powerful,
embedding overly complex logic within a single Expression transformation
can make the mapping difficult to understand and debug. It is often advisable
to break down very intricate calculations into several simpler Expression
transformations or to utilize internal variables within an Expression
transformation for better readability and intermediate value checking.
○ Filter Transformation
■ Type: Active, Connected.
■ Function: The Filter transformation routes rows that meet a specified
condition to downstream transformations, discarding rows that do not satisfy
the condition.
■ Key Features & Use Cases:
■ Removing irrelevant data based on business rules (e.g., Status =
'Active', Order_Amount > 0).
■ Splitting data flow based on a single condition (for multiple conditions, a
Router is often preferred).
■ Performance Considerations: A crucial best practice is to place Filter
transformations as early as possible in the mapping data flow, ideally right
after the Source Qualifier (or even incorporate the filter logic within the SQ
itself if feasible). By eliminating unwanted rows at the beginning of the
pipeline, the volume of data processed by subsequent, potentially more
resource-intensive transformations is reduced, leading to significant
performance improvements.
○ Aggregator Transformation
■ Type: Active, Connected.
■ Function: The Aggregator transformation performs aggregate calculations,
such as SUM, AVG, COUNT, MIN, MAX, on groups of data. It processes input
rows, groups them based on specified "Group By" ports, and then outputs a
single row for each group containing the aggregated values.
■ Key Features & Use Cases:
■ Calculating summary statistics (e.g., total sales per region, average
order value per customer).
■ Removing duplicate records by grouping by all ports and selecting the
first or last record in each group.
■ Performance Considerations: The Aggregator is a stateful transformation
that uses an "aggregate cache" (memory and potentially disk) to store group
information and intermediate results during processing. The size and
management of this cache are critical for performance, particularly with large
datasets or high-cardinality group-by keys.
■ Sorted Input: Providing input data that is already sorted on the "Group
By" ports significantly improves Aggregator performance. When input is
sorted, the Aggregator can process groups sequentially and finalize
calculations for a group once all its rows have been received, reducing
the amount of data that needs to be held in the cache simultaneously.
This often involves placing a Sorter transformation before the
Aggregator.
■ Limit Ports: Limiting the number of connected input/output or output
ports can reduce the amount of data the Aggregator stores in its data
cache.
■ Filter Before Aggregating: Filtering data before it reaches the
Aggregator reduces unnecessary aggregation operations.
○ Joiner Transformation
■ Type: Active, Connected.
■ Function: The Joiner transformation is used to join data from two separate
input pipelines (sources) within a mapping based on a specified join
condition. It is particularly useful for joining heterogeneous sources (e.g., a
flat file and a relational table, or tables from different database instances) or
for joining data streams that have been transformed differently within the
same mapping.
■ Key Features & Use Cases:
■ Supports various join types: Normal (Inner) Join, Master Outer Join
(Left Outer Join on Detail), Detail Outer Join (Right Outer Join on
Detail), and Full Outer Join.
■ Requires designating one input source as the "Master" and the other as
the "Detail."
■ Performance Considerations:
■ Database Joins: When joining tables from the same relational
database, it is generally more performant to perform the join within the
Source Qualifier transformation (using a SQL override) or in a
pre-session SQL command rather than using a Joiner transformation.
The Joiner introduces processing overhead within the Integration
Service.
■ Master Source Selection: For an unsorted Joiner transformation,
designate the source with fewer rows as the master source. The Joiner
typically caches the master source and streams the detail source; a
smaller master cache is more efficient. For a sorted Joiner, designate
the source with fewer duplicate key values as the master source to
optimize cache usage.
■ Sorted Input: Configuring the Joiner transformation to use sorted input
data (sorted on the join key columns from both sources) can
significantly improve session performance, especially for large
datasets, as it allows the Integration Service to minimize disk input and
output by using a sort-merge join algorithm.
○ Rank Transformation
■ Type: Active, Connected.
■ Function: The Rank transformation is used to select or rank rows based on a
specific port's value within groups. It can identify the top or bottom N rows for
each group.
■ Key Features & Use Cases:
■ Finding top N or bottom N records (e.g., top 5 salespersons per region,
bottom 3 performing products).
■ Ranking all rows within groups based on a certain criteria.
■ Performance Considerations: Rank is an active transformation because it
typically filters the data to return only the ranked subset. It also requires
caching to hold group information and perform the ranking calculations.
Similar to the Aggregator, its performance can be influenced by cache size
and the cardinality of its group-by ports.
○ Lookup Transformation
■ Type: Can be Active or Passive; Connected or Unconnected.
■ Function: The Lookup transformation is used to retrieve values from a
lookup source (which can be a relational table, flat file, or even a target
definition) based on a condition matching input data. It is commonly used to
enrich data, validate values, or retrieve related information.
■ Key Features & Use Cases:
■ Retrieving a description for a code (e.g., looking up Product_Name
based on Product_ID).
■ Checking if a record exists in another table.
■ Implementing Slowly Changing Dimensions (SCDs).
■ Performance Considerations: Lookup performance is heavily dependent on
its caching strategy and proper database indexing.
■ Caching: Lookups can be cached (static, dynamic, persistent) or
uncached.
■ Uncached Lookups query the lookup source for every input row,
which can be very slow for large input datasets.
■ Cached Lookups load the lookup source data into memory or
disk cache at the start of the session (or when first called for
dynamic cache), significantly speeding up subsequent lookups.
■ Persistent Cache allows the cache file created by one session to
be reused by other sessions, which is highly efficient for static or
infrequently changing lookup data.
■ Indexing: If the lookup source is a database table, ensure that the
columns used in the lookup condition are indexed in the database. This
drastically speeds up the queries PowerCenter sends to fetch lookup
data, whether for building the cache or for uncached lookups.
■ SQL Override: When using a SQL override in a Lookup
transformation, it's a best practice to suppress the ORDER BY clause
(e.g., by adding -- at the end of the query if the database supports it as
a comment) unless sorting is explicitly needed for the lookup logic, as it
can add unnecessary overhead.
○ Router Transformation
■ Type: Active, Connected.
■ Function: The Router transformation is used to test input data against
multiple conditions and route rows that meet these conditions to different
downstream data flows or output groups. It has one input group and multiple
output groups (one for each user-defined condition, plus a default group for
rows not meeting any condition).
■ Key Features & Use Cases:
■ Splitting a single data stream into multiple streams based on different
criteria (e.g., routing customers to different target tables based on their
region or purchase history).
■ Performance Considerations: Using a single Router transformation is
generally more efficient than using multiple Filter transformations to achieve
the same conditional splitting of data. This is because the Router reads the
input data only once and evaluates all conditions, whereas multiple Filters
would each process the full set of incoming rows (or the output of the
preceding filter).
○ Other Essential Transformations:
■ Sequence Generator Transformation: (Passive, Connected) Generates a
unique sequence of numbers (e.g., for creating surrogate primary keys).
■ Update Strategy Transformation: (Active, Connected) Flags rows for how
they should be treated by the target (e.g., insert, update, delete, reject). This
is essential for loading data into targets that require more than simple inserts.
It's often recommended to minimize the number of Update Strategy
transformations if possible, perhaps by consolidating logic.
■ Normalizer Transformation: (Active, Connected) Used primarily with
COBOL sources or to pivot rows into columns or columns into rows
(denormalizing or normalizing data structures).
■ Transaction Control Transformation: (Active, Connected) Provides
fine-grained control over commit and rollback operations within a mapping.
This allows developers to define transaction boundaries based on data
conditions or row counts, which is crucial for data integrity and recovery
strategies, especially when loading large volumes of data or dealing with
complex dependencies. For instance, one might commit after processing a
complete set of related parent-child records.
The following table summarizes key PowerCenter transformations:
Transformation Name Type (Active/Passive, Primary Function/Use Key Performance
Connected/Unconnecte Case Considerations
d)
Source Qualifier (SQ) Active, Connected Represents data read Use SQL override for
from relational/flat files; database-side
allows SQL override, joins/filtering. Filter
filtering, sorting at early.
source.
Expression Passive, Connected Performs row-level Break down complex
calculations, data logic. Numeric
manipulation, and calls operations are faster
unconnected than string.
transformations.
Filter Active, Connected Removes rows that do Place as early as
not meet a specified possible in the mapping
Transformation Name Type (Active/Passive, Primary Function/Use Key Performance
Connected/Unconnecte Case Considerations
d)
condition. to reduce data volume
for subsequent
transformations.
Aggregator Active, Connected Performs aggregate Use sorted input. Filter
calculations (SUM, data before
AVG, COUNT, etc.) on aggregating. Optimize
groups of data. cache size. Limit
connected ports.
Joiner Active, Connected Joins data from two Perform joins in
heterogeneous sources database via SQ if
or data streams within sources are
a mapping. homogeneous.
Designate smaller/less
duplicate-key source as
master. Use sorted
input. Optimize cache
size.
Rank Active, Connected Ranks rows within Optimize cache size.
groups and can filter for Group by appropriate
top/bottom N rows. fields.
Lookup Active/Passive, Looks up values in a Use caching (static,
Connected/Unconnecte source (table, file) dynamic, persistent).
d based on input data. Ensure database
indexes on lookup
condition columns.
Suppress ORDER BY
in lookup SQL override.
Router Active, Connected Splits a single data More efficient than
stream into multiple multiple Filter
output streams based transformations for
on multiple conditions. mutually exclusive
conditions as input is
read once.
Sequence Generator Passive, Connected Generates unique Generally efficient;
numeric sequences, ensure appropriate
often for surrogate cache size for
keys. sequence values if high
concurrency.
Update Strategy Active, Connected Flags rows for insert, Minimize the number of
update, delete, or reject Update Strategy
for target loading. transformations if
possible. Logic is
usually driven by
comparing source to
Transformation Name Type (Active/Passive, Primary Function/Use Key Performance
Connected/Unconnecte Case Considerations
d)
target.
Normalizer Active, Connected Pivots rows to columns Primarily used for
or columns to rows, specific data structures;
often for COBOL understand its impact
sources or on row count.
denormalizing/normalizi
ng relational data.
Transaction Control Active, Connected Defines commit and Essential for data
rollback points within a integrity in complex
mapping based on loads; plan transaction
data-driven conditions. boundaries carefully.
● Mapping Parameters and VariablesMapping parameters and variables are user-defined
values that enhance the flexibility and reusability of mappings by allowing them to behave
dynamically without direct modification of their design.
○ Mapping Parameters: These are values that are set before a session starts and
remain constant throughout the session's execution. They are often used for values
that might change between environments (dev, test, prod) or between different runs
of the same mapping, such as file paths, database connection names, or date
range filters.
○ Mapping Variables: These are values that can change during a session's
execution. The Integration Service saves the final value of a mapping variable in the
repository at the end of a successful session, and this value can then be used in
subsequent session runs. Mapping variables are commonly used for implementing
incremental data loads (e.g., storing the last processed timestamp or maximum ID)
or for passing values between tasks in a workflow.
Both parameters and variables are typically defined within a mapping and their values are
often supplied at runtime through a parameter file. Parameter files are text files that list
parameter and variable names along with their desired values for a specific session run.
Using parameter files is a best practice as it decouples configuration values from the
mapping/session design, making deployments and modifications easier to manage.
● Best Practices for Mapping DesignAdhering to best practices in mapping design is
crucial for developing ETL processes that are not only correct but also performant and
maintainable. Many of these practices focus on minimizing data processing, simplifying
logic, and leveraging database strengths.
○ Optimize Data Flow Early:
■ Use active transformations that reduce the number of records (like Filters or
Source Qualifiers with filtering conditions) as early as possible in the data
flow. This minimizes the volume of data processed by subsequent
transformations.
■ Connect only the necessary ports between transformations. Unused ports
can still consume memory and add minor overhead.
○ Leverage Database Capabilities:
■ When joining tables from the same relational database, prefer using a SQL
override in the Source Qualifier transformation over a Joiner transformation
within the mapping. Databases are generally more efficient at performing
joins.
■ Similarly, apply filtering conditions in the Source Qualifier whenever possible.
○ Transformation-Specific Optimizations:
■ For Joiner transformations, if joining unsorted data, designate the source with
fewer rows as the master source. If joining sorted data, designate the source
with fewer duplicate key values as the master.
■ For Lookup transformations, ensure that the columns in the lookup condition
are indexed in the database. Use persistent caches for frequently accessed,
static lookup tables to avoid repeated database queries across sessions.
■ When comparing values, numeric operations are generally faster than string
operations. If possible, convert flags or codes to integers for comparison.
○ Simplify Logic and Maintainability:
■ Replace complex filter conditions within a Filter transformation with a simpler
flag (e.g., 'Y'/'N'). The logic to set this flag can be encapsulated within an
upstream Expression transformation, making the filter condition itself
straightforward and often more performant.
■ Avoid unnecessary data type conversions between compatible types, as
these can slow down performance.
■ A practical technique for improving maintainability is to include placeholder
Expression transformations immediately after source qualifiers and just
before target definitions. These placeholders, initially passing through all
ports without modification, can help preserve port links if the source/target
definitions change (e.g., a column is added or its data type is altered),
potentially saving significant rework in complex mappings.
Section 4: Workflow and Session Management
Once mappings are designed in the PowerCenter Designer, they need to be incorporated into
executable units called sessions, which are then orchestrated by workflows within the
PowerCenter Workflow Manager.
● Creating and Configuring Workflows and SessionsWorkflows are the primary objects
that are scheduled and run by the Integration Service. They define the sequence of tasks,
dependencies, and control flow for an ETL process. A session is a specific type of task
within a workflow that executes a single mapping. It acts as the bridge between the logical
design of the mapping and the physical environment where the data resides and is
processed.The creation process typically involves:
1. Defining Connections: In the Workflow Manager, connection objects must be
created for all physical data sources and targets that the sessions will interact with.
These connections store the necessary details (e.g., database type, server name,
username, password, file paths). For example, to connect to cloud applications, one
would use the Workflow Manager to create a PowerExchange for Cloud
Applications connection, selecting "Informatica Cloud" as the connection type and
providing the relevant credentials and configuration details.
2. Creating Session Tasks: For each mapping that needs to be executed, a session
task is created in the Task Developer or directly within the Workflow Designer.
When creating a session, the developer selects the mapping it will execute.
3. Configuring Session Properties: Each session has a multitude of properties that
control its runtime behavior. These are configured on various tabs within the
session properties dialog (e.g., Mapping tab, Properties tab, Config Object tab).
This includes assigning the previously defined connection objects to the sources
and targets within the mapping, setting commit intervals, memory allocation, error
handling, and logging options. The session object's ability to abstract physical
connection details from the mapping design allows a single mapping to be used in
multiple sessions, each potentially connecting to different environments (e.g.,
development, testing, production databases) by simply assigning different
connection objects at the session level.
4. Building Workflows: In the Workflow Designer, session tasks, along with other
task types (Command, Email, Timer, etc.), are added to a workflow. Links are drawn
between tasks to define the order of execution and dependencies. Conditional links
can be used to control flow based on the outcome of previous tasks (e.g., run Task
B only if Task A succeeds).
● Session Properties and Configuration Best PracticesProper configuration of session
properties is paramount for achieving optimal performance, ensuring data integrity, and
facilitating effective operational management. Default settings are often insufficient for
enterprise-level ETL loads.Key session properties and their best practices include:
○ Commit Interval (Properties Tab): This property defines the number of target rows
processed before the Integration Service issues a commit to the target database.
For large volume loads, increasing the commit interval from the default (e.g., 10,000
rows ) can significantly improve performance by reducing the frequency of database
commit operations. However, this needs to be balanced with database transaction
log capacity and recovery considerations.
○ DTM Buffer Size (Config Object Tab): The Data Transformation Manager (DTM)
process, which executes the session, uses a buffer pool to hold data blocks as they
move between transformations. DTM Buffer Size (e.g., 12MB, 24MB, or Auto) and
Default buffer block size (e.g., 64KB, 128KB, or Auto) are critical memory settings.
Insufficient buffer memory can lead to excessive disk I/O as the DTM spills data to
disk, severely degrading performance. These values should be tuned based on
data volume, row width, transformation complexity, and available server memory.
For sessions with partitioning, the DTM Buffer Size may need to be increased
proportionally to the number of partitions.
○ Transformation Cache Sizes (Mapping Tab > Partitions/Transformations):
Stateful transformations like Aggregator, Joiner, Rank, and Lookup use their own
memory caches. The size of these caches (e.g., Lookup Cache Size, Aggregator
Data Cache Size, Aggregator Index Cache Size) should be configured
appropriately. If set too low, these transformations will spill to disk, impacting
performance. If set to Auto, the Integration Service attempts to allocate memory, but
manual tuning is often required for optimal results.
○ Target Load Type (Mapping Tab > Targets): For relational targets, this can be set
to Normal or Bulk. Bulk loading bypasses database logging for certain database
types and operations, which can dramatically speed up data insertion, especially for
large initial loads. However, recoverability might be affected, and it may lock the
target table.
○ Log Management (Properties Tab):
■ Save Session Log by: Can be set to Session runs (retains a specified number
of log files) or Session timestamp (creates a new log file for each run,
appending a timestamp). Using Session timestamp is generally
recommended for production environments to maintain a complete history.
■ Save Session Log for These Runs: If Session runs is selected, this numeric
value specifies how many previous log files to keep.
■ The Integration Service variable $PMSessionLogCount can also be used to
control the number of session logs retained globally for the service.
○ Error Handling (Config Object Tab):
■ Stop on errors: Defines the number of non-fatal errors the Integration Service
allows before stopping the session.
■ Override tracing: Allows setting the tracing level for transformations within the
session, which can override the tracing levels set in the mapping. For
production, this should generally be Normal or Terse to minimize logging
overhead.
○ Parameter Filename (Properties Tab): Specifies the path and name of the
parameter file to be used by the session at runtime.
○ Treat Source Rows As (Properties Tab): This property (values: Insert, Update,
Delete, Data-driven) instructs the Integration Service on how to flag rows for the
target. When set to Data-driven, the session relies on an Update Strategy
transformation within the mapping to determine the operation for each row. This is
fundamental for implementing various data loading strategies, such as Type 1 or
Type 2 Slowly Changing Dimensions.
The following table outlines common PowerCenter session properties crucial for optimization:
Property Name Typical Location (Tab) Description/Impact Recommended Best
Practice/When to
Adjust
Commit Interval Properties Number of target rows Increase for large bulk
processed before a loads to reduce commit
database commit. overhead (e.g.,
50,000-100,000+).
Balance with recovery
time and database log
space.
DTM Buffer Size Config Object Total memory allocated Increase for complex
to the DTM buffer pool mappings or large data
for data blocks. volumes. Start with
Auto or a reasonable
value (e.g.,
24MB-512MB+) and
tune based on session
stats and available
RAM.
Default buffer block Config Object Size of individual Increase for mappings
size blocks within the DTM with wide rows. Auto or
buffer pool. 64KB-256KB are
common. Tune with
DTM Buffer Size.
Lookup Cache Size Mapping > Memory allocated for Set explicitly based on
Transformations caching lookup data. lookup table size. If
(Lookup) Auto is insufficient
Property Name Typical Location (Tab) Description/Impact Recommended Best
Practice/When to
Adjust
(causing disk spill),
calculate required size
(num_rows * row_size)
and set. Use persistent
cache for static
lookups.
Aggregator/Joiner Mapping > Memory for data and Set explicitly based on
Data/Index Cache Transformations index caches for these group key cardinality
(Aggregator/Joiner) transformations. and data volume.
Insufficient cache leads
to disk I/O.
Target Load Type Mapping > Targets Normal (uses standard Use Bulk for initial/large
DML, logged) or Bulk data loads if target DB
(uses database bulk supports it and
utility, faster, less recovery implications
logging). are acceptable.
Tracing Level Config Object (Override Level of detail written to Use Normal or Terse
tracing) / Mapping > session log (None, for production. Verbose
Transformations Terse, Normal, Verbose Data for debugging
Init, Verbose Data). only as it severely
impacts performance.
Stop on errors Config Object Number of non-fatal Set to 0 or 1 for critical
errors allowed before production jobs to stop
session stops. on first error. Higher
values for less critical
jobs or during testing.
Parameter Filename Properties Path to the parameter Essential for
file for the session. environment-specific
configurations and
manageability. Ensure
path is accessible by
Integration Service.
● Understanding Commit Intervals, Logging, and RecoveryThese operational aspects
are vital for the reliability and manageability of ETL processes.
○ Commit Intervals: As discussed, the commit interval affects both performance and
recoverability. A larger interval generally means faster loads due to fewer database
commits but implies that more data might need to be reprocessed or rolled back if
the session fails mid-interval and full recovery is not configured. The chosen interval
must also consider the capacity of the target database's transaction log or rollback
segments, as very large uncommitted transactions can exhaust these resources.
○ Logging: PowerCenter generates detailed logs that are indispensable for
troubleshooting and monitoring.
■ Session Logs: Created for each session run, they contain execution
statistics (rows read/written, throughput), error messages, warnings, and
thread activity. The level of detail is controlled by the tracing level settings.
■ Workflow Logs: Created for each workflow run, these logs provide
information on the overall workflow progress, initialization of processes,
status of individual tasks within the workflow, and summary information. While
verbose logging is invaluable during development and debugging, it incurs
performance overhead and should be reduced to Normal or Terse levels in
production environments.
○ Recovery: PowerCenter offers session recovery capabilities, allowing a failed
session to be restarted from the last successfully committed checkpoint rather than
from the beginning. This is particularly useful for long-running sessions processing
large data volumes. When a session is configured for recovery, the Integration
Service maintains state information in recovery tables (created on a target database
system or as flat files). If the session fails, it can be restarted in recovery mode, and
the Integration Service will use this state information to resume processing. For
recovery to be effective, the mapping logic should generally be deterministic. While
a powerful feature for resilience, it adds some processing overhead and might not
be suitable or necessary for all sessions.
Section 5: Performance Tuning and Optimization in PowerCenter
Achieving optimal performance in Informatica PowerCenter is an iterative process that involves
identifying bottlenecks and applying targeted optimizations at various levels of the ETL
architecture.
● Identifying Bottlenecks (Source, Target, Mapping, Session, System)A bottleneck is a
component or process that limits the overall throughput of an ETL job. Performance
issues can arise from various areas :
○ Source Bottlenecks: Occur when the Integration Service spends excessive time
reading data from source systems. Causes include slow database queries (due to
unoptimized SQL, missing indexes, or database load), slow network connectivity to
the source, or limitations of the source system itself (e.g., an overloaded OLTP
database).
○ Target Bottlenecks: Occur when the Integration Service is slow in writing data to
target systems. Causes include heavy loading operations, database contention,
insufficient indexing on target tables (especially if updates or lookups are performed
on the target), slow network to the target, or database configurations not optimized
for writes.
○ Mapping Bottlenecks: Stem from inefficient transformation logic within the
PowerCenter mapping. This could be due to complex calculations, improper use of
transformations (e.g., uncached lookups on large tables, inefficient join conditions),
or processing large data volumes through multiple transformations unnecessarily.
○ Session Bottlenecks: Relate to the configuration of the PowerCenter session.
Causes include insufficient DTM buffer memory, small cache sizes for
memory-intensive transformations (Aggregator, Joiner, Lookup, Sorter), or an
inappropriately small commit interval for target loading.
○ System Bottlenecks: Involve limitations in the underlying hardware or operating
system resources of the Informatica server or database servers. This includes
insufficient CPU power, inadequate memory (leading to swapping), slow disk I/O, or
network bandwidth limitations.
The session log is a primary tool for bottleneck identification. It provides thread
statistics, including the run time, idle time, and busy time for reader threads (source
processing), writer threads (target processing), and transformation threads. A high
busy percentage for a specific thread type points towards that area as a potential
bottleneck. For example, if the reader thread shows high busy time, the bottleneck
is likely at the source or in reading from the source.Systematic testing can help
isolate bottlenecks :
○ To test source performance, create a simple mapping that reads from the source
and writes to a flat file target.
○ To test target performance, create a simple mapping that reads from a flat file
source and writes to the target.
○ To identify mapping-level bottlenecks, progressively remove or simplify
transformations in the mapping and observe performance changes. For instance,
temporarily replacing a complex transformation with a pass-through Expression
transformation can help quantify its impact.
Performance tuning is typically an iterative cycle: identify the most significant bottleneck,
apply an optimization, re-test, and then look for the next bottleneck, as resolving one can
often unmask another.
● Optimizing TransformationsEfficient transformation logic is key to good mapping
performance.
○ Filter Early and Often: As emphasized previously, use Filter transformations or
Source Qualifier filters to remove unnecessary rows as early as possible in the data
flow.
○ Aggregator and Joiner:
■ Provide sorted input data (on group-by keys for Aggregator, on join keys for
Joiner) to enable more efficient algorithms and reduce cache requirements.
This often involves using a Sorter transformation upstream.
■ Filter data before it reaches these transformations.
■ For Aggregators, limit the number of connected input/output or output ports to
reduce the amount of data stored in the data cache.
■ For Joiners, if joining tables from the same database, perform the join in the
Source Qualifier if possible. When using a Joiner, carefully select the master
source (fewer rows for unsorted, fewer duplicate keys for sorted).
○ Lookup:
■ Utilize caching (static, dynamic, or persistent) whenever possible, especially
for large lookup tables, to avoid repeated database queries.
■ Ensure database indexes exist on the columns used in lookup conditions.
○ Router vs. Multiple Filters: Use a single Router transformation instead of multiple
Filter transformations when splitting data based on several mutually exclusive
conditions, as the Router reads the input data only once.
○ Minimize Update Strategies: While necessary for flagging rows for insert, update,
or delete, Update Strategy transformations add processing overhead. If possible,
consolidate logic or explore alternatives if performance is critical.
○ Data Type Considerations: Numeric comparisons and operations are generally
faster than string comparisons and manipulations. If feasible, use integer flags or
codes.
● Session-Level Performance Enhancements (DTM Buffer, Caches)Many critical
performance settings are configured at the session level:
○ DTM Buffer Memory: Allocate sufficient memory for the DTM buffer pool
(configured via DTM Buffer Size and Default buffer block size properties in the
session's Config Object tab). Inadequate buffer memory forces the Integration
Service to page data to disk, drastically slowing down execution. Tuning these
requires considering row sizes, data volume, and transformation complexity.
○ Transformation Caches: Explicitly configure adequate memory for index and data
caches for transformations like Aggregator, Joiner, Lookup, and Sorter within the
session properties (Mapping tab, under the specific transformation instance).
Default or Auto settings may not be optimal for large datasets.
○ Commit Interval: For target databases, increase the commit interval for large data
loads to reduce the overhead of frequent commit operations.
○ Bulk Loading: When loading data into relational targets that support it (e.g.,
Oracle, SQL Server, Teradata), using the Bulk load type can provide substantial
performance improvements over Normal load, especially for initial or large-volume
data insertions. Bulk loading often bypasses some database logging and uses more
efficient loading paths.
○ Logging Level: In production environments, set the session tracing level (Override
tracing option in Config Object tab) to Normal or Terse. Verbose Init and especially
Verbose Data generate extensive logs and significantly degrade performance; they
should only be used for debugging specific issues in development or testing
environments.
○ Partitioning: For very large datasets, PowerCenter's partitioning capabilities can
distribute data processing across multiple threads or even multiple nodes (if grid is
configured). If a session uses partitioning, the DTM Buffer Size generally needs to
be increased proportionally to the number of partitions to provide adequate memory
for each partition.
○ Pushdown Optimization: If the source and/or target databases are Massively
Parallel Processing (MPP) systems (e.g., Teradata, Netezza, Greenplum) or
powerful SMP databases like Oracle Exadata, consider using Pushdown
Optimization. This feature allows the Integration Service to translate parts of the
transformation logic into SQL and push it down to the source or target database for
execution. This leverages the database's processing power and can significantly
reduce data movement and improve performance. However, not all transformations
or functions can be pushed down, and it requires careful testing to ensure the
generated SQL is efficient.
Section 6: Error Handling and Debugging
Robust error handling and effective debugging techniques are essential for developing reliable
ETL processes and for quickly resolving issues when they arise.
● Common PowerCenter Errors and SolutionsDevelopers and administrators may
encounter various errors during the ETL lifecycle :
○ Session Failures: These are common and can result from a multitude of issues:
■ Invalid Transformations or Mappings: Syntax errors in expressions,
unconnected ports, or logically flawed transformation configurations.
■ Incorrect Connection Information: Wrong database credentials, incorrect
server names, or inaccessible file paths.
■ Database Connection Failures: Network issues, database server down,
listener problems, or insufficient database privileges.
■ Missing Source/Target Objects: Source files not found at the specified
location, or target tables not existing in the database.
■ Data Type Mismatches: Incompatible data types between linked ports in a
mapping or between PowerCenter data types and database column types.
■ Schema Mismatches: Differences between the source/target definitions in
PowerCenter and the actual structure in the database (e.g., missing columns,
different column names).
○ Data Truncation Errors: Occur when the length of the data being inserted into a
target column exceeds the defined length of that column. This is common when
integrating data from sources with inconsistent data formats or when field lengths
are underestimated during design.
○ Lookup Transformation Failures: Can happen if a lookup value is not found in the
lookup source and no default value or error handling for non-matches is configured.
This can lead to NULL values being propagated or even session failure if the lookup
is critical.
○ Performance Bottlenecks: While not strictly errors, severe performance issues
can render an ETL job unusable. These manifest as excessively long run times.
○ Connectivity Issues: Problems related to network configurations, firewalls blocking
ports between the PowerCenter server and database servers, or expired
authentication tokens.
Many of these errors can be proactively identified and mitigated through thorough ETL
testing, which involves validating data movement, data counts in source and target,
transformation logic against requirements, and the preservation of table relationships and
keys. A significant number of errors arise from discrepancies between the ETL design
assumptions and the actual state of the source/target systems or the data itself. This
underscores the importance of comprehensive data profiling and requirements analysis
before and during development. For instance, data profiling can reveal potential data
truncation issues by comparing source field lengths with target definitions, or identify the
completeness of lookup data to anticipate potential lookup failures.
● Using Session Logs and Reject FilesPowerCenter provides essential tools for
diagnosing errors and understanding data quality issues:
○ Session Logs: These are detailed records of a session's execution. They contain :
■ Load statistics (number of rows read from each source, rows applied to each
target, rows rejected).
■ Error messages and warnings encountered during the session.
■ Thread activity and performance counters (useful for bottleneck analysis).
■ Information about initialization and completion of various stages. The level of
detail in the session log is controlled by the tracing level setting. For
debugging, increasing verbosity (e.g., to Verbose Data for a specific
problematic transformation) can provide row-level insight, but this should be
used judiciously due to its performance impact. In production, Normal or
Terse tracing is recommended. Session logs can be filtered by error codes or
searched for specific messages to quickly identify issues.
○ Reject Files (Bad Files): When the Integration Service encounters rows that
cannot be written to a target due to errors (e.g., database constraint violations, data
type mismatches causing conversion errors, data truncation, rows explicitly flagged
for reject by an Update Strategy transformation), it can write these rejected rows to
a reject file. Each target instance in a session can have an associated reject file.
Analyzing reject files is crucial for understanding data quality problems in source
systems or flaws in transformation logic. Instead of merely discarding these rows, a
common advanced error handling strategy is to configure the session to capture
these rejected records and load them into a dedicated error table in a database,
along with metadata about the error (e.g., error message, timestamp, workflow
name, session name). This allows data stewards or support teams to analyze the
problematic data, potentially correct it in the source systems, and reprocess it.
● Debugging Mappings with the PowerCenter DebuggerFor complex mapping logic
where session logs may not provide sufficient detail to pinpoint an issue, the PowerCenter
Designer includes an interactive Debugger tool. The Debugger allows developers to:
○ Execute a Session Interactively: Run a session in debug mode, processing data
row by row or in batches.
○ Set Breakpoints: Define points in the mapping (e.g., at a specific transformation
instance or based on a data condition) where the execution will halt. This allows
inspection of data values at that specific point in the flow.
○ Inspect Data Values: When execution is paused at a breakpoint or when stepping
through data, developers can view the data values in all ports of the
transformations.
○ Monitor Transformation Logic: Step through the data flow transformation by
transformation, or even row by row, to observe how data is being modified.
○ Evaluate Expressions and Variables: Check the results of expressions and the
values of mapping variables.
○ Modify Variable Values: In some cases, variable values can be modified during a
debug session to test different scenarios.
○ Validate with Sample Data: Use the debugger with a small, representative sample
of source data to validate mapping logic before running it against large datasets.
The Debugger is an invaluable tool for troubleshooting intricate transformation logic,
data-dependent errors, or unexpected output from mappings. It provides a much more granular
view of the data's journey through the mapping than session logs alone, significantly speeding
up the debugging process for complex scenarios.
Part 2: Visualizing Data and Gaining Insights with Tableau
Once data has been integrated and prepared, often through tools like Informatica PowerCenter,
the next step is to transform it into visual insights. Tableau is a leading platform in the Business
Intelligence (BI) and data visualization space, renowned for its ease of use and powerful
analytical capabilities.
Section 7: Introduction to Tableau and Data Visualization Principles
● What is Tableau? Its Role in Business IntelligenceTableau is a powerful and rapidly
evolving data visualization tool extensively used within the Business Intelligence industry.
Its core strength lies in enabling swift data analysis and the creation of rich, interactive
visualizations, typically presented in the form of dashboards and worksheets. A key
aspect of Tableau's design philosophy is accessibility; it empowers users, including those
who may not have a deep technical background, to connect to data, explore it, and create
customized dashboards to answer business questions. This democratization of data
analysis has been a significant factor in its widespread adoption, as it allows business
users to become more self-sufficient in their data exploration and reporting needs,
potentially leading to faster insights and more data-driven decision-making.
● The Tableau Workspace: Data Pane, Dimensions, Measures, Shelves, Cards, Views,
SheetsUnderstanding the Tableau workspace is fundamental to using the tool effectively.
When a data source is connected, the workspace presents several key areas :
○ Data Pane: Located on the left side, the Data pane lists all available fields from the
connected data source(s). Tableau automatically categorizes these fields into:
■ Dimensions: These are typically qualitative, categorical fields that provide
context to the data (e.g., 'Region', 'Product Name', 'Order Date'). They are
used to slice, dice, and segment the data. When dragged into a view,
dimensions usually create headers or labels.
■ Measures: These are typically quantitative, numeric fields that can be
aggregated (e.g., 'Sales', 'Profit', 'Quantity'). When dragged into a view,
measures are usually aggregated (e.g., SUM, AVG, MIN, MAX) and form
axes. The distinction between dimensions and measures is crucial as it
guides how Tableau interprets data and suggests appropriate visualizations.
○ Shelves: These are designated areas at the top and side of the workspace where
fields (represented as "pills") are dragged from the Data pane to build a
visualization (a "view"). Key shelves include:
■ Columns Shelf: Fields placed here typically define the columns of a table or
the X-axis of a chart.
■ Rows Shelf: Fields placed here typically define the rows of a table or the
Y-axis of a chart. Tableau uses color-coding for pills: blue pills generally
represent discrete fields (often dimensions, creating distinct labels), while
green pills represent continuous fields (often measures, creating continuous
axes). This visual cue helps users understand how Tableau is treating each
field.
○ Cards: Several cards provide control over different aspects of the visualization:
■ Marks Card: This is a central control panel for defining the visual properties
of the data points (marks) in the view. Users can drag fields to various
properties on the Marks card, such as:
■ Color: To encode data using different colors.
■ Size: To encode data using different sizes of marks.
■ Label: To display data values as text labels on the marks.
■ Detail: To break down the marks to a finer level of granularity without
necessarily applying a distinct visual encoding like color or size.
■ Tooltip: To customize the information that appears when a user hovers
over a mark.
■ Shape: To use different shapes for marks (relevant for scatter plots,
etc.). The Marks card also allows changing the overall mark type (e.g.,
bar, line, circle, square, area). This direct manipulation of visual
encodings is a cornerstone of Tableau's exploratory power.
■ Filters Card (or Shelf): Fields dragged here are used to filter the data
displayed in the view.
■ Pages Card (or Shelf): Used to break a view into a sequence of pages,
allowing users to step through members of a dimension.
■ Legends: Automatically generated when fields are placed on Color, Size, or
Shape, legends help interpret the visual encodings.
○ View (or Canvas): The main area where the visualization is built and displayed.
○ Sheets, Dashboards, Stories:
■ Sheet (or Worksheet): A single visualization (a chart, map, or table) is
created on a sheet.
■ Dashboard: A collection of one or more sheets, often combined with
interactive elements, to present a consolidated view of data.
■ Story: A sequence of sheets or dashboards arranged to narrate insights or
guide an audience through an analysis.
○ Measure Names and Measure Values: These are special fields automatically
generated by Tableau. Measure Values contains all the measures from the Data
pane, and Measure Names contains their names. These are used when multiple
measures need to be displayed in a single pane or axis, often in text tables or when
creating charts with multiple measure lines or bars.
● Key Data Visualization Best Practices (Inspired by Stephen Few, Alberto
Cairo)Creating effective data visualizations goes beyond simply plotting data; it involves
applying principles that ensure clarity, accuracy, and efficient communication of insights.
The works of experts like Stephen Few and Alberto Cairo provide valuable guidance.
○ Stephen Few's Principles: In "Information Dashboard Design," Few emphasizes
the importance of displaying data for "at-a-glance monitoring." This means
dashboards should be designed to convey critical information quickly and clearly,
avoiding common design pitfalls that lead to inefficient or cluttered displays. The
focus is on functionality and enabling users to understand key trends and make
informed decisions rapidly. This involves maximizing the data-ink ratio (the
proportion of ink used to display data versus non-data elements) and avoiding
"chart junk" – unnecessary visual embellishments that don't add informational
value.
○ Alberto Cairo's "The Functional Art": Cairo advocates that data visualization
should be considered "functional art"—it must serve a clear purpose (functional)
while also being aesthetically appealing to engage the audience. His approach
involves understanding how our brains perceive and remember information and
using design elements like color and typography effectively to enhance both
comprehension and aesthetic quality, without sacrificing accuracy or best practices.
This implies leveraging pre-attentive attributes (visual properties like color, size,
shape, position that the brain processes very quickly) to strategically guide the
viewer's attention to important data points.
○ General Best Practices for Clarity and Usability:
■ Simplicity and Purpose: Keep visualizations simple and focused on
conveying a specific message or answering a particular question. Prioritize
clarity over decoration.
■ Choose the Right Chart Type: Select a visualization type that is appropriate
for the data being presented and the insight to be communicated (e.g., bar
charts for comparisons, line charts for trends, scatter plots for relationships,
maps for geographic data).
■ Avoid Clutter: Limit the number of visuals on a single dashboard and the
amount of information within each visual. Use white space effectively to
improve readability and reduce cognitive load.
■ Strategic Use of Color and Fonts: Limit the palette of colors and the
number of fonts used. Colors should be chosen purposefully (e.g., to
distinguish categories, highlight key data, or indicate positive/negative values)
and with consideration for color vision deficiencies. Consistent font usage
enhances professionalism and readability.
■ Test for Usability: Test dashboards on different devices and screen sizes to
ensure accessibility and a good user experience for all viewers.
The overarching theme from these experts and practices is that effective data visualization is a
communication discipline. The goal is to present data in a way that is not only visually appealing
but, more importantly, easily understood, accurately interpreted, and leads to actionable
insights.
Section 8: Connecting to and Preparing Data in Tableau
Before visualizations can be created, data must be connected to and often prepared for
analysis. Tableau provides a range of options for data connection and preparation.
● Connecting to Various Data Sources (Files, Servers, Databases)Tableau Desktop
offers a versatile "Connect" pane on its Start page, which serves as the gateway to
accessing data from numerous sources. These sources can be broadly categorized as:
○ Files: Connection to local files such as Microsoft Excel spreadsheets (.xls,.xlsx),
text files (.csv,.txt), PDF files, JSON files, spatial files (e.g., Shapefiles, KML), and
statistical files (e.g., SAS, SPSS, R).
○ Servers/Databases: Direct connections to a wide array of relational databases
(e.g., Microsoft SQL Server, Oracle, MySQL, PostgreSQL, Amazon Redshift,
Teradata), cloud data warehouses (e.g., Snowflake, Google BigQuery), NoSQL
databases (via ODBC/JDBC or specific connectors like MongoDB BI Connector),
and online services or applications (e.g., Google Analytics, Salesforce, SAP
HANA). Tableau also allows connection to data sources published on Tableau
Server or Tableau Cloud.
○ Previously Used Data Sources: Quick access to data sources that have been
connected to before.
For some database connections, installing the appropriate database drivers on the
machine running Tableau Desktop is a prerequisite. Tableau's extensive list of
native connectors simplifies the process of data access, often reducing the need for
intermediate data staging solely for Tableau consumption. This direct connectivity
can accelerate the time-to-insight.Once a connection is established, Tableau
typically performs the following actions :
1. Navigates to the Data Source page (or a new worksheet if connecting to a simple
file). The Data Source page allows users to see a preview of the data (e.g., the first
1,000 rows), select specific tables, and perform initial data preparation tasks.
2. Populates the Data pane (in the worksheet view) with fields (columns) from the
selected data source.
3. Automatically assigns a data type (e.g., string, number, date, boolean, geographic)
and a role (Dimension or Measure) to each field. While this automation is
convenient, it is crucial for users to verify these assignments. For example, a
numeric identifier like 'Employee ID' might be misclassified as a Measure (intended
for aggregation) instead of a Dimension. Such misclassifications can lead to
incorrect aggregations or nonsensical visualizations if not corrected by the user via
the Data pane or Data Source page.
● Data Preparation Techniques on the Data Source Page (and within Tableau
Desktop)Tableau Desktop provides several built-in tools for cleaning, shaping, and
combining data, primarily on the Data Source page, but some operations can also be
performed via calculated fields or directly in the view.
○ Joining Data: Joins are used to combine data from two or more tables that share
common fields (join keys). Tableau's Data Source page offers a visual interface for
creating joins. Users can drag tables onto the canvas and define join clauses,
specifying the type of join (Inner, Left, Right, Full Outer) and the fields to join on.
Tableau also supports cross-database joins, allowing tables from different data
sources (e.g., an Excel file and a SQL Server table) to be joined. While visually
intuitive, inefficient join configurations (e.g., joining very large tables on unindexed
columns or using complex calculated join conditions) can severely degrade
performance, especially with live connections. The "Assume Referential Integrity"
option can sometimes optimize join performance by telling Tableau it doesn't need
to perform certain pre-join checks, but this should only be used if the underlying
data integrity is guaranteed. For frequent or complex cross-database joins, creating
a federated view or a materialized table in a database layer might be more
performant than relying solely on Tableau's cross-database join capability.
○ Blending Data: Data blending is a Tableau-specific technique used to combine
data from different published data sources on a worksheet-by-worksheet basis. It
involves defining linking fields (common dimensions) between a primary data
source and one or more secondary data sources. Data from the secondary
source(s) is always aggregated to the level of the linking fields in the primary source
before being combined. This is fundamentally different from joins, which combine
data at the row level before aggregation. Blending is useful when data cannot be
joined at the database level (e.g., sources are from entirely separate systems) or
when data needs to be combined at different levels of granularity. However, users
must be aware of its limitations: all measures from secondary sources are
aggregated, and asterisks (*) can appear in the view if the linking fields are not
unique in the secondary source for the given level of detail in the primary source.
For performance, it is advisable to blend on high-level (less granular) dimensions
and keep secondary data sources relatively small.
○ Unioning Data: Unioning is used to append rows from multiple tables or files that
share a similar column structure. This is common when data is split, for example,
into monthly or regional files (e.g., Sales_January.csv, Sales_February.csv).
Tableau allows manual unioning by dragging tables together on the Data Source
page or using a wildcard union for files (e.g., specifying Sales_*.csv to union all
matching files in a directory). Tableau attempts to match columns by name, but
users can manually merge mismatched fields. Generally, unioning is supported
within the same data source connection; unioning data directly between a live
connection and an extract from a different source within Tableau Desktop is not
straightforward.
○ Pivoting Data: Pivoting transforms data structure. Columns to Rows pivot is used
to convert data from a wide (crosstab) format to a tall (columnar) format. This is
useful when measures are spread across multiple columns (e.g., columns for
Q1_Sales, Q2_Sales, Q3_Sales, Q4_Sales). Pivoting these would create two new
columns, one for 'Quarter' (containing Q1, Q2, Q3, Q4) and one for 'Sales'
(containing the corresponding sales values). Tableau generally works best with tall
data for analysis and visualization. Rows to Columns pivot is less common in
initial prep but can be done in Tableau Prep. Tableau Desktop has limitations on
pivoting fields that are the result of calculations or splits; Tableau Prep offers more
flexibility here.
○ Splitting Fields: Tableau allows splitting a single column into multiple columns
based on a delimiter (e.g., splitting "FirstName,LastName" into 'FirstName' and
'LastName' columns using ',' as the delimiter) or using Tableau's automatic split
capability. This can be done from the Data Source page by selecting the column
and choosing the split option, or from the Data pane in a worksheet. For more
complex splitting logic based on patterns or specific positions, calculated fields
using string functions (SPLIT, LEFT, RIGHT, MID, FIND) provide greater control.
○ Cleaning Data (Data Interpreter, Manual Adjustments): Spreadsheet data, in
particular, can often be messy, containing headers, footers, merged cells, or
multiple tables within a single sheet. Tableau's Data Interpreter attempts to
automatically clean such data from sources like Excel, CSV, PDF, and Google
Sheets by detecting sub-tables and removing extraneous formatting. While helpful,
users should always review the results of the Data Interpreter (Tableau provides
feedback on the changes made) and be prepared to perform manual adjustments.
Other manual cleaning steps include correcting data types assigned by Tableau,
renaming columns for clarity, creating aliases for field members, and handling null
values.
○ Filtering Data from Data Sources: Data source filters are applied at the very
beginning when Tableau connects to the data, restricting the dataset that is brought
into Tableau for analysis. These filters are applied before any worksheet-level
filters. Using data source filters is a critical performance optimization technique,
especially for very large datasets connected live. By filtering at the source, Tableau
queries and processes less data from the database, which can dramatically reduce
query times and the volume of data transferred. This is also beneficial for extracts,
as it reduces the extract size and refresh duration. For example, if a workbook only
analyzes data for the current fiscal year, applying a data source filter for this year
ensures that all queries and extract operations are limited to this relevant subset.
● Live vs. Extract Connections: When to Use Each and Refreshing ExtractsOne of the
most fundamental decisions when connecting to data in Tableau is whether to use a Live
Connection or an Extract Connection.
○ Live Connection: With a live connection, Tableau sends queries directly to the
source database or file in real-time (or near real-time) as users interact with
visualizations.
■ Pros: Data is always up-to-date, reflecting the latest changes in the source
system. It's suitable for scenarios requiring immediate data freshness and for
leveraging the power of existing fast, analytics-optimized databases.
■ Cons: Performance is heavily dependent on the speed and load of the
source database and network latency. Complex visualizations or dashboards
with many users can put a significant query load on the source system. Not
all Tableau functionalities might be supported by every live database
connection.
■ Use Cases: Dashboards requiring up-to-the-second data (e.g., operational
monitoring, financial trading), connecting to high-performance analytical
databases.
○ Extract Connection (Tableau Hyper Extract): An extract is a snapshot of the data
(or a subset of it) that is compressed and stored in Tableau's proprietary
high-performance in-memory data engine, Hyper.
■ Pros: Generally offers significantly better performance for complex
visualizations and large datasets because queries are processed by the
optimized Hyper engine. Reduces the load on the source database. Enables
offline data access in Tableau Desktop (users can work with the data without
being connected to the source). Can unlock certain Tableau functionalities not
available with all live connections. Extracts also facilitate portability, as they
can be packaged within a Tableau Packaged Workbook (.twbx) for sharing
with users who don't have access to the live data source.
■ Cons: Data is not real-time; it's only as fresh as the last extract refresh.
Extracts need to be refreshed to incorporate new data, which can take time
for very large datasets.
■ Use Cases: Improving dashboard performance, working with slow databases
or heavily loaded transactional systems, reducing query load on source
systems, enabling offline analysis, sharing workbooks with self-contained
data.
○ Deciding Between Live and Extract: The choice often involves a trade-off
between data freshness and performance. If a source database is slow or already
under heavy load, an extract is usually preferred. If real-time data is paramount and
the source database can handle the query load, a live connection might be suitable.
A common strategy for sales dashboards, for example, is to use an extract
refreshed nightly or incrementally during the day, providing good performance while
isolating the operational sales database from excessive analytical queries.
○ Refreshing Extracts: Extracts can be refreshed to update them with the latest data
from the original source.
■ Full Refresh: Replaces all data in the extract with the current data from the
source.
■ Incremental Refresh: Appends only new rows to the extract that have been
added to the source since the last refresh. This requires a column in the
source data that indicates new rows (e.g., a timestamp or an incrementing
ID). Incremental refreshes are generally much faster than full refreshes for
large, growing datasets. Extract refreshes can be performed manually in
Tableau Desktop or scheduled to run automatically on Tableau Server or
Tableau Cloud.
The following table provides a comparison of Live and Extract connections:
Feature Live Connection Extract Connection When to Prefer Which
(Hyper)
Data Freshness Real-time or near Snapshot of data; only Live: When
real-time, reflects latest as fresh as the last up-to-the-second data
source data. refresh. is critical (e.g.,
operational monitoring,
stock trading). 
Extract: When near
real-time is acceptable
and performance/offline
access is more
important (e.g.,
daily/hourly reports).
Performance Dependent on source Generally faster due to Live: If source
database speed, optimized Hyper database is highly
network, query engine, especially for optimized for analytics
complexity. Can be complex vizzes and and can handle query
slow for complex vizzes large datasets. load. Extract: To
or slow DBs. boost performance for
slow source systems,
Feature Live Connection Extract Connection When to Prefer Which
(Hyper)
complex dashboards,
or large datasets.
Database Load Direct query load on Load on source Live: If source
the source database for database only during database has ample
every interaction. extract refresh. Queries capacity. Extract:
run against the Hyper To reduce query load
extract. on
operational/transaction
al source systems or
when source DB
resources are
constrained.
Offline Access Requires active Enables offline work in Live: Not suitable for
connection to the data Tableau Desktop as offline access. 
source. data is stored locally. Extract: When users
need to work with data
without network
connectivity.
Portability (.twbx) Data is not part of Extract can be Live: If all users have
the.twbx unless an packaged within a.twbx access to the live data
extract is also created. file for easy sharing. source. Extract:
For sharing workbooks
with self-contained data
with users who lack
direct data source
access.
Data Volume Handling Can query very large Hyper engine is Live: For querying
databases, but optimized for large massive, well-indexed
performance depends datasets (billions of databases where
on DB. rows). extracting all data is
impractical. 
Extract: For improved
query performance on
large datasets that can
be feasibly extracted.
Setup Complexity Generally simpler initial Requires an initial Live: For quick
setup (provide extract creation step, connections to readily
connection details). which can be available, performant
time-consuming for sources. Extract:
large data. Refresh When the benefits of
schedule needs setup. performance and offline
access outweigh the
initial setup and refresh
management.
Tableau Functionality Some Tableau Broader support for Live: If required
Feature Live Connection Extract Connection When to Prefer Which
(Hyper)
functions might not be Tableau functions due functions are supported
supported by all live to Hyper engine by the specific
connections. capabilities. database. 
Extract: When specific
Tableau functions are
needed that are better
supported by or
optimized for Hyper.
Section 9: Creating Visualizations in Tableau
Tableau's strength lies in its ability to rapidly create a wide variety of visualizations. This is
achieved by dragging dimensions and measures onto shelves and utilizing the Marks card to
control visual encoding.
● Building Common Chart TypesTableau's "Show Me" feature can automatically suggest
and create chart types based on the fields selected in the Data pane. While helpful for
beginners, manually constructing charts provides greater control and understanding.
○ Bar Charts: Ideal for comparing data across discrete categories.
■ Steps: Drag a dimension (e.g., 'Category') to the Columns or Rows shelf.
Drag a measure (e.g., 'SUM(Sales)') to the opposing shelf. Tableau will
typically default to a bar chart. The orientation (vertical or horizontal) depends
on which shelf holds the dimension versus the measure.
○ Line Charts: Primarily used for showing trends over time or a continuous
progression.
■ Steps: Drag a date dimension (e.g., 'Order Date') to the Columns shelf.
Right-click the date pill and select a continuous date part (e.g.,
MONTH(Order Date) from the continuous section, often green). Drag a
measure (e.g., 'SUM(Sales)') to the Rows shelf. To show multiple lines (e.g.,
for different regions), drag another dimension (e.g., 'Region') to the Color
property on the Marks card.
○ Pie Charts: Used to show proportions of a whole. Best used with a small number of
categories (2-5 slices recommended).
■ Steps: Change the Mark type on the Marks card to 'Pie'. Drag a dimension
(e.g., 'Customer Segment') to the Color property. Drag a measure (e.g.,
'SUM(Sales)') to the Angle property. To display percentages, drag the
measure to Label, then right-click the label pill, select "Quick Table
Calculation," then "Percent of Total".
○ Area Charts: Similar to line charts but emphasize volume or magnitude of change
over time. Can be stacked or unstacked.
■ Steps: Build like a line chart (date dimension on Columns, measure on
Rows). Change the Mark type to 'Area'. To unstack areas (if a dimension is
on Color), go to Analysis > Stack Marks > Off.
○ Scatter Plots: Show the relationship between two numerical measures. Each mark
represents a pair of values.
■ Steps: Drag one measure (e.g., 'SUM(Sales)') to the Columns shelf and
another measure (e.g., 'SUM(Profit)') to the Rows shelf. This will initially
create a single mark. To see individual data points, drag one or more
dimensions (e.g., 'Product Name', 'Customer ID') to the Detail property on the
Marks card. Dimensions can also be added to Color, Shape, or Size for
further encoding. Trend lines can be added via the Analytics pane.
○ Histograms: Display the distribution of a continuous measure by dividing the data
into bins (intervals) and showing the count of values falling into each bin.
■ Steps: In the Data pane, right-click the continuous measure (e.g., 'Sales')
you want to analyze, select Create > Bins. Define the bin size in the dialog.
Drag the newly created bin dimension (e.g., 'Sales (bin)') to the Columns
shelf. Drag the original measure (or another field like COUNTD of Order ID)
to the Rows shelf and change its aggregation to COUNT or COUNTD.
○ Maps (Symbol Maps, Filled Maps): Visualize geographically encoded data.
■ Steps: If your data contains geographic fields (e.g., 'Country', 'State', 'City',
'Zip Code') that Tableau recognizes (indicated by a globe icon), double-click
the geographic field. Tableau will automatically generate Latitude and
Longitude fields and place them on Rows and Columns, creating a map. Drag
measures to Color or Size on the Marks card to encode data on the map
(e.g., color states by 'SUM(Profit)', size circles on cities by 'SUM(Sales)'). The
Mark type can be changed to 'Filled Map' for choropleth maps.
○ Text Tables (Crosstabs): Display data in a tabular format.
■ Steps: Drag one or more dimensions to the Rows shelf and one or more
dimensions to the Columns shelf. Drag one or more measures to the Text
property on the Marks card. If multiple measures are needed, use the
Measure Values field on Text and Measure Names on Rows or Columns.
○ Dual Axis (Combination) Charts: Display two measures with different scales on
the same chart, often using different mark types for each measure (e.g., bars for
sales, a line for profit margin).
■ Steps: Place a dimension (e.g., continuous MONTH(Order Date)) on
Columns. Drag the first measure (e.g., 'SUM(Sales)') to Rows. Drag the
second measure (e.g., 'AVG(Profit Ratio)') to the Rows shelf, to the right of
the first measure. Right-click the pill for the second measure on the Rows
shelf and select "Dual Axis." The two measures will now share the same
chart area but have separate Y-axes (one on the left, one on the right). To
align the scales if appropriate, right-click one of the Y-axes and select
"Synchronize Axis" (this is only possible if data types are compatible or can
be made so). Each measure will have its own Marks card (e.g., Marks
(SUM(Sales)), Marks (AVG(Profit Ratio))), allowing you to set different mark
types (e.g., Bar for Sales, Line for Profit Ratio) and other visual properties
independently.
The choice of chart type is critical for effective communication. An inappropriate chart can
obscure insights or mislead the audience (e.g., using a pie chart with too many categories
makes comparisons difficult ). Therefore, understanding the primary purpose of each chart type
is essential for selecting visualizations that accurately and effectively convey the intended
message.The following table summarizes common Tableau chart types and their typical uses:
Chart Type Brief Description Common Use Key Important Marks
Case(s) Dimensions/Meas Card Properties
ures Typically
Used
Bar Chart Compares values Comparing sales Dimension on Color (for
across discrete by product Rows/Columns; stacked/grouped
categories using category; ranking Measure on bars).
rectangular bars. items. Columns/Rows.
Line Chart Shows trends or Tracking sales Continuous Color (for multiple
changes in a over months; stock Date/Time lines/categories),
measure over a price changes. Dimension on Path.
continuous Columns; Measure
dimension, on Rows.
typically time.
Pie Chart Represents parts Market share by Dimension on Label (for
of a whole as region; budget Color; Measure on values/percentage
slices of a circle. allocation by Angle. s).
Best for few department.
categories.
Area Chart Similar to a line Showing Continuous Color (for stacked
chart, but the area cumulative sales Date/Time areas), Mark Type:
below the line is over time; Dimension on Area.
filled, emphasizing comparing Columns; Measure
volume or contributions of on Rows.
magnitude. categories.
Scatter Plot Displays the Correlation Measure on Trend Lines
relationship between Columns; Measure (Analytics Pane).
between two advertising spend on Rows;
numerical and sales; profit Dimension(s) on
measures using vs. sales for Detail/Color/Shape
individual marks. products. /Size.
Histogram Shows the Distribution of Bin Dimension Mark Type: Bar.
frequency exam scores; (from Measure) on
distribution of a frequency of order Columns;
single continuous sizes. COUNT/COUNTD
measure. of Measure/ID on
Rows.
Map Visualizes data Sales by state; Geographic Mark Type: Map,
geographically. customer density Dimension(s) Circle, Filled Map.
by zip code. (auto-creates
Lat/Long);
Measure on
Color/Size.
Text Table Displays data in a Detailed numerical Dimension(s) on -
grid format reporting; precise Rows/Columns;
(crosstab). value lookup. Measure(s) on
Text (often using
Chart Type Brief Description Common Use Key Important Marks
Case(s) Dimensions/Meas Card Properties
ures Typically
Used
Measure
Values/Measure
Names).
Dual Axis Chart Combines two Comparing salesDimension on Separate Marks
measures with (bars) with profit
Columns; Two cards for each
potentially different ratio (line) over
Measures on measure to
scales on a single time. Rows (second one customize mark
chart, using two set to Dual Axis). type, color, etc.
y-axes.
● Using the Marks Card (Color, Size, Label, Detail, Tooltip)The Marks card is the engine
for visually encoding data in Tableau. By dragging fields from the Data pane onto the
various properties (also called "shelves") on the Marks card, users control how data points
(marks) in the visualization appear.
○ Color: Assigns colors to marks based on the values of a dimension (discrete colors
for categories) or a measure (sequential or diverging color gradients for numerical
ranges).
○ Size: Varies the size of marks based on the values of a measure.
○ Label: Displays text labels directly on the marks, typically showing the values of a
dimension or measure.
○ Detail: Adds a dimension to the view to increase the level of granularity (i.e., create
more marks) without necessarily applying a distinct visual encoding like color or
size. This is useful when you want to see individual data points that might otherwise
be aggregated.
○ Tooltip: Customizes the information that appears in a pop-up box when a user
hovers the mouse over a mark. Tooltips are excellent for providing additional
context or data points on demand without cluttering the main visualization.
○ Shape: (For mark types like 'Shape' or when applicable) Assigns different shapes
to marks based on the values of a dimension.
○ Path: (For line or polygon mark types) Defines the order in which marks are
connected to form lines or polygons.
Effective use of the Marks card allows for the encoding of multiple data dimensions into a single
visualization, leading to richer and more nuanced insights. For instance, a scatter plot showing
Sales versus Profit can have marks colored by Region and shaped by Customer Segment.
However, it's important to avoid overloading a single visualization with too many visual
encodings, as this can lead to clutter and make the chart difficult to interpret, a common pitfall
for beginners.
Section 10: Enhancing Analysis with Calculated Fields, Filters, and Parameters
Tableau's analytical capabilities extend far beyond basic chart creation through the use of
calculated fields, filters, and parameters, which allow for customized computations, focused
data views, and interactive explorations.
● Creating and Using Calculated FieldsCalculated fields enable users to create new data
fields (either new dimensions or new measures) from existing data within Tableau by
defining formulas. These are essential for deriving new metrics, implementing custom
business logic, transforming data, or performing advanced analytical
computations.Tableau supports several types of calculations:
1. Basic Row-Level Calculations: These are performed for each row in the
underlying data source. Example: [Price] * [Quantity] to calculate Line_Item_Total.
2. Aggregate Calculations: These calculations involve an aggregate function (e.g.,
SUM, AVG, MIN, MAX, COUNTD). Example: SUM([Profit]) / SUM() to calculate
Profit Ratio. The result of an aggregate calculation depends on the dimensions
present in the view.
3. Level of Detail (LOD) Expressions: These are powerful calculations that allow
users to compute aggregations at different levels of detail than what is currently
defined by the dimensions in the view. There are three types:
■ FIXED: Computes an aggregate value for the specified dimensions,
regardless of other dimensions in the view. Example: {FIXED : SUM()}
calculates total sales for each customer.
■ INCLUDE: Computes an aggregate value including the specified dimensions
in addition to any dimensions in the view.
■ EXCLUDE: Computes an aggregate value excluding the specified
dimensions from the view's level of detail.
4. Table Calculations: These calculations are performed on the aggregated data that
is currently visible in the view (the "table" of data underlying the visualization). They
are useful for computations like running totals, percent of total, moving averages,
difference from previous, or ranking within the displayed data.
To create a calculated field :
1. In the Data pane, click the drop-down arrow and select "Create Calculated Field."
2. In the calculation editor, provide a name for the new field.
3. Enter the formula using available fields, functions, and operators. Tableau provides
a function list and auto-completion to assist.
4. Click "OK." The new field will appear in the Data pane (often with an '=' sign next to
its icon, e.g., =# for a calculated measure).
● Applying Filters to Worksheets and DashboardsFilters in Tableau are used to narrow
down the data displayed in a visualization, allowing users to focus on specific subsets of
interest.
○ How to Apply: Drag any dimension or measure from the Data pane to the Filters
shelf.
○ Filter Dialog: Upon dropping a field onto the Filters shelf, a dialog box appears.
The options in this dialog vary depending on the data type of the field:
■ Dimensions (Discrete): Typically shows a list of members to include or
exclude (e.g., select specific regions or product categories). Can also use
wildcard matching, conditions, or top/bottom N filters.
■ Measures (Continuous): Allows filtering based on a range of values (e.g.,
Sales between $1000 and $5000), or at least/at most values.
■ Dates (Discrete or Continuous): Offers options like relative dates (e.g., last
3 months), date ranges, or specific date parts.
○ Filter Scope: Filters can be applied at different scopes :
■ Current Worksheet: The default, applies only to the active sheet.
■ Multiple Specific Worksheets: Allows selecting other worksheets in the
workbook to which the filter should also apply.
■ All Using This Data Source: Applies the filter to all worksheets that use the
same primary data source.
■ All Using Related Data Sources: Applies the filter to all worksheets using
data sources that have defined relationships with the filter's data source. This
cross-worksheet filtering capability is crucial for creating interactive
dashboards where a single filter control can update multiple relevant views
simultaneously.
○ Context Filters: A special type of dimension filter. When a dimension filter is added
to context, Tableau creates a temporary table containing only the data that passes
the context filter. All other filters (except data source filters) are then applied to the
data in this temporary table. Context filters can improve performance in some
scenarios, particularly if they significantly reduce the dataset size for subsequent
complex filters or LOD calculations. They appear as grey pills on the Filters shelf.
● Using Parameters for Interactivity and Dynamic ViewsParameters are user-defined
values that act as dynamic placeholders in Tableau. Unlike filters that directly subset data,
parameters allow users to input a value that can then be incorporated into calculations,
filters, reference lines, sets, or actions, making views highly interactive and flexible. They
are invaluable for "what-if" analysis.Creating and using parameters typically involves three
steps :
1. Create the Parameter:
■ In the Data pane, click the drop-down arrow and select "Create Parameter."
■ In the dialog, give the parameter a name.
■ Specify its Data type (e.g., Integer, Float, String, Boolean, Date, Date &
Time).
■ Set a Current value (default).
■ Define Allowable values:
■ All: Allows any user input (simple text field).
■ List: Provides a predefined list of values for the user to select from.
Values can be manually entered or populated from a field.
■ Range: Allows selection within a specified minimum, maximum, and
step size (for numeric or date types). The parameter will appear in the
Parameters section at the bottom of the Data pane.
2. Incorporate the Parameter into the View: A parameter itself does nothing until it
is referenced by an element in the visualization. Common uses include:
■ In Calculated Fields: Replace a constant in a formula with the parameter.
Example: SUM() *. When the user changes the parameter value, the
calculation updates.
■ In Filters: Use a parameter to define a Top N filter (e.g., show Top [N
Parameter] products) or in a conditional filter expression.
■ In Reference Lines/Bands: Set the value of a reference line, band, or box to
a parameter, allowing users to dynamically adjust its position.
■ In Sets: Define set membership based on a parameter.
3. Show the Parameter Control: To allow users to interact with the parameter,
right-click the parameter in the Data pane and select "Show Parameter Control" (or
"Show Parameter" in newer versions). This adds a card to the worksheet or
dashboard (similar to a filter card) where users can input or select values. The
appearance of the control (e.g., slider, radio buttons, type-in field) can be
customized based on the parameter's data type and allowable values.
Parameter Actions further enhance interactivity by allowing users to change a parameter's
value by directly interacting with marks on a visualization (e.g., clicking a bar changes the
parameter value, which in turn updates other parts of the view). Parameters can also be made
dynamic, refreshing their list of values or current value from a field or calculation when the
workbook opens. This is useful for keeping parameter choices up-to-date with the underlying
data.
Section 11: Building Interactive Dashboards
Dashboards in Tableau are collections of worksheets, objects, and interactive elements
designed to provide a consolidated and often interactive view of data.
● Combining Views, Sheets, and Objects (Layout Containers, Text, Images)A
dashboard is created by clicking the "New Dashboard" icon at the bottom of the
workbook.
○ Adding Sheets: Worksheets (individual visualizations) are dragged from the
"Sheets" list in the Dashboard pane onto the dashboard canvas.
○ Layout Containers: To organize sheets and other objects effectively, Tableau
provides Horizontal and Vertical layout containers. These allow grouping of
related items and control how the dashboard resizes and objects are arranged.
Without containers, objects can become disorganized, especially if the dashboard
size is not fixed. Containers ensure a structured layout.
○ Dashboard Objects: Beyond sheets, various objects can be added to enhance a
dashboard :
■ Text: For titles, explanations, annotations, and other contextual information.
■ Image: To add logos, icons, or other visual elements. Images can be linked to
URLs.
■ Web Page: To embed live web pages within the dashboard (though some
sites may restrict embedding).
■ Blank: To add spacing and control layout.
■ Navigation Buttons: To create buttons that allow users to navigate to other
dashboards, sheets, or stories.
■ Download Buttons: To enable users to export the dashboard view as PDF,
PowerPoint, or PNG, or selected data as a crosstab (after publishing).
■ Extensions: To add custom functionality through third-party or custom-built
extensions.
■ Pulse Metrics: (Tableau Cloud) To embed existing metric cards. Effective
dashboard design involves more than just placing charts; it's about creating a
cohesive analytical narrative where objects are logically arranged and provide
clear context.
● Adding Interactivity: Dashboard Actions (Filter, Highlight, URL)Dashboard actions
transform a static collection of charts into an interactive analytical application, guiding
users through data exploration or enabling drill-down capabilities.
○ "Use as Filter": A quick way to enable basic interactivity. Selecting this option for a
sheet (often via its context menu on the dashboard) allows marks selected in that
sheet to filter other sheets on the dashboard that use the same or related data
sources.
○ Configurable Dashboard Actions: For more control, actions are created via the
Dashboard > Actions menu. Common types include:
■ Filter Actions: Define how selections in a source sheet(s) filter data in target
sheet(s). Can be triggered by Hover, Select (click), or Menu. Example:
Clicking a state on a map filters a bar chart of city sales and a line chart of
sales over time to show data only for that selected state.
■ Highlight Actions: Selections in a source sheet highlight related marks in
target sheet(s), drawing attention without filtering out other data.
■ URL Actions: Open a web page when a user interacts with a mark. The URL
can be dynamic, incorporating values from the selected data (e.g., linking to a
product search page using the selected product name).
■ Go to Sheet Actions: Navigate the user to another sheet, dashboard, or
story.
■ Parameter Actions: Allow user interaction with marks on a viz to change the
value of a parameter, which can then dynamically update other elements tied
to that parameter.
■ Set Actions: Allow user interaction to change the values in a set, enabling
sophisticated conditional logic and comparative analysis.
● Dashboard Design Best PracticesCreating effective dashboards involves both art and
science, focusing on clarity, performance, and user experience.
○ Purpose and Audience: Design dashboards with a clear purpose and target
audience in mind. What key questions should the dashboard answer? What actions
should it enable?
○ Limit Views: Avoid overcrowding. A common recommendation is to limit the
number of visualizations (views) on a single dashboard to around 3-5 to prevent
cognitive overload and improve performance. If more detail is needed, consider
breaking the analysis into multiple, linked dashboards.
○ Layout and Sizing:
■ Use layout containers for organization and responsive resizing (if not using
fixed size).
■ Fixed Dashboard Sizing: For better performance (due to caching) and
consistent layout across user screens, using a fixed dashboard size (e.g.,
1200x800 pixels) is often recommended over "Automatic" or "Range" sizing.
○ Navigation: If using multiple dashboards, provide clear navigation using navigation
buttons or actions. If internal navigation is well-implemented, consider publishing
without tabs to Tableau Server/Cloud, as this can improve initial load performance
by preventing elements from other non-visible dashboards from loading.
○ Clarity and Context:
■ Use informative and descriptive titles for the dashboard and individual charts.
■ Add annotations or text objects to explain key insights, outliers, or how to use
the dashboard.
■ Maintain consistency in fonts, colors, and formatting for a professional and
cohesive look.
■ Establish a clear visual hierarchy to guide the user's eye to the most
important information first.
○ Interactivity: Use actions and filters thoughtfully to enhance exploration without
overwhelming the user. Consider adding "Apply" buttons to filters if updates are
slow, giving users control over when queries are executed.
The most effective dashboards are often those that are focused, tell a clear story, and guide the
user towards specific insights rather than presenting a sea of unfiltered data.
Section 12: Tableau Performance Optimization and Common Issues
Ensuring that Tableau workbooks and dashboards perform efficiently is crucial for user adoption
and effective data analysis. Performance can be influenced by data source connections,
calculation complexity, and visualization design.
● Optimizing Workbook Performance (Data Sources, Calculations,
Rendering)Optimizing Tableau performance requires a holistic approach, addressing
potential bottlenecks at various stages:
1. Data Source Optimization:
■ Live vs. Extract: As discussed previously, choose the appropriate
connection type. Use extracts (Hyper) for large or slow data sources to
leverage Tableau's optimized engine.
■ Filter Data Early: Apply filters at the data source level or as extract filters to
reduce the volume of data Tableau processes from the outset.
■ Optimize Joins and Relationships: Ensure joins are efficient (e.g., on
indexed fields, correct join types). For relationships, define cardinality and
referential integrity settings appropriately to help Tableau generate optimal
queries.
■ Aggregate Data: If detailed row-level data isn't necessary for the
visualization, consider pre-aggregating data in the database or using
Tableau's aggregation capabilities to reduce the number of records
processed.
2. Calculation Optimization:
■ Efficiency of Logic: Use efficient calculation logic. For instance, CASE
statements are often faster than nested IF or ELSEIF statements for complex
conditional logic.
■ Data Types: Numeric calculations are generally faster than string
manipulations. Avoid unnecessary data type conversions. Use TODAY() for
date-level calculations if time-level precision (from NOW()) isn't required.
■ Minimize Complex Calculations: While LOD expressions are powerful,
evaluate if simpler alternatives (like table calculations or basic aggregates)
can achieve the same result with better performance.
■ Aggregate Measures: Ensure measures are aggregated in views (Analysis >
Aggregate Measures should be checked) unless disaggregated data is
explicitly needed, as disaggregation can lead to rendering many rows.
■ COUNTD Sparingly: Distinct counting (COUNTD) can be one of the slowest
aggregation types across many data sources; use it judiciously.
3. Rendering and Visualization Design Optimization:
■ Limit the Number of Marks: Each mark (bar, point, symbol) on a view
requires Tableau to perform rendering work. Dashboards with excessive
marks (e.g., large text tables, overly dense scatter plots) will be slow.
Aggregate data, use density maps for crowded point data, or filter out
irrelevant details to reduce mark counts.
■ Fixed Dashboard Size: Use fixed dashboard dimensions rather than
"Automatic" or "Range" sizing. This allows Tableau to cache layouts more
effectively and improves rendering consistency and speed.
■ Optimize Images: Use appropriately sized and compressed images (e.g.,
PNGs for transparency, JPGs for photos). Keep image file sizes small (e.g.,
under 50kb).
■ Efficient Filters:
■ Reduce the number of filters on a dashboard, especially quick filters
with high cardinality (many unique values).
■ Avoid overuse of "Show Only Relevant Values" for quick filters, as this
can trigger additional queries to update filter options.
■ Filtering on the results of aggregations or complex calculations can be
less performant than filtering on raw dimension values.
■ Dashboard Layout: Keep the initial dashboard view simple. Break complex
analyses into multiple, focused dashboards rather than one overloaded
dashboard. Remove unused worksheets, data sources, and device layouts
from the workbook.
■ Client-Side vs. Server-Side Rendering: By default, Tableau attempts to
render visualizations in the user's browser (client-side). For very complex
visualizations or when users have less powerful machines, forcing
server-side rendering (where the server generates images of the vizzes) can
sometimes improve perceived performance. This can be influenced using
URL parameters like ?:render=true (client-side) or ?:render=false
(server-side).
● Understanding Query Performance (Live vs. Extract impact)While Tableau provides
the interface for building visualizations, it's typically the underlying database that executes
the queries when using a live connection. However, the way Tableau constructs these
SQL queries, based on the fields dragged to shelves and the filters applied, significantly
impacts performance.
○ Users have reported instances where Tableau-generated queries can be inefficient
or may not fully leverage database optimizations (e.g., by unnecessarily casting
data types in WHERE clauses, which can suppress index usage, or by ignoring
some existing filters when populating quick filter lists, leading to full table scans).
○ When encountering slow query performance with live connections, using Tableau's
Performance Recording feature (Help > Settings and Performance > Start
Performance Recording) is crucial. This tool captures information about various
events, including query execution times, and can help identify which queries are
slow and how they are constructed.
○ Extracts (using the Hyper engine) generally result in faster query performance
because Hyper is specifically optimized for analytical workloads and the data is
stored in a Tableau-friendly format. If a live query is slow, testing the same
visualization with an extract can help determine if the bottleneck lies primarily with
the database's ability to execute the Tableau-generated query or with other factors.
○ Diagnosing query issues often involves starting with a simple view and gradually
adding complexity (fields, filters) to see when performance degrades. Sometimes,
using Custom SQL in Tableau to provide a pre-optimized query, or pre-aggregating
data in the database, may be necessary if Tableau's default query generation is
problematic for a specific scenario.
● Common Tableau Mistakes and How to Avoid Them (Data Cleaning, Visualization
Simplicity, Usability)Beginners, and sometimes even experienced users, can fall into
common traps that hinder the effectiveness and performance of their Tableau workbooks.
○ Insufficient Data Preparation:
■ Mistake: Importing data that is unclean, poorly structured, or has inconsistent
formatting (e.g., inconsistent date formats, mixed data types in a column,
missing values not properly handled, duplicate records).
■ Avoidance: Spend time understanding and cleaning data before or during
the initial import into Tableau. Utilize Tableau's Data Interpreter for messy
spreadsheets, manually correct data types, rename fields for clarity, split or
pivot data as needed, and develop a strategy for handling nulls or missing
values.
○ Overly Complex or Cluttered Visualizations:
■ Mistake: Trying to display too much information in a single chart or
dashboard; using too many colors, fonts, or visual embellishments that
distract from the data; choosing inappropriate chart types for the data or the
message.
■ Avoidance: Prioritize clarity and simplicity. Each visualization should have a
clear purpose. Limit the number of measures and dimensions in a single
chart. Use color and other visual encodings strategically and sparingly.
Employ white space effectively to improve readability. Adhere to data
visualization best practices (e.g., those inspired by Few or Cairo).
○ Poor Usability and Interactivity Design:
■ Mistake: Creating dashboards that are difficult to navigate, filters that are
confusing to use, or interactions that are not intuitive. Not testing dashboards
on different screen sizes or devices.
■ Avoidance: Design with the end-user in mind. Provide clear titles, labels, and
instructions. Use filters and parameters to allow users to explore data but
ensure they are easy to understand and use. Test dashboard usability with
target users and on various devices to ensure a good experience.
○ Ignoring Performance Implications:
■ Mistake: Building complex views with live connections to slow databases
without considering extracts; using numerous high-cardinality filters set to
"relevant values only"; creating calculations that are unnecessarily complex.
■ Avoidance: Be mindful of performance throughout the design process.
Choose live vs. extract connections appropriately. Optimize calculations. Limit
marks and filters. Test performance regularly, especially for dashboards
intended for wide audiences.
By being aware of these common pitfalls and proactively applying best practices in data
preparation, visualization design, and performance optimization, users can create Tableau
workbooks that are not only insightful but also efficient and user-friendly.
Conclusion
Informatica PowerCenter and Tableau are formidable tools in their respective domains of data
integration and data visualization. PowerCenter provides a robust, enterprise-grade platform for
complex ETL processes, enabling organizations to consolidate, cleanse, and transform data
from disparate sources. Its component-based architecture, rich set of transformations, and
detailed workflow management capabilities make it suitable for building and maintaining
sophisticated data pipelines. Effective use of PowerCenter hinges on a solid understanding of its
architecture, careful mapping and session design, and diligent performance tuning.
Tableau, on the other hand, excels at making data accessible and understandable through
intuitive and interactive visualizations. Its user-friendly interface allows individuals across
various technical skill levels to explore data, discover patterns, and share insights through
compelling dashboards. Mastering Tableau involves not only learning its technical
features—such as connecting to data, building various chart types, and utilizing calculated
fields, filters, and parameters—but also embracing sound data visualization principles to ensure
clarity and impact.
Together, these tools often form part of a comprehensive data analytics stack, where
PowerCenter prepares and delivers reliable data, and Tableau then enables users to explore
and communicate insights derived from that data. Success with both platforms requires
attention to detail, adherence to best practices, and a continuous focus on optimizing for
performance and usability to truly unlock the value inherent in an organization's data assets.

Works cited

1. What Is An ETL Pipeline? | Informatica,

https://www.informatica.com/resources/articles/what-is-etl-pipeline.html.html.html 2. What is
ETL? (Extract Transform Load) - Informatica,
https://www.informatica.com/resources/articles/what-is-etl.html.html.html 3. Informatica Guide
Guru99 | PDF - Scribd, https://www.scribd.com/document/573418950/Informatica-Guide-Guru99
4. PowerCenter | 10.5.5 - Informatica Documentation,
https://docs.informatica.com/data-integration/powercenter/10-5-5.html 5. Using the Designer -
Informatica Documentation,
https://docs.informatica.com/data-integration/powercenter/10-5/designer-guide/using-the-design
er.html 6. Getting Started with Informatica : Using the Workflow Manager Interface |
packtpub.com, https://www.youtube.com/watch?v=MSIr6e7Dy3w 7. User Guide for
PowerCenter - Informatica Documentation,
https://docs.informatica.com/content/dam/source/GUID-2/GUID-2940FB03-64D3-4C9E-A304-54
D03A93554E/7/en/PWX_1056_PowerCenterUserGuideForPowerCenter_en.pdf 8. Informatica
Transformations | Transformation Types in 2025 ...,
https://mindmajix.com/informatica-transformations 9. Informatica Best Practices | PDF |
Databases | Computer File - Scribd,
https://www.scribd.com/document/47416680/Informatica-Best-Practices 10. Transformations |
Informatica Reference - WordPress.com,
https://informaticareference.wordpress.com/category/transformations/ 11. Informatica Tutorial
For Beginners | Informatica PowerCenter | Informatica Training | Edureka,
https://www.youtube.com/watch?v=u6oLXidGoqs 12. What are the best configurations for
informatica sessions to increase performance on a large volume of data? - Quora,
https://www.quora.com/What-are-the-best-configurations-for-informatica-sessions-to-increase-p
erformance-on-a-large-volume-of-data 13. Informatica PowerCenter Error Handling and
Debugging | DataTerrain,
https://dataterrain.com/informatica-powercenter-error-handling-debugging 14. ETL Testing
Quick Guide - Tutorialspoint,
https://www.tutorialspoint.com/etl_testing/etl_testing_quick_guide.htm 15. How to Use Tableau
Public for Beginners [Step by Step Tutorial] - Edureka,
https://www.edureka.co/blog/tableau-public/ 16. Free Training Videos - 2023.2 - Tableau,
https://www.tableau.com/learn/training 17. Measure Values and Measure Names - Tableau,
https://help.tableau.com/current/pro/desktop/en-us/datafields_understanddatawindow_meavalu
es.htm 18. INTRODUCTION TO TABLEAU - Center for Data, Analytics and ...,
https://cedar.princeton.edu/sites/g/files/toruqf1076/files/introduction_to_tableau_-_users_guide_f
or_website.pdf 19. Parts of the View - Tableau Help,
https://help.tableau.com/current/pro/desktop/en-us/view_parts.htm 20. 12 Great Books About
Data Visualization | Tableau,
https://www.tableau.com/learn/articles/books-about-data-visualization 21. Functional Art, The:
An introduction to information graphics and visualization (Voices That Matter): 9780321834737:
Cairo, Alberto - Amazon.com,
https://www.amazon.com/Functional-Art-introduction-information-visualization/dp/0321834739
22. Getting Started with Tableau - A Data Visualization Guide for Students,
https://go.christiansteven.com/bi-blog/getting-started-with-tableau-a-data-visualization-guide-for-
students 23. Step 1: Connect to a sample data source - Tableau,
https://help.tableau.com/current/guides/get-started-tutorial/en-us/get-started-tutorial-connect.ht
m 24. Improve Performance for Cross-Database Joins - Tableau Help,
https://help.tableau.com/current/pro/desktop/en-us/joins_xbd_perf.htm 25. Connecting Tableau |
Data Viz tools - Tealium Docs,
https://docs.tealium.com/server-side/data-storage/data-viz-tools/connecting-tableau/ 26.
Creators: Connect to Data on the Web - Tableau Help,
https://help.tableau.com/current/online/en-us/creator_connect.htm 27. Knowledge Base -
Tableau, https://www.tableau.com/support/knowledgebase 28. Blend Your Data - Tableau,
https://help.tableau.com/current/pro/desktop/en-us/multiple_connections.htm 29. Pass
Expressions with Analytics Extensions - Tableau,
https://help.tableau.com/current/pro/desktop/en-us/r_connection_manage.htm 30. How Do You
Join Data Sets In Tableau? - Tech Tips Girl,
https://www.techtipsgirl.com/post/how-do-you-join-data-sets-in-tableau 31. Tableau Data
Blending - the Ultimate Guide - TAR Solutions,
https://tarsolutions.co.uk/blog/data-blending-in-tableau/ 32. Build and Organize your Flow -
Tableau Help, https://help.tableau.com/current/prep/en-us/prep_build_flow.htm 33. Aggregate,
Join, or Union Data - Tableau Help,
https://help.tableau.com/current/prep/en-us/prep_combine.htm 34. Plan the Data Source -
Tableau Help, https://help.tableau.com/current/pro/desktop/en-us/datasource_plan.htm 35. How
to union 1 extract and 1 live data sources in tableau?,
https://commtableau.my.site.com/s/question/0D5cw000008F51ICAS/how-to-union-1-extract-and
-1-live-data-sources-in-tableau 36. Pivot Your Data - Tableau Help,
https://help.tableau.com/current/prep/en-us/prep_pivot.htm 37. Perfect Pivoting - Tableau Help,
https://help.tableau.com/current/pro/desktop/en-us/buildmanual_multidimensional_perfectpivot.h
tm 38. Splitting a field : r/tableau - Reddit,
https://www.reddit.com/r/tableau/comments/1b8f56l/splitting_a_field/ 39. Splitting Fields in
Tableau - The Data School,
https://www.thedataschool.co.uk/ellie-mason/splitting-fields-in-tableau/ 40. Cleaning Data in
Tableau: A Step-by-Step Guide - Coefficient,
https://coefficient.io/tableau-tutorials/how-to-clean-data-in-tableau 41. Box - Tableau,
https://help.tableau.com/current/pro/desktop/en-us/examples_box.htm 42. Data strategy -
Tableau, https://help.tableau.com/current/blueprint/en-gb/bp_data_strategy.htm 43. Extract Your
Data - Tableau, https://help.tableau.com/current/pro/desktop/en-us/extracting_data.htm 44.
Tableau takes a long time to execute a query,
https://commtableau.my.site.com/s/question/0D54T00000C5gwWSAR/tableau-takes-a-long-tim
e-to-execute-a-query 45. Beautifying The Pie Chart & Donut Chart in Tableau - Tableau Certified
Data Analyst,
https://tableaucertifieddataanalyst.com/beautifying-the-pie-chart-donut-chart-in-tableau/ 46. How
To Make A Histogram in Tableau, Excel, and Google Sheets,
https://www.tableau.com/chart/how-to-make-a-histogram 47. Visualizing Pipeline Data With
Data Visualization Tools - FasterCapital,
https://fastercapital.com/topics/visualizing-pipeline-data-with-data-visualization-tools.html/1 48.
Building Line Charts - Tableau,
https://help.tableau.com/current/pro/desktop/en-us/buildexamples_line.htm 49. How to Build a
Line Chart & Split Line Chart in Tableau Desktop - YouTube,
https://www.youtube.com/watch?v=ORidD8IW1r0 50. Understanding and using Pie Charts |
Tableau, https://www.tableau.com/chart/what-is-pie-chart 51. Build a Scatter Plot - Tableau Help,
https://help.tableau.com/current/pro/desktop/en-us/buildexamples_scatter.htm 52.
Understanding and Using Scatter Plots - Tableau,
https://www.tableau.com/chart/what-is-scatter-plot 53. How to Build a Histogram in Tableau in
Just a Minute - YouTube, https://www.youtube.com/watch?v=cnmQK3ZkbZU 54. Build a
Histogram - Tableau,
https://help.tableau.com/current/pro/desktop/en-us/buildexamples_histogram.htm 55. Add Axes
for Multiple Measures in Views - Tableau,
https://help.tableau.com/current/pro/desktop/en-us/multiple_measures.htm 56. Create a Simple
Calculated Field - Tableau,
https://help.tableau.com/current/pro/desktop/en-us/calculations_calculatedfields_formulas.htm
57. Create Parameters - Tableau,
https://help.tableau.com/current/pro/desktop/en-us/parameters_create.htm 58. Parameter
Actions - Tableau Help,
https://help.tableau.com/current/pro/desktop/en-us/actions_parameters.htm 59. Tableau Tutorial
- Parameters Overview - YouTube, https://www.youtube.com/watch?v=ObTEQRn0FZs 60.
Create a Dashboard - Tableau,
https://help.tableau.com/current/pro/desktop/en-us/dashboards_create.htm 61. Build an
interactive Tableau dashboard in 3 minutes! - YouTube,
https://www.youtube.com/watch?v=vDgBCgxLWPY 62. 16 Tips to Improve Tableau
Performance with Workbook Design ...,
https://www.devoteam.com/expert-view/improve-tableau-performance/ 63. Top 10 Expert Tips to
Boost Tableau Dashboard Performance,
https://vizableinsights.com/improve-tableau-workbook-performance-rendering/ 64. Create
Efficient Calculations - Tableau,
https://help.tableau.com/current/pro/desktop/en-us/perf_efficient_calcs.htm

Informatica Training
No ratings yet
Informatica Training
21 pages
Informatica Guide Guru99
No ratings yet
Informatica Guide Guru99
246 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
CTS Batch - GenC AIA - Informatica - IICS - DataStage - Curriculum
100% (1)
CTS Batch - GenC AIA - Informatica - IICS - DataStage - Curriculum
69 pages
Informatica Material Beginner
No ratings yet
Informatica Material Beginner
29 pages
What Is Informatica?
No ratings yet
What Is Informatica?
4 pages
Complete Reference To Informatica PDF
100% (3)
Complete Reference To Informatica PDF
52 pages
Complete Reference To Informatica
100% (1)
Complete Reference To Informatica
52 pages
Lecture 1
No ratings yet
Lecture 1
37 pages
Informatica
0% (1)
Informatica
32 pages
Informatica Guide
No ratings yet
Informatica Guide
159 pages
Informatica
No ratings yet
Informatica
7 pages
Business Intelligence/Data Integration/Etl/Integration: An Introduction Presented By: Gautam Sinha
No ratings yet
Business Intelligence/Data Integration/Etl/Integration: An Introduction Presented By: Gautam Sinha
40 pages
Business Case: Why Do We Need ETL Tools?: Informatica Beginners
No ratings yet
Business Case: Why Do We Need ETL Tools?: Informatica Beginners
12 pages
Business Intelligence/Data Integration/Etl/Integration: An Introduction Presented By: Chandrashekar P
No ratings yet
Business Intelligence/Data Integration/Etl/Integration: An Introduction Presented By: Chandrashekar P
40 pages
Informatica Certification MCQ Dumps
No ratings yet
Informatica Certification MCQ Dumps
73 pages
01 PowerData Informatica PDF
No ratings yet
01 PowerData Informatica PDF
32 pages
Business Case: Why Do We Need ETL Tools?
No ratings yet
Business Case: Why Do We Need ETL Tools?
14 pages
Informatica PowerCenter Workflow and Transformation Guide: Definitive Reference for Developers and Engineers
From Everand
Informatica PowerCenter Workflow and Transformation Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
ELT Architecture and Implementation: Definitive Reference for Developers and Engineers
From Everand
ELT Architecture and Implementation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Informatica Tutorials
No ratings yet
Informatica Tutorials
2 pages
PowerCenter Level1 Unit01
No ratings yet
PowerCenter Level1 Unit01
18 pages
Informatica Powercenter Components
No ratings yet
Informatica Powercenter Components
1 page
Powercenter 8 Level I Developer: Education Services
No ratings yet
Powercenter 8 Level I Developer: Education Services
18 pages
Powercenter Brochure 6659
No ratings yet
Powercenter Brochure 6659
4 pages
Teradata Architecture and SQL Essentials: Definitive Reference for Developers and Engineers
From Everand
Teradata Architecture and SQL Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Informatica Powercenter Training: Srinivas Panchadar
No ratings yet
Informatica Powercenter Training: Srinivas Panchadar
81 pages
Informatica Powercentre
No ratings yet
Informatica Powercentre
298 pages
Introduction To Informatica
No ratings yet
Introduction To Informatica
66 pages
Informaticalakshmi
No ratings yet
Informaticalakshmi
15 pages
Efficient ETL Systems Design: Definitive Reference for Developers and Engineers
From Everand
Efficient ETL Systems Design: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Informatica Preview
No ratings yet
Informatica Preview
17 pages
Informatica PowerCenter
No ratings yet
Informatica PowerCenter
10 pages
Airflow for Data Workflow Automation
From Everand
Airflow for Data Workflow Automation
Richard Johnson
No ratings yet
Informatica PowerCentre
No ratings yet
Informatica PowerCentre
3 pages
Prefect Workflow Orchestration Essentials: Definitive Reference for Developers and Engineers
From Everand
Prefect Workflow Orchestration Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Pipeline Automation with Airbyte: Definitive Reference for Developers and Engineers
From Everand
Data Pipeline Automation with Airbyte: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
INFORMATICA
No ratings yet
INFORMATICA
100 pages
Powercenter BR en-US
No ratings yet
Powercenter BR en-US
8 pages
Informatica Positioned in Leaders Quadrant For Data Integration Tools 'Magic Quadrant'.
No ratings yet
Informatica Positioned in Leaders Quadrant For Data Integration Tools 'Magic Quadrant'.
3 pages
Talend Data Integration Essentials: Definitive Reference for Developers and Engineers
From Everand
Talend Data Integration Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Essential Guide to DataStage Systems: Definitive Reference for Developers and Engineers
From Everand
Essential Guide to DataStage Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
StreamSets Data Integration Architecture and Design: The Complete Guide for Developers and Engineers
From Everand
StreamSets Data Integration Architecture and Design: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Unified Data Workflows with Fugue: The Complete Guide for Developers and Engineers
From Everand
Unified Data Workflows with Fugue: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Redshift Essentials: Definitive Reference for Developers and Engineers
From Everand
Redshift Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Informatica Powercenter: Drive Business Value With End-To-End Data Integration Agility
No ratings yet
Informatica Powercenter: Drive Business Value With End-To-End Data Integration Agility
3 pages
Part 3,4,5,6 & 7: Informatica: Informatica Overview and Transformations
No ratings yet
Part 3,4,5,6 & 7: Informatica: Informatica Overview and Transformations
35 pages
Informatica Tutorials Complete Reference
No ratings yet
Informatica Tutorials Complete Reference
11 pages
Comprehensive Guide to Matillion for Data Integration: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Matillion for Data Integration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
CE Student Guide PC - HandsOnWorkshop
No ratings yet
CE Student Guide PC - HandsOnWorkshop
328 pages
Informatica
No ratings yet
Informatica
29 pages
Informatica 8
No ratings yet
Informatica 8
7 pages
Fivetran Data Integration Essentials: Definitive Reference for Developers and Engineers
From Everand
Fivetran Data Integration Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Business Intelligence Overview
No ratings yet
Business Intelligence Overview
20 pages
1571 Informatica r20 Ds Whitepaper
No ratings yet
1571 Informatica r20 Ds Whitepaper
59 pages
Superset Data Exploration and Analysis Framework: Definitive Reference for Developers and Engineers
From Everand
Superset Data Exploration and Analysis Framework: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Snowflake Data Platform Engineering: Definitive Reference for Developers and Engineers
From Everand
Snowflake Data Platform Engineering: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Datawarehousing: WHY Data Warehousing?
No ratings yet
Datawarehousing: WHY Data Warehousing?
31 pages
Airbyte for Data Integration Systems: The Complete Guide for Developers and Engineers
From Everand
Airbyte for Data Integration Systems: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
NetWorker Configuration and Administration Reference: Definitive Reference for Developers and Engineers
From Everand
NetWorker Configuration and Administration Reference: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
0941 Sms Template Safety Bulletin
No ratings yet
0941 Sms Template Safety Bulletin
2 pages
Dell Emc Poweredge R640: Technical Guide
No ratings yet
Dell Emc Poweredge R640: Technical Guide
45 pages
Java Tutorials RoseIndia
No ratings yet
Java Tutorials RoseIndia
209 pages
Prerequisites: R Installation
No ratings yet
Prerequisites: R Installation
11 pages
Document Details Rev. Format Notes Location Document No. Date Compiled Current Revision Date Revision Status DOC Originator (Company)
No ratings yet
Document Details Rev. Format Notes Location Document No. Date Compiled Current Revision Date Revision Status DOC Originator (Company)
13 pages
Source From The Menu. This Will Open A New Tab in Your Browser
No ratings yet
Source From The Menu. This Will Open A New Tab in Your Browser
20 pages
Goods Management System-1
No ratings yet
Goods Management System-1
33 pages
Practical No.:-6: Name of The Experiment: To Study KNIME Tool
No ratings yet
Practical No.:-6: Name of The Experiment: To Study KNIME Tool
6 pages
China E Coding en
No ratings yet
China E Coding en
2 pages
Servicenow Samnew
No ratings yet
Servicenow Samnew
4 pages
100-00070 Rev A SpectroVisc Q3000 Series User's Guide
No ratings yet
100-00070 Rev A SpectroVisc Q3000 Series User's Guide
50 pages
Honeywell Barcode Printer Firmware Release Notes MR18.1
No ratings yet
Honeywell Barcode Printer Firmware Release Notes MR18.1
8 pages
T8FB (BT) : User Manual
No ratings yet
T8FB (BT) : User Manual
44 pages
Blue Book Manual
No ratings yet
Blue Book Manual
36 pages
Chapter 3 - Arrays - ControllingFlow
No ratings yet
Chapter 3 - Arrays - ControllingFlow
47 pages
Sheet Primary 6 (ICT)
No ratings yet
Sheet Primary 6 (ICT)
22 pages
Samantha Saucedo - The Zero Gravity
No ratings yet
Samantha Saucedo - The Zero Gravity
8 pages
Cloud Organization User Guide
No ratings yet
Cloud Organization User Guide
102 pages
Earthquake Design in STAAD PRO
0% (1)
Earthquake Design in STAAD PRO
9 pages
MASTER SOE NLC 2023 For Conference Preview Guide
No ratings yet
MASTER SOE NLC 2023 For Conference Preview Guide
9 pages
Unit 2 Penerapan Pancasila Dalam Konteks Berbangsa PDF
No ratings yet
Unit 2 Penerapan Pancasila Dalam Konteks Berbangsa PDF
1 page
C Programming Lab - Manual Final
No ratings yet
C Programming Lab - Manual Final
55 pages
MAD Micro Project Report
No ratings yet
MAD Micro Project Report
10 pages
Gautam Virmani Resume
No ratings yet
Gautam Virmani Resume
1 page
Lecture 5
No ratings yet
Lecture 5
40 pages
MDR - Alert Detail
No ratings yet
MDR - Alert Detail
1 page
Software Requirement Specification
No ratings yet
Software Requirement Specification
6 pages
CPPLUS T Sense Face Recognition Terminal User Manual
No ratings yet
CPPLUS T Sense Face Recognition Terminal User Manual
103 pages
Updates in Information System I: Aispre 10
No ratings yet
Updates in Information System I: Aispre 10
11 pages
Red Hat Enterprise Linux-8-8.2 Release Notes-En-US
100% (1)
Red Hat Enterprise Linux-8-8.2 Release Notes-En-US
131 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Research On Wings1 t13 Lab

Uploaded by

Research On Wings1 t13 Lab

Uploaded by

A Comprehensive Guide to Informatica

PowerCenter and Tableau: From Data

1. What Is An ETL Pipeline? | Informatica,

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.