BI Lab File
BI Lab File
BUSINESS INTELLIGENCE
4. Introduction to ETL
BI has a direct impact on organization's strategic, tactical and operational business decisions. BI
supports fact-based decision making using historical data rather than assumptions and gut
feeling.
BI tools perform data analysis and create reports, summaries, dashboards, maps, graphs, and
charts to provide users with detailed intelligence about the nature of the business.
Example 1: In an Online Transaction Processing (OLTP) system information that could be fed
into product database could be
Correspondigly, in BI system query that could be executed would be how many new clients
added due to change in radio budget
In OLTP system dealing with customer demographic data bases data that could be fed would be
Correspondingly in the OLAP system query that could be executed would be can customer
profile changes support support higher product price.
Business Intelligence Tools
Business intelligence (BI) tools are types of application software which collect and process
large amounts of unstructured data from internal and external systems, including books,
journals, documents, health records, images, files, email, video and other business sources.
While not as flexible as business analytics tools, BI tools provide a way of amassing data to
find information primarily through queries. These tools also help prepare data for analysis so
that you can create reports, dashboards and data visualisations. The results give both
employees and managers the power to accelerate and improve decision making, increase
operational efficiency, pinpoint new revenue potentials, identify market trends, report
genuine KPIs and identify new business opportunities.
SAP Business Intelligence offers several advanced analytics solutions including real-time BI
predictive analytics, machine learning, and planning & analysis. The Business Intelligence
platform in particular, offers reporting & analysis, data visualisation & analytics applications,
office integration and mobile analytics. SAP is a robust software intended for all roles (IT,
end uses and management) and offers tons of functionalities in one platform.
2) MicroStrategy
MicroStrategy is a business intelligence tool that offers powerful (and high speed)
dashboarding and data analytics which help monitor trend, recognise new opportunities,
improve productivity and more. Users can connect to one or various sources, whether the
incoming data is from a spreadsheet, cloud-based or enterprise data software. It can be
accessed from your desktop or via mobile.
MicroStrategy is a business intelligence tool that offers powerful (and high speed)
dashboarding and data analytics which help monitor trend, recognise new opportunities,
improve productivity and more. Users can connect to one or various sources, whether the
incoming data is from a spreadsheet, cloud-based or enterprise data software. It can be
accessed from your desktop or via mobile.
3) Microsoft Power BI
Microsoft Power BI is a web-based business analytics tool suite which excels in data
visualisation. It allows users to identify trends in real-time and has brand new connectors that
allow you to up your game in campaigns. Because it’s web-based, Microsoft Power BI can be
accessed from pretty much anywhere. This software also allows users to integrate their apps
and deliver reports and real-time dashboards.
Practical 2 :- Explain Data Warehousing
Data warehousing is the process of constructing and using a data warehouse. A data warehouse
is constructed by integrating data from multiple heterogeneous sources that support analytical
reporting, structured and/or ad hoc queries, and decision making. Data warehousing involves
data cleaning, data integration, and data consolidations.
There are decision support technologies that help utilize the data available in a data warehouse.
These technologies help executives to use the warehouse quickly and effectively. They can
gather data, analyze it, and take decisions based on the information present in the warehouse.
The information gathered in a warehouse can be used in any of the following domains −
Query-driven Approach
Update-driven Approach
1) Query-Driven Approach
This is the traditional approach to integrate heterogeneous databases. This approach was used to
build wrappers and integrators on top of multiple heterogeneous databases. These integrators
are also known as mediators.
When a query is issued to a client side, a metadata dictionary translates the query into an
appropriate form for individual heterogeneous sites involved.
Now these queries are mapped and sent to the local query processor.
The results from heterogeneous sites are integrated into a global answer set.
Disadvantages
This approach is also very expensive for queries that require aggregations.
2) Update-Driven Approach
This is an alternative to the traditional approach. Today's data warehouse systems follow
update-driven approach rather than the traditional approach discussed earlier. In update-driven
approach, the information from multiple heterogeneous sources are integrated in advance and
are stored in a warehouse. This information is available for direct querying and analysis.
Advantages
The following are the functions of data warehouse tools and utilities −
Data Transformation − Involves converting the data from legacy format to warehouse
format.
Data integration involves combining data from several disparate sources, which are stored using
various technologies and provide a unified view of the data. Data integration becomes
increasingly important in cases of merging systems of two companies or consolidating
applications within one company to provide a unified view of the company's data assets. The
later initiative is often called a data warehouse.
Probably the most well known implementation of data integration is building an enterprise's data
warehouse. The benefit of a data warehouse enables a business to perform analyses based on the
data in the data warehouse. This would not be possible to do on the data available only in the
source system. The reason is that the source systems may not contain corresponding data, even
though the data are identically named, they may refer to different entities.
Data warehousing
Data migration
Enterprise application/information integration
Master data management
Challenges of Data Integration
At first glance, the biggest challenge is the technical implementation of integrating data from
disparate often incompatible sources. However, a much bigger challenge lies in the entirety
of data integration. It has to include the following phases:
1. Design
The data integration initiative within a company must be an initiative of business, not
IT. There should be a champion who understands the data assets of the enterprise and
will be able to lead the discussion about the long-term data integration initiative in
order to make it consistent, successful and benefitial.
Analysis of the requirements (BRS), i.e. why is the data integration being done, what
are the objectives and deliverables. From what systems will the data be sourced? Is all
the data available to fulfill the requirements? What are the business rules? What is the
support model and SLA?
Analysis of the source systems, i.e. what are the options of extracting the data from
the systems (update notification, incremental extracts, full extracts), what is the
required/available frequency of the extracts? What is the quality of the data? Are the
required data fields populated properly and consistently? Is the documentation
available? What are the data volumes being processed? Who is the system owner?
What is the support model for the new system? What are the SLA requirements?
And last but not least, who will be the owner of the system and what is the funding of
the maintenance and upgrade expenses?
The results of the above steps need to be documented in form of SRS document,
confirmed and signed-off by all parties which will be participating in the data
integration project.
2. Implementation
Based on the BRS and SRS, a feasibility study should be performed to select the tools to
implement the data integration system. Small companies and enterprises which are starting with
data warehousing are faced with making a decision about the set of tools they will need to
implement the solution. The larger enterprise or the enterprises which already have started other
projects of data integration are in an easier position as they already have experience and can
extend the existing system and exploit the existing knowledge to implement the system more
effectively. There are cases, however, when using a new, better suited platform or technology
makes a system more effective compared to staying with existing company standards. For
example, finding a more suitable tool which provides better scaling for future growth/expansion,
a solution that lowers the implementation/support cost, lowering the license costs, migrating the
system to a new/modern platform, etc.
3.Testing
Along with the implementation, the proper testing is a must to ensure that the unified data are
correct, complete and up-to-date.
Both technical IT and business needs to participate in the testing to ensure that the results are as
expected/required. Therefore, the testing should incorporate at least Performance Stress test
(PST), Technical Acceptance Testing (TAT) and User Acceptance Testing (UAT ) PST, TAT
(Technical Acceptance Testing), UAT (User Acceptance Testing).
ETL stands for Extract, Transform and Load. An ETL tool extracts the data from different
RDBMS source systems, transforms the data like applying calculations, concatenate, etc. and
then load the data to Data Warehouse system. The data is loaded in the DW system in the form
of dimension and fact tables.
Extraction
A staging area is required during ETL load. There are various reasons why staging area is required.
The source systems are only available for specific period of time to extract data. This period of time is
less than the total data-load time. Therefore, staging area allows you to extract the data from the
source system and keeps it in the staging area before the time slot ends.
Staging area is required when you want to get the data from multiple data sources together or if you
want to join two or more systems together. For example, you will not be able to perform a SQL query
joining two tables from two physically different databases.
Data extractions’ time slot for different systems vary as per the time zone and operational hours.
Data extracted from source systems can be used in multiple data warehouse system, Operation Data
stores, etc.
ETL allows you to perform complex transformations and requires extra area to store the data.
Transform
In data transformation, you apply a set of functions on extracted data to load it into the target
system. Data, which does not require any transformation is known as direct move or pass
through data.
You can apply different transformations on extracted data from the source system. For example,
you can perform customized calculations. If you want sum-of-sales revenue and this is not in
database, you can apply the SUM formula during transformation and load the data.
For example, if you have the first name and the last name in a table in different columns, you
can use concatenate before loading.
Load
During Load phase, data is loaded into the end-target system and it can be a flat file or a Data
Warehouse system.
Practical 5:- Introduction to Dimesion Modeling
For instance, in the relational mode, normalization and ER models reduce redundancy
in data. On the contrary, dimensional model arranges data in such a way that it is
easier to retrieve information and generate reports.
Hence, Dimensional models are used in data warehouse systems and not a good fit for
relational systems.
Elements of Dimensional Data Model
Fact
Facts are the measurements/metrics or facts from your business process. For a Sales
business process, a measurement would be quarterly sales number
Dimension
Attributes
State
Country
Zipcode etc.
Attributes are used to search, filter, or classify facts. Dimension Tables contain
Attributes
Fact Table
1. Measurements/facts
2. Foreign key to dimension table
Dimension table
The accuracy in creating your Dimensional modeling determines the success of your
data warehouse implementation. Here are the steps to create Dimension Model
Identifying the actual business process a datarehouse should cover. This could be Marketing,
Sales, HR, etc. as per the data analysis needs of the organization. The selection of the Business
process also depends on the quality of data available for that process. It is the most important
step of the Data Modelling process, and a failure here would have cascading and irreparable
defects.
To describe the business process, you can use plain text or use basic Business Process Modelling
Notation (BPMN) or Unified Modelling Language (UML).
Step 2) Identify the grain
The Grain describes the level of detail for the business problem/solution. It is the process of
identifying the lowest level of information for any table in your data warehouse. If a table
contains sales data for every day, then it should be daily granularity. If a table contains total sales
data for each month, then it has monthly granularity.
1. Do we need to store all the available products or just a few types of products? This
decision is based on the business processes selected for Datawarehouse
2. Do we store the product sale information on a monthly, weekly, daily or hourly basis?
This decision depends on the nature of reports requested by executives
3. How do the above two choices affect the database size?
Example of Grain:
The CEO at an MNC wants to find the sales for specific products in different locations on a daily
basis.
Dimensions are nouns like date, store, inventory, etc. These dimensions are where all the data
should be stored. For example, the date dimension may contain data like a year, month and
weekday.
Example of Dimensions:
The CEO at an MNC wants to find the sales for specific products in different locations on a daily
basis.
This step is co-associated with the business users of the system because this is where they get
access to data stored in the data warehouse. Most of the fact table rows are numerical values like
price or cost per unit, etc.
Example of Facts:
The CEO at an MNC wants to find the sales for specific products in different locations on a daily
basis.
In this step, you implement the Dimension Model. A schema is nothing but the database structure
(arrangement of tables). There are two popular schemas
1. Star Schema
The star schema architecture is easy to design. It is called a star schema because diagram
resembles a star, with points radiating from a center. The center of the star consists of the fact
table, and the points of the star is dimension tables.
The fact tables in a star schema which is third normal form whereas dimensional tables are de-
normalized.
2. Snowflake Schema
The snowflake schema is an extension of the star schema. In a snowflake schema, each
dimension are normalized and connected to more dimension tables.
Practical 6:- Basics of Enterprise Reporting
Business reports are tangible documents that provide information organized into tabular, graphic
or narrative form. Reporting is a feature of business intelligence tools that presents data in a
compressed, organized way, making complex information easy to digest and understand. These
can be anything as simple as an organized table of numbers to a complex, interactive
visualization like this one from Visual Capitalist.
Visualization of data is crucial to our interpretation and understanding of it. And that’s not just
because we aesthetically like pretty graphs and flashy charts — we process visual information up
to 60,000 times faster than text.
While it might take a trained data analyst to notice trends in a huge table of numbers, even your
newest intern can recognize patterns in a scattergraph of those same numbers. These
visualizations make it easier for users to draw actionable insights from their proprietary data.
What do users do when they have their data organized into easily-decipherable reports? They can
analyze the data to identify patterns and trends. These patterns can lead to actionable insights —
basically, patterns that have concrete meaning for the user’s organization.
For example, let’s imagine a hypothetical sales organization. They generate an explanatory report
that shows their sales data for the year across the United States in the form of a chart organized
by region. The sales manager can compare this visualization to last year’s numbers, either
manually or through business intelligence software.
The manager notices that the data shows their sales have fallen by 13% in the Northeast since the
previous year. They can use that information to make decisions — for example, if a new
competitor is likely the cause for the downturn, they can focus their energies on ways to improve
their product, change their marketing tactics, focus on customer retention or other business
practices to regain their position. They will make these decisions based on data and with firm
numbers (potential ROI, revenue lost in the region, cost of the corrective action versus cost of
doing nothing, for example) to guide their choices.
For example, if a sales organization discovers it’s making good sales in the Midwest, it can direct
more resources towards that demographic of buyers to further improve sales metrics. Enterprise
reporting aims to empower business professionals to make more informed decisions based on
historical data, present analysis and future predictions.