0% found this document useful (0 votes)
101 views59 pages

Unit 14 Assignment 2 Frontsheet

This document is an assignment front sheet for a Business Intelligence course. It includes sections for student and assessor information as well as a grading grid. The assignment requires students to define business intelligence, provide real-world examples, and design a BI solution to support data-driven
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
101 views59 pages

Unit 14 Assignment 2 Frontsheet

This document is an assignment front sheet for a Business Intelligence course. It includes sections for student and assessor information as well as a grading grid. The assignment requires students to define business intelligence, provide real-world examples, and design a BI solution to support data-driven
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 59

ASSIGNMENT 2 FRONT SHEET

Qualification BTEC Level 5 HND Diploma in Computing

Unit number and title Unit 14: Business Intelligence

Submission date Date Received 1st submission

Re-submission Date Date Received 2nd submission

Student Name Ngo Thi Khanh Chi Student ID BH00182

Class IT0503 Assessor name Dinh Van Dong

Student declaration

I certify that the assignment submission is entirely my own work and I fully understand the consequences of plagiarism. I understand that
making a false declaration is a form of malpractice.

Student’s signature Chi

Grading grid

P3 P4 M3 D3

1
❒ Summative Feedback: ❒ Resubmission Feedback:

Grade: Assessor Signature: Date:

2
IV Signature:

3
Table of Contents
A. INTRODUCTION.................................................................................................................................6
B. DETERMINE, WITH EXAMPLE, WHAT BUSINESS INTELLIGENCE IS AND THE
TOOLS AND TECHNIQUES ASSOCIATED WITH IT (P3).................................................................7
1. Business Intelligence (BI)..................................................................................................................7
1.1. Benefits of BI...............................................................................................................................8
1.2. Some tools....................................................................................................................................9
2. Examples of Business Intelligence System used in Practice.........................................................12
2.1. Example 1 - BMW....................................................................................................................12
2.2. Example 2 – Zara......................................................................................................................13
2.3. Example 3 - Nike.......................................................................................................................14
2.4. Example 4 - Starbucks...............................................................................................................15
2.5. Example 5 - FedEx....................................................................................................................16
3. Tools and techniques associated with Business Intelligence........................................................17
3.1. Tools associated with Business Intelligence..............................................................................17
a. Tableau.......................................................................................................................................17
b. Advantages of Tableau...............................................................................................................18
c. Disadvantages of Tableau..........................................................................................................18
d. icCube.........................................................................................................................................19
e. Google Chart..............................................................................................................................20
f. Advantages of Google Chart......................................................................................................21
g. Disadvantages of Google Chart..................................................................................................21
h. Predictive analytics tool.............................................................................................................22
IBM SPSS Modeler........................................................................................................................22
RapidMiner....................................................................................................................................23
Microsoft Azure Machine Learning Studio...................................................................................24
Python (with libraries like scikit-learn and TensorFlow)..............................................................25
R Language....................................................................................................................................26
i. Multi – Cloud strategy................................................................................................................27
3.2. Techniques associated with Business Intelligence.....................................................................29

4
a. OLAP (Online Analytical Processing).......................................................................................29
b. Data Mining................................................................................................................................30
c. Sisense........................................................................................................................................35
d. Microsoft BI platform................................................................................................................36
e. SAP Business Intelligence..........................................................................................................37
f. Self-Service reporting.................................................................................................................38
g. Data visualization.......................................................................................................................39
C. DESIGN A BUSINESS INTELLIGENCE TOOL, APPLICATION OR INTERFACE THAT
CAN PERFORM A SPECIFIC TASK TO SUPPORT PROBLEM-SOLVING OR DECISION-
MAKING AT AN ADVANCED LEVEL.................................................................................................41
1. Data processing................................................................................................................................41
1.1. Definition....................................................................................................................................41
1.2. Type of Data processing.............................................................................................................42
Batch Processing................................................................................................................................42
Real-time Processing.........................................................................................................................42
Online Transaction Processing (OLTP).............................................................................................42
Online Analytical Processing (OLAP)..............................................................................................43
Data Mining.......................................................................................................................................43
1.3. Advantages of Data processing..................................................................................................44
1.4. Disadvantages of Data processing..............................................................................................45
2. Show and explain data.....................................................................................................................46
3. Python...............................................................................................................................................50
3.1. Features and benefits of Python.................................................................................................50
Features..............................................................................................................................................50
Benefits..............................................................................................................................................50
3.2. Use BI.........................................................................................................................................51
4. Tableau.............................................................................................................................................54
5. Google form......................................................................................................................................54
D. CONCLUSION....................................................................................................................................56
E. REFERENCES....................................................................................................................................56

5
List of Figures
Figure 1: Business Intelligence....................................................................................................................................7
Figure 2: Benefits of BI...............................................................................................................................................8
Figure 3: Tableau.........................................................................................................................................................9
Figure 4: Power BI......................................................................................................................................................9
Figure 5: QlikView....................................................................................................................................................10
Figure 6: Domo..........................................................................................................................................................10
Figure 7: Looker........................................................................................................................................................11
Figure 8: BMW..........................................................................................................................................................12
Figure 9: Zara............................................................................................................................................................13
Figure 10: Nike..........................................................................................................................................................14
Figure 11: Starbucks..................................................................................................................................................15
Figure 12: FedEx.......................................................................................................................................................16
Figure 13: Tableau.....................................................................................................................................................17
Figure 14: icCube......................................................................................................................................................19
Figure 15: Google Chart............................................................................................................................................20
Figure 16: IBM SPSS Modeler..................................................................................................................................22
Figure 17: RapidMiner..............................................................................................................................................23
Figure 18: Microsoft Azure Machine Learning Studio..............................................................................................24
Figure 19: Python......................................................................................................................................................25
Figure 20: R Language..............................................................................................................................................26
Figure 21: Multi - Cloud strategy..............................................................................................................................27
Figure 22: OLAP.......................................................................................................................................................29
Figure 23: RapidMiner..............................................................................................................................................30
Figure 24: Knime.......................................................................................................................................................31
Figure 25: Weka........................................................................................................................................................32
Figure 26: Apache Mahout........................................................................................................................................33
Figure 27: Orange......................................................................................................................................................34
Figure 28: Sisense......................................................................................................................................................35
Figure 29: Microsoft BI platform..............................................................................................................................36
Figure 30: SAP BI.....................................................................................................................................................37
Figure 31: Self - Serve reporting...............................................................................................................................38
Figure 32: Data visualization.....................................................................................................................................39
Figure 33: Importing Required Libraries...................................................................................................................51
Figure 34: Conditional Execution and User Menu.....................................................................................................51
Figure 35: User Input and Data Cleaning Operations................................................................................................52
Figure 36: Option 1: Choose File to Clean data.........................................................................................................52
Figure 37: Option 2: Clean Duplicate Rows..............................................................................................................52
Figure 38: Option 3: Exit...........................................................................................................................................53
Figure 39: Option 4: Clear Empty Data.....................................................................................................................53
Figure 40: Option 5: Drop Column............................................................................................................................53

A. INTRODUCTION
6
As mentioned in the first report, our company has been working in an FPT store for 2 years. FPT shop is an
online sales platform. For a new, fledgling company, the competition in the market is very high. Therefore,
the Board of Directors decided to apply Business Intelligence to improve the company's business processes
by making better decisions.

In the first report, management tasked us with researching business and decision support processes in the
company and identifying data types (unstructured, semi-structured or structured). We have completed these
tasks and have conducted research on current software used in business or decision support processes and
evaluated these uses (including benefits and limitations). In addition, we have also explored and provided
information on the types of support for decision making at different levels (operational, tactical and strategic)
within the company. Next, we researched what business intelligence features could help with that kind of
support, researching information systems or technologies (BI's) could use in this case. We compared and
contrasted them and came to a conclusion which one to use.

In continuation of the first report, in this second report we will demonstrate to the board of directors the
ability to apply business intelligence in the company's current business processes. This article will include the
following sections: Explain general concept of what is BI, Introduction to some tools / techniques for BI and
their application in general, give a dataset extracted from the company's business process and explain the
dataset. Show how we pre-process data for later analysis, explain each step and it purpose. Design
dashboards to show your analysis on pre-processed data. Explain clearly the purpose of dashboards and
charts. Suggestions should be made after analysis. In addition, during the demonstration, we will also collect
feed-back and comments from users to review how well our dashboards design meet user or business
requirement and what customization needed for future use.

7
B. DETERMINE, WITH EXAMPLE, WHAT BUSINESS INTELLIGENCE IS AND THE
TOOLS AND TECHNIQUES ASSOCIATED WITH IT (P3)
1. Business Intelligence (BI)
Business intelligence (BI) is a technology-driven process for analyzing data and delivering actionable
information that helps executives, managers and workers make informed business decisions. As part
of the BI process, organizations collect data from internal IT systems and external sources, prepare it
for analysis, run queries against the data and create data visualizations, BI dashboards and reports to
make the analytics results available to business users for operational decision-making and strategic
planning.

The ultimate goal of BI initiatives is to drive better business decisions that enable organizations to
increase revenue, improve operational efficiency and gain competitive advantages over business
rivals. To achieve that goal, BI incorporates a combination of analytics, data management and
reporting tools, plus various methodologies for managing and analyzing data.

Figure 1: Business Intelligence

8
1.1. Benefits of BI

Figure 2: Benefits of BI

The Benefits of Business Intelligence:


- Fast and accurate reporting. Due to the time-consuming and labor intensive nature of business
reporting, multimodal tools impact businesses. Employees will be able to monitor KPIs by using tools
like templates or custom reports. Besides, there are a variety of data sources, including financial and
operational data.
- Significant business Insights. If there has to be a combination of actions in the business world,
nothing better than to put together business data and its analysis. All with the purpose of allowing
organizations to come up with better business decisions. These decisions are led by improved
business processes. This will allow valuable business insights and information fundamental for the
company.
- Competitive analysis. Business Intelligence are the tools, software and systems of a business. They
are key and essential to an organization’s strategic planning process. Also, there is competitive
intelligence. This refers to analyzing a company’s industry and its competition with the purpose to
make strategic business decisions that will help differentiate the company from other market
representatives.
- Better BI data quality. Data analysis is the base of BI, so, the organization will not only collect, but
will also analyze data. So, its quality about BI, is a measurement that evaluates the accuracy and
reliability of the data in an organization. It also determines the completeness and the usability of the
analyzed data. Thus, one of the many other benefits of Business Intelligence is that it will improve the
quality of the data.

9
- Higher Margins. Thanks to BI, businesses can lower their production costs and identify opportunities
in demand that offer them higher margins, thus improving their ROI significantly.
1.2. Some tools
Tableau: Tableau is a popular BI platform that allows users to create interactive and shareable dashboards
and reports. It can connect to various data sources, perform data blending, and offer data visualization
tools for better insights. Tableau is used in many sectors, including finance, healthcare, retail, and
marketing, to visualize and analyze data effectively.

Figure 3: Tableau
Power BI: Power BI is a Microsoft product that offers robust data visualization and analytics capabilities.
It integrates seamlessly with other Microsoft tools like Excel, SharePoint, and SQL Server. Power BI
allows users to create interactive dashboards and reports, share insights, and collaborate with team
members in real-time.

Figure 4: Power BI

10
QlikView: QlikView is a self-service BI platform that enables users to explore and analyze data from
various sources quickly. It utilizes in-memory data processing to provide rapid data retrieval and supports
associative data models, allowing users to drill down and explore data relationships easily. QlikView is
commonly used in industries such as manufacturing, logistics, and supply chain management.

Figure 5: QlikView

Domo: Domo is a cloud-based BI platform that offers a range of data visualization, collaboration, and data
integration features. It allows users to connect to multiple data sources, create real-time dashboards, and
share insights across the organization. Domo is used in various industries, including e-commerce,
technology, and finance.

Figure 6: Domo

11
Looker: Looker is a data platform that provides a modern approach to BI and data analytics. It offers
data exploration and modeling capabilities, allowing users to create and share interactive reports and
dashboards. Looker is commonly used by companies for marketing analytics, sales performance
tracking, and product analytics.

Figure 7: Looker

12
2. Examples of Business Intelligence System used in Practice
2.1. Example 1 - BMW

Figure 8: BMW

Business Intelligence (BI) plays a vital role in BMW's business, helping them make smart decisions,
optimize production operations, improve customer service, and better understand markets and
consumers.

BMW uses BI to track the performance of different make and models of cars. This data helps them
evaluate the performance of their models and make the necessary adjustments to improve product
quality and features. Using data from different sources and BI analysis techniques, BMW is able to
predict market demand for specific product lines. This helps them to efficiently adjust production and
stockpile inventory.

13
2.2. Example 2 – Zara

Figure 9: Zara

Business Intelligence (BI) plays a vital role in the business of the Zara brand, helping them make
strategic decisions and develop marketing campaigns, improve inventory management, and gain
insight into customer preferences.

Zara uses BI to analyze data from various sources, including social media, search trends, and sales
data. Based on this information, they can predict market trends, learn about popular clothing and
styles, and make product design decisions. In addition, Zara uses BI to analyze data about customers'
shopping behavior and preferences. This helps them better understand their customers, thereby
providing more relevant products and services, and creating a better shopping experience for them.

14
2.3. Example 3 - Nike

Figure 10: Nike

Nike, one of the leading sportswear and footwear brands, uses BI to track sales and customer
feedback. Using data from various sources and BI analysis techniques, Nike is able to predict market
demand for specific product lines. This helps them to efficiently regulate production and distribution.

BI data helps Nike better understand market trends and make distribution and marketing decisions,
optimize manufacturing operations, deliver tailored products and services to customers, and maintain
a strong competitive position in the sports and fashion industries.

15
2.4. Example 4 - Starbucks

Figure 11: Starbucks

Starbucks, an international coffee chain, uses BI to analyze data from stores and customers. This data
includes information about sales, shopping habits and feedback from customers. In addition,
Starbucks uses BI to track the effectiveness of its loyalty program and evaluate the impact of
promotions and offers. BI data helps them optimize their program to attract and retain loyal
customers.

Through BI, Starbucks can optimize workflow, predict consumption trends, and develop effective
marketing strategies.

16
2.5. Example 5 - FedEx

Figure 12: FedEx

FedEx is one of the largest shipping companies in the world. FedEx uses BI to track the route and
status of packages. BI data helps them make smart decisions about shipping routes, ensure on-time
delivery, and minimize shipping time and costs. In addition, FedEx also uses BI to track and analyze
costs related to shipping operations, from fuel costs to staff costs. This data helps them optimize costs
and enhance profits.

Using Business Intelligence in its business operations, FedEx is able to optimize shipping operations,
improve service quality, and maintain its leading position in the shipping and freight forwarding
industry.

17
3. Tools and techniques associated with Business Intelligence
3.1. Tools associated with Business Intelligence
a. Tableau

Figure 13: Tableau

Tableau is a powerful data visualization and business intelligence tool developed by Tableau
Software, now a part of Salesforce. It allows users to connect to various data sources, create
interactive dashboards, and generate visualizations to gain insights and communicate data effectively.
Tableau is the fastest growing and most powerful tool for data analysis and visualization in the
Business Intelligence industry. Simply put, Tableau transforms raw tabular data into easy-to-follow
and digestible formats like images and charts.

Here are some key features of Tableau:


Data Connection: Tableau can connect to a wide range of data sources, including databases,
spreadsheets, cloud services, and more. It supports both structured and unstructured data.

Data Preparation: Tableau provides data preparation capabilities, allowing you to clean, transform,
and shape your data for analysis. This includes data filtering, aggregation, pivoting, and merging.

Visualization: One of Tableau's primary strengths is its ability to create interactive and visually
appealing visualizations. It offers a drag-and-drop interface, making it easy to build charts, graphs,
maps, and other visual elements.

18
Dashboards and Stories: Tableau enables you to combine multiple visualizations into interactive
dashboards. Dashboards provide a consolidated view of key metrics and allow users to explore data
dynamically. Stories allow you to create a narrative by sequencing visualizations, guiding the
audience through a data-driven story.

Calculations and Expressions: Tableau allows you to create calculated fields and custom expressions
to perform complex calculations, aggregations, and transformations on your data. These calculations
can be used to create new variables, apply business logic, or perform statistical calculations.

Parameters: Parameters in Tableau are dynamic values that allow users to interact with the
visualizations. They can be used to change filters, dimensions, and measures dynamically, providing a
more interactive experience.

Integration and Extensibility: Tableau integrates with various data sources, including popular
databases, cloud platforms, and big data systems. It also offers APIs and SDKs, allowing developers
to extend Tableau's functionality or embed visualizations within other applications.

Tableau has gained popularity for its user-friendly interface, powerful data visualization capabilities,
and the ability to handle large and diverse datasets. It is widely used in various industries, including
finance, marketing, healthcare, and education, to explore data, identify patterns, and communicate
insights effectively.

b. Advantages of Tableau
Here is some advantages of Tableau: Firstly, high performance. Users rate Tableau’s overall
performance as strong and secure. It can handle millions of rows of data with ease. The huge
advantage of having Tableau is different types of visualization can be created at one shot. Secondly,
Mobile-friendly. There is an accomplished mobile app available for IOS and Android which adds
mobility to Tableau users and allows them to keep statistics at their fingertips. The app supports
practically that a desktop and online version has. Thirdly, extensive customer resources. Tableau
community is engaging and enthusiastic. It has various comprehensive online resources, guides,
training, and online forums, etc. The fourth is easy to upgrade. Tableau customers are happy with
using the latest release of the software because the upgrades are easy to be carried out. The next is
low cost. Tableau is relatively a low-cost solution compared to other big data counterparts such as
Qlik and Business Objects. Sixth, quality customer service. Tableau has user and developer
community where the queries are answered quickly. Finally, ease of use. It is easy to use since it is
simple user interface software. Also, it is simple to drag and drop interface which is very easy to
learn.

c. Disadvantages of Tableau
Some disadvantages of Tableau:
- Poor Versioning: The main disadvantage of using Tableau is, only recent versions supports revision
history and for the older one's package rolling back is not possible.

19
- No Automatic Refreshing of Reports: You don’t get an automatic option to refresh your reports with
the help of scheduling. Therefore, some manual effort required to update the data in back-end.
- Need Manual Effort: Tableau’s parameters are inactive and only a single value can be selected using a
parameter. You need to update it manually whenever the data gets change.
- Not a Comprehensive Solution: Even if the Tableau Software is easy to use for BI application, still it
doesn’t provide any platform for developing analytic applications that can be widely shared. Also, it
doesn’t suit the business that has expanded deployments of broad business.
- No Version Control: Once the dashboards and reports are published on the server you can’t get back
to the previous levels of data in Tableau. It is not possible to go back and recover old data.
d. icCube

Figure 14: icCube


icCube is a powerful business intelligence (BI) and analytics platform that enables organizations to
analyze, visualize, and report on their data. It is known for its multidimensional cube capabilities,
which make it suitable for OLAP (Online Analytical Processing) and data warehouse scenarios.
icCube is designed to provide a flexible and scalable solution for handling large volumes of data and
delivering actionable insights to users. Key features of icCube include:

- Multidimensional Cubes: icCube allows you to create OLAP cubes that offer fast and efficient data
analysis, providing a multidimensional view of your data. This is particularly useful for complex data
analysis and aggregations.
- Data Integration: icCube can connect to various data sources, including relational databases,
spreadsheets, and cloud-based data sources, to gather and consolidate data for analysis.

20
- Reporting and Dashboards: The platform offers intuitive drag-and-drop interfaces for building
interactive dashboards and reports. Users can create custom visualizations and interactive charts to
explore data and gain valuable insights.
- Advanced Analytics: icCube supports advanced analytical functions like predictive modeling, data
mining, and statistical analysis. This allows data scientists and analysts to perform in-depth data
exploration and discover hidden patterns.
- Data Security: icCube provides robust data security features, including role-based access controls and
data encryption, ensuring that sensitive data is protected and only accessible to authorized users.
- Scalability: The platform is designed to handle large datasets and can scale to accommodate the
growing needs of an organization as its data requirements increase.
- White Labeling: icCube allows organizations to customize the BI platform with their branding,
making it seamless to integrate into existing applications or present as a standalone product.

e. Google Chart

Figure 15: Google Chart


Google Charts is a web-based data visualization library developed by Google that allows developers
to create interactive and customizable charts and graphs to represent data on web pages. It provides a
variety of chart types, such as line charts, bar charts, pie charts, area charts, and more, to help users
visualize data and communicate insights effectively.

Key features of Google Charts include:


- Rich Chart Types: Google Charts supports a wide range of chart types, allowing developers to choose

21
the most appropriate chart for their specific data visualization needs. This includes basic charts like
line and bar charts, as well as more complex charts like bubble charts, scatter plots, and geographic
maps.
- Customization: Users can customize various aspects of the charts, including colors, labels, legends,
axis scales, and tooltips, to match the visual style and branding of their website or application.
- Interactive: Google Charts provide interactive features, allowing users to interact with the charts by
hovering over data points to view details or clicking on elements to drill down into the data.
- Dynamic Data: The charts can be updated dynamically, allowing real-time data to be reflected in the
visualizations without requiring page reloads. This is particularly useful for applications that display
live data or receive frequent updates.
- Scalable: Google Charts is designed to handle large datasets and scale well across different screen
sizes and devices, ensuring a consistent and smooth user experience.
- Easy Integration: Implementation is relatively straightforward, as Google Charts use a simple
JavaScript API that can be embedded directly into HTML pages. Developers need to include the
necessary script tags and provide data in the desired format to create the charts.
- Free and Open Source: Google Charts is free to use and is based on open-source technologies, making
it accessible to developers without any licensing costs.

f. Advantages of Google Chart


Advantages of Google Charts:
- Ease of Use: Google Charts offer a straightforward JavaScript API, making it easy for developers to
integrate and create interactive charts with minimal effort.
- Wide Range of Chart Types: It provides a diverse set of chart types, allowing users to pick the most
suitable visualization for their data, whether it's simple line charts or more complex geo-maps.
- Customization: The library offers a good level of customization, allowing developers to tailor the
appearance of charts to match their application's branding and design.
- Interactive Features: Google Charts allow users to interact with the charts, providing a more engaging
and dynamic user experience.
- Scalability: The library can handle large datasets and works well on various devices and screen sizes,
providing a responsive user experience.
- Community and Support: Being developed by Google, Google Charts have an active community and
good documentation, making it easier to find solutions to common issues.
- Free and Open Source: Google Charts are free to use and based on open-source technologies, making
it accessible to a wide range of developers without any licensing costs.
g. Disadvantages of Google Chart
Disadvantages of Google Charts:
- Limited Customization Control: While Google Charts offer some customization options, it might not
meet the specific design requirements of every application. Advanced customization may require
working around the limitations of the library.
- Dependency on Internet Connection: As Google Charts is a web-based library, it requires an internet

22
connection to access the necessary scripts. This can be a concern in certain environments with
restricted internet access.
- Updates and Changes: With any library, updates and changes can occur, which might affect existing
implementations or require developers to adapt their code to the latest version.
- Limited Offline Support: As Google Charts are typically loaded from external sources, offline
availability might be limited unless developers use caching or other techniques to store the required
scripts locally.
- Data Security: Since Google Charts require data to be sent to Google's servers for processing, some
organizations may have concerns about data privacy and security.

In conclusion, Google Charts is a powerful and accessible tool for creating interactive data
visualizations on the web. It is well-suited for developers looking for a quick and easy way to
implement various chart types. However, it's essential to weigh the pros and cons against the specific
requirements and constraints of your project to determine if Google Charts is the right choice for your
needs.

h. Predictive analytics tool


At this part of the report, I'll list some well-known predictive analytics tools:

IBM SPSS Modeler

Figure 16: IBM SPSS Modeler


IBM SPSS Modeler is a comprehensive predictive analytics platform that allows users to build and
deploy predictive models using a visual interface. It supports data preparation, data mining, and

23
machine learning techniques.

Pros:
- User-friendly visual interface for building predictive models.
- Comprehensive toolset for data preparation, data mining, and machine learning.
- Good integration with other IBM SPSS products and the IBM ecosystem.

Cons:
- Can be expensive for small organizations or individual users.
- Steeper learning curve for advanced features and customization.

RapidMiner

Figure 17: RapidMiner


RapidMiner is an open-source data science platform that offers advanced analytics, including
predictive modeling, machine learning, and text mining. It provides a visual workflow interface for
building and deploying predictive models.

Pros:
- Open-source, so it's accessible to users with limited budgets.
- Extensive library of pre-built operators and extensions
- Scalable for enterprise-level applications.

Cons:

24
- Limited support for large datasets compared to commercial tools.
- Advanced functionalities may require knowledge of coding and scripting.

Microsoft Azure Machine Learning Studio

Figure 18: Microsoft Azure Machine Learning Studio


Part of the Microsoft Azure cloud ecosystem, Azure Machine Learning Studio enables users to build,
deploy, and manage machine learning models using drag-and-drop modules.

Pros:
- Cloud-based platform with seamless integration into the Microsoft Azure ecosystem.
- Drag-and-drop interface for building models without extensive coding knowledge.
- Good for scaling models using cloud resources.

Cons:
- Limited to the Azure environment, which may not suit all organizations.
- May require some level of familiarity with Azure services.

25
Python (with libraries like scikit-learn and TensorFlow)

Figure 19: Python


Python is a popular programming language for data science, and it has numerous libraries like scikit-
learn (for machine learning) and TensorFlow (for deep learning) that facilitate predictive analytics
tasks.

Pros:
- Open-source and widely used in the data science community.
- Extensive libraries and frameworks for machine learning and deep learning.
- Highly flexible and customizable.

Cons:
- Requires coding skills, making it less approachable for non-technical users.
- Some algorithms might require optimization for large-scale datasets.

26
R Language

Figure 20: R Language


R is another widely used programming language for statistical computing and graphics. It has a vast
ecosystem of packages for predictive modeling and data analysis.

Pros:
- Rich collection of statistical and machine learning packages.
- Active community with regular updates and improvements.
- Great for statistical analysis and data visualization.

Cons:
- Steeper learning curve for users with no prior programming experience.
- Memory management can be a challenge for large datasets.

27
i. Multi – Cloud strategy

Figure 21: Multi - Cloud strategy


A multi-cloud strategy refers to an approach where an organization uses more than one cloud service
provider to meet its cloud computing needs. Instead of relying on a single cloud provider, a multi-
cloud strategy distributes workloads and resources across multiple cloud platforms.

Benefits of a Multi-Cloud Strategy:


- Reduced Vendor Lock-in: By using multiple cloud providers, organizations can avoid vendor lock-in,
where they become heavily dependent on a single provider's services and technologies. This
flexibility allows them to switch between providers if necessary, which can lead to better negotiation
leverage and more competitive pricing.
- Improved Reliability and Redundancy: Multi-cloud setups can enhance reliability and redundancy. If
one cloud provider experiences an outage or downtime, workloads and services can quickly switch to
another provider, minimizing disruptions and ensuring continuous operations.
- Optimized Performance and Latency: Different cloud providers may have data centers in various
geographic regions. Organizations can strategically place their resources closer to end-users or
specific markets, reducing latency and improving performance for users in different locations.
- Best-of-Breed Services: Different cloud providers excel in various services and technologies. With a
multi-cloud approach, organizations can choose the best-of-breed services from different providers,
tailoring their solutions to specific requirements and needs.
- Data Sovereignty and Compliance: Some organizations may have data sovereignty and compliance
requirements that dictate where their data must be stored. A multi-cloud strategy allows them to select
specific providers based on their geographical presence and compliance standards.
- Cost Optimization: Multi-cloud strategies can offer cost advantages by allowing organizations to take

28
advantage of cost-effective pricing models from different providers and avoid potential price hikes
from a single provider.

Considerations and Challenges:


- Complexity and Management: Managing multiple cloud providers can add complexity to an
organization's IT infrastructure. IT teams need to be well-versed in different cloud platforms and have
appropriate monitoring and management tools to handle diverse environments.
- Interoperability and Integration: Ensuring seamless integration and interoperability between various
cloud platforms and services can be challenging. Standardizing interfaces and data formats can help
streamline operations.
- Security and Compliance: Security becomes more complex with a multi-cloud approach, as each
provider may have different security protocols and measures. Ensuring consistent security practices
and compliance across multiple clouds is essential.
- Data Transfer and Latency: Transferring data between different cloud providers can incur costs and
introduce potential latency. Organizations need to consider data transfer charges and optimize data
placement for optimal performance.
- Cost Management: While a multi-cloud approach can offer cost benefits, it can also lead to cost
management challenges. Careful monitoring and optimization of resources are necessary to prevent
unexpected expenses.
- Vendor Management and Relationships: Working with multiple cloud providers means managing
relationships with each of them. Effective communication and vendor management become critical to
resolving issues and ensuring service levels are met.

In conclusion, a multi-cloud strategy can provide organizations with increased flexibility, redundancy,
and the ability to leverage various cloud providers' strengths. However, it also introduces additional
complexities and challenges that need to be carefully managed to reap the full benefits of the
approach. Before adopting a multi-cloud strategy, organizations should conduct a thorough
assessment of their requirements, evaluate the capabilities of different cloud providers, and develop a
well-defined governance and management plan to successfully implement and maintain the multi-
cloud environment.

29
3.2. Techniques associated with Business Intelligence
a. OLAP (Online Analytical Processing)
Online analytical processing (OLAP) is a software technology you can use to analyze business data
from different points of view. Organizations collect and store data from multiple data sources, such as
websites, applications, smart meters, and internal systems. OLAP combines and groups this data into
categories to provide actionable insights for strategic planning.

Figure 22: OLAP

Why is OLAP important?

Online analytical processing (OLAP) helps organizations process and benefit from a growing amount
of digital information. Some benefits of OLAP include the following.

Faster decision making: Businesses use OLAP to make quick and accurate decisions to remain
competitive in a fast-paced economy. Performing analytical queries on multiple relational databases is
time consuming because the computer system searches through multiple data tables. On the other
hand, OLAP systems precalculate and integrate data so business analysts can generate reports faster
when needed.

Non-technical user support: OLAP systems make complex data analysis easier for non-technical
business users. Business users can create complex analytical calculations and generate reports instead
of learning how to operate databases.

30
Integrated data view: OLAP provides a unified platform for marketing, finance, production, and other
business units. Managers and decision makers can see the bigger picture and effectively solve
problems. They can perform what-if analysis, which shows the impact of decisions taken by one
department on other areas of the business.

b. Data Mining
 Definition
Data mining along with Data science are the two most commonly used technology fields today. With
the ability to gather and organize large amounts of data to give the most accurate analysis.

The data mining process of Data mining takes place with advanced computational technology that is
not only limited to data extraction but is also used for transformation, cleaning, data integration, and
pattern analysis. Some current applications of Data mining:
- Apply to the analysis of market and stock data.
- Fraud detection.
- Analyze business from there to manage risks.

 Data Mining Tools

RapidMiner

Figure 23: RapidMiner


The first tool to mention is RapidMiner. This is quite popular data mining tool today. Written on the
JAVA platform but requires no coding to operate. In addition, it also provides various data mining
functions such as data preprocessing, data representation, filtering, clustering, etc.

31
Knime

Figure 24: Knime


With extremely powerful operability that integrates various components of machine learning and data
mining to provide one platform. KNime greatly assists users in data processing and analysis, data
extraction, transformation and loading.

32
Weka

Figure 25: Weka


The tool launched at the University of Wichita is an open source data mining software. Similar to
RapidMiner, this tool requires no coding and uses a simple GUI. Using Weka, users can call machine
learning algorithms directly or import them using Java code. Weka is equipped with a variety of
functions such as visualization, preprocessing, classification, clustering,...

33
Apache Mahout

Figure 26: Apache Mahout

From the Big Data Hadoop foundation, came Apache Mahout with the aim of addressing the growing
need for data mining and analytics in Hadoop. It is equipped with various machine learning functions
such as classification, regression, clustering, etc.

34
Orange

Figure 27: Orange


The tool is programmed in Python with an intuitive interface and easy interaction. Orange software is
known for integrating simple, intelligent machine learning and data mining tools. Through the article,
we have come together to learn and analyze to clarify what is the concept of Data mining? and its
applications to data mining tools. This is really a very important and helpful area in data analysis and
processing that we should learn and apply.

35
c. Sisense

Figure 28: Sisense


Sisense is an analytics business intelligence platform that enables you to build analytics apps that
deliver highly interactive user experiences. The business intelligence and dashboard reporting
software allows you to access and combine data in a few clicks. You can connect to structured and
unstructured data sources, join tables from multiple sources with minimal scripting and coding, and
create interactive web dashboards and reports. In this article, you learn how to set up Azure Data
Explorer as a data source for Sisense, and visualize data from a sample cluster.

36
d. Microsoft BI platform

Microsoft Power Platform is comprised of four key products: Power Apps, Power Automate, Power
BI, and Power Virtual Agents.

Figure 29: Microsoft BI platform

Power Apps provides a rapid low code development environment for building custom apps for
business needs. It has services, connectors, and a scalable data service and app platform (Microsoft
Dataverse) to allow simple integration and interaction with existing data. Power Apps enables the
creation of web and mobile applications that run on all devices.

Power Automate allows users to create automated workflows between apps and services. It helps to
automate repetitive business processes such as communication, data collection, and decision approval.

Power BI (Business Intelligence) is a business analytics service that provides insights for data
analysis. It can share insights through data visualizations that generate reports and dashboards that
enable businesses to make quick, informed decisions. Power BI scales within an organization, with
integrated governance and security allowing businesses to focus on using data rather than managing
it.

Power Virtual Agents allows anyone to create powerful chatbots using a guided, code-free graphical
interface without the need for data scientists or developers.

37
e. SAP Business Intelligence
SAP Business Intelligence solutions can be divided into two main groups. The first is data warehouse
systems, whose primary task is to integrate data from various sources, their storage, and
transformation (SAP Datasphere, SAP BW/4HANA, or SAP HANA).

Figure 30: SAP BI

SAP BI courses demonstrate the keys to increasing the flexibility and gradation of important business
processes, including the progress and administration of SAP Business Objects (BO). They teach
aspirants to simplify the search capabilities of various business tools. There are a number of unified
courses available in SAP BI, as explained below:

SAP Business Objects BI Solution: It includes ‘overview and integration’ sessions that provide an
introduction to BI solutions with respect to business objects and warehouse data handling. This short
session is accompanied by ‘Delta and Early Product Training’ modules that include comprehensive e-
learning classes for the various versions of the software, along with other BI tools.

Administration - BI Solutions: This course is equipped with keys for the administrative and security
departments. It covers the processes of designing and deploying security tools and SAP migration
objects alike.

Business Intelligence - Dashboard: This course comprises of short sessions dealing with the
dashboard of Business Intelligence Objects.

Business Intelligence - SAP Business Objects Explorer: It encompasses the operations and tools of
different data warehousing tools that reduce the concerns of IT backlog; thereby providing easy data
accessibility for one and all.

Business Intelligence - Crystal Reports: By pursuing this course, aspirants can design reports with

38
fundamental/ advanced BI knowledge and can optimize data reports and report processing strategies
effectively.

Business Intelligence - Web Intelligence: All BI processes - from reports to user interface designs -
can be deployed with the help of this course.

Business Intelligence - Analysis and Design Studio: Various versions of Microsoft Office and the
analysis of whole process are referred to in this course.

Predictive Analytics: This helps in gaining an analytical insight of BI tools.

f. Self-Service reporting

Figure 31: Self - Serve reporting

Self-service reporting is a type of business intelligence that allows everyday users to access and
analyze data without relying on IT or other technical resources. It empowers non-technical users to
answer their own data questions, find insights, create data visualizations, and turn that all into
customized reports that meet their specific needs.

Unlike traditional reporting, which can take days to deliver insights, self service reporting
dramatically reduces the time and energy required to produce a report.By having up-to-date
information at their fingertips, companies can make more informed decisions, avoid unnecessary risk,
and be better equipped to tackle various challenges. Additionally, organizations can reduce

39
operational costs by eliminating the need for manual data entry and analysis, while freeing up data
teams from being report factories to tackle more strategic, impactful work like building data pipelines
and implementing a modern data stack.

g. Data visualization

Figure 32: Data visualization

Data visualization is the representation of information and data using charts, graphs, maps, and other
visual tools. These visualizations allow us to easily understand any patterns, trends, or outliers in a
data set. Data visualization also presents data to the general public or specific audiences without
technical knowledge in an accessible manner. For example, the health agency in a government might
provide a map of vaccinated regions. The purpose of data visualization is to help drive informed
decision-making and to add colorful meaning to an otherwise bland database.

Benefits of data visualization:


- Data visualization can be used in many contexts in nearly every field, like public policy, finance,
marketing, retail, education, sports, history, and more. Here are the benefits of data visualization:
- Storytelling: People are drawn to colors and patterns in clothing, arts and culture, architecture, and
more. Data is no different—colors and patterns allow us to visualize the story within the data.
- Accessibility: Information is shared in an accessible, easy-to-understand manner for a variety of
audiences.

40
- Visualize relationships: It’s easier to spot the relationships and patterns within a data set when the
information is presented in a graph or chart.
- Exploration: More accessible data means more opportunities to explore, collaborate, and inform
actionable decisions.

Here are some common types of data visualizations:


- Table: A table is data displayed in rows and columns, which can be easily created in a Word
document or Excel spreadsheet.
- Chart or graph: Information is presented in tabular form with data displayed along an x and y axis,
usually with bars, points, or lines, to represent data in comparison. An infographic is a special type of
chart that combines visuals and words to illustrate the data.
- Gantt chart: A Gantt chart is a bar chart that portrays a timeline and tasks specifically used in project
management.
- Pie chart: A pie chart divides data into percentages featured in “slices” of a pie, all adding up to
100%.
- Geospatial visualization: Data is depicted in map form with shapes and colors that illustrate the
relationship between specific locations, such as a choropleth or heat map.
- Dashboard: Data and visualizations are displayed, usually for business purposes, to help analysts
understand and present data.

Using data visualization tools, different types of charts and graphs can be created to illustrate
important data. These are a few examples of data visualization in the real world:
- Data science: Data scientists and researchers have access to libraries using programming languages or
tools such as Python or R, which they use to understand and identify patterns in data sets. Tools help
these data professionals work more efficiently by coding research with colors, plots, lines, and shapes.
- Marketing: Tracking data such as web traffic and social media analytics can help marketers analyze
how customers find their products and whether they are early adopters or more of a laggard buyer.
Charts and graphs can synthesize data for marketers and stakeholders to better understand these
trends.
- Finance: Investors and advisors focused on buying and selling stocks, bonds, dividends, and other
commodities will analyze the movement of prices over time to determine which are worth purchasing
for short- or long-term periods. Line graphs help financial analysts visualize this data, toggling
between months, years, and even decades.
- Health policy: Policymakers can use choropleth maps, which are divided by geographical area
(nations, states, continents) by colors. They can, for example, use these maps to demonstrate the
mortality rates of cancer or ebola in different parts of the world.

41
C. DESIGN A BUSINESS INTELLIGENCE TOOL, APPLICATION OR INTERFACE THAT
CAN PERFORM A SPECIFIC TASK TO SUPPORT PROBLEM-SOLVING OR DECISION-
MAKING AT AN ADVANCED LEVEL
1. Data processing
1.1. Definition
Data processing is the manipulation and transformation of raw data into meaningful and useful
information. It involves various techniques and steps to clean, organize, analyze, and present data in a
format that allows for easy interpretation and decision-making. Data processing is a crucial part of the
data lifecycle and is essential for extracting valuable insights from large and complex datasets.

The process typically includes the following steps:

Step 1 - Data Collection: The first step is to gather data from various sources, which can include
databases, sensors, files, APIs, web scraping, surveys, or other data generation methods.

Step 2 - Data Cleaning: Once the data is collected, it often requires cleaning to handle missing values,
remove duplicates, correct errors, and ensure data integrity. Data cleaning is essential to ensure the
accuracy and reliability of the analysis.

Step 3 - Data Integration: In some cases, data may come from different sources and in various
formats. Data integration involves combining data from multiple sources into a unified and coherent
dataset.

Step 4 - Data Transformation: Data might need to be transformed into a suitable format for analysis.
This can involve normalization, scaling, or converting data types to make it consistent and
comparable.

Step 5 - Data Aggregation: Aggregation involves summarizing data into a more compact form, often
for easier analysis. It might involve calculating averages, sums, counts, or other statistical measures.

Step 6 - Data Analysis: Once the data is processed and prepared, various analytical techniques can be
applied to gain insights and patterns from the data. This step often involves using statistical methods,
machine learning algorithms, or other data mining techniques.

Step 7 - Data Visualization: Data visualization is the graphical representation of data to make it easier
to understand and interpret patterns and trends. Visualizations like charts, graphs, and maps are
commonly used to present data.

Step 8 - Data Interpretation: The final step is to interpret the processed data, draw conclusions, and
make informed decisions based on the insights gained from the analysis.

Data processing is an iterative process, and different techniques and tools are used depending on the
nature of the data and the goals of the analysis. The goal is to turn raw data into actionable
knowledge, which can be used to optimize business processes, identify opportunities, solve problems,

42
or improve decision-making. Efficient and accurate data processing is essential for ensuring the
success of data-driven initiatives and obtaining meaningful and valuable insights from data.

1.2. Type of Data processing


Data processing can be categorized into several types, depending on the nature of the operations
performed on the data and the specific goals of the processing. Here are some common types of data
processing:

Batch Processing
Batch processing involves processing data in predefined batches or groups. Data is collected over a
period, stored, and then processed as a batch at a scheduled time. This type of processing is common
for handling large volumes of data efficiently and is often used in tasks like end-of-day financial
calculations or generating monthly reports.

Pros:
- Efficient processing of large volumes of data at scheduled intervals.
- Simplicity in implementation and automation.
- Suitable for tasks that don't require real-time responses.

Cons:
- Delayed processing, which may not be suitable for time-sensitive applications.
- Inefficient for handling real-time data streams or continuous data updates.

Real-time Processing
Real-time processing, also known as online or stream processing, involves handling data as it arrives
or in real-time. The data is processed immediately as it is generated, allowing for instant analysis and
quick responses. Real-time processing is essential for applications that require immediate actions
based on incoming data, such as fraud detection, real-time monitoring, and IoT (Internet of Things)
applications.

Pros:
- Immediate processing and analysis of incoming data.
- Quick responses and actions based on real-time insights.
- Well-suited for time-critical applications and real-time monitoring.

Cons:
- Higher complexity in implementation compared to batch processing.
- Resource-intensive as it requires processing data as it arrives.

Online Transaction Processing (OLTP)


OLTP is a type of data processing used for managing transactional data in databases. It involves
processing individual transactions, such as adding or updating records in a database, in real-time.

43
OLTP systems are typically used in applications like e-commerce, banking, and inventory
management.

Pros:
- Supports concurrent transactions from multiple users.
- Ensures data consistency and integrity in real-time.
- Widely used in applications requiring immediate transaction processing.

Cons:
- May not be optimized for complex analytical queries and reporting.
- Performance can be impacted during high transaction volumes.

Online Analytical Processing (OLAP)


OLAP is a type of data processing used for complex data analysis and decision support. OLAP
systems allow users to query and analyze data from multiple dimensions, enabling them to gain
insights and perform ad-hoc analysis easily. OLAP is commonly used in business intelligence and
reporting applications.

Pros:
- Allows users to perform complex ad-hoc analysis and gain insights.
- Efficient for multidimensional querying and reporting.
- Enables decision-makers to access real-time business intelligence.

Cons:
- May not handle high transaction volumes as efficiently as OLTP systems.
- Requires a well-designed data warehouse and OLAP cubes.

Data Mining
Data mining is the process of discovering patterns, trends, or insights from large datasets using
various statistical and machine learning techniques. It aims to extract valuable information that was
previously unknown or not explicitly expressed in the data.

Pros:
- Helps uncover hidden patterns and insights from large datasets.
- Identifies valuable trends and associations in the data.
- Enables organizations to make data-driven decisions.

Cons:
- Can be computationally intensive for large datasets.
- Requires expertise in statistics and machine learning.

44
Each type of data processing serves specific purposes and is essential in various domains, including
business, finance, healthcare, research, and more. Organizations may utilize multiple types of data
processing techniques to handle their data efficiently and gain meaningful insights from it.

45
1.3. Advantages of Data processing
Data processing plays a crucial role in converting raw data into meaningful and actionable
information. It offers numerous advantages that are essential for decision-making, problem-solving,
and overall business success. Here are some of the key advantages of data processing:
- Data Organization: Data processing organizes raw data into a structured format, making it easier to
store, manage, and retrieve when needed. Well-organized data enables efficient data access and
analysis, reducing the time spent searching for relevant information.
- Data Accuracy and Quality: Through data cleaning and validation, data processing helps improve the
accuracy and quality of data. Removing errors, duplicates, and inconsistencies ensures that the data
used for analysis and decision-making is reliable and trustworthy.
- Efficient Analysis: Processed data allows for more efficient data analysis. With organized and cleaned
data, analysts can quickly identify trends, patterns, and insights, leading to faster and more informed
decision-making.
- Real-time Insights: Real-time data processing allows for immediate analysis of incoming data. This is
particularly valuable in applications where real-time responses are crucial, such as fraud detection,
monitoring systems, and online customer support.
- Cost Savings: Improved data accuracy and quality can lead to cost savings by reducing errors,
minimizing the need for rework, and avoiding costly mistakes based on faulty data.
- Automation: Data processing can be automated, reducing manual efforts and increasing productivity.
Automated data processing pipelines can handle repetitive tasks, freeing up human resources for more
strategic activities.
- Better Decision-Making: By transforming data into valuable insights, data processing supports better
decision-making across all levels of an organization. Informed decisions lead to improved strategies,
enhanced efficiency, and a competitive advantage.

To sum up, data processing is an indispensable step in the data lifecycle. Its advantages extend
beyond just data organization; it empowers organizations to gain valuable insights, make data-driven
decisions, and achieve business objectives effectively and efficiently. With the growing importance of
data in today's world, data processing continues to play a pivotal role in driving innovation and
success across industries.

46
1.4. Disadvantages of Data processing
While data processing offers numerous advantages, there are also some potential disadvantages and
challenges associated with it. Here are some of the main disadvantages of data processing:
- Data Privacy and Security Concerns: Data processing involves handling sensitive and valuable
information, making data privacy and security a major concern. Mishandling or unauthorized access
to data can lead to data breaches, privacy violations, and reputational damage for organizations.
- Data Bias and Quality Issues: Data processing heavily relies on the quality of input data. If the data
used for processing is biased, incomplete, or inaccurate, it can lead to biased insights and flawed
decision-making. Ensuring data quality and addressing biases can be challenging.
- Data Loss or Corruption: During data processing, there is a risk of data loss or corruption if not
handled properly. Data backups and redundancy measures are essential to mitigate these risks.
- Data Processing Errors: Human errors or software bugs during data processing can lead to incorrect
outcomes. Proper validation and testing procedures are necessary to identify and rectify such errors.
- Data Processing Bottlenecks: In complex data processing pipelines, bottlenecks can occur when
certain processing steps take longer than others. Identifying and optimizing these bottlenecks is
essential to improve overall processing efficiency.
- Lack of Expertise: Effective data processing requires skilled data professionals who understand data
manipulation, analysis, and interpretation. A shortage of skilled data analysts or data scientists can
hinder the data processing capabilities of an organization.

Despite these disadvantages, data processing remains a critical component of modern business
operations. Organizations need to be aware of these challenges and take appropriate measures to
address them, ensuring that data processing is carried out responsibly, securely, and in a manner that
maximizes the benefits while minimizing risks.

47
2. Show and explain data
Annual
ID Full Name Department Business Unit Gender Age Hire Date Bonus Country
Salary
E02 Research & United
Emily Davis IT Female 55 4/8/2016 $141,604 15%
387 Development States
E02 Luna Speciality United
Finance Female 50 10/26/2006 $163,099 20%
572 Sanders Products States
E02 Penelope United
IT Manufacturing Female 26 9/27/2019 $84,913 7%
832 Jordan States
E00 Bella Research & United
Finance Female 65 3/4/2002 $175,837 20%
163 Powell Development States
E00 Camila Speciality United
Marketing Female 64 12/1/2003 $154,828 13%
884 Silva Products States
E04 David United
IT Corporate Male 64 11/3/2013 $186,503 24%
116 Barnes States
E03 Elias
IT Manufacturing Male 56 1/9/2012 $146,140 10% Brazil
680 Alvarado
E04 United
Eva Rivera Sales Manufacturing Female 36 4/2/2021 $151,703 21%
732 States
E03 Logan Research &
IT Male 59 5/24/2002 $172,787 28% Brazil
484 Rivera Development
E00 Leonardo Speciality United
Sales Male 37 9/5/2019 $49,998 0%
671 Dixon Products States
E02 Jose Human Speciality United
Male 41 4/17/2015 $152,239 23%
206 Henderson Resources Products States
E04 Abigail
Engineering Corporate Female 56 2/5/2005 $98,581 0% Brazil
545 Mejia
E00 Speciality United
Wyatt Chin Engineering Male 43 6/7/2004 $246,231 31%
154 Products States
E02 Ezekiel Research & United
IT Male 28 6/25/2017 $54,775 0%
594 Kumar Development States
E00 Dominic
Finance Manufacturing Male 65 5/16/2004 $55,499 0% Brazil
402 Guzman
E01 Angel Research & United
Sales Male 61 7/11/2008 $66,521 0%
994 Powell Development States
E03 Caroline Research & United
Finance Female 27 5/6/2018 $49,011 0%
247 Jenkins Development States
E02 Nora United
IT Manufacturing Female 32 2/11/2014 $99,575 0%
074 Brown States
E01 Jackson Research & United
Marketing Male 27 10/20/2019 $256,420 30%
628 Perry Development States
E04 Riley United
IT Manufacturing Female 35 5/15/2013 $78,940 0%
285 Padilla States
E01 Leah Pena IT Corporate Female 57 1/3/1994 $82,872 0% Brazil

48
417
E03 Kennedy Speciality United
Marketing Female 53 11/23/2013 $113,135 5%
749 Foster Products States
E03 Speciality United
John Moore IT Male 52 11/8/2005 $199,808 32%
574 Products States
Sadie
E00 Research & United
Washingto Marketing Female 29 5/24/2019 $122,350 12%
586 Development States
n
E03 Gabriel Research & United
IT Male 40 11/4/2010 $92,952 0%
538 Holmes Development States
E02 Wyatt United
IT Corporate Male 32 3/20/2013 $79,921 5%
185 Rojas States
E03 Eva Research & United
IT Female 37 9/20/2009 $167,199 20%
830 Coleman Development States
E03 Dominic Research & United
Engineering Male 52 10/17/2012 $71,476 0%
720 Clark Development States
E03 Lucy United
Engineering Manufacturing Female 45 10/29/2014 $189,420 20%
025 Alexander States
Everleigh
E04 Human Research & United
Washingto Female 64 10/20/2001 $64,057 0%
917 Resources Development States
n
E00 Leilani United
Marketing Manufacturing Female 27 9/21/2021 $68,728 0%
415 Butler States
E04 John United
Marketing Manufacturing Male 35 5/15/2011 $66,889 0%
207 Contreras States
E02 Research & United
Rylee Yu Accounting Female 36 9/29/2015 $178,700 29%
139 Development States
E01 Research & United
Piper Lewis Engineering Female 33 12/22/2018 $83,990 0%
797 Development States
E01 Zoey Human United
Manufacturing Female 46 8/21/2008 $59,067 0%
848 Jackson Resources States
E00
Ava Ayala IT Corporate Female 55 8/16/2006 $159,044 10% Brazil
699
E03 Anna Speciality United
IT Female 32 1/5/2020 $78,844 0%
349 Mehta Products States
E02 William United
Engineering Manufacturing Male 58 5/23/2002 $76,354 0%
966 Foster States
E01 Speciality United
Jade Rojas Finance Female 37 1/28/2019 $165,927 20%
499 Products States
E00 Isla Speciality
Accounting Female 38 11/16/2021 $109,812 9% Brazil
105 Espinoza Products
E00 United
David Chu Engineering Corporate Male 55 9/3/1998 $86,299 0%
665 States
E00 Thomas Marketing Research & Male 57 7/26/2003 $206,624 40% Brazil

49
791 Padilla Development
E01 Miles
IT Manufacturing Male 36 12/23/2010 $53,215 0% Brazil
540 Salazar
E00 Samuel United
Finance Corporate Male 34 6/27/2015 $57,008 0%
254 Morales States
E02 United
John Soto Finance Manufacturing Male 60 9/23/2015 $141,899 15%
166 States
E00 Joseph United
Marketing Corporate Male 41 9/13/2016 $64,847 0%
935 Martin States
E01 Research & United
Jose Ross Engineering Male 53 4/8/1992 $116,878 11%
525 Development States
E00 Parker Speciality United
Engineering Male 45 2/5/2005 $70,505 0%
386 James Products States
E00 Everleigh Research &
Engineering Female 30 5/22/2016 $189,702 28% Brazil
416 Fernandez Development
E03 Lincoln Speciality United
Accounting Male 26 7/28/2020 $180,664 27%
383 Hall Products States
E03 Genesis
IT Corporate Female 41 4/28/2009 $69,803 0% Brazil
440 Navarro
E00 Eliza
IT Corporate Female 48 7/4/2019 $76,588 0% Brazil
431 Hernandez
E01 Gabriel United
IT Manufacturing Male 29 12/10/2018 $84,596 0%
258 Brooks States
E00 Amelia
Finance Corporate Female 26 4/23/2019 $59,817 0% Brazil
972 Salazar
E04 Xavier United
Sales Manufacturing Male 31 7/22/2017 $55,854 0%
562 Zheng States
E02 Matthew Human Research & United
Male 53 11/16/2002 $95,998 0%
802 Chau Resources Development States
E01 United
Mia Cheng Sales Manufacturing Female 34 4/22/2015 $154,941 13%
427 States
E04 Zoe
IT Manufacturing Female 32 10/5/2021 $88,072 0% Brazil
931 Romero
E03 Nevaeh United
Sales Manufacturing Female 31 8/20/2020 $219,693 30%
890 Jones States

50
Explain dataset

This dataset contains information about employees working in an online sales company. Each row represents
a single employee, and the columns provide various details about the employee, including their ID, Full
Name, Department, Business Unit, Gender, Age, Hire Date, Annual Salary, Bonus, and Country. Let's go
through the columns to understand what each one represents:
- ID: Employee ID, a unique identifier for each employee.
- Full Name: The full name of the employee.
- Department: The department in which the employee works (e.g., IT, Finance, Marketing, Sales,
Engineering, Human Resources, Accounting).
- Business Unit: The specific business unit or division within the department where the employee is
assigned.
- Gender: The gender of the employee (e.g., Male or Female).
- Age: The age of the employee.
- Hire Date: The date when the employee was hired by the company.
- Annual Salary: The employee's annual salary in dollars.
- Bonus: The bonus amount given to the employee as a percentage of their annual salary.
- Country: The country where the employee is based or where the company operates (e.g., United
States or Brazil).

Each row in this dataset provides information about a different employee, and there are more than 50
employees in this dataset. This kind of data can be used for various analyses, such as employee performance,
salary trends, gender diversity, and department-specific insights.

51
3. Python
3.1. Features and benefits of Python
Python is a versatile and widely-used programming language that offers a range of features and benefits,
making it popular among developers for various applications. Here are some of the key features and benefits
of Python:

Features
- Simple and Easy to Learn: Python's syntax is clear, readable, and straightforward, making it easy for
beginners to learn and write code quickly.
- Expressive Language: Python allows developers to express concepts in fewer lines of code compared
to many other programming languages. This feature enhances code readability and maintainability.
- Interpreted Language: Python is an interpreted language, which means that the code is executed line-
by-line, enabling faster development and testing cycles.
- Large Standard Library: Python comes with a comprehensive standard library that includes various
modules and functions for tasks like file I/O, networking, web services, regular expressions, and
more, saving developers time and effort.
- Dynamic Typing: Python uses dynamic typing, which means you don't need to declare the data type
of a variable explicitly. The interpreter determines the data type at runtime.
- Cross-platform Compatibility: Python code can run on multiple platforms, such as Windows, macOS,
Linux, etc., without any modifications, as long as the required dependencies are available.
- Object-Oriented Programming (OOP): Python supports OOP principles, allowing developers to create
and use classes and objects, making code organization and maintenance easier.

Benefits
- Versatility: Python is used across various domains, including web development, data analysis,
machine learning, artificial intelligence, scripting, automation, scientific computing, and more.
- Readability: Python's clear and readable syntax reduces the chances of syntax errors and makes it
easier for developers to collaborate on projects.
- Productivity and Rapid Development: Python's simplicity and extensive standard library enable
developers to build applications more quickly and efficiently.
- Scalability: Python is suitable for both small-scale and large-scale projects, and it can be integrated
with other programming languages easily.
- Community Support: Python has a large and active community that provides support, shares
knowledge, and continuously improves the language and its ecosystem.
- Popular Frameworks: Python has popular frameworks like Django for web development, Flask for
lightweight web applications, Pandas for data manipulation, NumPy and SciPy for scientific
computing, and TensorFlow and PyTorch for machine learning and AI, among others.
- Job Market and Career Opportunities: Python is widely used in the industry, leading to abundant job
opportunities for Python developers and data scientists.

52
3.2. Use BI

This Python code is a simple command-line utility that performs various data cleaning operations on a CSV
file using the pandas library. I will explain each part of the code:

Step 1 - Importing Required Libraries:

Figure 33: Importing Required Libraries

- The code starts by importing the “chardet” library, which is used to automatically detect the encoding of
the CSV file.
- It also imports the “panda” library as “pd”, which is used for data manipulation and analysis.

Step 2 - Conditional Execution and User Menu:

Figure 34: Conditional Execution and User Menu

- This part checks if the script is executed directly and not imported as a module.
- It then presents a simple menu with five options for the user to choose from.

53
Step 3 - User Input and Data Cleaning Operations:

Figure 35: User Input and Data Cleaning Operations

- The script initializes a variable “df” with the function “pd.isnull”, but this line seems to be incomplete
and not serving any purpose in the context of this code.
- It also initializes an empty string “b”, which will be used to store the file path entered by the user.
- The script enters an infinite “while” loop, where it continuously asks the user to input a choice (integer)
from the menu.

Step 4 - Option 1: Choose File to Clean Data:

Figure 36: Option 1: Choose File to Clean data

If the user chooses option 1, the script will ask for the file path and store it in the variable “b”.

Step 5 - Option 2: Clean Duplicate Rows:

Figure 37: Option 2: Clean Duplicate Rows

- If the user chooses option 2, the script will open the file specified by the user (`b`) in binary read mode and
use `chardet` to detect the encoding of the file.

- After detecting the encoding, it loads the CSV file into a DataFrame (`df`) using the detected encoding.

- It then drops duplicate rows based on the 'EEID' column and saves the cleaned DataFrame to a new CSV
file with a name prefixed by "no_duplicates_".

54
Step 6 - Option 3: Exit:

Figure 38: Option 3: Exit

If the user chooses option 3, the script will break out of the infinite loop, effectively ending the program.

Step 7 - Option 4: Clear Empty Data:

Figure 39: Option 4: Clear Empty Data

- If the user chooses option 4, it follows a similar process as in option 2 to detect the encoding and load
the CSV file into the DataFrame “df”.
- It then drops rows containing any missing values (NaN) and saves the cleaned DataFrame to a new CSV
file with a name prefixed by "no_empty".

Step 8 - Option 5: Drop Column:

Figure 40: Option 5: Drop Column

- If the user chooses option 5, it follows the same encoding detection and DataFrame loading process.
- It then drops the last two columns from the DataFrame `df` and saves the resulting DataFrame to a new
CSV file with a name prefixed by "drop_2last_column_".

55
4. Tableau

5. Google form

In the first survey question with 15 participants, 40% of respondents said that they are very satisfied with
the charts and reports given by BI tools because when they look at the graphs, they can see the data is very
intuitive, easy to understand and easy to evaluate and make comments. There are 10% of people feel
normal and 10% of people feel dissatisfied with the charts, this report because they find these charts very
normal and they often encounter in life.

56
57
Through the survey, 70% of people can find the answer through the graph, 60% fast and 10% very fast.
And through your contributions, there is nothing for people to comment on and the majority of people
recommend that they have and will continue to use BI tools.

D. CONCLUSION

In this second report, we demonstrated to management the ability to apply business intelligence in the
company's current business processes. This report has provided information including the following
sections: Explain the general concept of what BI is, Introduce some tools/techniques for BI and their
application in general, give the dataset extracted from the company's business processes and explain the
dataset. Show how we preprocess the data for later analysis, explaining each step and its purpose. Design
dashboards to show your analysis on preprocessed data. Clearly explain the purpose of dashboards and
charts. Suggestions should be made after analysis. Additionally, during the demonstration, we also
collected feedback and comments from users to review how well our dashboard design meets user or
business requirements and what customizations are needed for future use.

E. REFERENCES

Stedman, C. and Burns, E. (2023) What is Business Intelligence (BI)?: Definition from TechTarget,
Business Analytics. Available at: https://www.techtarget.com/searchbusinessanalytics/definition/business-
intelligence (Accessed: 26 July 2023).

Top 15 business intelligence tools (BI Tools), Mopinion. Available at: https://mopinion.com/business-
intelligence-bi-tools-overview/ (Accessed: 26 July 2023).

58
Adair, B. (2023) Business intelligence systems: Types of BI tools in 2023, SelectHub raquo. Available at:
https://www.selecthub.com/business-intelligence/key-types-business-intelligence-tools/ (Accessed: 26
July 2023).

Trần, P. (2021) Business intelligence (BI) LÀ GÌ? Vai trò Của Bi Trong Doanh Nghiệp, TopDev.
Available at: https://topdev.vn/blog/business-intelligence-la-gi/ (Accessed: 23 July 2023).

5 business intelligence tools you need to know (no date a) Coursera. Available at:
https://www.coursera.org/articles/bi-tools (Accessed: 26 July 2023).

Adair, B. (2023) Business intelligence systems: Types of BI tools in 2023, SelectHub raquo. Available at:
https://www.selecthub.com/business-intelligence/key-types-business-intelligence-tools/ (Accessed: 26
July 2023).

Business Intelligence and Analytics Software (no date) Tableau. Available at: https://www.tableau.com/
(Accessed: 26 July 2023).

Biscobing, J. (2020) What is OLAP (online analytical processing)?: Definition from TechTarget, Data
Management. Available at: https://www.techtarget.com/searchdatamanagement/definition/OLAP
(Accessed: 26 July 2023).

Turn your data into immediate impact (no date) Data Visualisation | Microsoft Power BI. Available at:
https://powerbi.microsoft.com/en-au/ (Accessed: 26 July 2023).

sericks007 (no date) Sử dụng power bi - power platform, Power Platform | Microsoft Learn. Available at:
https://learn.microsoft.com/vi-vn/power-platform/admin/use-power-bi (Accessed: 26 July 2023).

59

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy