Lab Manual BI
Lab Manual BI
Acropolis Institute of
Technology and
Research, Indore
Department of CSE
Submitted To: Dr. Mayur Rathi
(Artificial Intelligence & Machine
Learning)
Submitted By:
Harsh Khichi
Enrollment No. : 0827AL201022
Class/Year/Sem : AL_F-1/4th / 8th
The Objective of this laboratory work is to enlighten the student with knowledge base in
Business Intelligence and its applications. Also learn how to extract knowledge from data and
information and to learn how to draw conclusions, predictions and take futuristic actions .
ACROPOLIS INSTITUTE OF TECHNOLOGY & RESEARCH,
INDORE
CERTIFICATE
This is to certify that the experimental work entered in this journal as per
the B. TECH. II year syllabus prescribed by the RGPV was done by Mr.
In this lab, students will be able to learn and develop application using
Business intelligence concepts. Students can expand their skill set by
deriving practical solutions using predictive analytics. More, this lab
provides the understanding of the importance of various algorithms in
Data Science. A business intelligence environment offers decision makers
information and knowledge derived from data processing, through the
application of mathematical models and algorithms. The latest platforms
compilers are provided to the students to run their programs.
GENERAL INSTRUCTIONS FOR LABORATORY CLASSES
➢ DO’S
✓ While entering into the LAB students should wear their ID cards.
✓ Students should sign in the LOGIN REGISTER before entering into the
laboratory.
✓ Students should come with observation and record note book to the laboratory.
✓ After completing the laboratory exercise, make sure to shutdown the system
properly.
➢ DONT’S
Module1: Effective and timely decisions – Data, information and knowledge – Role of
mathematical models – Business intelligence architectures: Cycle of a business
intelligence analysis – Enabling factors in business intelligence projects – Development
of a business intelligence system – Ethics and business intelligence.
Module2: The business intelligence user types, Standard reports, Interactive Analysis
and Ad Hoc Querying, Parameterized Reports and Self-Service Reporting, dimensional
analysis, Alerts/Notifications, Visualization: Charts, Graphs, Widgets, Scorecards and
Dashboards, Geographic Visualization, Integrated Analytics, Considerations:
Optimizing the Presentation for the Right Message.
Module3: Efficiency measures – The CCR model: Definition of target objectives- Peer
groups – Identification of good operating practices; cross efficiency analysis – virtual
inputs and outputs – Other models. Pattern matching – cluster analysis, outlier analysis.
PREREQUISITE:-
Experience with a Python is suggested. Prior knowledge of Data Science, Machine
Learning Algorithms & Foundation of Mathematics is helpful.
➢ Course Objectives
The student should be made to:
➢ Course Outcomes
1. Explain the basic concepts of business intelligence & make effective and timely
decision.
2. Deal with data and information and to convert it in knowledgeable data to be stored in
business intelligent systems to be used by knowledge workers.
3. Measure the efficiency using different models.
4. Apply the learned concepts in different real time applications.
5. Understand the future of business intelligence by emerging technologies.
.
Index
Grade &
Date of Page Date of
S.No Name of the Experiment Sign of the
Exp. No. Submission
Faculty
1 Import the legacy data from different sources such
as (Excel, Sql Server, Oracle etc.) and load in the
target system.
Additional remarks
Tutor
1 Title
Import the legacy data from different sources such as (Excel, Sql Server, Oracle etc.) and
load in the target system.
2 Neatly Drawn and labeled experimental setup
NA
3 Theoretical solution of the instant problem
3.1 Algorithm
1. Identify legacy data sources.
2. Establish connections to each data source (Excel, SQL Server, Oracle).
3. Retrieve data from each source.
4. Transform data if required (e.g., format conversion, cleansing).
5. Load transformed data into the target system .
3.2 Program
import pandas as pd
import pyodbc
import cx_Oracle
Page 10
return pd.DataFrame(data)
Page 11
load_data_into_target(all_data)
4 Tabulation Sheet
INPUT OUTPUT
Excel, SQL Server, Oracle data Data loaded into the target system
5 Results
Legacy data from Excel, SQL Server, and Oracle was successfully imported and loaded into the target
system.
Page 12
Acropolis Institute of Technology and Research, Indore
Department of CSE (Artificial Intelligence & Machine Learning)
Lab: Business Intelligence Title:
(AL801)
EVALUATION RECORD Type/ Lab Session:
Name Harsh Khichi Enrollment No. 0827AL201022
Performing on First submission Second submission
Extra Regular
Additional remarks
Tutor
1 Title
Perform the Extraction Transformation and Loading (ETL) process to construct the database
in the Sql server / Power BI.
2 Neatly Drawn and labeled experimental setup
NA
3 Theoretical solution of the instant problem
3.1 Algorithm
1) Extraction:
• Retrieve data from various sources (e.g., files, databases).
• Use appropriate tools (e.g., SSIS for SQL Server, Power Query for Power BI)
2) Transformation:
• Cleanse and validate data.
• Perform data transformations (e.g., filtering, joining, aggregating).
• Apply business rules and logic.
Page 13
3) Loading:
• Insert transformed data into the target database.
• Ensure data integrity and consistency.
• Handle errors and logging.
3.2 Program
ETL Process in SQL Server
Step 1 − Open either BIDS\SSDT based on the version from the Microsoft SQL Server programs
group. The following screen appears.
Step 2 − The above screen shows SSDT has opened. Go to file at the top left corner in the above
image and click New. Select project and the following screen opens.
Page 14
Step 3 − Select Integration Services under Business Intelligence on the top left corner in the above
screen to get the following screen.
Step 4 − In the above screen, select either Integration Services Project or Integration Services Import
Project Wizard based on your requirement to develop\create the package.There are two modes −
Native Mode (SQL Server Mode) and Share Point Mode. Models There are two models − Tabular
Model (For Team and Personal Analysis) and Multi Dimensions Model (For Corporate Analysis)
Page 15
- Power BI will load the transformed data into its internal data model, which you can then use to create
visualizations and reports.
4 Tabulation Sheet
INPUT OUTPUT
NA NA
5 Results
• Document the successful execution of the ETL process.
• Include any issues encountered and their resolutions.
• Provide insights gained from the constructed database in SQL Server/Power BI.
Page 16
Acropolis Institute of Technology and Research, Indore
Department of CSE (Artificial Intelligence & Machine Learning)
Lab: Business Intelligence Title:
(AL801)
EVALUATION RECORD Type/ Lab Session:
Name Harsh Khichi Enrollment No. 0827AL201022
Performing on First submission Second submission
Extra Regular
Additional remarks
Tutor
1 Title
Data Visualization from ETL Process
2 Neatly Drawn and labeled experimental setup
NA
3 Theoretical solution of the instant problem
3.1 Algorithm
1. Create Charts
• Drag UnitsInStock onto the canvas to create a Table visualization.
• Set ProductName as the axis and sort the table by UnitsInStock
• Drag OrderDate onto the canvas, then drag LineTotal to create a Line Chart.
• Drag ShipCountry onto the canvas to create a map.
• Set LineTotal as the values for the map.
2. Interact with Visual
• Click on the light blue circle in point to filter visuals for point's data
Page 17
3.2 Program
Step 1: Create charts showing Units in Stock by Product and Total Sales by Year
• Drag UnitsInStock from the Field pane (the Fields pane is along the right of the screen)
onto a blank space on the canvas. A Table visualization is created. Next, drag ProductName
to the Axis box, found in the bottom half of the Visualizations pane. Then we then select
Sort By > UnitsInStock using the skittles in the top right corer of the visualization.
.
• Drag OrderDate to the canvas beneath the first chart, then drag LineTotal (again, from the
Fields pane) onto the visual, then select Line Chart. The following visualization is created.
• Next, drag ShipCountry to a space on the canvas in the top right. Because you selected a
geographic field, a map was created automatically. Now drag LineTotal to the Values field;
the circles on the map for each country are now relative in size to the LineTotal for orders
shipped to that country.
Page 18
Step 2: Interact with your report visuals to analyze further
• Click on the light blue circle centered in Canada. Note how the other visuals are filtered to
show Stock (ShipCountry) and Total Orders (LineTotal) just for Canada.
Page 19
4 Tabulation Sheet
INPUT OUTPUT
NA NA
5 Results
Present the results of your ETL process. This could include visualizations generated from the transformed
data. Describe the insights gained from the visualization and how it helps in understanding the data better.
Page 20
Acropolis Institute of Technology and Research, Indore
Department of CSE (Artificial Intelligence & Machine Learning)
Lab: Business Intelligence Title:
(AL801)
EVALUATION RECORD Type/ Lab Session:
Name Harsh Khichi Enrollment No. 0827AL201022
Performing on First submission Second submission
Extra Regular
Additional remarks
Tutor
1 Title
Apply the what – if Analysis for data visualization. Design and generate necessary reports
based on the data warehouse data.
2 Neatly Drawn and labeled experimental setup
NA
3 Theoretical solution of the instant problem
A book store and have 100 books in storage. You sell a certain % for the highest price of $50 and a
certain % for the lower price of $20. If you sell 60% for the highest price, cell D10 calculates a total profit
of 60 * $50 + 40 * $20 = $3800. Create Different Scenarios But what if you sell 70% for the highest
price? And what if you sell 80% for the highest price? Or 90%, or even 100%? Each different percentage
is a different scenario. You can use the Scenario Manager to create these scenarios. Note: You can simply
type in a different percentage into cell C4 to see the corresponding result of a scenario in cell D10.
However, what-if analysis enables you to easily compare the results of different scenarios.
Page 21
3.1 Algorithm
• Open Excel and load the data table.
• Go to Data tab and click What-If Analysis.
• Select Scenario Manager.
• Add scenarios by naming and assigning values.
• Verify scenarios and Apply scenarios to data.
• Analyze and save updated data.
• Close Excel.
3.2 Program
Step 1 - On the Data tab, in the Forecast group, click What-If Analysis.
Page 22
Step 4 - Type a name (60% highest), select cell C4 (% sold for the highest price) for the Changing
cells and click on OK.
Step 6 - Next, add 4 other scenarios (70%, 80%, 90% and 100%). Finally, your Scenario Manager
should be consistent with the picture below:
Page 23
4 Tabulation Sheet
INPUT OUTPUT
60 3800
70 4100
80 4400
5 Results
Discuss the findings and insights gained from the analysis and Interpret the results and their
implications for decision-making or further analysis.
Page 24
Acropolis Institute of Technology and Research, Indore
Department of CSE (Artificial Intelligence & Machine Learning)
Lab: Business Intelligence Title:
(AL801)
EVALUATION RECORD Type/ Lab Session:
Name Harsh Khichi Enrollment No. 0827AL201022
Performing on First submission Second submission
Extra Regular
Additional remarks
Tutor
1 Title
Implementation of Classification Algorithm in R Programming
2 Neatly Drawn and labeled experimental setup
NA
3 Theoretical solution of the instant problem
3.1 Algorithm
• Start:
• Define the rainfall data points for each month starting from January 2012.
• Create a time series object using the defined rainfall data points.
• Set the start date of the time series to January 2012 and frequency to monthly (12
months).
• Print the time series data to display the rainfall values for each month.
• Open a file to save the plot chart as an image (e.g., PNG format).
• Plot a graph of the time series, with months on the x-axis and rainfall values on the y-
Page 25
axis.
• Save the plotted graph as an image file.
• Close the file.
• End.
3.2 Program
Consider the annual rainfall details at a place starting from January 2012. We create an R time series
object for a period of 12 months and plot it.
# Get the data points in form of a R vector.
rainfall <-
c(799,1174.8,865.1,1334.6,635.4,918.5,685.5,998.6,784.2,985,882.8,1071)
# Convert it to a time series object.
rainfall.timeseries <-
ts(rainfall,start = c(2012,1),frequency = 12)
# Print the timeseries data.
print(rainfall.timeseries)
# Give the chart file a name.
png(file = "rainfall.png")
# Plot a graph of the time series.
plot(rainfall.timeseries)
# Save the file.
dev.off()
4 Tabulation Sheet
INPUT OUTPUT
799,1174.8,865.1,1334.6,635.4,918.5 Jan Feb Mar Apr May Jun Jul Aug Sep
,685.5,998.6,784.2,985,882.8,107 2012 799.0 1174.8 865.1 1334.6 635.4
918.5 685.5 998.6 784.2 Oct Nov Dec
2012 985.0 882.8 1071.0
Page 26
5 Results
The provided algorithm allows us to visualize the annual rainfall data starting from January 2012. we
obtain a time series plot depicting the variation in rainfall over the 12-month period.
Page 27
Acropolis Institute of Technology and Research, Indore
Department of CSE (Artificial Intelligence & Machine Learning)
Lab: Business Intelligence Title:
(AL801)
EVALUATION RECORD Type/ Lab Session:
Name Harsh Khichi Enrollment No. 0827AL201022
Performing on First submission Second submission
Extra Regular
Additional remarks
Tutor
1 Title
Practical Implementation of Decision Tree using R Tool
2 Neatly Drawn and labeled experimental setup
NA
3 Theoretical solution of the instant problem
3.1 Algorithm
Using ctree() Function from the party Package
1. Load the Required Packages:
• Load the "party" package, which contains the ctree() function.
2. Prepare Input Data:
• Load or create the dataset containing variables like "nativeSpeaker", "age",
"shoeSize", and "score".
• Subset the dataset if needed.
3. Create the Decision Tree:
• Use the ctree() function:
Page 28
• Specify the formula: nativeSpeaker ~ age + shoeSize + score.
• Provide the input data using the 'data' parameter.
4. Plot the Tree:
• Generate a graphical representation of the decision tree using the plot() function.
• Optionally, save the tree visualization as an image file.
3.2 Program
Input Data
We will use the R in-built data set named readingSkills to create a decision tree. It describes the
score of someone's readingSkills if we know the variables "age","shoesize","score" and whether the
person is a native speaker or not.
We will use the ctree() function to create the decision tree and see its graph.
# Load the party package. It will automatically load other
# dependent packages.
library(party)
# Create the input data frame.
input.dat <- readingSkills[c(1:105),]
# Give the chart file a name.
Page 29
png(file = "decision_tree.png")
# Create the tree.
output.tree <- ctree( nativeSpeaker ~ age + shoeSize + score, data = input.dat)
# Plot the tree.
plot(output.tree)
# Save the file.
dev.off()
Output:-
null device 1
Loading required package: methods
Loading required package: grid
Loading required package: mvtnorm
Loading required package: modeltools
Loading required package: stats4
Loading required package: strucchange
Loading required package: zoo
Attaching package: ‘zoo’
The following objects are masked from ‘package:base’: as.Date, as.Date.numeric
Loading required package: sandwich
4 Tabulation Sheet
INPUT OUTPUT
NA NA
Page 30
5 Results
Upon executing the algorithm to create and visualize the decision tree using the ctree()
function from the party package in R, we obtained the following outcome:
Page 31
Acropolis Institute of Technology and Research, Indore
Department of CSE (Artificial Intelligence & Machine Learning)
Lab: Business Intelligence Title:
(AL801)
EVALUATION RECORD Type/ Lab Session:
Name Harsh Khichi Enrollment No. 0827AL201022
Performing on First submission Second submission
Extra Regular
Additional remarks
Tutor
1 Title
K-Means Clustering Using R
2 Neatly Drawn and labeled experimental setup
NA
3 Theoretical solution of the instant problem
3.1 Algorithm
1. Initialize centroids:
- Randomly select k data points from the dataset as initial centroids.
Page 32
c. Check convergence:
- If centroids do not change significantly or a maximum number of iterations is reached,
exit.
3. Output:
- Return the final cluster centroids and cluster assignments.
3.2 Program
Page 33
4 Tabulation Sheet
INPUT OUTPUT
NA NA
5 Results
Certainly! After running the k-means clustering algorithm using the provided dataset and number of
clusters, you will obtain the following results:
• Cluster centroids: These are the final centroids of each cluster, representing the center points
around which data points in each cluster are grouped.
• Cluster assignments: Each data point is assigned to one of the clusters based on its proximity to the
centroids. The cluster assignments indicate which cluster each data point belongs to.
These results help in understanding how the data points are grouped into clusters and the central
tendencies of each cluster represented by their centroids.
Page 34
Acropolis Institute of Technology and Research, Indore
Department of CSE (Artificial Intelligence & Machine Learning)
Lab: Business Intelligence Title:
(AL801)
EVALUATION RECORD Type/ Lab Session:
Name Harsh Khichi Enrollment No. 0827AL201022
Performing on First submission Second submission
Extra Regular
Additional remarks
Tutor
1 Title
Prediction Using Linear Regression
2 Neatly Drawn and labeled experimental setup
NA
3 Theoretical solution of the instant problem
3.1 Algorithm
1. Import necessary libraries: NumPy, Pandas, Matplotlib, and Scikit-learn.
2. Load the dataset into a Pandas DataFrame.
3. Preprocess the data: handle missing values, encode categorical variables
4. if any, split the data into training and testing sets.
5. Create a linear regression model object.
6. Train the model using the training dataset.
7. Make predictions on the testing dataset.
8. Evaluate the model's performance using appropriate metrics such as Mean Squared
Page 35
Error (MSE) or R-squared.
9. Plot the actual vs. predicted values to visualize the model's performance.
3.2 Program
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# Example dataset
data = {'X': [1, 2, 3, 4, 5],
'Y': [2, 4, 5, 4, 5]}
df = pd.DataFrame(data)
# Splitting the dataset into independent (X) and dependent (Y) variables
X = df[['X']]
Y = df['Y']
# Making predictions
Y_pred = model.predict(X_test)
Page 36
# Evaluating the model
mse = mean_squared_error(Y_test, Y_pred)
print("Mean Squared Error:", mse)
Page 37
4 Tabulation Sheet
INPUT OUTPUT
X Y
1 2
2 4
3 5
4 4
5 5
5 Results
• Mean Squared Error: [calculated value]
• The graph shows the relationship between the independent variable (X) and the dependent variable (Y),
along with the regression line indicating the model's predictions.
Page 38
Acropolis Institute of Technology and Research, Indore
Department of CSE (Artificial Intelligence & Machine Learning)
Lab: Business Intelligence Title:
(AL801)
EVALUATION RECORD Type/ Lab Session:
Name Harsh Khichi Enrollment No. 0827AL201022
Performing on First submission Second submission
Extra Regular
Additional remarks
Tutor
1 Title
Data Analysis using Time Series Analysis
2 Neatly Drawn and labeled experimental setup
NA
3 Theoretical solution of the instant problem
3.1 Algorithm
Steps:
1. Start
2. If the 'data' parameter is not provided, return an error message indicating missing data.
3. If the 'start' parameter is not provided, assume the start time as 1.
4. If the 'end' parameter is not provided, assume the end time as the length of the data.
5. If the 'frequency' parameter is not provided, assume the default frequency as 1.
6. Create the time series object using the ts() function with the provided parameters.
7. Store the created time series object in the variable 'timeseries.object.name'.
Page 39
8. Return the 'timeseries.object.name'.
9. End
3.2 Program
Consider the annual rainfall details at a place starting from January 2012. We create an R time series
object for a period of 12 months and plot it.
# Get the data points in form of a R vector.
rainfall <-
c(799,1174.8,865.1,1334.6,635.4,918.5,685.5,998.6,784.2,985,882.8,1071)
# Convert it to a time series object.
rainfall.timeseries <-
ts(rainfall,start = c(2012,1),frequency = 12)
# Print the timeseries data.
print(rainfall.timeseries)
# Give the chart file a name.
png(file = "rainfall.png")
# Plot a graph of the time series.
plot(rainfall.timeseries)
# Save the file.
dev.off()
4 Tabulation Sheet
INPUT OUTPUT
799,1174.8,865.1,1334.6,635.4,918.5 Jan Feb Mar Apr May Jun Jul Aug Sep
,685.5,998.6,784.2,985,882.8,107 2012 799.0 1174.8 865.1 1334.6 635.4
918.5 685.5 998.6 784.2 Oct Nov Dec
2012 985.0 882.8 1071.0
Page 40
5 Results
The provided algorithm allows us to visualize the annual rainfall data starting from January 2012. we
obtain a time series plot depicting the variation in rainfall over the 12-month period.
Page 41
Acropolis Institute of Technology and Research, Indore
Department of CSE (Artificial Intelligence & Machine Learning)
Lab: Business Intelligence Title:
(AL801)
EVALUATION RECORD Type/ Lab Session:
Name Harsh Khichi Enrollment No. 0827AL201022
Performing on First submission Second submission
Extra Regular
Additional remarks
Tutor
1 Title
Data Modelling and Analytics with Pivot Table in Excel.
2 Neatly Drawn and labeled experimental setup
NA
3 Theoretical solution of the instant problem
3.1 Algorithm
• Identify the dataset: Gather the data you want to analyze using a pivot table.
• Open Excel: Launch Microsoft Excel on your computer.
• Insert Pivot Table: Go to the "Insert" tab and click on "PivotTable".
• Select Data Range: Choose the range of data you want to analyze.
• Design Pivot Table: Drag and drop fields into the Rows, Columns, and Values areas
to design your pivot table.
• Customize Pivot Table: Apply filters, sort data, and format as needed.
• Analyze Data: Use the pivot table to summarize and analyze your data effectively.
Page 42
3.2 Program
A Data Model is created automatically when you import two or more tables simultaneously from a
database. The existing database relationships between those tables is used to create the Data Model
in Excel.
Step 1 − Open a new blank Workbook in Excel.
Step 2 − Click on the DATA tab.
Step 3 − In the Get External Data group, click on the option From Access. The Select Data Source
dialog box opens.
Step 4 − Select Events.accdb, Events Access Database file.
Step 5 − The Select Table window, displaying all the tables found in the database, appears.
Page 43
Step 6 − Tables in a database are similar to the tables in Excel. Check the ‘Enable selection of
multiple tables’ box, and select all the tables. Then click OK.
Step 7 − The Import Data window appears. Select the PivotTable Report option. This option imports
the tables into Excel and prepares a PivotTable for analyzing the imported tables. Notice that the
checkbox at the bottom of the window - ‘Add this data to the Data Model’ is selected and disabled.
Page 44
Step 8 − The data is imported, and a PivotTable is created using the imported tables.
Page 45
Step 6 − Click the dropdown list button to the right of the Column labels.
Step 7 − Select Value Filters and then select Greater Than…
Step 8 − Click OK.
Page 46
The PivotTable displays only those regions, which has more than total 80 medals.
4 Tabulation Sheet
INPUT OUTPUT
NA NA
5 Results
• Present the analyzed data with key insights and findings.
• Use charts or graphs to visualize the data if necessary
Page 47