0% found this document useful (0 votes)
4 views35 pages

Revision

The document explains how to construct and interpret histograms, focusing on the distribution of quantitative data and key statistical concepts such as normal distribution, skewness, and kurtosis. It provides detailed instructions for creating histograms and analyzing data using Excel, including calculating Z-scores and performing Chi-square tests. Additionally, it discusses the relationship between GDP growth and FDI, highlighting potential correlations and outliers based on scatter plot analysis.

Uploaded by

Garima
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views35 pages

Revision

The document explains how to construct and interpret histograms, focusing on the distribution of quantitative data and key statistical concepts such as normal distribution, skewness, and kurtosis. It provides detailed instructions for creating histograms and analyzing data using Excel, including calculating Z-scores and performing Chi-square tests. Additionally, it discusses the relationship between GDP growth and FDI, highlighting potential correlations and outliers based on scatter plot analysis.

Uploaded by

Garima
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

A histogram is a visual representation of the distribution of quantitative data.

●​ To construct a histogram, the first step is


-​ to "bin" (or "bucket") the range of values— divide the entire range of
values into a series of intervals
-​ then count how many values fall into each interval.
●​ A histogram is the most commonly used graph to show frequency
distributions.

TYPICAL HISTOGRAM SHAPES AND WHAT THEY MEAN


●​ Normal Distribution
Normal distribution, also known as the Gaussian distribution, is a probability
distribution that is symmetric about the mean, showing that data near the mean are
more frequent in occurrence than data far from the mean. The normal distribution
appears as a "bell curve" when graphed.

Key Takeaways
●​ The normal distribution is the proper term for a probability bell curve.

●​ Normal distributions are symmetrical, but not all symmetrical distributions are
normal.

●​ Note that other distributions look similar to the normal distribution. Statistical
calculations must be used to prove a normal distribution.

●​
●​ Skewness
●​ Skewness is a measure of asymmetry or distortion of symmetric
distribution. It measures the deviation of the given distribution of
a random variable from a symmetric distribution, such as normal
distribution. A normal distribution is without any skewness, as it
is symmetrical on both sides. Hence, a curve is regarded as
skewed if it is shifted towards the right or the left.

1. Positive Skewness
If the given distribution is shifted to the left and with its tail on the right side, it is a
positively skewed distribution. It is also called the right-skewed distribution.

2. Negative Skewness
If the given distribution is shifted to the right and with its tail on the left side, it is a
negatively skewed distribution. It is also called a left-skewed distribution

●​ Kurtosis
Kurtosis is a statistical measure used to describe a characteristic of a dataset. When
normally distributed data is plotted on a graph, it generally takes the form of a bell.
This is called the bell curve. The plotted data that are farthest from the mean of the
data usually form the tails on each side of the curve. Kurtosis indicates how much
data resides in the tails.
Types of kurtosis​
1. Lepto-kurtic: It is a curve having a higher peak than normal curve. Too much
concentration of the items near the center.​
2. Platy-kurtic:​
A curve having a lower peak (flatter) than the normal curve. There is less
concentration of items near the center.​
3. Meso-Kurtic: It is a curve having a normal peak or normal curve. There is equal
distribution around the center value (mean). In such case mean, median, and mode
are equal.
Double-Peaked or Bimodal
The bimodal distribution looks like the back of a two-humped camel. The outcomes
of two processes with different distributions are combined in one set of data.

●​ To construct a histogram, the first step is


-​ to "bin" (or "bucket") the range of values— divide the entire range of
values into a series of intervals
-​ then count how many values fall into each interval.
steps

1.​ Count total number of data


2.​ Maximum value of group data
3.​ Minimum value of group data
4.​ Range (Max. value – Min.value)/total no. of data
5.​ Number of class interval = 1+3.322 Log (total no. of data)base 10 (sturge’s
formula
6.​ Interval width = Range/no of class interval

Excel

No. of class interval

1.​ Min.value
2.​ = min.value +$interval width

Frequency

1.​ = frequency(data_array, bins_array)


2.​ Short select (control+shift+enter)

Plot histogram

Select frequency – insert – 2D plot

Select table – right click - format data series – gap width (0)

Colour – border – solid line – black

Data arrange – right click -select data – edit – data put on (class of interval)

add – data labels

add – axis titles

NORM.DIST(B2,$Mean,$STD V,FALSE)

Chi Square Normality test


1. You have the compute the Minimum, Maximum, Standard Deviation,
Mean, Bin Interval, Observed Frequency (O), Normal Distribution:
P(x<=b), Cumulative Probability/Normal Distribution: P(a<x<=b),
Expected Frequency (E) and Chi Square of the input data (given
dataset)
i). MIN. =MIN(A2:A215)
ii). Max. =MAX(A2:A215)
iii). Standard Deviation =STDEV.S(A2:A215)
iv). Mean =AVERAGE(A2:A215)
v) Bin = First Box (Put the Minimum Value) i.e. type = B2
In the second box of ‘Bin’ type =B2+5 and press enter and drag down till above
the ‘Max.’ value, i.e., 97.9356 (as per this excel dataset)

vi). Observed Frequency (O): Frequency(Data_array, Bins_array)


=FREQUENCY(A2:A215,F2:F22)
Press Enter and drag down till the last ‘Bin value”
Then, sum the Observed Frequency

vii) Normal Distribution : P(x<=b) = NORM. DIST (x, Mean, Standard


Deviation, TRUE),
Note- type =Norm.Dist(x-click the first value of ‘Bin’, mean- click mean value
and press F4, Standard Deviation – click Standard Deviation value and press F4,
cumulative – click true)
F4 means fixed the particular value
=NORM.DIST(F2,$E$2,$D$2,TRUE)
Press Enter and drag down till the last ‘Bin value”

Viii). Cumulative Probability/Normal Distribution: P(a<x<=b)


First Box
=H2 (the first value of Normal Distribution)
Press Enter
Second Box
= H3 - H2 or H3-I2
ix). Expected Frequency (E) - Multiplying the value of Cumulative
Probability/Normal Distribution: P(a<x<=b)value with the total Sum of
Observed Frequency (O), which is 214 in this case.
=I2*$G$23 (fixed the total Sum of Observed Frequency)
Press Enter and drag down till the last ‘Bin value”
Then, sum the Expected Frequency (E)

x). Chi Square


X = (Observed Frequency – Expected Frequency)
2 2

Expected Frequency

=(G2-J2)^2/J2
Press Enter and drag down till the last ‘Bin value”
Then, sum the Chi Square (x )2

2. Compute the value of Degree of freedom, Critical Chi Square (x ) and


2

p-value using following excel formula

i). Degree of freedom: the number of Bin intervals less 1


=COUNT(F2:F22)-1

ii). Critical Chi Square (x ): CHIINV (Probability, Deg_freedom)


2

=CHIINV(0.05,K24)

iii). p-values : ChIDIST (x, Deg_freedom)


=CHIDIST(K23,K24)

Chart line
Q. The time series of Market capitalization (% of GDP) of India is given for
the years 2000 through 2022 in the table
Using Visualization tools in Excel, prepare a line chart to (a) Compute the
Trend values (b) Depict the Trend values as a secondary line in the Line
Chart; and (c) Interpret the visual presentation in Brief.

Instruction
1. Compute Trend Values
• Enter = TREND(A2:B24) followed by the Ctrl+Shift+Enter key
combination to get the Trend Value.
• Select C2:C24, then keep the Cursor in the end of the Formula
bar, Ctrl+Shift and press Enter, to get all the Trend values

2. Insert the Line Chart


• Select A1 to C24
• Navigate to the Insert tab, the click on Recommended Charts>
All Chart
• Click on Insert Line
• Select Line Chart with two lines
3. Customize the Line Chart
Add a chart title: Market Capitalization (% GDP) of India Across the
Years 2000-2022
Add Axis Titles: X-axis as Years, Y-Axis as Market Capitalization
4. Interpret the visual Presentation (in Brief)
Interpretation sample: The chart shows India's market capitalization as
a percentage of GDP from 2000 to 2022, highlighting volatility with a
peak in 2007 and fluctuations during economic crises. The trend line
indicates steady long-term growth. Post-2020, actual market
capitalization stays above the trend, reflecting stronger market
performance relative to GDP.
1. Calculate the Z-scores for the dataset in Excel, you can follow these
steps:
1. Calculate the Mean (Average):
o Use the AVERAGE function to calculate the mean of the dataset.

=AVERAGE(Data_Range)
2. Calculate the Standard Deviation:
o Use the STDEV.P function to calculate the standard deviation of

the dataset.
=STDEV.S (Data_range)
3. Calculate the Z-Score for Each Data Point:
= (x-mean)/Standard Deviation
(Note- fixed Mean, and Standard Deviation)
• Darg this formula across all cell Range to calculate the Z-scores for
the entire dataset
2. Steps to Apply Conditional Formatting for Outliers:
1. Select the Z-Score Range:
o Go to the Home tab in the Excel ribbon.

o Click on Conditional Formatting> New Rule> Use a formula to

determine which cells to format


o In “Edit the Rule Description” – Format values where this formula

is true:
Type​
=OR(L1>3, L1<-3)
(note: L1 is the first value of Z-score)
2. Set the Formatting Style:
o Click the Format button to choose how you want to highlight the

outliers (e.g., fill color) and Click OK


3. Apply the Rule:
o Click OK to apply the rule.
Histogram and Normal Distribution Instruction
I). Histogram
1. Find the Minimum value of the given dataset
Minimum = MIN(Data Range)
2. Find the Maximum value of the given dataset
Maximum = MAX(Data Range)
3. Create a Bin Interval of the dataset
Bin = First Box (Put the Minimum Value) i.e. type = C2
In the second box of ‘Bin’ type =C2+1 and press enter and drag down till the
‘Max.’ value, or just above Maximum value (as per this excel dataset)
4. Create a Frequency Distribution
=Frequency(Data_array, Bins_array)
=FREQUENCY(B2:B215,E2:E22)
Press Ctrl + Shift + Enter (not just Enter) to input it as an array formula.
For second Box, Select F2:F22,

In the excel formula bar, Keep the mouse cursor at the end of the
function, press the key combination Ctrl+Shift+Enter to enter the
function as an array function.
5. Create the Histogram Chart
Select the Bins column and the Frequency column
Go to Insert > Charts> Column Chart > Clustered Column

Format the chart:


• Add axis titles, chart title
• Name the vertical axis as Frequency and Horizontal axis as Bin

Interval
• Chart title as GDP growth (2000 – 2023)

II). Normal Distribution


1. Find the Mean value
=Average(data_range)
2. Find the Standard Deviation value
3. Normal Distribution : P(x<=b) = NORM. DIST (x, Mean,
Standard Deviation, TRUE),
Note- type =Norm.Dist(x-click the first value of ‘data range’, mean-
click mean value and press F4, Standard Deviation – click Standard
Deviation value and press F4, cumulative – click true)
F4 means fixed the particular value
=NORM.DIST(B2,mean,STDEV,FALSE)
Press Enter
4. Create the Histogram Chart
Select the GDP column and the Normal Distributioncolumn
Go to Insert > Charts> X Y (Scatter) > Scatter>Click OK

Format the chart:


• chart title and name as GDP Growth (2000 – 2023)
Interpretation of relationship between GDP per capita growth and FDI
The relationship between GDP growth per capita and FDI to GDP based on
the scatter plot created from the provided Excel data, we need to analyze the
two variables:
1. FDI_INFLOW (2000-23): This represents the Foreign Direct
Investment (FDI) as a percentage of GDP.
2. AVG_GDP (2000-23): This represents the average GDP growth per
capita over the same period.

Steps to Interpret the Scatter Plot:


1. Plot the Data:
o On the x-axis, plot FDI_INFLOW (2000-23).

o On the y-axis, plot AVG_GDP (2000-23).

o Each point on the scatter plot represents a country.

Possible Interpretations:
1. Positive Correlation:
• Pattern: Data points trend upward from left to right.
• Interpretation: Countries with higher FDI inflows (as a percentage of

GDP) tend to have higher GDP growth per capita.


• Example: If countries like Ireland (high FDI and high GDP growth) and

Singapore (high FDI and high GDP growth) are clustered in the
top-right corner, this suggests a positive relationship.
2. Negative Correlation:
• Pattern: Data points trend downward from left to right.
• Interpretation: Countries with higher FDI inflows tend to have lower

GDP growth per capita.


• Example: If countries like Kuwait (low GDP growth despite moderate
FDI) or United Arab Emirates (negative GDP growth) are in the
bottom-right corner, this could indicate a negative relationship.
3. No Correlation:
• Pattern: Data points are scattered randomly with no clear trend.
• Interpretation: There is no apparent relationship between FDI inflows

and GDP growth per capita.


• Example: If countries like Japan (low FDI, moderate GDP growth) and

Brazil (moderate FDI, moderate GDP growth) are spread randomly,


this suggests no correlation.
4. Outliers:
• Pattern: Data points that fall far outside the general trend.
• Interpretation: Outliers may indicate unique cases or errors in data.

• Example: Hong Kong SAR, China (extremely high FDI but moderate

GDP growth) or United Arab Emirates (negative GDP growth despite


moderate FDI) could be outliers.

Key Observations from the Data:


1. High FDI, High GDP Growth:
o Countries like Ireland (FDI: 17.27%, GDP growth: 4.58%) and

Singapore (FDI: 23.72%, GDP growth: 3.85%) show a strong


positive relationship between FDI and GDP growth.
2. Moderate FDI, Moderate GDP Growth:
o Countries like Poland (FDI: 3.77%, GDP growth: 4.08%) and

Hungary (FDI: 13.51%, GDP growth: 3.03%) also show a positive


trend.
3. Low FDI, Low GDP Growth:
o Countries like Japan (FDI: 0.41%, GDP growth: 1.25%) and

Greece (FDI: 1.39%, GDP growth: 0.80%) show lower FDI and
lower GDP growth.
4. Outliers:
o Hong Kong SAR, China (FDI: 31.05%, GDP growth: 2.64%) has

extremely high FDI but moderate GDP growth, which may


indicate diminishing returns or other factors at play.
o United Arab Emirates (FDI: 3.32%, GDP growth: -0.04%) has
negative GDP growth despite moderate FDI, which could be due
to external economic factors.

Conclusion:
• The scatter plot likely shows a weak to moderate positive correlation
between FDI inflows and GDP growth per capita. Countries with
higher FDI tend to have higher GDP growth, but there are exceptions
(outliers) where other factors may influence GDP growth.
• Further analysis (e.g., regression analysis) could quantify the strength of

this relationship and identify other influencing factors.

BOXPLOT

Step 1: Select the Data


• Highlight the entire data range, including the header (i.e., cells Range).

Step 2: Insert the BoxPlot


• Go to the Insert tab in the ribbon.

• In the Charts group, look for the Insert Statistic Chart option (this icon looks like a box with
whiskers).

• Click the Insert Statistic Chart dropdown and choose Box and Whisker.

Step 3: Format the BoxPlot


• Once the BoxPlot is inserted, you can format it by:

• Clicking on the chart to open the Chart Tools menu.

• Using the Chart Elements button (the "+" sign) to toggle data labels, gridlines and Legend
Calculate Quartiles (Q1, Q3):
•Use the QUARTILE.INC function to calculate the first quartile (Q1)
and the third quartile (Q3).
o Q1 (25th percentile):

▪ QUARTILE.INC(Data_Range,1)

o Q3 (75th percentile):

▪ QUARTILE.INC(Data_Range,3)

2. Calculate the Interquartile Range (IQR):


• Subtract Q1 from Q3 to get the IQR:
IQR = Q3 - Q1
3. Determine the Lower and Upper Bounds for Outliers:
• Lower Bound = Q1 - 1.5 * IQR
• Upper Bound = Q3 + 1.5 * IQR
• Use Excel formulas to calculate these:
o Lower Bound:

= Q1 - 1.5 * IQR
oUpper Bound:
= Q3 - 1.5 * IQR

Use Conditional Formatting to Highlight Outliers


• Select the data range
• Go to Home > Conditional Formatting > New Rule.

• Choose "Use a formula to determine which cells to format".

• Enter the formula:

=OR(A1<Lower bound, A1>upper bound)


• Set the formatting (e.g., fill color) to highlight outliers.
• Click Ok.

data <- c(10, 20, 30, 40, 50)


range (data)
var (data)
sd(data)
IQR (data)
data (iris)
head (iris)
tail(iris)
dim(iris)
attributes (iris)
typeof (t1)

x <- c(1, 3, 5, 10)


y <- c(2, 4, 6, 20)

cov (x, y)
cov(x, y, method = "pearson")
cov(x, y, method = "kendal")
cov(x, y, method = "spearman")
cor (x, y)
cor (x, y, method = "pearson")
cor (x, y, method = "kendal")
cor (x, y, method = "spearman")

x <- c(1, 3, 5, 10)


y <- c(2, 4, 6, 20)

r_squared <- cor (x, y)^2


r_squared
install.packages(c("tm","tidytext","dplyr","ggplot2","textdata","stringr","wordc
loud"))
library(tm)
library(tidytext)
library(dplyr)
install.packages("textdata")
#creating a wordcloud
text_data <- readLines("C:/Users/Student/Downloads/feedback.txt")
corpus <- Corpus(VectorSource(text_data))
corpus <- tm_map(corpus, content_transformer(tolower))
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, removeNumbers)
corpus <- tm_map(corpus, removeWords, stopwords("en"))
tdm <- TermDocumentMatrix(corpus)
matrix <- as.matrix(tdm)
word_freqs <- sort(rowSums(matrix), decreasing = TRUE)
word_freqs_df <- data.frame(word = names(word_freqs), freq= word_freqs)
wordcloud(words = word_freqs_df$word,
freq = word_freqs_df$freq,
min.freq = 2,
max.words = 100,
random.order = FALSE,
colors = brewer.pal(8, "Dark2"))

#Text mining, Categorization, Sentiment Analysis

#sentiment analysis - install Tidytext, Tidyverse


install.packages("tidyverse")
text_data <- data.frame(
id = c(1, 2, 3),
text = c("I love R programming!",
"The weather today is terrible.",
"I'm feeling okay about my progress.")
)
text_data <- text_data %>%
unnest_tokens(word, text)
sentiments <- get_sentiments("bing")
text_with_sentiments <- text_data %>%
inner_join(sentiments, by = "word")
sentiment_counts <- text_with_sentiments %>%
count(sentiment)
ggplot(sentiment_counts, aes(x = sentiment, y = n, fill = sentiment)) +
geom_bar(stat = "identity") +
theme_minimal() +
labs(title = "Sentiment Analysis", x = "Sentiment", y = "Count")

#text Mining

text_data <- readLines("C:/Users/nimmy/Downloads/feedback.txt")


text_data <- as.data.frame(text_data, stringsAsFactors = FALSE)

corpus <- Corpus(VectorSource(text_data))


corpus <- tm_map(corpus, content_transformer(tolower)) # Convert to
lowercase
corpus <- tm_map(corpus, removePunctuation) # Remove
punctuation
corpus <- tm_map(corpus, removeNumbers) # Remove
numbers
corpus <- tm_map(corpus, removeWords, stopwords("en")) # Remove
stop words
corpus <- tm_map(corpus, stripWhitespace) # Remove extra
whitespaces

text_data <- data.frame(text = sapply(corpus, as.character),


stringsAsFactors = FALSE)
tokens <- text_data %>% unnest_tokens(word, text) # Requires tidytext
word_counts <- tokens %>%
count(word, sort = TRUE)
wordcloud(words = word_counts$word,
freq = word_counts$n,
min.freq = 2,
max.words = 100,
random.order = FALSE,
colors = brewer.pal(8, "Dark2"))

Power Query in Excel

1. Access Power Query


Power Query is available in Excel under the Data tab.
Steps:
• Open Microsoft Excel
• Go to the Data tab

• Look for the Get & Transform Data group

• Click on Get Data to access Power Query

2. Connect to a Data Source


You can import data from various sources like Excel files, databases, web
pages, and more.
Steps:
• Click Get Data
• Choose a data source (From Excel Workbook, From Text/CSV, From

Database, From Web, etc.)


• Browse and select your data file

• Click Import

3. Load Data into Power Query Editor


Once connected, the data appears in the Power Query Editor.
Steps:
• The Navigator window appears (for some sources, e.g., Excel or
databases)
• Select the specific Table or Sheet you want to import

• Click Transform Data to open it in Power Query Editor

4. Transform Data
Modify the data as needed using Power Query's transformation tools.
Common transformations:
Remove Columns/Rows: Right-click a column/row > Remove

Filter Data: Click the filter icon on column headers


Change Data Type: Click the column > Choose a data type from the

dropdown
• Split Columns: Select a column > Go to Transform > Click Split
Column
• Merge Columns: Select multiple columns > Click Merge Columns
• Replace Values: Right-click a column > Replace Values
• Pivot/Unpivot Data: Use the Transform tab
(Every action is recorded as a step, which you can modify later.)

5. Combine Data
You can merge multiple tables or append data from different sources.
Steps:
• Merge Queries:
oGo to Home > Click Merge Queries
Select the tables and matching columns
o

o Choose Join Type (e.g., Left Join, Inner Join)

• Append Queries:

o Go to Home > Click Append Queries

o Select multiple tables to stack data

6. Load Data to Excel


Once the data is cleaned and transformed, you can load it back into Excel.
Steps:
• Click Close & Load (default loads data to a new sheet)
• To load into an existing sheet, choose Close & Load To…

• Select:

o Table (recommended for analysis)

o Pivot Table Report (for summarized reports)

o Connection Only (for further transformations later)

• Click OK

7. Refresh Data
If the original data changes, Power Query allows you to refresh without
redoing transformations.
Steps:
• Go to the Data tab
• Click Refresh All

• Or, right-click on the loaded table and select Refresh


Power Query exercise
Question: You are provided with the excel dataset of HR data and Pay details of
the employees working in your company: (HR data_excel & Pay Details_excel)
Using the Power Query Tool, you are required to:
a. Merge both data sources and prepare consolidated data by suitably
applying data transformations.
b. Compute the net Salary payable to all employees
c. Compute the average net salary of all categories of employees
d. Identify the employees with the highest and the lowest amount of
contribution to their Provident (Fund PF)
Instructions :
(a) To merge both data sources and transform and prepare data for Payroll
computation, the analyst may perform the following steps:
• Open a new Excel sheet and click on the Data tab.
• Under the ‘Get & Transform Data’ group, click on Get Data > From File

> From Excel Workbook (HR file)to launch the Import Data dialog box.
• Select the first data source, namely the workbook file containing the HR

data and click on Import. In the Navigator pane, tick the ‘Select multiple
items’ option, then select the sheet containing the HR data. Click on
Load.
• Repeat the previous step to import data from the second data source, namely

the workbook file containing the Pay details of employees.


• Click on Get Data > Launch Power Query Editor to access the imported

data.
• Click on Home > Merge Queries > Merge Queries as New to launch the

Merge pane.
• Select HR Data as the first table and Pay Details as the second table. Since

Emp_ID is the key column in both the tables, select this column in both
sections. At this point, the bottom of the pane should confirm a match of all
rows. Click OK.
• Navigate to the Query Settings Pane to rename this query as Payroll, then
press Enter.
• Using the Current View slider, scroll to the Pay Details Column and click on
the button in the column header to expand the query. Click on Expand, then
OK to expand the merged query.
• Remove unnecessary columns. This may be achieved by executing a right
click on a column header and selecting Remove Column. For example,
the analyst may consider removing the Date of Birth column, the Pay
Details.Emp_ID column and any other extra column deemed unnecessary
for the purpose of payroll computation.
• Rename columns suitably; this may be achieved by double clicking column
headers.
• Click on Home > Close & Load to view the Output sheet in the Excel
workbook.

b. To compute the Net Salary payable to all employees, the following steps
may be performed:
• Launch the Power Query Editor by clicking on Query tab, then on Edit.
• Navigate to Add Column tab, then click on Custom Column.

• Enter the New Column Name as ‘Gross Salary’. By selecting the

appropriate column from the Available column pane and clicking on


insert, enter the following formula in the Custom column formula
section: = [Basic Salary]+([DA]*[Basic Salary])+([HRA]*[Basic
Salary]). Alternatively, the specific column may be inserted in the formula
by double-clicking on the column name appearing in the Available column
pane.
• Click OK to generate the new column 'Gross Salary'.

• Click on Add Column > Custom Column. Enter the New Column Name

as 'Deductions', and enter the following formula in the Custom column


formula section: =[Basic Salary]*[TDS Rate]+[PF Contribution].
• Click OK to generate the new column 'Deductions'.

• Click on Add Column > Custom Column. Enter the New Column Name as
'Net Salary' and enter the following formula in the Custom column
formula section: =[Gross Salary]-[Deductions].
• Click OK to generate the new column 'Net Salary'.

• Since the computation does not require the Advance Salary column,

right-click on the column header, then click on Remove column.


• Click on Home > Close & Load to generate update the Output sheet in the

Excel workbook.
• Select the Basic Salary, Gross Salary, Deductions, and Net Salary
columns, then navigate to the Home tab and specify the number format
as Currency.

c. To compute the average net salary of all categories of employees, the analyst
may perform the following steps:
1. Right-click on the ‘Payroll’ query from the Query Pane, then select
Duplicate. Rename this new query as Average Salary.
2. Navigate to the Transform tab. Click on Group By to launch the Group
By pane.
3. Select the grouping basis as Advanced.
4. Select Designation from the drop-down section.
5. In the first aggregation section, specify the New column name as Average
Salary. From the Operation dropdown, select Average. From the Column
dropdown, select Net Salary.
6. Click on Add aggregation to add a second aggregation section. Type ‘All’
as the New column name. From the Operation dropdown, select All
Rows.
7. Click OK to generate the query containing the average net salary of all
designations.
8. Navigate to the Home tab, then click on Close & Load to generate the
Average Salary Output sheet in the Excel workbook.
9. Select the Average Salary column, navigate to the Home tab in the Excel
workbook and change the number format to Currency to produce the final
output.

d. To identify the employees with the highest and the lowest amounts of PF
contribution, the following steps may be performed:
1. Launch the Power Query Editor.
2. Duplicate the Payroll query by executing a right-click on the query name
and clicking on Duplicate.
3. Rename this new query as "PF contribution details."
4. Click on Add column > Custom Column. Rename the column as "PF
contribution amount." Enter the formula: [Basic Salary] * [PF
Contribution]. Click OKto generate the new column.
5. Select Column and Transform: Select a specific column, go to the
Transform tab, and navigate to the Number Column group. Click on
Statistics > Maximum to find the maximum value in the column (e.g.,
22050).
6. Return to Previous Step: Click on the previous step under Applied Steps
to return to the PF Contribution Analysis Query.
7. Filter by Maximum Value: Navigate to the PF Contribution Amount
column. Click on the header dropdown, select Number Filters > Equals, and
then click Insert to open the Filter Rows dialog box.
8. Set Filter Condition: Choose Basic as the filter condition. Set the first
condition to "equals" and enter the maximum value (22050) in the
corresponding value section.
9. Apply Filter: Click OK. The query will now display the name and other
details of the employee with the maximum PF contribution.

Power BI Exercise:
Q. You are provided with the consolidated sales data, details of products
and sales teams of X Ltd: datasets are provided in the excel files
Using Power BI, you are required to:​
(a) Import and model the data in Power BI.​
(b) Create a visual presenting the monthly sales data for the years 2023
and 2024 for all products.​
(c) Create a visual presenting the average sales (in rupees) of all sales
teams across different product categories. Identify the cities and the team
(by team lead name) that have the highest and the lowest average sales.​
(d) Create a Map displaying the sales data across various cities and
countries.
Solution:​
(a) To import and model the given data in Power BI, the following steps
may be performed:
• Launch the Power BI Desktop application.
• Click on Get Data > Excel Workbook > Connect. Locate the input

Excel file, then click on OK. In the Navigator pane, check the box
next to the sheet containing the data to be imported, then click on
Load.
• Navigate to the Home tab of the Power BI, then repeat the procedure
outlined in the previous step to import data from other Excel
workbooks.

• Go to Navigation pane, click on Transform Data>Transform Data>click


Product_Data> go to Home tab, click on the “Use First Row as
Headers option” to promote the first row as a header. Repeat this
procedure for ‘Sales_Team’

• Click on Close & Apply to update the changes.


• In the View Navigation pane, click on Model View to view the data

tables and fields imported from the various data sources.

• Go to Navigation pane, click on Manage Relationship>Click New


Relationship>from table-Click Sales_Data>to table Click
Sales_Team>click Teams_code on both table>cross-filter direction-
click single. Click on Save.

(b) To create a visual depicting the monthly sales for 2023 and 2024, the
following steps may be performed:
• Click on Report View in the View Navigation pane.

• In the Visualization sidebar, locate and click on Clustered column


chart to add its visual to the report canvas.

• In the Data sidebar, expand the Sales_Data table fields. Click on the
Quantity Sold and Date of Sale fields to add the data to the visual.
• Click on the chart visual, then navigate to the Chart Visualization
section below the Visualization sidebar. Remove Day and Quarter
from the Date of Sale field under the X-axis.

• Go to Home tab>Click New Visual


• In the Visualization sidebar, click on Slicer to include it in the
dashboard. From the Data sidebar, select the Product_Name field to
dynamically link and view monthly sales of all products as per
selection from the slicer.

• Name the Output page as Report 1

(c) To create a visual presenting the average sales (in rupees) of all sales
teams across all cities:
• Click on the Table View in the View Navigation pane.

• Click on the Sales_Data to view its table. In the Table Tools tab, click
on New Column to add a new column. In the formula bar,
enter the DAX formula: = Sales_Data[Quantity Sold]*Sales_Data[Price
per unit] to calculate the Total Sales (in terms of rupees) for each sale.
Rename this column as Total Sales.

• Click on Report View in the View Navigation pane. In the Visualization


sidebar, locate and click on Stacked bar chart to add its visual to the
report canvas.
• In the Data sidebar, expand the tables and select the Total Sales field
from the Sales_Data table, and Team_Leadand City fields from the
Sales_Teams table.
• Click on the chart visual, then navigate to the Chart Visualization
section below the Visualization sidebar. Click on the dropdown under
the X-axis and change it to Average to transform the data to Average
Sales.

Using a combination of the visual inspection and slicer filters, we can


determine that the highest average sales is observed in the city of Sydney
lead by Pulkit Mishra, whereas the lowest average sales is observed in the
city of Osaka led by Naina Singh.

(d) To create a Map displaying the sales data across various cities and
countries, the following steps may be performed:
• Insert a new page to launch a blank report canvas.
• In the Visualization sidebar, click on Map.
• From the Data sidebar, click on the City and Country fields under the
Sales_Team table.

• In case the visual is disabled, click on File > Options and Settings >
Global > Security. Tick the checkbox against “Use Map and Filled
Map visuals”. Click OK. Navigate to the Home tab, then click on
Refresh.

Power Pivot exercise instruction


Using Power Pivot, you are required to:
(a) Create a Power Pivot table using the Product_ID field as the key
column.
(b) Compute the Total Sales and Total Profit for all the sales data.
(c) Compute the Average Sales and Average Profit of Appliances in the
East and West regions.
(d) Prepare a Pivot Chart to display the Total Profit of the four regions.
(e) Prepare a Pivot Chart with Month and Region slicers to display the
Sales (in terms of quantities sold) of all products.
(f) Create a KPI visual to track the profitability of all products.
Solution:
(a) To create a PowerPivot table, the analyst may perform the following
steps:
• Open a new Excel workbook. Add the Power Pivot Add-in (if not
already added) by following the steps outlined in the previous
section.
• Navigate to the Power Pivot tab and click on Manage to launch the
Power Pivot window.
• Click on Home tab > From Other Sources > Excel file > Next. Click
on Browse to select the Excel file from which the data is to be
imported. Check the box labeled“Use first row as column headers”.
Click on Next. From the Preview section, check the box next to
Source Table to include the relevant tables in the Excel file, then
click on Finish. Repeat this procedure for all Excel files from which
data is to be imported. Suitably rename the sheets imported into the
Power Pivot window, namely SalesData, ProductMapping, and
ProfitMargin.
• In the Home tab of the Power Pivot window, click on Diagram View to
view each table (including headers) in the data model.
• Click on the Product_ID field of the SalesData table, drag and
navigate to the Product_ID field of the ProductMapping table to
create a link between the two tables. Repeat this procedure to link
the Product_ID field of the SalesData table with that of the
ProfitMargin table.
(b) To compute the Total Sales and Total Profit for all the sales data, the
analyst may perform the following steps:
1. Calculate Total Sales:
o Click on Data View.

o Navigate to the SalesData sheet.

o Click on the Add Column header, then navigate to the function

bar.
o Select the appropriate column headers in the SalesData sheet.

o Enter the function: =SalesData[Qty Sold]*SalesData[Price per

unit].
o Press Enter.

o Rename this Calculated Column 1 as “Total Sales”.

2. Calculate Total Profit:


o Click on the Add Column header, then navigate to the function

bar.
o Select the appropriate column headers from the SalesData and

ProfitMargin sheets.
o Enter the function: =SalesData[Total Sales]*ProfitMargin[Sum of

Profit Margin (% of Sales)].


o Press Enter.
o Rename this Calculated Column as “Total Profit”.
o Format the Total Sales and Total Profit columns by checking on

the Apply Currency Format option under the Formatting group of


the home tab of the Power Pivot window
(c). To perform the Total Sales and Total Profit of all the product categories,
the following steps may be performed:
1. Navigate to the Power Pivot tab and click on Manage to open the
Power Pivot window.
2. On the Home tab, click on PivotTable to open the Create PivotTable
dialog box. Select New Worksheetand click OK. Rename this
PivotTable output sheet as SalesDataAnalysis.
3. In the PivotTable Fields section, expand the SalesDatasource by
clicking the arrow next to it. Select the Product_Name, Region, Total
Sales, and Total Profit fields. Also, expand the ProductMapping
source and select the Product_Category field.
4. Drag and place the Product_Category field under the Filters area.
5. In the Values area of the PivotTable Fields, click on Sum of Total
Sales > Value Field Settings. Select Average and click OK. Repeat
this for the Sum of Total Profit value field to generate the Average
Profit values.
6. In the PivotTable, click on the Product_Categorydropdown. Click the
+ sign next to All to expand the items. Tick the checkbox next to
Multiple items. Deselect Electronics and Furniture, then click OK.
7. From the Row Labels, filter the values for East and West to present
the Average Sales and Average Profits for Appliances in these two
regions.
(d). To prepare a PivotChart displaying the Total Profits of all four regions,
the analyst may perform the following steps:
1. Navigate to the Home tab of the Power Pivot window, then click on
PivotTable > PivotChart to launch the Create PivotChart dialog box.
2. Select either New Worksheet or Existing Worksheet as the output
table destination, then click on OK.
3. Expand the data sources by clicking on the arrows to the left of the
data source table names in the PivotChart Fields.
4. Drag and place the Total Profits field of the SalesDatatable under
the Values area, and place the Region field under the Axis
(Categories) area.
5. Format the PivotChart using the Chart Elements and Chart Style
options.
(e). To prepare a Pivot Chart with a slicer to display the Sales (in terms of
quantities sold) of all products, the analyst may perform the following steps:
1. Navigate to the Home tab of the Power Pivot window, then click on
PivotTable > PivotChart to launch the Create PivotChart dialog box.
Select either New Worksheet or Existing Worksheet as the output
table destination, then click on OK.
2. Expand the data sources by clicking on the arrows to the left of the
data source table names in the PivotChart Fields.
3. Drag and place the Qty Sold field of the SalesData table under the
Values area, and place the Product_Name field under the Axis
(Categories) area.
4. Format the Pivot Chart using the Chart Elements and Chart Style
options.
5. Click on the PivotChart, then navigate to the PivotChart Analyze tab.
Click on Insert Slicer. Select Month and Region, then click on OK to
insert the slicers for these two fields.
(f). To create a KPI (Key Performance Indicator) visual for profitability of all
products using Power Pivot in Excel. Here's a summary of the steps
mentioned:
1. Navigate to the Power Pivot tab and click on Manage to open the
Power Pivot window.
2. On the Home tab, click on PivotTable to open the Create PivotTable
dialog box.
3. Select New Worksheet and click OK. Rename this PivotTable output
sheet as “KPI Tracker”.
4. Since KPIs in Power Pivot only work with measures,navigate to the
Power Pivot tab and click on Measures > New Measure. In the
Measure dialog box, input the measure name as TotalProfit and the
following formula in the formula section:
=sumx(SalesData,SalesData[Sum of Total Profit]
Click on Check Formula to validate the formula. Select the Category as
Currency, then click on OK.
5. Drag and place the Product Name field under Rows and the newly
created TotalProfit measure under Values. Click on the dropdown of
the TotalProfit field and click on Value Field Settings. Click on the
“Show Values as” tab, and select % of Grand Total. Click on OK to
display the TotalProfit values as percentages of Total Cost (Overall).
6. Navigate to the Power Pivot tab and click on KPIs > New KPI.
Select the TotalProfit measure as the KPI base field. Select Absolute
value as the target value. And enter the Target Profit value (say,
₹​85,00,000) Drag the sliders to define the KPI thresholds. In this
case, the lower bound is set at ₹17,00,000, whereas the upper
bound is set at ₹39,00,000. Select an Icon Style, then click on OK.
7. In the PivotTable fields pane, click on the newly added TotalProfit
KPI measure and select the Goal and Status fields to display the
output.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy