Revision
Revision
Key Takeaways
● The normal distribution is the proper term for a probability bell curve.
● Normal distributions are symmetrical, but not all symmetrical distributions are
normal.
● Note that other distributions look similar to the normal distribution. Statistical
calculations must be used to prove a normal distribution.
●
● Skewness
● Skewness is a measure of asymmetry or distortion of symmetric
distribution. It measures the deviation of the given distribution of
a random variable from a symmetric distribution, such as normal
distribution. A normal distribution is without any skewness, as it
is symmetrical on both sides. Hence, a curve is regarded as
skewed if it is shifted towards the right or the left.
1. Positive Skewness
If the given distribution is shifted to the left and with its tail on the right side, it is a
positively skewed distribution. It is also called the right-skewed distribution.
2. Negative Skewness
If the given distribution is shifted to the right and with its tail on the left side, it is a
negatively skewed distribution. It is also called a left-skewed distribution
● Kurtosis
Kurtosis is a statistical measure used to describe a characteristic of a dataset. When
normally distributed data is plotted on a graph, it generally takes the form of a bell.
This is called the bell curve. The plotted data that are farthest from the mean of the
data usually form the tails on each side of the curve. Kurtosis indicates how much
data resides in the tails.
Types of kurtosis
1. Lepto-kurtic: It is a curve having a higher peak than normal curve. Too much
concentration of the items near the center.
2. Platy-kurtic:
A curve having a lower peak (flatter) than the normal curve. There is less
concentration of items near the center.
3. Meso-Kurtic: It is a curve having a normal peak or normal curve. There is equal
distribution around the center value (mean). In such case mean, median, and mode
are equal.
Double-Peaked or Bimodal
The bimodal distribution looks like the back of a two-humped camel. The outcomes
of two processes with different distributions are combined in one set of data.
Excel
1. Min.value
2. = min.value +$interval width
Frequency
Plot histogram
Select table – right click - format data series – gap width (0)
Data arrange – right click -select data – edit – data put on (class of interval)
NORM.DIST(B2,$Mean,$STD V,FALSE)
Expected Frequency
=(G2-J2)^2/J2
Press Enter and drag down till the last ‘Bin value”
Then, sum the Chi Square (x )2
=CHIINV(0.05,K24)
Chart line
Q. The time series of Market capitalization (% of GDP) of India is given for
the years 2000 through 2022 in the table
Using Visualization tools in Excel, prepare a line chart to (a) Compute the
Trend values (b) Depict the Trend values as a secondary line in the Line
Chart; and (c) Interpret the visual presentation in Brief.
Instruction
1. Compute Trend Values
• Enter = TREND(A2:B24) followed by the Ctrl+Shift+Enter key
combination to get the Trend Value.
• Select C2:C24, then keep the Cursor in the end of the Formula
bar, Ctrl+Shift and press Enter, to get all the Trend values
=AVERAGE(Data_Range)
2. Calculate the Standard Deviation:
o Use the STDEV.P function to calculate the standard deviation of
the dataset.
=STDEV.S (Data_range)
3. Calculate the Z-Score for Each Data Point:
= (x-mean)/Standard Deviation
(Note- fixed Mean, and Standard Deviation)
• Darg this formula across all cell Range to calculate the Z-scores for
the entire dataset
2. Steps to Apply Conditional Formatting for Outliers:
1. Select the Z-Score Range:
o Go to the Home tab in the Excel ribbon.
is true:
Type
=OR(L1>3, L1<-3)
(note: L1 is the first value of Z-score)
2. Set the Formatting Style:
o Click the Format button to choose how you want to highlight the
In the excel formula bar, Keep the mouse cursor at the end of the
function, press the key combination Ctrl+Shift+Enter to enter the
function as an array function.
5. Create the Histogram Chart
Select the Bins column and the Frequency column
Go to Insert > Charts> Column Chart > Clustered Column
Interval
• Chart title as GDP growth (2000 – 2023)
Possible Interpretations:
1. Positive Correlation:
• Pattern: Data points trend upward from left to right.
• Interpretation: Countries with higher FDI inflows (as a percentage of
Singapore (high FDI and high GDP growth) are clustered in the
top-right corner, this suggests a positive relationship.
2. Negative Correlation:
• Pattern: Data points trend downward from left to right.
• Interpretation: Countries with higher FDI inflows tend to have lower
• Example: Hong Kong SAR, China (extremely high FDI but moderate
Greece (FDI: 1.39%, GDP growth: 0.80%) show lower FDI and
lower GDP growth.
4. Outliers:
o Hong Kong SAR, China (FDI: 31.05%, GDP growth: 2.64%) has
Conclusion:
• The scatter plot likely shows a weak to moderate positive correlation
between FDI inflows and GDP growth per capita. Countries with
higher FDI tend to have higher GDP growth, but there are exceptions
(outliers) where other factors may influence GDP growth.
• Further analysis (e.g., regression analysis) could quantify the strength of
BOXPLOT
• In the Charts group, look for the Insert Statistic Chart option (this icon looks like a box with
whiskers).
• Click the Insert Statistic Chart dropdown and choose Box and Whisker.
• Using the Chart Elements button (the "+" sign) to toggle data labels, gridlines and Legend
Calculate Quartiles (Q1, Q3):
•Use the QUARTILE.INC function to calculate the first quartile (Q1)
and the third quartile (Q3).
o Q1 (25th percentile):
▪ QUARTILE.INC(Data_Range,1)
o Q3 (75th percentile):
▪ QUARTILE.INC(Data_Range,3)
= Q1 - 1.5 * IQR
oUpper Bound:
= Q3 - 1.5 * IQR
cov (x, y)
cov(x, y, method = "pearson")
cov(x, y, method = "kendal")
cov(x, y, method = "spearman")
cor (x, y)
cor (x, y, method = "pearson")
cor (x, y, method = "kendal")
cor (x, y, method = "spearman")
#text Mining
• Click Import
4. Transform Data
Modify the data as needed using Power Query's transformation tools.
Common transformations:
Remove Columns/Rows: Right-click a column/row > Remove
•
Change Data Type: Click the column > Choose a data type from the
•
dropdown
• Split Columns: Select a column > Go to Transform > Click Split
Column
• Merge Columns: Select multiple columns > Click Merge Columns
• Replace Values: Right-click a column > Replace Values
• Pivot/Unpivot Data: Use the Transform tab
(Every action is recorded as a step, which you can modify later.)
5. Combine Data
You can merge multiple tables or append data from different sources.
Steps:
• Merge Queries:
oGo to Home > Click Merge Queries
Select the tables and matching columns
o
• Append Queries:
• Select:
• Click OK
7. Refresh Data
If the original data changes, Power Query allows you to refresh without
redoing transformations.
Steps:
• Go to the Data tab
• Click Refresh All
> From Excel Workbook (HR file)to launch the Import Data dialog box.
• Select the first data source, namely the workbook file containing the HR
data and click on Import. In the Navigator pane, tick the ‘Select multiple
items’ option, then select the sheet containing the HR data. Click on
Load.
• Repeat the previous step to import data from the second data source, namely
data.
• Click on Home > Merge Queries > Merge Queries as New to launch the
Merge pane.
• Select HR Data as the first table and Pay Details as the second table. Since
Emp_ID is the key column in both the tables, select this column in both
sections. At this point, the bottom of the pane should confirm a match of all
rows. Click OK.
• Navigate to the Query Settings Pane to rename this query as Payroll, then
press Enter.
• Using the Current View slider, scroll to the Pay Details Column and click on
the button in the column header to expand the query. Click on Expand, then
OK to expand the merged query.
• Remove unnecessary columns. This may be achieved by executing a right
click on a column header and selecting Remove Column. For example,
the analyst may consider removing the Date of Birth column, the Pay
Details.Emp_ID column and any other extra column deemed unnecessary
for the purpose of payroll computation.
• Rename columns suitably; this may be achieved by double clicking column
headers.
• Click on Home > Close & Load to view the Output sheet in the Excel
workbook.
b. To compute the Net Salary payable to all employees, the following steps
may be performed:
• Launch the Power Query Editor by clicking on Query tab, then on Edit.
• Navigate to Add Column tab, then click on Custom Column.
• Click on Add Column > Custom Column. Enter the New Column Name
• Click on Add Column > Custom Column. Enter the New Column Name as
'Net Salary' and enter the following formula in the Custom column
formula section: =[Gross Salary]-[Deductions].
• Click OK to generate the new column 'Net Salary'.
• Since the computation does not require the Advance Salary column,
Excel workbook.
• Select the Basic Salary, Gross Salary, Deductions, and Net Salary
columns, then navigate to the Home tab and specify the number format
as Currency.
c. To compute the average net salary of all categories of employees, the analyst
may perform the following steps:
1. Right-click on the ‘Payroll’ query from the Query Pane, then select
Duplicate. Rename this new query as Average Salary.
2. Navigate to the Transform tab. Click on Group By to launch the Group
By pane.
3. Select the grouping basis as Advanced.
4. Select Designation from the drop-down section.
5. In the first aggregation section, specify the New column name as Average
Salary. From the Operation dropdown, select Average. From the Column
dropdown, select Net Salary.
6. Click on Add aggregation to add a second aggregation section. Type ‘All’
as the New column name. From the Operation dropdown, select All
Rows.
7. Click OK to generate the query containing the average net salary of all
designations.
8. Navigate to the Home tab, then click on Close & Load to generate the
Average Salary Output sheet in the Excel workbook.
9. Select the Average Salary column, navigate to the Home tab in the Excel
workbook and change the number format to Currency to produce the final
output.
d. To identify the employees with the highest and the lowest amounts of PF
contribution, the following steps may be performed:
1. Launch the Power Query Editor.
2. Duplicate the Payroll query by executing a right-click on the query name
and clicking on Duplicate.
3. Rename this new query as "PF contribution details."
4. Click on Add column > Custom Column. Rename the column as "PF
contribution amount." Enter the formula: [Basic Salary] * [PF
Contribution]. Click OKto generate the new column.
5. Select Column and Transform: Select a specific column, go to the
Transform tab, and navigate to the Number Column group. Click on
Statistics > Maximum to find the maximum value in the column (e.g.,
22050).
6. Return to Previous Step: Click on the previous step under Applied Steps
to return to the PF Contribution Analysis Query.
7. Filter by Maximum Value: Navigate to the PF Contribution Amount
column. Click on the header dropdown, select Number Filters > Equals, and
then click Insert to open the Filter Rows dialog box.
8. Set Filter Condition: Choose Basic as the filter condition. Set the first
condition to "equals" and enter the maximum value (22050) in the
corresponding value section.
9. Apply Filter: Click OK. The query will now display the name and other
details of the employee with the maximum PF contribution.
Power BI Exercise:
Q. You are provided with the consolidated sales data, details of products
and sales teams of X Ltd: datasets are provided in the excel files
Using Power BI, you are required to:
(a) Import and model the data in Power BI.
(b) Create a visual presenting the monthly sales data for the years 2023
and 2024 for all products.
(c) Create a visual presenting the average sales (in rupees) of all sales
teams across different product categories. Identify the cities and the team
(by team lead name) that have the highest and the lowest average sales.
(d) Create a Map displaying the sales data across various cities and
countries.
Solution:
(a) To import and model the given data in Power BI, the following steps
may be performed:
• Launch the Power BI Desktop application.
• Click on Get Data > Excel Workbook > Connect. Locate the input
Excel file, then click on OK. In the Navigator pane, check the box
next to the sheet containing the data to be imported, then click on
Load.
• Navigate to the Home tab of the Power BI, then repeat the procedure
outlined in the previous step to import data from other Excel
workbooks.
(b) To create a visual depicting the monthly sales for 2023 and 2024, the
following steps may be performed:
• Click on Report View in the View Navigation pane.
• In the Data sidebar, expand the Sales_Data table fields. Click on the
Quantity Sold and Date of Sale fields to add the data to the visual.
• Click on the chart visual, then navigate to the Chart Visualization
section below the Visualization sidebar. Remove Day and Quarter
from the Date of Sale field under the X-axis.
(c) To create a visual presenting the average sales (in rupees) of all sales
teams across all cities:
• Click on the Table View in the View Navigation pane.
• Click on the Sales_Data to view its table. In the Table Tools tab, click
on New Column to add a new column. In the formula bar,
enter the DAX formula: = Sales_Data[Quantity Sold]*Sales_Data[Price
per unit] to calculate the Total Sales (in terms of rupees) for each sale.
Rename this column as Total Sales.
(d) To create a Map displaying the sales data across various cities and
countries, the following steps may be performed:
• Insert a new page to launch a blank report canvas.
• In the Visualization sidebar, click on Map.
• From the Data sidebar, click on the City and Country fields under the
Sales_Team table.
• In case the visual is disabled, click on File > Options and Settings >
Global > Security. Tick the checkbox against “Use Map and Filled
Map visuals”. Click OK. Navigate to the Home tab, then click on
Refresh.
bar.
o Select the appropriate column headers in the SalesData sheet.
unit].
o Press Enter.
bar.
o Select the appropriate column headers from the SalesData and
ProfitMargin sheets.
o Enter the function: =SalesData[Total Sales]*ProfitMargin[Sum of