0% found this document useful (0 votes)
4 views11 pages

Prac 2

The document outlines practical exercises in Business Analytics using Python libraries like NumPy, Pandas, and Matplotlib. It includes tasks such as creating and manipulating arrays, handling missing data in DataFrames, and visualizing data through various plots. Key operations include calculating statistics, filtering data, and comparing distributions with visualizations.

Uploaded by

asharathod1999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views11 pages

Prac 2

The document outlines practical exercises in Business Analytics using Python libraries like NumPy, Pandas, and Matplotlib. It includes tasks such as creating and manipulating arrays, handling missing data in DataFrames, and visualizing data through various plots. Key operations include calculating statistics, filtering data, and comparing distributions with visualizations.

Uploaded by

asharathod1999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

22SE02ML063 Business Analytics

Practical – 2
Create a NumPy array of shape (5, 5) with values ranging from 1 to 25. •
Perform the following operations: • Flatten the array into a 1D array. •
Calculate the mean, median, and standard deviation of the array. • Reshape
the array back into a 5x5 matrix and replace all values greater than 10 with 0.
import numpy as np

array_2d = np.arange(1, 26).reshape(5, 5)

array_flattened = array_2d.flatten()

mean_value = np.mean(array_flattened)
median_value = np.median(array_flattened)
std_deviation = np.std(array_flattened)

array_reshaped = array_flattened.reshape(5, 5)
array_reshaped[array_reshaped > 10] = 0

print("Original 2D Array:")
print(array_2d)
print("\nFlattened Array:")
print(array_flattened)
print("\nMean:", mean_value)
print("Median:", median_value)
print("Standard Deviation:", std_deviation)
print("\nModified 2D Array:")
print(array_reshaped)
Output:
22SE02ML063 Business Analytics

• Create two NumPy arrays: a 3x3 matrix of random integers between 1 and
10 and a 3x1 column vector of random integers between 1 and 5. • Perform
the following: o Multiply the matrix by the column vector. o Transpose the
resulting matrix. o Find the determinant of the original 3x3 matrix.
import numpy as np

matrix = np.random.randint(1, 11, size=(3, 3))

column_vector = np.random.randint(1, 6, size=(3, 1))

result_matrix = np.dot(matrix, column_vector)

transposed_matrix = result_matrix.T

determinant = np.linalg.det(matrix)

print("Original 3x3 Matrix:")


print(matrix)
print("\n3x1 Column Vector:")
print(column_vector)
print("\nResulting Matrix After Multiplication:")
print(result_matrix)
print("\nTransposed Matrix:")
print(transposed_matrix)
print("\nDeterminant of the Original Matrix:", determinant)
Output:
22SE02ML063 Business Analytics

Create a Pandas DataFrame with columns Name, Age, Height, and City with
the following data: • Perform the following tasks: o Display the first 3 rows of
the DataFrame. o Add a new column Weight with random values. o Filter the
rows where Age is greater than 25 and display only the Name and Height
columns
import numpy as np
import pandas as pd

data = {
"Name": ["Alice", "Bob", "Charlie", "David", "Eve"],
"Age": [23, 30, 35, 22, 28],
"Height": [5.5, 6.0, 5.8, 5.9, 5.7],
"City": ["New York", "Los Angeles", "Chicago", "Houston", "Phoenix"]
}
df = pd.DataFrame(data)

print("\nFirst 3 Rows of DataFrame:")


print(df.head(3))

df["Weight"] = np.random.randint(50, 101, size=len(df))


print("\nDataFrame with Weight Column:")
print(df)

filtered_df = df[df["Age"] > 25][["Name", "Height"]]


print("\nFiltered Rows (Age > 25):")
print(filtered_df)
Output:
22SE02ML063 Business Analytics

Create a DataFrame containing Name, Age, Salary columns with some missing
(NaN) values. • Fill the missing Age values with the mean value of the
column.• Drop any rows where Salary is missing
import numpy as np
import pandas as pd

data_with_nan = {
"Name": ["Frank", "Grace", "Hank", "Ivy", "Jack"],
"Age": [25, np.nan, 29, np.nan, 32],
"Salary": [50000, 60000, np.nan, 75000, 80000]
}
df_nan = pd.DataFrame(data_with_nan)

df_nan["Age"].fillna(df_nan["Age"].mean(), inplace=True)

df_nan.dropna(subset=["Salary"], inplace=True)

print("\nDataFrame with Missing Values Handled:")


print(df_nan)
Output:
22SE02ML063 Business Analytics

Create a line plot that represents the relationship between two lists x = [1, 2,
3, 4, 5] and y = [2, 4, 6, 8, 10]. • Label the x-axis as "X values" and the y-axis as
"Y values". • Add a title "Simple Line Plot".
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

plt.plot(x, y, marker='o')
plt.xlabel("X values")
plt.ylabel("Y values")
plt.title("Simple Line Plot")
plt.grid(True)
plt.show()
Output:
22SE02ML063 Business Analytics

Create a bar plot comparing the sales of different products in a store.


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

products = ["Product A", "Product B", "Product C", "Product D"]


sales = [250, 400, 300, 450]

plt.bar(products, sales, color=['blue', 'green', 'red', 'purple'])


plt.xlabel("Products")
plt.ylabel("Sales")
plt.title("Product Sales Comparison")
plt.show()

Output:
22SE02ML063 Business Analytics

Plot histograms for both total_bill and tip. Compare their distributions. •
Create overlapping histograms for total_bill for lunch and dinner times. What
differences do you notice? • Adjust the number of bins in the histogram to
50. How does it affect the visualization?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

data = sns.load_dataset('tips')

plt.hist(data['total_bill'], bins=50, alpha=0.7, label='Total Bill', color='blue')


plt.hist(data['tip'], bins=50, alpha=0.7, label='Tip', color='green')
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.title("Histograms of Total Bill and Tip")
plt.legend()
plt.show()

lunch_data = data[data['time'] == 'Lunch']


dinner_data = data[data['time'] == 'Dinner']

plt.hist(lunch_data['total_bill'], bins=50, alpha=0.7, label='Lunch', color='orange')


plt.hist(dinner_data['total_bill'], bins=50, alpha=0.7, label='Dinner', color='purple')
plt.xlabel("Total Bill")
plt.ylabel("Frequency")
plt.title("Overlapping Histograms of Total Bill (Lunch vs Dinner)")
plt.legend()
plt.show()

Observation: Adjusting bins to 50 creates more granular insights into the distribution of
values.
Output:
22SE02ML063 Business Analytics
22SE02ML063 Business Analytics

Create a boxplot comparing tip amounts for smokers and non-smokers. What
trends can you identify? • Add a swarmplot over the boxplot (use
sns.swarmplot) for total_bill by day. Does it add any additional insights? •
Group the boxplot by sex and time (e.g., use hue='sex' and x='time') to see if
there are any differences in spending habits.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
data = sns.load_dataset('tips')

plt.figure(figsize=(8, 6))
sns.boxplot(x='smoker', y='tip', data=data)
plt.title("Boxplot of Tip Amounts for Smokers and Non-Smokers")
plt.xlabel("Smoker")
plt.ylabel("Tip Amount")
plt.show()

Observation: Boxplot reveals trends such as whether smokers tend to tip more or less than
non-smokers.

plt.figure(figsize=(10, 6))
sns.boxplot(x='day', y='total_bill', data=data, palette='Set2')
sns.swarmplot(x='day', y='total_bill', data=data, color='black', alpha=0.7)
plt.title("Boxplot with Swarmplot Overlay of Total Bill by Day")
plt.xlabel("Day")
plt.ylabel("Total Bill")
plt.show()

Observation: Swarmplot provides additional insights into individual data points and outliers.
22SE02ML063 Business Analytics

plt.figure(figsize=(10, 6))
sns.boxplot(x='time', y='total_bill', hue='sex', data=data, palette='coolwarm')
plt.title("Boxplot of Total Bill Grouped by Sex and Time")
plt.xlabel("Time")
plt.ylabel("Total Bill")
plt.legend(title="Sex")
plt.show()

Observation: Grouping by sex and time shows differences in spending habits between males
and females during lunch and dinner.
Output:
22SE02ML063 Business Analytics

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy