0% found this document useful (0 votes)

8 views31 pages

DA Lab Manual r22

The document outlines a structured program for data preprocessing and analysis, covering topics such as handling missing values, noise detection, data redundancy elimination, and various machine learning implementations including linear and logistic regression. It includes code examples using Python and libraries like pandas and scikit-learn to demonstrate practical applications of these concepts. Additionally, the document emphasizes visualization techniques and analytics on healthcare and product sales data.

Uploaded by

pnagakalyan.aiml

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views31 pages

DA Lab Manual r22

Uploaded by

pnagakalyan.aiml

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 31

Week Page

Name Of The Program

No. No.
Data Preprocessing
1 a. Handling missing values 8-11
b. Noise detection removal
2 Implement Data Preprocessing to Identifying data redundancy and elimination 11-12

3 Implement any one imputation model 12-13

4 Implement Linear Regression 14

5 Implement Logistic Regression 14-23

6 Implement Decision Tree Induction for classification 24-25

7 Implement Random Forest Classifier 25-28

8 Object segmentation using hierarchical based methods 29-30

9 Perform Visualization techniques (types of maps - Bar, Colum, Line, Scatter, 31-33
3D Cubes etc)
10 Perform Descriptive analytics on healthcare data 34-36

11 Perform Predictive analytics on Product Sales data 37-41

12 Apply Predictive analytics for Weather forecasting 42-46

INDEX

:: 1 ::
Week 1.a
Data Preprocessing
1
a. Handling missing values
a. Handling missing values

import pandas as pd
import numpy as np
dataset_path="Poojithafile.csv"
df=pd.read_csv(dataset_path)
print(df.head())
Output:
SNO HTNO Student Name Age Address Attendence Marks
0 1 22C11A0427 Likitha 18.0 kodad 75.0 90.0
1 2 22C11A0428 Nandhini 19.0 khammam 80.0 84.0
2 3 22C11A0429 Latha 18.0 tirupathi 75.0 76.0
3 4 22C11A0430 Poojitha 21.0 suryapet 97.0 91.0
4 5 22C11A0431 Madhuri NaN kodad 94.0 75.0

print(df.info())
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12 entries, 0 to 11
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 SNO 12 non-null int64
1 HTNO 12 non-null object
2 Student Name 12 non-null object
3 Age 11 non-null float64
4 Address 10 non-null object
5 Attendence 10 non-null float64
6 Marks 11 non-null float64
dtypes: float64(3), int64(1), object(3)
memory usage: 804.0+ bytes
None
print(df.describe())
Output:
SNO Age Attendence Marks
count 12.000000 11.00000 10.000000 11.000000
mean 6.500000 20.00000 81.500000 77.090909
std 3.605551 2.04939 12.030055 21.658507
min 1.000000 18.00000 60.000000 21.000000
25% 3.750000 18.50000 75.000000 75.500000
50% 6.500000 19.00000 78.000000 87.000000
75% 9.250000 21.00000 91.750000 90.000000
max 12.000000 24.00000 98.000000 91.000000

:: 2 ::
print(df.shape)
Output:
(12, 7)
print(df.isnull().sum())
Output:

SNO 0
HTNO 0
Student Name 0
Age 1
Address 2
Attendence 2
Marks 1
dtype: int64

print(df.nunique())
Output:
SNO 12
HTNO 12
Student Name 12
Age 6
Address 7
Attendence 8
Marks 8
dtype: int64

print(df['Student Name'])
Output:
0 Likitha
1 Nandhini
2 Latha
3 Poojitha
4 Madhuri
5 Manjula
6 Sushmitha
7 Rishi
8 Dimple
9 Anmol
10 Namratha
11 Pavan kalyan
Name: Student Name, dtype: object

print(df.groupby('Age')['Attendence'].mean())
Output:

Age
18.0 78.333333
19.0 77.500000
20.0 98.000000
21.0 97.000000
23.0 76.000000
24.0 60.000000
Name: Attendence, dtype: float64

:: 3 ::
print(df.isnull().sum())
Output:
SNO 0
HTNO 0
Student Name 0
Age 1
Address 2
Attendence 2
Marks 1
dtype: int64

age_mean=df.Age.mean()
print("Mean of age column:",age_mean)
Output:
Mean of age column: 20.0
df['Age'].fillna(value=age_mean,inplace=True)

print(df.head())
Output:
SNO HTNO Student Name Age Address Attendence Marks
0 1 22C11A0427 Likitha 18.0 kodad 75.0 90.0
1 2 22C11A0428 Nandhini 19.0 khammam 80.0 84.0
2 3 22C11A0429 Latha 18.0 tirupathi 75.0 76.0
3 4 22C11A0430 Poojitha 21.0 suryapet 97.0 91.0
4 5 22C11A0431 Madhuri 20.0 kodad 94.0 75.0

print(df.isnull().sum())
Output:
SNO 0
HTNO 0
Student Name 0
Age 0
Address 2
Attendence 2
Marks 1
dtype: int64
print(df.to_string())
Output:
SNO HTNO Student Name Age Address Attendence Marks
0 1 22C11A0427 Likitha 18.0 kodad 75.0 90.0
1 2 22C11A0428 Nandhini 19.0 khammam 80.0 84.0
2 3 22C11A0429 Latha 18.0 tirupathi 75.0 76.0
3 4 22C11A0430 Poojitha 21.0 suryapet 97.0 91.0
4 5 22C11A0431 Madhuri 20.0 kodad 94.0 75.0
5 6 22C11A0432 Manjula 24.0 mulugu 60.0 NaN
6 7 22C11A0433 Sushmitha 18.0 NaN 85.0 90.0
7 8 22C11A0434 Rishi 19.0 karimnagar NaN 90.0
8 9 22C11A0435 Dimple 19.0 karimnagar 75.0 87.0
9 10 22C11A0436 Anmol 23.0 NaN 76.0
54.01011 22C11A0437 Namratha 21.0 karimnagar NaN
21.0
112 22C11A0438 Pavan kalyan 20.0 pitapuram 98.0 90.0

:: 4 ::
Week 1.b
1 b. Noise detection removal
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Step 1: Generate a simple student marks dataset

np.random.seed(0)
marks = np.random.normal(loc=70, scale=10, size=30) # Normal
distribution with mean=70, std=10
marks = np.clip(marks, 0, 100) # Ensure the marks are between 0
and 100

# Adding some random noise (outliers)

marks[5] = 120 # Outlier 1
marks[15] = -10 # Outlier 2

# Create a DataFrame
df = pd.DataFrame(marks, columns=['Marks'])
print("Original Marks Dataset with Noise:\n", df)

# Step 2: Outlier Detection and Removal using IQR method

Q1 = df['Marks'].quantile(0.25)
Q3 = df['Marks'].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

# Remove outliers
df_cleaned = df[(df['Marks'] >= lower_bound) & (df['Marks'] <=
upper_bound)]

# Step 3: Smoothing (Moving Average) to reduce minor noise

window_size = 5
df_cleaned['Smoothed Marks'] =
df_cleaned['Marks'].rolling(window=window_size).mean()

# Plot the original and smoothed data

plt.figure(figsize=(10, 6))
plt.plot(df_cleaned['Marks'].reset_index(drop=True),
label="Cleaned Marks", marker='o')
plt.plot(df_cleaned['Smoothed
Marks'].dropna().reset_index(drop=True), label="Smoothed Marks
(Moving Avg)", marker='x')
plt.legend()
plt.title(f"Marks After Removing Outliers and Smoothing (Window
Size: {window_size})")
plt.xlabel("Student Index")
plt.ylabel("Marks")
plt.show()

:: 5 ::
o\p
Original Marks Dataset with Noise:
Marks
0 87.640523
1 74.001572
2 79.787380
3 92.408932
4 88.675580
5 120.000000
6 79.500884
7 68.486428
8 68.967811
9 74.105985
10 71.440436
11 84.542735
12 77.610377
13 71.216750
14 74.438632
15 -10.000000
16 84.940791
17 67.948417
18 73.130677
19 61.459043
20 44.470102
21 76.536186
22 78.644362
23 62.578350
24 92.697546
25 55.456343
26 70.457585
27 68.128161
28 85.327792
29 84.693588

:: 6 ::
Week 2
2 Implement data processing to identify data redundancy and elimination

import pandas as pd
# Step 1: Create a simple student data dataset with some
redundancy (duplicates)
data = {
'StudentID': [101, 102, 103, 104, 105, 102, 106, 107, 105],
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva', 'Bob',
'Frank', 'Grace', 'Eva'],
'Age': [20, 21, 22, 23, 24, 21, 25, 26, 24],
'Grade': ['A', 'B', 'C', 'B', 'A', 'B', 'A', 'A', 'A']
}

# Create a DataFrame
df = pd.DataFrame(data)

# Display the original dataset with redundancy

print("Original Dataset with Redundancy (Duplicates):")
print(df)

# Step 2: Identify and eliminate redundant data (duplicate rows)

df_no_duplicates = df.drop_duplicates()

# Step 3: Display the cleaned dataset (duplicates removed)

print("\nCleaned Dataset (Duplicates Removed):")
print(df_no_duplicates)

#Step 4

Df_remove_praticular_duplicates_in_column=

df.drop_duplicates(subset=’StudentID’)

print(df_no_duplicates)
Output:

:: 7 ::
Week 3

3 Implement any one imputation model

import pandas as pd
import numpy as np
from sklearn.impute import SimpleImputer

# Step 1: Create a sample dataset with missing values (NaN)

data = {
'StudentID': [101, 102, 103, 104, 105],
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
'Age': [20, np.nan, 22, 23, np.nan], # Missing values in
'Age'
'Grade': ['A', 'B', 'C', 'B', 'A']
}

# Create a DataFrame
df = pd.DataFrame(data)

# Display the original dataset with missing values

print("Original Dataset with Missing Values:")
print(df)

# Step 2: Impute missing values in the 'Age' column using

SimpleImputer (mean strategy)
imputer = SimpleImputer(strategy='mean') # Use the 'mean'
strategy for imputation
df['Age'] = imputer.fit_transform(df[['Age']]) # Impute missing
values in the 'Age' column

# Display the dataset after imputation

print("\nDataset After Imputation (Mean):")
print(df)

Output:

:: 8 ::
Week 4
Implement Linear Regression
4

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Step 1: Create a simple dataset

# Let's create a simple dataset where we predict y based on x
data = {
'X': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], # Input feature
'y': [1.5, 3.2, 4.8, 6.5, 7.7, 9.1, 10.5, 11.6, 13.1, 14.3]
# Target variable
}

# Convert the data into a pandas DataFrame

df = pd.DataFrame(data)

# Step 2: Visualize the data

plt.scatter(df['X'], df['y'], color='blue', label='Data points')
plt.xlabel('X')
plt.ylabel('y')
plt.title('Simple Linear Regression Example')
plt.show()

# Step 3: Split the data into training and testing sets

X = df[['X']] # Feature (independent variable)
y = df['y'] # Target (dependent variable)

# Split data into training (80%) and testing (20%)

X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)

# Step 4: Initialize and train the Linear Regression model

model = LinearRegression()
model.fit(X_train, y_train)

# Step 5: Make predictions using the trained model

y_pred = model.predict(X_test)

# Step 6: Evaluate the model performance

mse = mean_squared_error(y_test, y_pred) # Mean Squared Error
r2 = r2_score(y_test, y_pred) # R-squared value

print(f'Mean Squared Error: {mse}')

print(f'R-squared: {r2}')

# Step 7: Visualize the regression line along with the data

:: 9 ::
points
plt.scatter(X, y, color='blue', label='Data points') # Original
data points
plt.plot(X, model.predict(X), color='red', label='Regression
Line') # Regression line
plt.xlabel('X')
plt.ylabel('y')
plt.title('Linear Regression with Model Prediction')
plt.legend()
plt.show()

Output:

:: 10 ::
Week 5

5 Implement logistic regression.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix,
classification_report

# Step 1: Create a simple dataset for binary classification

data = {
'Study_Hours': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], # Number of
hours studied
'Passed': [0, 0, 0, 0, 1, 1, 1, 1, 1, 1] # 0=Fail, 1=Pass
}

# Convert the data into a pandas DataFrame

df = pd.DataFrame(data)

# Step 2: Visualize the data

plt.scatter(df['Study_Hours'], df['Passed'], color='blue',
label='Data points')
plt.xlabel('Study Hours')
plt.ylabel('Passed (1) / Failed (0)')
plt.title('Logistic Regression Example')
plt.show()

# Step 3: Split the data into features (X) and target (y)
X = df[['Study_Hours']] # Feature (independent variable)
y = df['Passed'] # Target (dependent variable)

# Step 4: Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)

# Step 5: Initialize and train the Logistic Regression model

model = LogisticRegression()
model.fit(X_train, y_train)

# Step 6: Make predictions using the trained model

y_pred = model.predict(X_test)

# Step 7: Evaluate the model performance

accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

# Output the evaluation results

:: 11 ::
print(f'Accuracy: {accuracy}')
print(f'Confusion Matrix:\n{conf_matrix}')
print(f'Classification Report:\n{class_report}')

# Step 8: Visualize the Logistic Regression decision boundary

# Plot the original data points
plt.scatter(df['Study_Hours'], df['Passed'], color='blue',
label='Data points')

# Plot the logistic regression decision boundary

x_range = np.linspace(0, 10, 1000).reshape(-1, 1) # Create a
range of study hours
y_range = model.predict_proba(x_range)[:, 1] # Get predicted
probabilities for passing (class 1)
plt.plot(x_range, y_range, color='red', label='Decision Boundary
(Probability)')
plt.xlabel('Study Hours')
plt.ylabel('Probability of Passing')
plt.title('Logistic Regression - Decision Boundary')
plt.legend()
plt.show()

Output:

:: 12 ::
Week 6
:: 13 ::
6 Implement decision tree induction for classification

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.metrics import accuracy_score, confusion_matrix,
classification_report

# Step 1: Create a simple dataset for classification

data = {
'Study_Hours': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], # Number of
hours studied
'Marks': [30, 35, 40, 50, 60, 65, 70, 80, 90, 95], # Marks
obtained
'Passed': [0, 0, 0, 0, 1, 1, 1, 1, 1, 1] # 0=Fail, 1=Pass
}

# Convert the data into a pandas DataFrame

df = pd.DataFrame(data)

# Step 2: Visualize the data

plt.scatter(df['Study_Hours'], df['Marks'], c=df['Passed'],
cmap='coolwarm', marker='o')
plt.xlabel('Study Hours')
plt.ylabel('Marks')
plt.title('Decision Tree Classification Example')
plt.colorbar(label='Passed (1) / Failed (0)')
plt.show()

# Step 3: Split the data into features (X) and target (y)
X = df[['Study_Hours', 'Marks']] # Features (independent
variables)
y = df['Passed'] # Target (dependent variable)

# Step 4: Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.3, random_state=42)

# Step 5: Initialize and train the Decision Tree Classifier model

model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)

# Step 6: Make predictions using the trained model

y_pred = model.predict(X_test)

# Step 7: Evaluate the model performance

accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

:: 14 ::
# Output the evaluation results
print(f'Accuracy: {accuracy}')
print(f'Confusion Matrix:\n{conf_matrix}')
print(f'Classification Report:\n{class_report}')

# Step 8: Visualize the Decision Tree

plt.figure(figsize=(12, 8))
plot_tree(model, filled=True, feature_names=['Study_Hours',
'Marks'], class_names=['Fail', 'Pass'], rounded=True,
proportion=True)
plt.title('Decision Tree Classifier')
plt.show()

Output:

Week 7
:: 15 ::
7 Implement random forest classifier

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix,
classification_report

# Step 1: Create a simple dataset for classification

# Convert the data into a pandas DataFrame

df = pd.DataFrame(data)

# Step 2: Visualize the data

plt.scatter(df['Study_Hours'], df['Marks'], c=df['Passed'],
cmap='coolwarm', marker='o')
plt.xlabel('Study Hours')
plt.ylabel('Marks')
plt.title('Random Forest Classification Example')
plt.colorbar(label='Passed (1) / Failed (0)')
plt.show()

# Step 3: Split the data into features (X) and target (y)
X = df[['Study_Hours', 'Marks']] # Features (independent
variables)
y = df['Passed'] # Target (dependent variable)

# Step 4: Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.3, random_state=42)

# Step 5: Initialize and train the Random Forest Classifier model

model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Step 6: Make predictions using the trained model

y_pred = model.predict(X_test)

# Step 7: Evaluate the model performance

accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

# Output the evaluation results

:: 16 ::
print(f'Accuracy: {accuracy}')
print(f'Confusion Matrix:\n{conf_matrix}')
print(f'Classification Report:\n{class_report}')

# Step 8: Visualize the Random Forest predictions

# Plot the original data points
plt.scatter(df['Study_Hours'], df['Marks'], c=df['Passed'],
cmap='coolwarm', marker='o')

# Plot the predictions from the Random Forest model

# Here, we will use the predicted probabilities of passing (class
1)
plt.scatter(X_test['Study_Hours'], X_test['Marks'], c=y_pred,
marker='x', s=100, label='Predictions', edgecolor='black')
plt.xlabel('Study Hours')
plt.ylabel('Marks')
plt.title('Random Forest Classifier - Predictions vs Actual')
plt.legend()
plt.colorbar(label='Passed (1) / Failed (0)')
plt.show()
Output:

:: 17 ::
Week 8
8 Object segmentation using hierarchical based methods.

:: 18 ::
import numpy as np
import matplotlib.pyplot as plt
from skimage import io, color
from skimage.transform import resize
from scipy.cluster.hierarchy import linkage, fcluster
from skimage.segmentation import mark_boundaries
# Step 1: Load an Image
image = io.imread('FDP on Cybersecurity.jpg') # Example Image
URL
image_rgb = image / 255.0 # Normalize image
# Downsample the image to reduce the size (e.g., resize to 1/4 of
the original size)
downsampled_image = resize(image_rgb, (image_rgb.shape[0] // 4,
image_rgb.shape[1] // 4), mode='reflect')
# Show the original and downsampled image
plt.figure(figsize=(8, 6))
plt.subplot(1, 2, 1)
plt.imshow(image_rgb)
plt.title('Original Image')
plt.axis('off')
plt.subplot(1, 2, 2)
plt.imshow(downsampled_image)
plt.title('Downsampled Image')
plt.axis('off')
plt.show()
# Step 2: Pre-process Image (Reshape for clustering)
# Flatten the downsampled image
pixels = downsampled_image.reshape(-1, 3) # Shape:
(number_of_pixels, 3)
# Step 3: Use only a subset of pixels for clustering
# Select a subset of pixels randomly (e.g., 10,000 pixels)
subset_size = 10000
np.random.seed(42) # For reproducibility
subset_indices = np.random.choice(pixels.shape[0], subset_size,
replace=False)
subset_pixels = pixels[subset_indices]
# Step 4: Perform Hierarchical Clustering on the Subset of Pixels
Z = linkage(subset_pixels, method='ward') # 'ward' minimizes
variance within clusters
# Step 5: Assign clusters (we define a threshold to segment the
image)
num_clusters = 5 # You can change the number of clusters
clusters = fcluster(Z, num_clusters, criterion='maxclust')
# Now, we need to apply these clusters to the full downsampled
image (not just the subset)
# Step 6: Map the clustering result back to the full image (since
we're only using a subset, this step is different)
# We will create an array that holds the labels for the entire
image and set those corresponding to the subset's indices.
# Create a full array of cluster labels for the image (it will
have the same shape as the downsampled image)
cluster_labels_full = np.zeros(pixels.shape[0], dtype=int)
# Assign the clusters to the labels corresponding to the subset
indices
:: 19 ::
cluster_labels_full[subset_indices] = clusters
# Reshape the cluster labels to the shape of the downsampled
image
segmented_image =
cluster_labels_full.reshape(downsampled_image.shape[0],
downsampled_image.shape[1])
# Step 7: Visualize the Segmented Image
# Show the segmented image
plt.figure(figsize=(8, 6))
plt.imshow(segmented_image, cmap='jet') # Use 'jet' color map
for better visualization
plt.title('Hierarchical Segmentation (Clustering)')
plt.axis('off')
plt.show()
# Step 8: Mark boundaries on the downsampled image
# Use mode='thick' to mark the boundaries
boundaries = mark_boundaries(downsampled_image, segmented_image,
color=(1, 0, 0), mode='thick')
# Show image with boundaries marked
plt.figure(figsize=(8, 6))
plt.imshow(boundaries)
plt.title('Boundaries of Segments')
plt.axis('off')
plt.show()

OUTPUT:

:: 20 ::
:: 21 ::
Week 9
9 perform visualition techniques like bar,column ,line,scatter,3d cubes
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from mpl_toolkits.mplot3d import Axes3D

# Sample data for the visualizations

categories = ['A', 'B', 'C', 'D', 'E']
values = [10, 20, 15, 25, 30]
x = np.linspace(0, 10, 100) # For line and scatter plots
y = np.sin(x) # Line plot data
z = np.cos(x) # For scatter plot
z3d = np.random.rand(5) # 3D cube plot data

# ---------------------------------- Bar Chart

----------------------------------

plt.figure(figsize=(8, 6))
plt.bar(categories, values, color='skyblue')
plt.xlabel('Categories')
plt.ylabel('Values')
plt.title('Bar Chart')
plt.show()

# ------------------------------- Column Chart

--------------------------------

plt.figure(figsize=(8, 6))
plt.barh(categories, values, color='lightcoral')
plt.xlabel('Values')
plt.ylabel('Categories')
plt.title('Column Chart (Horizontal Bar)')
plt.show()

# ------------------------------- Line Chart

-----------------------------------

plt.figure(figsize=(8, 6))
plt.plot(x, y, label='sin(x)', color='blue')
plt.plot(x, z, label='cos(x)', color='green')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Line Chart')
plt.legend()
plt.show()

# ------------------------------- Scatter Plot

--------------------------------

plt.figure(figsize=(8, 6))
plt.scatter(x, y, color='red', label='sin(x)')
plt.scatter(x, z, color='purple', label='cos(x)')
plt.xlabel('X-axis')

:: 22 ::
plt.ylabel('Y-axis')
plt.title('Scatter Plot')
plt.legend()
plt.show()

# ----------------------------- 3D Cube Plot (3D Scatter)

----------------------

fig = plt.figure(figsize=(8, 6))

ax = fig.add_subplot(111, projection='3d')
# Data for 3D plot (random cubes)
x3d = np.random.rand(5)
y3d = np.random.rand(5)
z3d = np.random.rand(5)
# 3D scatter plot
ax.scatter(x3d, y3d, z3d, c='r', marker='o')
# Label axes
ax.set_xlabel('X Label')
ax.set_ylabel('Y Label')
ax.set_zlabel('Z Label')
ax.set_title('3D Cube (Scatter Plot)')
plt.show()
OUTPUT:

:: 23 ::
:: 24 ::
Week 10
10 Perform Descriptive analysis on health care data.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Create a sample healthcare dataset

data = {
'Age': np.random.randint(20, 80, 100), # Age between 20 and
80
'Gender': np.random.choice(['Male', 'Female'], 100),
'Blood Pressure': np.random.randint(90, 180, 100), # Blood
Pressure between 90 and 180
'Cholesterol': np.random.randint(100, 300, 100), #
Cholesterol between 100 and 300
'Heart Disease': np.random.choice([0, 1], 100), # 0 = No, 1
= Yes
}

# Create a DataFrame
df = pd.DataFrame(data)

# Descriptive statistics for numerical columns

print(df.describe())

# Visualize the distribution of Age and Blood Pressure

plt.figure(figsize=(10, 5))
sns.histplot(df['Age'], kde=True, color='blue', label='Age')
sns.histplot(df['Blood Pressure'], kde=True, color='green',
label='Blood Pressure')
plt.legend()
plt.title('Age and Blood Pressure Distribution')
plt.show()

# Countplot for Gender distribution

plt.figure(figsize=(6, 4))
sns.countplot(x='Gender', data=df, palette='Set1')
plt.title('Gender Distribution')
plt.show()

# Correlation heatmap for numerical variables only (Age, Blood

Pressure, Cholesterol)
numeric_columns = df.select_dtypes(include=[np.number]) # Select
only numeric columns
plt.figure(figsize=(8, 6))
sns.heatmap(numeric_columns.corr(), annot=True, cmap='coolwarm',
linewidths=0.5)
plt.title('Correlation Heatmap')
plt.show()

:: 25 ::
OUTPUT:

:: 26 ::
Week 11
11 Perform Predictive analytics on Product Sales data .
# Step 1: Import Necessary Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Step 2: Create a Mock Product Sales Dataset

data = {
'Price': [15, 20, 25, 30, 35, 40, 45, 50, 55, 60],
'Advertising': [1000, 2000, 3000, 4000, 5000, 6000, 7000,
8000, 9000, 10000],
'Season': ['Spring', 'Summer',
'Fall', 'Winter', 'Spring',
'Summer', 'Fall', 'Winter',
'Spring', 'Summer'],
'Sales': [150, 180, 160, 140, 130, 120, 110, 115, 200,
210]
}

# Convert the dictionary into a pandas DataFrame

df = pd.DataFrame(data)

:: 27 ::
# Step 3: Preprocess the Data
# Convert 'Season' to numeric values (Spring = 0, Summer
= 1, Fall = 2, Winter = 3)

df['Season'] =
df['Season'].map({'Spring': 0, 'Summer':
1, 'Fall': 2, 'Winter': 3})

# Step 4: Select Features and Target Variable

X = df[['Price', 'Advertising',
'Season']] # Features
y = df['Sales'] # Target variable (Sales)

# Step 5: Split the Data into Training and Testing Sets

X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)

# Step 6: Train the Linear Regression Model

model = LinearRegression()
model.fit(X_train, y_train)

# Step 7: Make Predictions

y_pred = model.predict(X_test)

# Step 8: Evaluate the Model

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Print Evaluation Metrics

print(f'Mean Squared Error: {mse}')
print(f'R-squared: {r2}')

# Step 9: Visualize the Results

# Plot Actual vs Predicted Sales

plt.scatter(y_test, y_pred)
plt.xlabel('Actual Sales')
plt.ylabel('Predicted Sales')
plt.title('Actual vs Predicted Sales')
plt.show()

# Step 10: Model Coefficients

print("\nModel Coefficients:")
coefficients = pd.DataFrame(model.coef_, X.columns,
columns=['Coefficient'])
print(coefficients)
OUTPUT:
Mean Squared Error: 1593.2661115916937
R-squared: -14.932661115916938

OUTPUT:

:: 28 ::
Model Coefficients:
Coefficient
Price 0.000014
Advertising 0.002721
Season -8.382353

Week 12
12 Apply Predictive analytics for Weather forecasting.
# Step 1: Import Necessary Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Step 2: Create a Mock Weather Dataset
data = {
'Temperature': [30, 32, 33, 31, 29, 28, 25, 27, 30, 31, 33,
35, 36, 37, 34],
'Humidity': [80, 75, 77, 70, 85, 88, 90, 85, 80, 78, 76, 74,
73, 72, 71],
:: 29 ::
'Wind Speed': [10, 12, 15, 11, 13, 14, 9, 10, 12, 11, 10, 9,
8, 7, 6],
'Pressure': [1010, 1012, 1011, 1010, 1011, 1013, 1012, 1010,
1011, 1012, 1013, 1014, 1015, 1016, 1017],
'Month': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3]
}
# Convert the dictionary into a pandas DataFrame
df = pd.DataFrame(data)
# Step 3: Explore the Dataset
print(df.head())
print("\nSummary Statistics:")
print(df.describe())
# Step 4: Handle Missing Values (not needed in this mock dataset)
# df.fillna(df.median(), inplace=True)
# Step 5: Select Features and Target Variable
X = df[['Humidity', 'Wind Speed', 'Pressure', 'Month']] #
Features
y = df['Temperature'] # Target variable (Temperature)
# Step 6: Split the Data into Training and Testing Sets
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)
# Step 7: Train the Linear Regression Model
model = LinearRegression()
model.fit(X_train, y_train)
# Step 8: Make Predictions
y_pred = model.predict(X_test)
# Step 9: Evaluate the Model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"\nMean Squared Error: {mse}")
print(f"R-squared: {r2}")
# Step 10: Visualize the Results
# Plot Actual vs Predicted Temperature
plt.scatter(y_test, y_pred)
plt.xlabel('Actual Temperature')
plt.ylabel('Predicted Temperature')
plt.title('Actual vs Predicted Temperature')
plt.show()
# Step 11: Model Interpretation
# Display model coefficients
coefficients = pd.DataFrame(model.coef_, X.columns,
columns=['Coefficient'])
print("\nModel Coefficients:")
print(coefficients)
OUTPUT:
Temperature Humidity Wind Speed Pressure Month
0 30 80 10 1010 1
1 32 75 12 1012 2
2 33 77 15 1011 3
3 31 70 11 1010 4
4 29 85 13 1011 5

Summary Statistics:
Temperature Humidity Wind Speed Pressure Month count 15.000000

:: 30 ::
15.000000 15.000000 15.000000 15.000000
mean 31.400000 78.266667 10.466667 1012.466667 5.600000
std 3.376389 6.284524 2.503331 2.199567 3.718679
min 25.000000 70.000000 6.000000 1010.000000 1.000000
25% 29.500000 73.500000 9.000000 1011.000000 2.500000
50% 31.000000 77.000000 10.000000 1012.000000 5.000000
75% 33.500000 82.500000 12.000000 1013.500000 8.500000
max 37.000000 90.000000 15.000000 1017.000000 12.000000

Mean Squared Error: 1.6787344343422512

R-squared: 0.6402711926409461

Model Coefficients:
Coefficient
Humidity -0.388341
Wind Speed 0.377432
Pressure 0.837989
Month -0.013751

:: 31 ::

CSC - 310 Advanced Python Programming Continuous Assessment-2 Assignment:Ca2
No ratings yet
CSC - 310 Advanced Python Programming Continuous Assessment-2 Assignment:Ca2
33 pages
Ip Practical File
No ratings yet
Ip Practical File
20 pages
Data Analytics Lab Manuals 2025-2026-1
No ratings yet
Data Analytics Lab Manuals 2025-2026-1
39 pages
AI Practical 2025
No ratings yet
AI Practical 2025
14 pages
PythonFinal (8,9,10,11) Removed
No ratings yet
PythonFinal (8,9,10,11) Removed
8 pages
Solution
No ratings yet
Solution
8 pages
DSDBAAssignment2 SUMEET
No ratings yet
DSDBAAssignment2 SUMEET
8 pages
Answer Key Practice Paper RT 1 Xii
No ratings yet
Answer Key Practice Paper RT 1 Xii
2 pages
cdp201 10 11 2023
No ratings yet
cdp201 10 11 2023
17 pages
Ds&bda 1-14
No ratings yet
Ds&bda 1-14
95 pages
Unit 5 Python
No ratings yet
Unit 5 Python
30 pages
Practical File Programs
No ratings yet
Practical File Programs
8 pages
IP Record-5
No ratings yet
IP Record-5
9 pages
Data Analytics Lab Manual
No ratings yet
Data Analytics Lab Manual
47 pages
Create A Pandas Series From A Dictionary of Values and An Ndarray
No ratings yet
Create A Pandas Series From A Dictionary of Values and An Ndarray
15 pages
Etl1 6
No ratings yet
Etl1 6
6 pages
Unit3 - Cleaning - Preparing - Data - Jupyter Notebook
No ratings yet
Unit3 - Cleaning - Preparing - Data - Jupyter Notebook
10 pages
PYTHON PROGRAMMING: Data Handling
No ratings yet
PYTHON PROGRAMMING: Data Handling
12 pages
Even Students
No ratings yet
Even Students
36 pages
Xii Ip Practical File 24-25
No ratings yet
Xii Ip Practical File 24-25
111 pages
Term 1 IP AK
No ratings yet
Term 1 IP AK
6 pages
List of Practical Ip065 Xii Session 2025 CKC Academy
No ratings yet
List of Practical Ip065 Xii Session 2025 CKC Academy
19 pages
Informatics Practices Record Class 12
No ratings yet
Informatics Practices Record Class 12
60 pages
Document (4) - 1
No ratings yet
Document (4) - 1
15 pages
Exp-12 Iaiml
No ratings yet
Exp-12 Iaiml
13 pages
Data Science Practicals - Ipynb
No ratings yet
Data Science Practicals - Ipynb
54 pages
DS Problem Statements and Codes
No ratings yet
DS Problem Statements and Codes
21 pages
Practical Record Programs - Solutions
No ratings yet
Practical Record Programs - Solutions
23 pages
I037 - Manas Patel Experiment09
No ratings yet
I037 - Manas Patel Experiment09
9 pages
ERM - James Lam
0% (3)
ERM - James Lam
15 pages
Data Cleaning
No ratings yet
Data Cleaning
13 pages
Wa0012.
No ratings yet
Wa0012.
30 pages
Xii Ip Practical List 2022-23-1
No ratings yet
Xii Ip Practical List 2022-23-1
23 pages
Answers Practical File
No ratings yet
Answers Practical File
19 pages
Creation of Series Using List, Dictionary & Ndarray
No ratings yet
Creation of Series Using List, Dictionary & Ndarray
65 pages
Da Program Upto 6
No ratings yet
Da Program Upto 6
20 pages
DA Lab
No ratings yet
DA Lab
27 pages
ML Lab Manual Final
No ratings yet
ML Lab Manual Final
36 pages
Data Cleaning in Python
No ratings yet
Data Cleaning in Python
6 pages
Lab Record IP
No ratings yet
Lab Record IP
13 pages
Code Explanation For Date Types
No ratings yet
Code Explanation For Date Types
8 pages
Ip Practical File
No ratings yet
Ip Practical File
18 pages
Exp 3
No ratings yet
Exp 3
10 pages
Overview of Data Cleaning
No ratings yet
Overview of Data Cleaning
17 pages
Codes
No ratings yet
Codes
44 pages
Davp Pyq 2023 Solution
No ratings yet
Davp Pyq 2023 Solution
15 pages
List of Practical Ip065 Xii Session 2025 CKC Academy
No ratings yet
List of Practical Ip065 Xii Session 2025 CKC Academy
19 pages
Data Science Practical Book - Ipynb
No ratings yet
Data Science Practical Book - Ipynb
21 pages
Lab2.2 Kritika
No ratings yet
Lab2.2 Kritika
10 pages
DataAnalytics Lab Manual
No ratings yet
DataAnalytics Lab Manual
35 pages
Ai Tools and Applications-Lab
No ratings yet
Ai Tools and Applications-Lab
33 pages
IP Practic MINE
No ratings yet
IP Practic MINE
30 pages
Pre Conference Workshop Brochure ICACECS 2025
No ratings yet
Pre Conference Workshop Brochure ICACECS 2025
2 pages
MCQ On Dataframe
No ratings yet
MCQ On Dataframe
11 pages
Journal 12
No ratings yet
Journal 12
54 pages
(Ebook PDF) Accounting Information Systems, Global Edition 15th Edition Instant Download
100% (1)
(Ebook PDF) Accounting Information Systems, Global Edition 15th Edition Instant Download
44 pages
12 Pandas
100% (1)
12 Pandas
21 pages
Python ClassXII AI
No ratings yet
Python ClassXII AI
4 pages
Dataframe in Pandas
No ratings yet
Dataframe in Pandas
23 pages
ML LAB Manual-1
No ratings yet
ML LAB Manual-1
33 pages
It Reviewer Midterm Potah
No ratings yet
It Reviewer Midterm Potah
10 pages
SapientNitro - Insights, Issue 4
No ratings yet
SapientNitro - Insights, Issue 4
73 pages
Group A Assignment No2 Writeup
No ratings yet
Group A Assignment No2 Writeup
9 pages
Practical File Questions With Answers
No ratings yet
Practical File Questions With Answers
7 pages
One Data Strategy To Rule Them All
100% (1)
One Data Strategy To Rule Them All
28 pages
Data For Business Analytics
No ratings yet
Data For Business Analytics
31 pages
Data - Science - FBSI - (Banking, Financial Services, and Insurance)
No ratings yet
Data - Science - FBSI - (Banking, Financial Services, and Insurance)
19 pages
Muchandigona, A. K., Kalema, B. M. (2023) - Modernizing Organizations
No ratings yet
Muchandigona, A. K., Kalema, B. M. (2023) - Modernizing Organizations
12 pages
Data Management in Governance
No ratings yet
Data Management in Governance
60 pages
Journal of Retailing and Consumer Services: Omara. Alghamdi, Gomaa Agag
No ratings yet
Journal of Retailing and Consumer Services: Omara. Alghamdi, Gomaa Agag
15 pages
Reviewing The Differences Between Learning Analytics and Educational Data 2023
No ratings yet
Reviewing The Differences Between Learning Analytics and Educational Data 2023
10 pages
Football Analytics: Now and Beyond: A Deep Dive Into The Current State of Advanced Data Analytics
No ratings yet
Football Analytics: Now and Beyond: A Deep Dive Into The Current State of Advanced Data Analytics
25 pages
2-Data Analytics Lifecycle
No ratings yet
2-Data Analytics Lifecycle
17 pages
Tableau Handbook v2021.01.06
No ratings yet
Tableau Handbook v2021.01.06
61 pages
Forecasting Nike's Sales Using Facebook Data: December 2016
No ratings yet
Forecasting Nike's Sales Using Facebook Data: December 2016
11 pages
SAC - Performance Best Practices For Planning
100% (1)
SAC - Performance Best Practices For Planning
40 pages
Migration Readiness SOW Gov
100% (1)
Migration Readiness SOW Gov
5 pages
Teaching Artificial Intelligence To Students
No ratings yet
Teaching Artificial Intelligence To Students
23 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Siriusxm Attracts and Engages A New Generation of Radio Consumers
No ratings yet
Siriusxm Attracts and Engages A New Generation of Radio Consumers
4 pages
An Overview of Predictive Analytics - MachinePulse
No ratings yet
An Overview of Predictive Analytics - MachinePulse
20 pages
KPMG Cyber Security Considerations For 2022 - Trust Through Security
No ratings yet
KPMG Cyber Security Considerations For 2022 - Trust Through Security
40 pages
Classification of Big Data Analytics
No ratings yet
Classification of Big Data Analytics
13 pages
Juniper - Mist AP12 Access Point
No ratings yet
Juniper - Mist AP12 Access Point
6 pages
Lalit Kumar Annadevara - Resume
No ratings yet
Lalit Kumar Annadevara - Resume
1 page
KPO Industry India
No ratings yet
KPO Industry India
12 pages
2-InsurTech Presentation PWC
No ratings yet
2-InsurTech Presentation PWC
25 pages
Coe PDF
No ratings yet
Coe PDF
16 pages
SAP Signavio Overview - ASUG 07-2022
No ratings yet
SAP Signavio Overview - ASUG 07-2022
76 pages
Devyn L. O'Brien: Education
No ratings yet
Devyn L. O'Brien: Education
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.