Step 1
Step 1
import pandas as pd
import numpy as np
print(data.head())
print(data.isnull().sum())
# b. Handle Outliers
sns.boxplot(data['bmi'])
plt.title("BMI Outliers")
plt.show()
# a. Statistical Analysis
print(data.describe())
# b. Visualizations
plt.scatter(data['age'], data['charges'])
plt.title("Age vs Charges")
plt.xlabel("Age")
plt.ylabel("Charges")
plt.show()
# Box plot: Charges by Smoker Status
sns.boxplot(x=data['smoker'], y=data['charges'])
plt.show()
plt.show()
male_count = sum(data['sex'] == 0)
total_count = len(data)
region_charges = data.groupby('region_northwest')['charges'].mean()
print(region_charges)
# b. ANOVA Test
data[data['region_southeast'] == 1]['charges'],
data[data['region_southwest'] == 1]['charges'])
y = data['charges']
# Train-test split
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
Below is the full Python code for the Insurance Claims Dataset Assignment, along with explanations for
each step. This code addresses all the tasks mentioned in the assignment.
Python
import pandas as pd
import numpy as np
1. Purpose: Import necessary libraries for data manipulation, visualization, statistical analysis, and
machine learning.
print(data.head())
1. Purpose: Load the dataset into a pandas DataFrame and display the first few rows to understand
its structure.
Python
print(data.isnull().sum())
data = data.dropna()
b. Handle Outliers
Python
sns.boxplot(data['bmi'])
plt.title("BMI Outliers")
plt.show()
Python
Python
a. Statistical Analysis
Python
print(data.describe())
b. Visualizations
Python
plt.scatter(data['age'], data['charges'])
plt.title("Age vs Charges")
plt.xlabel("Age")
plt.ylabel("Charges")
plt.show()
sns.boxplot(x=data['smoker'], y=data['charges'])
plt.show()
plt.show()
Python
male_count = sum(data['sex'] == 0)
total_count = len(data)
Python
Python
Python
Python
Python
Python
region_charges = data.groupby('region_northwest')['charges'].mean()
print(region_charges)
b. ANOVA Test
Python
data[data['region_southeast'] == 1]['charges'],
data[data['region_southwest'] == 1]['charges'])
Python
y = data['charges']
# Train-test split
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
Step 9: Submission