Major project - Colab
Major project - Colab
Choose Files No file chosen Upload widget is only available when the cell has been executed in the current
browser session. Please rerun this cell to enable.
Saving cardio_train.txt to cardio_train.txt
df = pd.read_csv('cardio_train.txt', sep=';')
# Dataset info
print(df.info())
# Summary statistics
print(df.describe())
# Split into training and test datasets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Accuracy: 0.7240714285714286
Confusion Matrix:
[[5360 1628]
[2235 4777]]
Classification Report:
precision recall f1-score support
0 0.71 0.77 0.74 6988
1 0.75 0.68 0.71 7012
/usr/local/lib/python3.11/dist-packages/sklearn/linear_model/_logistic.py:465: Co
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regressio
n_iter_i = _check_optimize_result(
Feature Importance
3 weight 0.230041
2 height 0.211850
4 ap_hi 0.186320
0 age 0.164083
5 ap_lo 0.096743
6 cholesterol 0.040225
1 gender 0.019445
7 gluc 0.018327
10 active 0.015191
8 smoke 0.009663
import joblib
# Example prediction
sample_prediction = loaded_model.predict(X_test[:5])
print("Sample predictions:", sample_prediction)
Sample predictions: [1 1 1 0 0]
param_grid = {
'n_estimators': [100, 200],
'max_depth': [10, 20],
'min_samples_split': [2, 5]
}
feature_importance = rf_model.feature_importances_
features = X.columns
sorted_idx = feature_importance.argsort()
plt.figure(figsize=(10, 6))
sns.barplot(x=feature_importance[sorted_idx], y=features[sorted_idx])
plt.title("Feature Importance - Random Forest")
plt.show()
predictions = rf_model.predict(X_test)
output = pd.DataFrame({'Actual': y_test, 'Predicted': predictions})
output.to_csv('predictions.csv', index=False)
print("Predictions saved.")
Predictions saved.
Ellipsis
grid_search.fit(X_train, y_train)
# Predict probabilities
y_proba = best_rf.predict_proba(X_test)[:, 1]
import joblib