0% found this document useful (0 votes)
2 views4 pages

Experiment 8

H

Uploaded by

faisalkhan778877
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views4 pages

Experiment 8

H

Uploaded by

faisalkhan778877
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Experiment 8

Develop a program to demonstrate the working of the


decision tree algorithm. Use Breast Cancer Data set for
building the decision tree and applying this knowledge to
classify a new sample.

Introduction to Decision Trees


What is a Decision Tree?
A Decision Tree is a supervised machine learning algorithm used for classification and
regression tasks. It models decisions using a tree-like structure where:

Nodes represent decision points based on feature values.


Edges represent possible outcomes (branches).
Leaves represent the final decision or classification.

Decision trees work by recursively splitting data into subsets based on the most significant
feature, ensuring maximum information gain at each step.

Working of the Decision Tree Algorithm


1. Selecting the Best Feature for Splitting
At each step, the algorithm selects the feature that best separates the data. Common
methods for choosing the best feature include:

Gini Impurity
Gini = 1- ∑Pi2

Measures how often a randomly chosen element would be incorrectly classified.

Entropy (Information Gain)


Entropy = ∑p(X)log p(X)

Measures the uncertainty in a dataset and selects splits that maximize information gain.

Chi-Square Test
Evaluates the statistical significance of the feature split.

2. Splitting the Data


The dataset is divided into subsets based on the selected
feature. The process continues recursively until:
A stopping condition is met (e.g., pure classification, max
depth). The tree reaches a predefined depth.

3. Making Predictions
For a new sample, traverse the tree from the root to a leaf
node. The leaf node contains the predicted class label.

Advantages of Decision Trees


✔ Easy to interpret – Mimics human decision-making.
✔ Handles both numerical & categorical data.
✔ Requires little data preprocessing – No need for feature scaling.
✔ Works well with missing values.

Challenges of Decision Trees


❌ Overfitting – Deep trees may memorize noise instead of patterns.
❌ Bias towards dominant features – Features with more categories can lead to
biased splits.
❌ Instability – Small data variations can lead to different trees.

Optimizing Decision Trees


1. Pruning

Pre-Pruning: Stop the tree early using conditions (e.g., min samples per split).
Post-Pruning: Remove unnecessary branches after the tree is built.
2. Setting Tree Depth

Limiting maximum depth prevents overfitting.


3. Using Ensemble Methods

Random Forest: Combines multiple trees for better generalization.


Gradient Boosting: Sequentially improves predictions.
Applications of Decision Trees
Medical Diagnosis – Classifying diseases based on symptoms.
Fraud Detection – Identifying fraudulent transactions.
Customer Segmentation – Categorizing users based on behavior.
# Importing necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split


from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

from sklearn.tree import export_graphviz


from IPython.display import Image
import pydotplus

import warnings
warnings.filterwarnings('ignore')

data = pd.read_csv(r')
data.head()
data.shape
data.info()
data.diagnosis.unique()
data.isnull().sum()
df = data.drop(['id'], axis=1)
df['diagnosis'] = df['diagnosis'].map({'M':1, 'B':0}) # Malignant:1, Benign:0

#Model Building
X = df.drop('diagnosis', axis=1) # Drop the 'diagnosis' column (target)
y = df['diagnosis']
# Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2, random_state=42)

# Fit the decision tree model


model = DecisionTreeClassifier(criterion='entropy') #criteria = gini, entropy
model.fit(X_train, y_train)
model
y_pred = model.predict(X_test)
y_pred
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred) * 100
classification_rep = classification_report(y_test, y_pred)

# Print the results


print("Accuracy:", accuracy)
print("Classification Report:\n", classification_rep)

new = [[12.5, 19.2, 80.0, 500.0, 0.085, 0.1, 0.05, 0.02, 0.17, 0.06,
0.4, 1.0, 2.5, 40.0, 0.006, 0.02, 0.03, 0.01, 0.02, 0.003,
16.0, 25.0, 105.0, 900.0, 0.13, 0.25, 0.28, 0.12, 0.29, 0.08]]
y_pred = model.predict(new)

# Output the prediction (0 = Benign, 1 = Malignant)


if y_pred[0] == 0:
print("Prediction: Benign")
else:
print("Prediction: Malignant")

# Visualize the Decision Tree (optional)


plt.figure(figsize=(12, 8))
plot_tree(model, filled=True, feature_names=X.columns, class_names=['Benign', 'Mali
plt.show()

# Export the tree to DOT format


dot_data = export_graphviz(model, out_file=None,
feature_names=X_train.columns,
rounded=True, proportion=False,
precision=2, filled=True)

# Convert DOT data to a graph


graph = pydotplus.graph_from_dot_data(dot_data)

# Display the graph


Image(graph.create_png())
kkkkkkkkk

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy