0% found this document useful (0 votes)
6 views4 pages

Experiment 8 ML Vtu

The document provides an introduction to Decision Trees, a supervised machine learning algorithm used for classification and regression tasks, detailing its structure, advantages, and applications. It includes a sample program demonstrating the implementation of a Decision Tree using a dataset, as well as a lab experiment using the Breast Cancer dataset to classify samples. Key concepts such as entropy and information gain are also discussed to explain how decision trees make splits in data.

Uploaded by

vikasvikki158
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views4 pages

Experiment 8 ML Vtu

The document provides an introduction to Decision Trees, a supervised machine learning algorithm used for classification and regression tasks, detailing its structure, advantages, and applications. It includes a sample program demonstrating the implementation of a Decision Tree using a dataset, as well as a lab experiment using the Breast Cancer dataset to classify samples. Key concepts such as entropy and information gain are also discussed to explain how decision trees make splits in data.

Uploaded by

vikasvikki158
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Experiment -8

Introduction to Decision Trees

What is a Decision Tree?


A Decision Tree is a supervised machine learning algorithm used for classification and
regression tasks. It models decisions using a tree-like structure
where:
Nodes represent decision points based on feature values.
Edges represent possible outcomes (branches).
Leaves represent the final decision or classification.
Decision trees work by recursively splitting data into subsets based on the most significant
feature, ensuring maximum information gain at each step

Entropy (Information Gain)


Entropy = ∑p(X)log p(X)
Measures the uncertainty in a dataset and selects splits that maximize information gain.

Information Gain
Advantages of Decision Trees
• Easy to interpret – Mimics human decision-making.
• Handles both numerical & categorical data.
• Requires little data preprocessing – No need for feature scaling.
• Works well with missing values

Applications of Decision Trees


• Medical Diagnosis – Classifying diseases based on symptoms.
• Fraud Detection – Identifying fraudulent transactions.
• Customer Segmentation – Categorizing users based on behaviour

Sample Program
import pandas as pd
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.preprocessing import LabelEncoder
import matplotlib.pyplot as plt

# Correct way to load CSV from file path


csv_path = "C:\\Users\\deepa\\OneDrive\\Desktop\\ex.csv"
df = pd.read_csv(csv_path)

print("Dataset Loaded:\n")
print(df)

# Encode categorical variables


le = LabelEncoder()
for column in df.columns:
df[column] = le.fit_transform(df[column])

# Split into features and target


X = df.drop('PlayTennis', axis=1)
y = df['PlayTennis']

# Train Decision Tree Classifier


clf = DecisionTreeClassifier()
clf.fit(X, y)

# Visualize the Decision Tree


plt.figure(figsize=(10, 6))
plot_tree(clf, filled=True, feature_names=X.columns, class_names=['No', 'Yes'])
plt.title("Decision Tree - Play Tennis")
plt.show()

Sample Data Set


Outlook Temperature Humidity Wind PlayTennis
Sunny Hot High Weak No
Sunny Hot High Strong No
Overcast Hot High Weak Yes
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No

Lab experiment

Develop a program to demonstrate the working of the decision tree algorithm. Use
Breast Cancer Data set for building the decision tree and apply this knowledge to
classify a new sample.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn import tree

# Load dataset and split into training and test sets


data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2,
random_state=42)

# Train decision tree model


clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)

# Evaluate the model


accuracy = accuracy_score(y_test, clf.predict(X_test))
print(f"Accuracy: {accuracy * 100:.2f}%")

# Predict class for the first test sample


prediction_class = "Benign" if clf.predict([X_test[0]]) == 1 else "Malignant"
print(f"Predicted Class: {prediction_class}")

# Plot the decision tree


plt.figure(figsize=(12,8))
tree.plot_tree(clf, filled=True, feature_names=data.feature_names, class_names=
data.target_names)
plt.title("Decision Tree - Breast Cancer Dataset")
plt.show()

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy