Experiment 8
Experiment 8
Decision trees work by recursively splitting data into subsets based on the most significant
feature, ensuring maximum information gain at each step.
Gini Impurity
Gini = 1- ∑Pi2
Measures the uncertainty in a dataset and selects splits that maximize information gain.
Chi-Square Test
Evaluates the statistical significance of the feature split.
3. Making Predictions
For a new sample, traverse the tree from the root to a leaf
node. The leaf node contains the predicted class label.
Pre-Pruning: Stop the tree early using conditions (e.g., min samples per split).
Post-Pruning: Remove unnecessary branches after the tree is built.
2. Setting Tree Depth
import warnings
warnings.filterwarnings('ignore')
data = pd.read_csv(r')
data.head()
data.shape
data.info()
data.diagnosis.unique()
data.isnull().sum()
df = data.drop(['id'], axis=1)
df['diagnosis'] = df['diagnosis'].map({'M':1, 'B':0}) # Malignant:1, Benign:0
#Model Building
X = df.drop('diagnosis', axis=1) # Drop the 'diagnosis' column (target)
y = df['diagnosis']
# Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2, random_state=42)
new = [[12.5, 19.2, 80.0, 500.0, 0.085, 0.1, 0.05, 0.02, 0.17, 0.06,
0.4, 1.0, 2.5, 40.0, 0.006, 0.02, 0.03, 0.01, 0.02, 0.003,
16.0, 25.0, 105.0, 900.0, 0.13, 0.25, 0.28, 0.12, 0.29, 0.08]]
y_pred = model.predict(new)