Viyan Report
Viyan Report
By
VIYANPRABU L
(Reg. No: 22UGIT065)
MARCH 2025
2
DECLARATION
3
DECLARATION
CERTIFICATE
1
CERTIFICATE
COMPANY CERTIFICATE
3
4
ACKNOWLEDGEMENT
5
ACKNOWLEDGEMENT
I would like to express my sincere gratitude to everyone who contributed to the successful
completion of this project, “Breast Cancer Prediction Using Machine Learning Algorithms.”
invaluable guidance, encouragement, and insightful feedback helped shape this research.
Their expertise and support were instrumental in overcoming challenges throughout the
project.
I would like to acknowledge my professors, mentors, and colleagues for their constructive
A special note of appreciation goes to my family and friends for their unwavering support
Lastly, I extend my gratitude to all researchers and scholars whose work in machine learning
VIYAN PRABU L
6
ABSTRACT
7
ABSTRACT
INDEX
1 INTRODUCTION 5
1.1 Overview 5
1.2 Aim and Objectives 6
1.3 Features of the System 6
1.4 Project Description 7
2 SYSTEM ANALYSIS 8
2.1 Existing System 8
2.2 Proposed System 9
2.3 System Requirements and Specification 10
2.3.1 Hardware Specifications 11
2.3.2 Software Specifications 12
2.3.3 About the Software 13
3 SYSTEM DESIGN 19
3.1.1 Input Design 20
9
4 SYSTEM TESTING 26
4.1. Testing Methodology 26
4.1.1 Unit Testing 27
4.1.2 Black Box Testing 27
4.1.3 White Box Testing 28
4.1.4 Integration Testing 28
5 SYSTEM IMPLEMENTATION 29
5.1 Implementation Procedures 30
5.2 System Maintenance 31
5.3 Source Code 32
6 CONCLUSION AND FUTURE 43
ENHANCEMENT
6.1 Conclusion 44
6.2 Future Enhancements 45
7 BIBLIOGRAPHY 48
10
CHAPTER 1
Introduction
Breast cancer is one of the most common cancers among women and remains a
significant global health challenge. According to the World Health Organization
(WHO), breast cancer is responsible for a considerable percentage of
cancer-related deaths worldwide. Despite advances in medical imaging,
diagnostic methodologies, and treatment options, early detection remains a
crucial factor in improving survival rates and reducing mortality. Traditional
diagnostic methods such as mammography, biopsy, and ultrasound imaging play
a significant role in detecting breast cancer, but they often suffer from
limitations such as false positives, false negatives, and high costs. To address
these challenges, machine learning (ML) has emerged as a promising approach
to enhance the accuracy and efficiency of breast cancer diagnosis.
Objective
This study aims to explore the role of machine learning algorithms in predicting
breast cancer by leveraging various datasets, including medical imaging,
histopathological slides, and clinical patient records. The key objectives of this
research are as follows:
2. To Identify the Most Relevant Features for Breast Cancer Prediction
CHAPTER 2
SYSTEM ANALYSIS
EXISTING SYSTEM
Before the advent of machine learning, breast cancer diagnosis primarily depended on:
These traditional methods are effective but often suffer from subjectivity, misdiagnosis, and
delayed detection, leading to poor patient outcomes.
Despite their advantages, existing ML-based breast cancer prediction systems face several
challenges:
○ Breast cancer datasets often have more benign cases than malignant ones,
leading to biased predictions.
○ Imbalanced data can cause models to favor the majority class.
PROPOSED SYSTEM
● Improve the accuracy of breast cancer detection using optimized ML and DL models.
● Reduce false positives and false negatives to ensure reliable diagnosis.
● Enhance model interpretability for better decision-making in clinical settings.
● Address class imbalance and data scarcity issues using advanced preprocessing
techniques.
● Develop a user-friendly and cost-effective solution that can be implemented in
real-world healthcare facilities.
SYSTEM SPECIFICATION
System Specification
The Breast Cancer Prediction System should meet specific functional and non-functional
requirements to ensure smooth operation, high accuracy, and security compliance.
Functional Requirements
These are the key functions that the system must perform to enable breast cancer prediction:
Hardware Requirements
To train and deploy machine learning models efficiently, the system needs sufficient
computing power, storage, and processing speed.
Software Requirements
The software environment plays a crucial role in developing, training, and deploying machine
learning models.
A. Operating System
B. Programming Languages
● Jupyter Notebook / Google Colab – For model training and initial development
● PyCharm / VS Code – For full-scale software development
Deployment Platforms:
CHAPTER 3
SYSTEM DESIGN
29
INPUT DESIGN
Input design plays a crucial role in developing a breast cancer prediction system using
machine learning algorithms. The quality and structure of input data significantly impact the
model's accuracy and effectiveness. This document outlines the input design, including data
sources, feature selection, preprocessing steps, and input formats.
2. Data Sources
The primary dataset for breast cancer prediction can be obtained from:
● Public datasets: Wisconsin Breast Cancer Dataset (WBCD), SEER, UCI Machine
Learning Repository
A. Clinical Features
E. Patient History
Before feeding data into machine learning models, it undergoes the following preprocessing
steps:
● Feature Engineering: Generating new features from existing ones for improved
accuracy
The processed data is stored in a structured format for model training. Examples
include:
● Deep Learning Models (CNN for image processing, RNN for genetic sequences)
Feasibility Study
2. Technical Feasibility
2.1 Machine Learning Algorithms for Breast Cancer Prediction
The following ML algorithms can be used:
● Logistic Regression – Suitable for binary classification (benign vs. malignant).
● Support Vector Machine (SVM) – Effective in high-dimensional spaces.
● Random Forest – Reduces overfitting and improves accuracy.
● K-Nearest Neighbors (KNN) – Works well with small datasets.
● Neural Networks (Deep Learning, CNNs) – Used for image-based detection from
mammograms and histopathology slides.
2.2 Data Requirements
● Structured Data: Patient demographics, tumor characteristics, genetic markers.
● Unstructured Data: Mammogram images, biopsy slides, genomic sequences.
● Datasets Available: Wisconsin Breast Cancer Dataset (WBCD), SEER, UCI Machine
Learning Repository, hospital records.
2.3 System Requirements
● Hardware: High-performance GPUs for deep learning models.
● Software: Python, TensorFlow, Scikit-learn, OpenCV for image processing.
● Storage: Large-scale databases for medical records and images.
3. Operational Feasibility
● Ease of Use: The system should have a user-friendly interface for medical
professionals.
● Integration: The model must integrate with existing hospital management systems
(HMS).
● Regulatory Compliance: Must comply with HIPAA, GDPR, and other health data
protection regulations.
● Training Requirements: Medical staff may need basic ML training to interpret results
effectively.
33
4. Economic Feasibility
4.1 Cost Analysis
● Development Costs: Data collection, algorithm training, software development.
● Implementation Costs: Hardware acquisition, cloud storage, deployment.
● Maintenance Costs: Regular model updates, cybersecurity, data privacy compliance.
4.2 Cost-Benefit Analysis
● Benefits:
o Early detection can reduce treatment costs.
o Automated systems reduce dependency on expert radiologists and
pathologists.
o Faster diagnosis improves patient outcomes.
● Challenges:
o Initial investment can be high.
o Data privacy concerns may increase compliance costs.
CHAPTER 4
SYSTEM TESTING
System testing ensures that the breast cancer prediction system functions correctly, meets
performance requirements, and provides accurate predictions. This phase involves multiple
testing techniques to validate data processing, model accuracy, user interactions, and overall
system performance.
● To validate the integration of the ML model with user interfaces and databases.
Types of Testing
1 Functional Testing
● Input Validation Testing: Check if the system handles missing, invalid, or noisy data
properly.
● Feature Selection Testing: Ensure the selected input features influence model
predictions effectively.
2 Performance Testing
● Model Inference Speed: Measure the time taken to process an input and generate
predictions.
● Load Testing: Assess system behavior when processing a large volume of data.
3 Integration Testing
● Database Connectivity: Verify if patient records and medical data are correctly
retrieved and stored.
● Model Deployment Testing: Ensure smooth integration of the ML model with the
front-end and back-end.
● API Testing: Validate communication between the model, web application, and
hospital management system.
4 Security Testing
Ensures patient data is secure and the system is protected from cyber threats.
● Data Encryption Testing: Verify that patient data is securely stored and transmitted.
● Access Control Testing: Ensure that only authorized users can access sensitive data.
5 Usability Testing
● Ease of Use: Check if healthcare professionals can navigate and use the system
effectively.
● User Feedback Collection: Gather insights from doctors, radiologists, and patients.
6 Regression Testing
7 Validation Testing
● Tools Used: Python, TensorFlow, Scikit-learn, Selenium (for UI testing), JMeter (for
performance testing).
Expected Outcomes
● A secure and user-friendly system that integrates well with hospital workflows.
The testing methodology for the Breast Cancer Prediction System using machine learning
algorithms follows a structured approach to ensure the system’s accuracy, reliability, security,
39
and usability. Various testing techniques, including unit testing, black-box testing, white-box
testing, and integration testing, are used to evaluate different aspects of the system.
Objective:
Scope:
Tools Used:
Objective:
Scope:
o Test model with extreme values (e.g., very large/small tumor size).
o Provide incorrect input formats (e.g., text in numeric fields) and check error
messages.
Tools Used:
●
Objective:
● To test the internal logic and flow of the system, ensuring that all components work as
intended.
Scope:
3. Data Flow Testing: Ensuring data transitions correctly between modules.
Tools Used:
CHAPTER 5
System Implementation
The implementation of the Breast Cancer Prediction System using Machine Learning
Algorithms involves deploying the machine learning model, integrating it with the user
interface, ensuring database connectivity, and providing system maintenance. This section
outlines the procedures for implementation, system maintenance strategies, and source code
development.
● Gather breast cancer datasets from reliable sources (e.g., Wisconsin Breast Cancer
Dataset, hospital records).
● Convert categorical data into numerical format using one-hot encoding or label
encoding.
● Deploy on a cloud server (AWS, Google Cloud, Azure) or local hospital servers.
To ensure the Breast Cancer Prediction System remains accurate and efficient over time,
regular maintenance is necessary:
● Detect and mitigate model drift (where prediction accuracy decreases over time).
2. Software Updates
3. Database Management
5.3 Source Code for Breast Cancer Prediction Using Machine Learning
Algorithms
import pandas as pd
import numpy as np
import pickle
data = pd.read_csv("breast_cancer_data.csv")
# Preprocess dataset
X = data.drop(columns=["diagnosis"]) # Features
# Split dataset
model.fit(X_train, y_train)
46
# Evaluate accuracy
y_pred = model.predict(X_test)
pickle.dump(model, file)
import pickle
import numpy as np
app = Flask(__name__)
model = pickle.load(file)
@app.route("/")
def home():
@app.route("/predict", methods=["POST"])
def predict():
try:
prediction = model.predict(features)
except Exception as e:
if __name__ == "__main__":
app.run(debug=True)
<!DOCTYPE html>
<html lang="en">
<head>
<script>
method: "POST",
48
});
</script>
</head>
<body>
<button onclick="predictCancer()">Predict</button>
<h3 id="result"></h3>
</body>
</html>
OUTPUT
50
51
CHAPTER 6
6.1 Conclusion
The Breast Cancer Prediction System using Machine Learning Algorithms has been
successfully designed, implemented, and tested. The system leverages machine learning
models to analyze patient data and predict whether a tumor is benign or malignant, aiding in
early detection and timely medical intervention.
This system offers a cost-effective, AI-powered decision support tool that enhances
diagnostic accuracy, reduces workload for healthcare professionals, and improves patient
outcomes.
52
Although the system is functional, there are several ways to enhance its performance and
usability:
2. Cloud-Based Deployment
● Deploy the system on AWS, Google Cloud, or Azure to provide real-time access to
healthcare providers worldwide.
● Implement auto-scaling and load balancing for handling large volumes of patient
data.
📱 Develop an Android/iOS app for doctors and patients to access predictions conveniently.
📱 Implement voice-enabled AI assistants for patient interaction.
4. Enhanced Data Processing
📊 Incorporate Real-Time Data Streams from hospital databases, IoT health devices, and
wearable technology.
📊 Apply feature engineering techniques to improve model interpretability.
5. Explainable AI (XAI) for Trust and Transparency
🔬 Extend the system to predict other diseases like lung cancer, skin cancer, or
cardiovascular diseases using multi-modal AI models.
CHAPTER 7
BIBLIOGRAPHY
1. Dua, D., & Graff, C. (2019). UCI Machine Learning Repository: Breast Cancer
Wisconsin Dataset. University of California, Irvine. Available at:
https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)
2. Wang, J., Yang, X., Cai, H., Tan, W., Jin, C., & Li, L. (2016). Discrimination of
breast cancer with microcalcifications on mammography by deep learning. Scientific
Reports, 6, 27327. DOI:10.1038/srep27327
3. Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau, H. M., &
Thrun, S. (2017). Dermatologist-level classification of skin cancer with deep neural
networks. Nature, 542(7639), 115-118. DOI:10.1038/nature21056
4. Zhou, X., Li, C., & Rahaman, M. M. (2021). AI-based Medical Image Analysis for
Breast Cancer Screening and Diagnosis. IEEE Transactions on Medical Imaging.
DOI:10.1109/TMI.2021.3073995
5. Cheng, J., Ni, D., Chou, Y., Qin, J., Tiu, C., Chang, R., & Shen, D. (2016).
Computer-aided diagnosis with deep learning architecture: Applications to breast
lesions in US images and pulmonary nodules in CT scans. Scientific Reports, 6,
24454. DOI:10.1038/srep24454
2. Books
7. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
8. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical
Learning: Data Mining, Inference, and Prediction. Springer.
55
9. National Cancer Institute. (2022). Breast Cancer Statistics. Available at:
https://www.cancer.gov/types/breast
10.World Health Organization (WHO). (2022). Breast Cancer Fact Sheet. Available
at: https://www.who.int/news-room/fact-sheets/detail/breast-cancer
12.TensorFlow Developers. (2023). Deep Learning for Medical Imaging. Available at:
https://www.tensorflow.org/tutorials/images/medical_imaging