0% found this document useful (0 votes)
5 views19 pages

Jatin Synopsis

The report details a project on 'Cancer Prediction' using machine learning algorithms to enhance early detection of cancer, which is critical for improving survival rates. It discusses existing systems, their limitations, and proposes a new system utilizing algorithms like Naive Bayes, KNN, and Decision Trees for better accuracy in predicting cancer risks. The project is submitted for the Bachelor of Technology degree in Information Technology at Guru Jambheshwar University, under the guidance of Dr. Jaswinder Singh.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views19 pages

Jatin Synopsis

The report details a project on 'Cancer Prediction' using machine learning algorithms to enhance early detection of cancer, which is critical for improving survival rates. It discusses existing systems, their limitations, and proposes a new system utilizing algorithms like Naive Bayes, KNN, and Decision Trees for better accuracy in predicting cancer risks. The project is submitted for the Bachelor of Technology degree in Information Technology at Guru Jambheshwar University, under the guidance of Dr. Jaswinder Singh.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

A

Major Project-I Report


on

“Cancer Prediction”

Submitted for the partial fulfillment of the requirement for the


award of the degree of

Bachelor of Technology
in
Information Technology

Submitted to: Submitted by:


Dr. Jaswinder Singh Jatin
Deptt. of CSE 210010140050
GJUS&T, Hisar B.Tech (IT) – 7th Sem

Department of Computer Science & Engineering


Guru Jambheshwar University of Science & Technology, Hisar
‘A+’ Grade NAAC Accredited
2021-2025
1
CANDIDATE’S DECLARATION

I, hereby declare that the project work entitled “Cancer Prediction” is an authentic work carried
out by us under the guidance of Dr. Jaswinder Singh, Department of Computer Science &
Engineering in partial fulfillment of the requirement for the award of the degree of Bachelor of
Technology in Information Technology and this has not been submitted anywhere else for any
other degree.

Date:17/01/2025 Signature

Jatin

210010140050

2
CERTIFICATE

This is to certify that Jatin (210010140050) and are students of B.Tech (IT), Department of
Computer Science & Engineering, Guru Jambheshwar University of Science & Technology,
Hisar has completed the project entitled “Cancer Prediction”.

Dr. Jaswinder Singh


Deptt. of CSE
GJUS&T, Hisar

3
Contents
“Cancer prediction”
Page No

1. Introduction 5
2. Existing System 6
3. Problems in existing system 7
4. Proposed System 9-13
5. Advantages of the proposed system 14
6. Software requirement specification document 14
7. Design of proposed system 15-17
8. Conclusions 18
9. References/Bibliography 19

4
Introduction
Cancer like lung, prostate, and colorectal cancers contribute to up to 45% of cancer deaths. So it
is very important to detect or predict before it reaches to serious stages. If cancer is predicted in
its early stages, then it helps to save lives. Statistical methods are generally used for the
classification of risks of cancer i.e. high risk or low risk. Sometimes it becomes difficult to
handle the complex interactions of high-dimensional data. Machine learning techniques can be
used to overcome these drawbacks which caused use due to the high dimensions of the data. So
in this project, I am using machine learning algorithms to predict the chances of getting cancer. I
am using algorithms like Naive Bayes, decision trees, and KNN.

Machine learning is increasingly being employed in cancer detection and diagnosis. Cancer
prediction will become quite easy in the future and we can predict it without the need of going to
the hospitals. As we can see many technologies are being used and tested in the medical field.
Machine learning is a branch of artificial intelligence that employs a variety of statistical,
probabilistic, and optimization techniques that allows computers to “learn” from past examples
and to detect hard-to-discern patterns from large, noisy, or complex data sets. This capability is
particularly well-suited to medical applications, especially those that depend on complex
proteomic and genomic measurements. As a result, machine learning is frequently used in cancer
diagnosis and detection. More recently machine learning has been applied to cancer prognosis
and prediction. This latter approach is particularly interesting as it is part of a growing trend
toward personalized, predictive medicine. In assembling this review we conducted a broad
survey of the different types of machine learning methods being used, the types of data being
integrated, and the performance of these methods in cancer prediction and prognosis. A number
of trends are noted, including a growing dependence on protein biomarkers and microarray data,
a strong bias towards applications in prostate and breast cancer, and a heavy reliance on “older”
technologies such as artificial neural networks (ANNs) instead of more recently developed or
more easily interpretable machine learning methods.

5
Existing System

Research on cancer has been widely conducted and previously studied with various methods or
algorithms to categorize it into benign and malignant groups. In the ANN algorithm, one
method called back propagation network is utilized to solve complex problems related to
identification, pattern recognition prediction, and so forth. The objective of the present study is
to investigate the level of accuracy and performance of ANN backpropagation in predicting
breast cancer. Several stages for this study are formulating the problem and collecting and
processing the Wisconsin breast cancer dataset from the Kaggle site. Designing and creating an
ANN algorithm system to classify cancer into malignant and benign, then examining the system
to perceive the prediction accuracy, and conclude it. The results of the numerical simulation
indicate that the created system of MATLAB R2016a software obtained an accuracy of 94.929%
with an error of 5.071% by a combination of training parameters with epoch 1000, learning rate
0.01, goal 0.001, and hidden layer 5.

6
Problems in existing system
As per the data provided by WHO (https://www.who.int/health-topics/cancer#tab=tab_1)
9.6 million people are estimated to have died worldwide due to cancer in 2018. Also, 3 lakh new
cancer cases diagnosed each year are among children aged 0 - 19 years. Cancer is amongst the
deadliest disease that a human can get affected with. However, the positive side to it is that if the
cancer is detected at an early stage, then about 50% of cancers can be prevented & cured.
Otherwise, it may lead to a very critical situation and may even cause death. Hence, this makes it
even more necessary to have a system or technology that can help doctors detect cancer at an
early stage where it can be treated effectively.

To solve this problem using advanced technological solutions & artificial intelligence, we
have come up with a Cancer Prediction System using the Naïve Bayes Machine Learning
algorithm. This system takes a statistical approach by employing probabilistic & optimization
techniques to draw out a result based on past datasets. This evaluation technique aims at helping
doctors & pathologists to detect cancer at an early stage where it can be prevented & cured,
thereby saving many lives.

7
Proposed System
Prediction:-
“Prediction” refers to the output of an algorithm after it has been trained on a historical dataset
and applied to new data when forecasting the likelihood of a particular outcome, such as whether
or not a customer will churn in 30 days.

Classification:
Classification is the process of finding a good model that describes the data classes or
concepts, and the purpose of classification is to predict the class of objects whose class label is
unknown. In simple terms, we can think of Classification as categorizing the incoming new
data based on our current or past assumptions that we have made and the data that we already
have with us.

Prediction vs Classification:-
Sr.No. Prediction Classification

Prediction is about predicting a Classification is about determining a


missing/unknown element(continuous (categorial) class (or label) for an
value) of a dataset element in a dataset
1.
Eg. We can think of prediction as Eg. Whereas the grouping of patients
predicting the correct treatment for a based on their medical records can be
particular disease for an individual person. considered classification.
2.
The model used to predict the unknown The model used to classify the unknown
value is called a predictor. value is called a classifier.
3.
A classifier is also constructed from a
The predictor is constructed from a training training set composed of the records of
set and its accuracy refers to how well it databases and their corresponding class
can estimate the value of new data. names
4.

Cancer-Prediction-in-Early-stages:-
Cancer like lung, prostate, and colorectal cancers contribute to up to 45% of cancer deaths. So it
is very important to detect or predict before it reaches serious stages. If cancer is predicted in its
early stages, then it helps to save lives. Statistical methods are generally used for the
classification of risks of cancer i.e. high risk or low risk. Sometimes it becomes difficult to handle
the complex interactions of high-dimensional data. Machine learning techniques can be used to
overcome these drawbacks which are caused due to the high dimensions of the data. So in this
project, I am using machine learning algorithms to predict the chances of getting cancer.

8
Objective:-
Typically based on logistic regression, such tools aim to provide an overall risk of the patient
having cancer-based on patient meta-data such as age, sex, and smoking history and nodule
characteristics such as nodule size, morphology, and growth, if a previous CT was available.

Using the prediction of cancer outcome as a model, we have tested the hypothesis that by
analyzing routinely collected digital data contained in an electronic administrative record (EAR),
using machine-learning techniques, we could enhance conventional methods in predicting
clinical outcomes.

Algorithms to be used are:-

Logistic Regression
o Logistic regression is one of the most popular Machine Learning algorithms, which
comes under the Supervised Learning technique. It is used for predicting the categorical
dependent variable using a given set of independent variables.
o Logistic regression predicts the output of a categorical dependent variable. Therefore the
outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or
False, etc. but instead of giving the exact value as 0 and 1, it gives the probabilistic values
which lie between 0 and 1.
o In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic
function, which predicts two maximum values (0 or 1).
o The curve from the logistic function indicates the likelihood of something such as
whether the cells are cancerous or not, a mouse is obese or not based on its weight, etc.
o Logistic Regression is a significant machine learning algorithm because it has the ability
to provide probabilities and classify new data using continuous and discrete datasets.
o Logistic Regression can be used to classify the observations using different types of data
and can easily determine the most effective variables used for the classification. The
below image is showing the logistic function:

9
K-Nearest Neighbor(KNN) Algorithm
o K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on
Supervised Learning technique.
o K-NN algorithm assumes the similarity between the new case/data and available cases
and put the new case into the category that is most similar to the available categories.
o K-NN algorithm stores all the available data and classifies a new data point based on the
similarity. This means when new data appears then it can be easily classified into a well
suite category by using K- NN algorithm.
o K-NN algorithm can be used for Regression as well as for Classification but mostly it is
used for the Classification problems.
o K-NN is a non-parametric algorithm, which means it does not make any assumption on
underlying data.
o It is also called a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it performs an
action on the dataset.
o KNN algorithm at the training phase just stores the dataset and when it gets new data,
then it classifies that data into a category that is much similar to the new data.
o Example: Suppose, we have an image of a creature that looks similar to cat and dog, but
we want to know either it is a cat or dog. So for this identification, we can use the KNN
algorithm, as it works on a similarity measure. Our KNN model will find the similar
features of the new data set to the cats and dogs images and based on the most similar
features it will put it in either cat or dog category.

Why do we need a K-NN Algorithm?

Suppose there are two categories, i.e., Category A and Category B, and we have a new data point
x1, so this data point will lie in which of these categories. To solve this type of problem, we need
a K-NN algorithm. With the help of K-NN, we can easily identify the category or class of a
particular dataset. Consider the below diagram:

10
Decision Tree Classification Algorithm
o Decision Tree is a Supervised learning technique that can be used for both classification
and Regression problems, but mostly it is preferred for solving Classification problems. It
is a tree-structured classifier, where internal nodes represent the features of a dataset,
branches represent the decision rules and each leaf node represents the outcome.
o In a Decision tree, there are two nodes, which are the Decision Node and Leaf
Node. Decision nodes are used to make any decision and have multiple branches, whereas
Leaf nodes are the output of those decisions and do not contain any further branches.
o The decisions or the test are performed on the basis of features of the given dataset.
o It is a graphical representation for getting all the possible solutions to a problem/decision
based on given conditions.
o It is called a decision tree because, similar to a tree, it starts with the root node, which
expands on further branches and constructs a tree-like structure.
o In order to build a tree, we use the CART algorithm, which stands for Classification and
Regression Tree algorithm.
o A decision tree simply asks a question, and based on the answer (Yes/No), it further split
the tree into subtrees.
o Below diagram explains the general structure of a decision tree:

11
Support Vector Machine Algorithm

Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms,
which is used for Classification as well as Regression problems. However, primarily, it is used
for Classification problems in Machine Learning.

The goal of the SVM algorithm is to create the best line or decision boundary that can segregate
n-dimensional space into classes so that we can easily put the new data point in the correct
category in the future. This best decision boundary is called a hyperplane.

SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme
cases are called as support vectors, and hence algorithm is termed as Support Vector Machine.
Consider the below diagram in which there are two different categories that are classified using a
decision boundary or hyperplane:

Data Set used:-


Since it is hard to collect data manually, So we will use the existing data as like that:-

12
For a complete data set go through the link:-
https://drive.google.com/file/d/1XCu6_3CV9DmElQ_j6WFD4r6uFFrK0YFb/view?usp=drivesd
k

Attributes required for prediction in data set:


The following tests are necessary for the prediction of cancer:-

Patient Id, Age, Gender, air pollution, Alcohol use, Dust Allergy, occupational Hazards,
Genetic Risk, chronic lung disease, Balanced Diet, Obesity Smoking, Passive Smoker, Chest
Pain, Coughing of Blood, Fatigue, Weight Loss, Shortness of Breath, Wheezing, Swallowing
Difficulty, Clubbing of Finger Nails, Frequent Cold, Dry Cough, Snoring Level

13
Advantages of the proposed system:-
1. User can easily get the cancer disease prediction on a single click.
2. Based on predicted results, the system will display relevant doctor details for further
communications.
3. It provides more accuracy and takes less time to produce output.

Software requirement specification document: -


Laptop or PC
• Language: Python
• Windows 7 or higher
• Jupiter Notebook
• Anaconda

14
Design of proposed system
The record has just been separated into a train set and test set. Each piece of information has just

been labeled. First we take the trainset organizer.

We will train our model with the help of histograms. The feature so extracted is stored in

a histogram. This process is done for every data in the train set. Now we will build the

model of our classifiers. The classifiers which we will take into account are Linear

Regression, Short vector machine(SVM), KNN, Decision Tree. With the help

of our histogram, we will train our model. The most important thing in this process is

to tune the parameters accordingly, such that we get the most accurate results.

Once the training is complete, we will take the test set. Now for each data variable of the test

set, we will extract the features using feature extraction techniques and then compare their

values with the values present in the histogram formed by the train set. The output is then

predicted for each test day. Now in order to calculate accuracy, we will compare the

predicted value with the labeled value. The different metrics that we will use our

the confusion matrix, accuracy score, f1 score, etc.

Cancer prediction will be carried out using following main steps :

Step 1: Data loading and preparation:

Dataset used in this project are taken from

Step 2: Data Normalization:

We need to normalize inputs in python packages, NumPy, pandas, matplot and other data mining
models,

.The goal of normalization is to change the values of numeric columns in the dataset to a
common scale, without distorting differences in

the range of values.

Step 3: Predict cancer using a machine learning algorithm.

This stage is the model-building stage. It is simple to build the model for cancer prediction using
machine learning algorithms i.e. Logistic Regression, KNN, Decision Tree, SVM.

15
System Description
The system comprises of 2 major entities with their modules as follows:

a. Admin

· Login: Admin need to authenticate using login id and pass in

order to access the system.

· Add/View Training Data: A relevant training set is to be filled

by admin for the algorithm to analyse and predict results.

· View User Details: All the registered users are displayed to the

admin.

· View Feedback: System related feedbacks are received from

the registered users.

b. User
· Register: In order to access the system, user need to register

with basic details like Name, email, contact no., age, sex, etc.

· Predict Cancer (By providing Details like Age, Gender, blood

clots in the urine, Urination visit in a Day, Chest pain, Coughing

up blood, Pain/Itching in the mouth, Memory problems

· System will accordingly view Doctor to consult.

· Give Feedback: User will provide feedback regarding the

system.

16
17
Conclusions
In conclusion, the cancer prediction machine learning project represents a
significant step forward in our ongoing battle against cancer. With continued
advancements in technology and collaborative efforts, we can harness the power of
machine learning to transform cancer detection and treatment, ultimately saving
more lives and improving patient outcomes in the future.

We use different machine learning algorithms like Knn, decision tree,SVM and logistic
regression . Then we train and test model of KNN algorithm and get pickle file which further
with help of flask library connected with user interface. And finally we are able to made an
website which predict chances of cancer in future.

18
References/Bibliography

Books:-
Machine learning for Beginners

Python Machine learning by Sebastian Raschka

Websites:-

www.w3schools.com

www.google.com

www.tutorialpoint.com

www.geeksforgeeks.com

www.kaggle.com

www.getbootstrap.com

19

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy