0% found this document useful (0 votes)

10 views27 pages

Project Report

Breast cancer is the most common health issue among women, with early detection being crucial for reducing mortality rates. This paper discusses the use of machine learning techniques, specifically comparing Logistic Regression, Random Forest, and Decision Trees, to classify breast cancer outcomes using mammogram data. The research aims to improve prediction accuracy and reduce error rates through various datasets and machine learning algorithms.

Uploaded by

smitashreemahapatra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views27 pages

Project Report

Uploaded by

smitashreemahapatra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

INTRODUCTION

Breast cancer has become the most recurrent type of health

issue among women especially for women in middle age. Early detection
of breast cancer can help women cure this disease and death rate can be
reduced [1]. In the present-day scenario, to observe breast cancer
mammograms are used and they are known be the most effective
scanning technique. In this paper the detection of cancer cells is done by
machine learning technique.

The second major cause of women's death is breast cancer (after

lung cancer). 246,660 of women's new cases of invasive breast cancer are
expected to be diagnosed in the US during 2016 and 40,450 of women’s
death is estimated. Breast cancer is a type of cancer that starts in the breast.
Cancer starts when cells begin to grow out of control. Breast cancer cells
usually form a tumour that can often be seen on an x-ray or felt as a lump.
Breast cancer can spread when the cancer cells get into the blood or lymph
system and are carried to other parts of the body. The cause of Breast
Cancer includes changes and mutations in DNA. There are many different
types of breast cancer and common ones include ductal carcinoma in situ
(DCIS) and invasive carcinoma. Others, like phyllodes tumours and
angiosarcoma are less common. There are many algorithms for classification
of breast cancer outcomes. The side effects of Breast Cancer are – Fatigue,
Headaches, Pain and numbness (peripheral neuropathy), Bone loss and
osteoporosis. There are many algorithms for classification and prediction of
breast cancer outcomes. The present paper gives a comparison between the
performance of three classifiers: Logistic Regression , Random Forest and

1
decision tree which are among the most influential data mining algorithms.
It can be medically detected early during a screening examination through
mammography or by portable cancer diagnostic tool. Cancerous breast
tissues change with the progression ofthe disease, which can be directly
linked to cancer staging. The stage of breast cancer (I–IV) describes how far
a patient’s cancer has proliferated. Statistical indicators such as tumour size,
lymph node metastasis, and distant metastasis and so on are used to
determine stages. To prevent cancer from spreading, patients have to
undergo breast cancer surgery, chemotherapy, radiotherapy and endocrine.
The goal of the research is to identify and classify Malignant and Benign
patients and intending how to parametrize our classification techniques
hence to achieve high accuracy. We are looking into many datasets and how
further Machine Learning algorithms can be used to characterize Breast
Cancer. We want to reduce the error rates with maximum accuracy. 10-fold
cross validation test which is a Machine Learning Technique is used in
JUPYTER to evaluate the data and analyse data in terms of effectiveness
and efficiency.

Machine learning is an application of artificial intelligence that

provides systems the ability to automatically learn and improve from
experience without being explicitly programmed. The basic premise of
machine learning is to build algorithms that can receive input data and
use statistical analysis to predict an output while updating outputs as new
data becomes available. The process of learning begins with observations
or data, such as examples, direct experience, or instruction, to look for
patterns in data and make better decisions in the future based on the
examples that we provide. The primary aim is to allow the computers
learn automatically without human intervention or assistance and adjust
actions accordingly.

2
1.1 MOTIVATION
Breast Cancer is the most affected disease present in women
worldwide. 246,660 of women's new cases of invasive breast cancer are
expected to be diagnosed in the U.S during 2016 and 40,450 of women’s
death is estimated. The development in Breast Cancer and its prediction
fascinated. The UCI Wisconsin Machine Learning Repository Breast
Cancer Dataset attracted as large patients with multivariate attributes were
taken as sample set.

1.2 RELATED WORK

The cause of Breast Cancer includes changes and mutations in DNA.
Cancer starts when cells begin to grow out of control. Breast cancer cells usually
form a tumour that can often be seen on an x-ray or felt as a lump. There are
many different types of breast cancer and common ones include ductal
carcinoma in situ (DCIS) and invasive carcinoma. Others, like phyllodes
tumours and angiosarcoma are less common.We have used classification
methods like Random Forest, Decision tree, Logistic regression. Prediction and
prognosis of cancer development are focused on three major domains: risk
assessment or prediction of cancer susceptibility, prediction of cancer relapse,
and prediction of cancer survival rate. The first domain comprises prediction of
the probability of developing certain cancer prior to the patient diagnostics. The
second issue is related to prediction of cancer recurrence in terms of diagnostics
and treatment, and the third case is aimed at prediction of several possible
parameters characterizing cancer development and treatment after the diagnosis

3
of the disease: survival time, life expectancy, progression, drug sensitivity, etc.
The survivability rate and the cancer relapse are dependent very much on the
medical treatment and the quality of the diagnosis. As we know that data pre-
processing is a data mining technique that used for filter data in a usable format.
Because the real- world dataset almost available in different format. It is not
available as per our requirement so it must be filtered in understandable format.
Data pre-processing is a proven method of resolving such issues. Data pre-
processing convert the dataset into usable format for pre-processing we have
used standardization method.

The following is the summary of the existing works on the given domain:

Title of the Date of Problem Software Techniques

SL-No Dataset used Accuracy
paper publication domain used used
Skin lesion
classification
from Skin lesion
ISIC Archive -KNN,ANN
1 dermoscopic 2017 as maligrant python 81.33%
dataset and SVMs
images using or bening
deep learning
techniques
python ,
Breast cancer 19th June, 2020 UCI-ML 96.50%
java(8),
detection and
Breast cancer Breast
2 prediction
dataset detection Repository for
using machine (Research (Random
xcyt breast cancer
learning Gate.net) Forest)
dataset

Lung cancer
Lung cancer U-Net ,
detection and
detection
classification TCIA dataset and Random
3 2021 and SVM 99%
using machine LIDC Forest
classifier for
learning
classfication Convolutio
algorithm
nal
Network

4
A Novel
approach to Analysis the
perform breast
2018 -Decision tree,
analysis and cancer data
4 Random forest, R
prediction on and do for
SVMs
breast cancer efficiency
dataset using prediction
R
(Research Gate)

A deep A method of
learning multi-level
model based feature Inception -V3 99.34%(inc
5th Mar, 2020
on extraction and DensNet eption -V3)
Brain tumor
5 concatenation and Python
dataset
approach for concatenati
the diagnosis on for early
of brain (IEEE Access) diagnosis of 201 99.51%
tumor brain tumor (DensNet2
02)

27th Apr, 2020 Used of SVMs and 97.90%

Potential machine
breast cancer KEGG-kyoto learning Random
drug (IEEE Explore) encyclopedia of model in (SVMs)
forest
6 prediction gens and classifying Python
using machine genomes and drug as a 99.20%
learning chEMBL method potent (Random
model breast forest)
cancer drug

Combinatio
MIAS
Automated n of various
(mammographic
breast mass 5th Apr, 2021 techniques 97.50%
image analysis
classification to classify
society) and Random
7 system using the breast Python
DDSM(digital forest and DL
deep learning mass in to
database for
in digital benign
screening
mammogram malignant
(IEEE Explore) mammography (MIAS)
and normal
96%
(DDSM)

5
Deep learning
to improve Improve
breast cancer cancer DL using the
CBIS-DDSM
8 detection on th
29 Aug, 2019 detection python CNN 97%
screening with deep
mammograph learning
y

And INbreast model

Attention - New deep

Enriched deep Integrating learning
learning 19th June, 2020 visud model using
Breast
model for saliency to attention
9 ultrasound Python 90.50%
breast tumor block and
images
segmentation
in ultrasound
images Tumor
(Elsevier) breast U-Net
cancer

COVID-19
detection 14th July, COVID-19 x-ray VGG.19
through
10 transfer COVID-19 98.40%
Detection
learning using
2020 through and And
multimodal
transfer
imaging data
(IEEE Access) learning CT scan DenseNet

Feature
selection from
colon cancer
dataset for Cancer ANN and
Colon cancer
cancer
11 2018 dataset and MATLAB 98.40%
classification
SVMs
using Artificial
Neural
Network(ANN
) Classificatio
SVMs
n using ANN

An
Automated
detection of
breast cancer CAD
diagnosis and (computer
12 prognosis 14th Apr, 2022 DDSM/323 Benign vs aided ANN, 98.83%
based on diagnostic
Machine )
learning using
ensemble of
classifier

6
(IEEE Explore) maligonant SVM and
KNN

A sustainable
IoTH based
computationa
Over come Greedy Best
lly intelligent 9th June, 2021 Lung cancer 98.80%
the rise of First Search
13 healthcare Python
lung cancer
monitoring
diseases
system for
lung cancer
risk detection
(Random
(Elsevier) dataset (GBFS)
Forest)
Diagnosing
Deep-chest , measure
multiclassficat diseases Deep
ion deep 2022 COVID-CT VGG19+(CNN) 98.05%
deep learning,
learning learning
model for model
14 diagnosing Chest X-ray
COVID-19 (Elsevier) Dataset and CT (AI) ResNet
pneumonia images
and lung
(computed
cancer chest 152V2
tomography)
diseases
95.31%

On the
Automatic 30th May, 2022 HAM1000 Raw deep MATLAB
detection and transfer
Deep transfer
classification learning in
15 learning of a 82.90%
of skin cancer classifying
CNN
using deep (Sensors) 10015 images of R2021a
transfer skin lesion
learning Dermoscopic

images

Build
models for
detecting
Hospital based
Prediction and 79.8%(De
dataset n=8066 Decision tree
factors for 19(1),1- visualising cision
16 with diagnosis Python and Random
17,2019 significant
forest
prognostic
indicators
of survival
rate
Tree),82.7
Survival of Information
%

7
breast cancer Between 1993 (Random
patient and 2016 forest)

Features
concentrati
Deep on using
learning pre-trained
model on Brain dataset model as 99.34%(In
concentratio comprised of compared Inception-v3 ception-
n approach 5 March 2020 3064 T1- to the and v3) and
17 Python
for the IEEE Access weighted current 99.51%
diagnosis of contrast image research
brain tumour of 233 method for
brain
tumour
classificatio
(IEEE n (DensNet
DensNet201
ACCESS) 201)

Lung cancer 32 instances,57 Compared

prediction characteristics with Support
from Text
2020 existing vector 98.8(SVM
18 datasets IEEE Access
Python
SVM and machines(SV )
using and one class SMOTE M)
machine attribute in its method
learning entirety

Deep
learning
3533 skin method 83.2%(CN
Detection of
lesions(benign, CNN was N),83.7%
skin cancer
used to CNN, (Resnet50
based on skin 30 May 2022
19 detect python Resnet50, ),
lesion images Sensors
malignant Inception V3 85.8%(Inc
using deep
and benign eption
learning
Malignant and using V3)
melanocytic ISIC2018
tumour) dataset

8
Algorithms
were
evaluated
Comparison
in terms of
of nomogram
ROC curve
with machine
and
learning 88.7%(De
accuracy
techniques Decision tree cision
31 May 2013 7596 tongue value and
20 for prediction springer link
python and tree) and
cancer patients the result
of overall nomogram 60.4%(no
was
survival in mogram)
compared
patients with
with
tongue
nomogram
cancer
to predict
survival of
patients

Proposing a
suitable
method
WBC that can
Analysis of dataset(699 manage 98.2%(J48
breast cancer instances and the ),99.56%
detection 11 attributes) imbalanced
27 April 2020 WEKA
21 using and Breast dataset and J48,NB,SMO
IEEE Explore 3.8.3
different cancer dataset the missing
machine (286 instances values to
learning and 10 enhance
attributes) the
classifier’s
performanc
e (SMO)
and
99.24%
(NB)
Reduce the
variability
Lung cancer
in assessing
prediction
and
using
reporting
machine 2021 3593 CAD
22 the lung SVM 98.56%
learning and IEEE Access LUNGRADS software
cancer risk
advanced
between
imaging
interpretin
techniques
g
physicians

9
Used to
Breast cancer
predict
prognosis
outcomes
using a N=318 (training Kernel-based
23 2019 in Python 96.30%
machine set) learning
individual
learning
cancer
approach
patients
Breast cancer
5 year
prediction 2020
Electronic survivabilit Logistic
24 using www.researchg WEKA 92.30%
ate.net health record y regression
Machine
prediction
learning

Patient
Breast cancer features
96.85%
prediction sorted out
2020
using Wisconsin from data KNN and
25 www.researchg WEKA
machine ate.net
breast cancer materials SVM
learning are (KNN)
approach statistically and
tested 96.85%(S
VM)
Breast cancer Decision
Naïve Bayes
prediction UC Irvine tree is the
2020 j48 decision
using machine best
26 www.researchg WEKA tree and 96.50%
machine ate.net learning predictor
bagging
learning repository on holdout
algorithm
approach sample
Requires
Breast cancer less input
prediction parameter
2020
using ,performin
27 www.researchg Cancer society WEKA ADABOOST 97.50%
machine ate.net
g well in
learning the low
approach noise
dataset
Breast cancer
prediction
2020 Getting Logistic and
using
28 www.researchg Cancer society higher MATLAB Neural 96.30%
machine ate.net accuracy network
learning
approach
Reduce the
variability
Breast cancer
in assessing
prediction Logistic
2020 and
using CAD regression
29 www.researchg BCI Dataset reporting 94.20%
machine ate.net
system and back
the lung
learning propagation
cancer risk
approach
between
interpretin

10
g
physicians

Build
models for
Breast cancer detecting 93.29%(C
C4,5 Bagging
prediction and 4,5
2020 Gene expression and
using visualising Bagging),
30 www.researchg dataset WEKA ADABOOST
machine significant 92.62%
ate.net collection Decision
learning prognostic
trees
approach indicators
of survival
rate

(ADABOO
ST)

11
2. PROPOSED METHODOLOGY

DATA DATA
PREPROCESSIN PREPARATIO

FEATURE FEATURE
PROJECTION SELECTION

FEATURE
SCALING

PREDICTION
MODEL SELECTION

Fig. (1) Phases of Machine Learning consists of seven phases, the phases are elaborated as
given below:
-
Phase 1 - Pre-Processing Data

The first phase we do is to collect the data that we are interested in collecting
for pre-processing and to apply classification and Regression methods. Data
pre-processing is a data mining technique that involves transforming raw data
into an understandable format. Real world data is often incomplete,
inconsistent, and lacking certain to contain many errors. Data pre-processing
is a proven method of resolving such issues. Data pre-processing prepares raw
data for further processing. For pre-processing we have used standardization

12
method to pre-process the UCI dataset. This step is very important because the
quality and quantity of data that you gather will directly determine how good
your predictive model can be. In this case we collect the Breast Cancer
samples which are Benign and Malignant. This will be our training data.

Phase 2 - DATA PREPRATION

Data Preparation, where we load our data into a suitable place and prepare it
for use in our machine learning training. We’ll first put all our data together,
and then randomize the ordering.

Phase 3 - FEATURES SELECTION

In machine learning and statistics, feature selection, also known as variable

selection, attribute selection, is the process of selection a subset of relevant
features for use in model construction.

Data File and Feature Selection Breast Cancer Wisconsin (Diagnostic):- Data
Set from Kaggle repository and out of 31 parameters we have selected about
8-9 parameters. Our target parameter is breast cancer diagnosis
– malignant or benign. We have used Wrapper Method for Feature Selection.
The important features found by the study are: Concave points worst, Area
worst, Area se, Texture worst, Texture mean, Smoothness worst, Smoothness
mean, Radius mean, Symmetry mean.

Attribute Information:

ID number 2) Diagnosis (M = malignant, B = benign) 3–32)

Phase 4 - Feature Projection

Feature projection is transformation of high-dimensional space data to a lower

13
dimensional space (with few attributes). Both linear and nonlinear reduction
techniques can be used in accordance with the type of relationships among the
features in the dataset.

Phase 5 - Feature Scaling

Most of the times, your dataset will contain features highly varying in
magnitudes, units and range. But since, most of the machine learning
algorithms use Euclidian distance between two data points in their
computations. We need to bring all features to the same level of magnitudes.
This can be achieved by scaling.

Phase 6 - Model Selection

Supervised learning is the method in which the machine is trained on the data
which the input and output are well labelled. The model can learn on the
training data and can process the future data to predict outcome. They are
grouped to Regression and Classification techniques. A regression problem is
when the result is a real or continuous value, such as “salary” or “weight”. A
classification problem is when the result is a category like filtering emails
spam” or “not spam”. Unsupervised Learning: Unsupervised learning is
giving away information to the machine that is neither classified nor labelled
and allowing the algorithm to analyse the given information without providing
any directions. In unsupervised learning algorithm the machine is trained from
the data which is not labelled or classified making the algorithm to work
without proper instructions. In our dataset we have the outcome variable or
Dependent variable i.e. Y having only two set of values, either M (Malign) or
B (Benign). So Classification algorithm of supervised learning is applied on
it. We have chosen three different types of classification algorithms in
Machine Learning. We can use a small linear model, which is a simple.

14
2.1 METHODS USED:
1) LOGISTICS REGRESSION
Logistic regression was introduced by statistician DR Cox in 1958
and so predates the field of machine learning. It is a supervised machine
learning technique, employed in classification jobs (for predictions based on
training data). Logistic Regression uses an equation like Linear Regression,
but the outcome of logistic regression is a categorical variable whereas it is a
value for other regression models. Binary outcomes can be predicted from the
independent variables.

The general workflow is:

1) Get a dataset
2) Train a classifier
3) Make a prediction using such classifier

2) RANDOM FOREST:
Random forest, like its name implies, consists of many individual
decision trees that operate as an ensemble. Each individual tree in the random
forest spits out a class prediction and the class with the most votes becomes our
model’s prediction.

3) DECISION TREE:
Decision Tree is a Supervised learning technique that can be used
for both classification and Regression problems, but mostly it is preferred for
solving Classification problems. It is a tree-structured classifier,
where internal nodes represent the features of a dataset, branches represent the
decision rules and each leaf node represents the outcome.

15
In a Decision tree, there are two nodes, which are the Decision
Node and Leaf Node. Decision nodes are used to make any decision and have
multiple branches, whereas Leaf nodes are the output of those decisions and
do not contain any further branches.

The decisions or the test are performed on the basis of features of the
given dataset.

16
3. PROGRAMMING USED:

THE CODE:

# importing libraries

import numpy

import matplotlib.pyplot as plt

import pandas as pd

import seaborn as sns

# reading data from the file

df=pd.read_csv("data.csv")

df.info()

# return all the columns with null values count

df.isna().sum()

# return the size of dataset

df.shape

# remove the column

df=df.dropna(axis=1)

# shape of dataset after removing the null column

df.shape

17
# describe the dataset

df.describe()

# Get the count of malignant<M> and Benign<B> cells

df['diagnosis'].value_counts()

sns.countplot(df['diagnosis'],label="count")

# label encoding(convert the value of M and B into 1 and 0)

from sklearn.preprocessing import LabelEncoder

labelencoder_Y = LabelEncoder()

df.iloc[:,1]=labelencoder_Y.fit_transform(df.iloc[:,1].values)

df.head()

sns.pairplot(df.iloc[:,1:5],hue="diagnosis")

# get the correlation

df.iloc[:,1:32].corr()

# visualize the correlation

plt.figure(figsize=(10,10))

sns.heatmap(df.iloc[:,1:10].corr(),annot=True,fmt=".0%")

# split the dataset into dependent(X) and Independent(Y) datasets

X=df.iloc[:,2:31].values

18
Y=df.iloc[:,1].values

# spliting the data into trainning and test dateset

from sklearn.model_selection import train_test_split

X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.20,rando
m_state=0)

# feature scaling

from sklearn.preprocessing import StandardScaler

X_train=StandardScaler().fit_transform(X_train)

X_test=StandardScaler().fit_transform(X_test)

# models/ Algorithms

def models(X_train,Y_train):

#logistic regression

from sklearn.linear_model import LogisticRegression

log=LogisticRegression(random_state=0)

log.fit(X_train,Y_train)

#Decision Tree

from sklearn.tree import DecisionTreeClassifier

tree=DecisionTreeClassifier(random_state=0,criterion="entropy")

19
tree.fit(X_train,Y_train)

#Random Forest

from sklearn.ensemble import RandomForestClassifier

forest=RandomForestClassifier(random_state=0,criterion="entropy
",n_estimators=10)

forest.fit(X_train,Y_train)

print('[0]logistic regression accuracy:',log.score(X_train,Y_train))

print('[1]Decision tree accuracy:',tree.score(X_train,Y_train))

print('[2]Random forest accuracy:',forest.score(X_train,Y_train))

return log,tree,forest

model=models(X_train,Y_train)

# testing the models/result

from sklearn.metrics import accuracy_score

from sklearn.metrics import classification_report

20
for i in range(len(model)):

print("Model",i)

print(classification_report(Y_test,model[i].predict(X_test)))

print('Accuracy :
',accuracy_score(Y_test,model[i].predict(X_test))

# prediction of random-forest

pred=model[2].predict(X_test)

print('Predicted values:')

print(pred)

print('Actual values:')

print(Y_test)

from joblib import dump

dump(model[2],"Cancer_prediction.joblib")

21
RESULT AND DISCUSSION OF PROPOSED
METHODOLOGY

The work was implemented on i5 processor with 2.30GHz speed, 8 GB

RAM and all experiments on the classifiers described in this paper were
conducted using libraries from machine learning environment. In
Experimental studies we have partition 70-30% for training & testing.
JUPYTER contains a collection of machine learning algorithms for data pre-
processing, classification, regression, clustering and association rules.
Machine learning techniques implemented in JUPYTER are applied to a
variety of real-world problems. The results of the data analysis are reported.
To apply our classifiers and evaluate them, we apply the 10-fold cross
validation test which is a technique used in evaluating predictive models that
split the original set into a training sample to train the model, and a test set to
evaluate it. After applying the pre-processing and preparation methods, we try
to analyse the data visually and figure out the distribution of values in terms
of effectiveness and efficiency.

We evaluate the effectiveness of all classifiers in terms of time to build the

model, correctly classified instances, incorrectly classified instances and
accuracy.

Table No. 1
Algorithms Accuracy Recall F1 Score

Logistic 0.9649122807017544 0.96 0.96

Regression

Decision 0.9385964912280702 0.94 0.94

Tree
RandomForest 0.9736842105263158 0.97 0.97

22
Fig 2 :- Comparison graphs between features where represents Malignant and blue represents
Benign.

23
4.1 CONCLUSION

Breast Cancer represents one of the diseases that makes highest number of
deaths every year. At present, only few accurate prognostic and predictive factors are
used clinically for managing the patients with breast cancer. Here, by making use of
Algorithms with Level Set approach, high accuracy can be achieved in detection of
effected cell shapes with exact marking on detected contours. The proposed system helps
to enhance the performance of mammogram retrieval by selecting optimal features.
After creating the predicted model, we can now analyse results obtained in
evaluating efficiency of our algorithms. Random forest achieved the highest accuracy of
97.36% and 96.49%, 93.85% for logistic regression and decision tree respectively.

4.2 FUTURE WORK

The analysis of the results signifies that the integration of multidimensional data along
with different classification, feature selection and dimensionality reduction techniques can
provide auspicious tools for inference in this domain. Further research in this field should be
carried out for the better performance of the classification techniques so that it can predict on
more variables. We are intending how to parametrize our classification techniques hence to
achieve high accuracy. We are looking into many datasets and how further Machine Learning
algorithms can be used to characterize Breast Cancer. We want to reduce the error rates with
maximum accuracy.

24
REFERENCES

[1] Wang, D. Zhang and Y. H. Huang “Breast Cancer Prediction Using Machine Learning” (2018), Vol. 66,
NO. 7.
[2] B. Akbugday, "Classification of Breast Cancer Data Using Machine Learning Algorithms," 2019 Medical
Technologies Congress (TIPTEKNO), Izmir, Turkey, 2019, pp. 1-4.

[3] Keles, M. Kaya, "Breast Cancer Prediction and Detection Using Data Mining Classification Algorithms:
A Comparative Study." Tehnicki Vjesnik - Technical Gazette, vol. 26, no. 1, 2019, p. 149+.
[4] V. Chaurasia and S. Pal, “Data Mining Techniques: To Predict and Resolve Breast Cancer Survivability”,
IJCSMC, Vol. 3, Issue. 1, January 2014, pg.10 – 22.
[5] ] Delen, D.; Walker, G.; Kadam, A. Predicting breast cancer survivability: A comparison of three data
mining methods. Artif. Intell. Med. 2005, 34, 113–127.
[6] R. K. Kavitha1, D. D. Rangasamy, “Breast Cancer Survivability Using Adaptive Voting Ensemble
Machine Learning Algorithm Adaboost and CART Algorithm” Volume 3, Special Issue 1, February 2014
[7] P. Sinthia, R. Devi, S. Gayathri and R. Sivasankari, “Breast Cancer detection using PCPCET and
ADEWNN”, CIEEE’ 17, p.63-65
[8] Vikas Chaurasia and S.Pal, “Using Machine Learning Algorithms for Breast Cancer Risk Prediction and
Diagnosis” (FAMS 2016) 83 ( 2016 ) 1064 – 1069
[9] N. Khuriwal, N. Mishra. “A Review on Breast Cancer Diagnosis in Mammography Images Using Deep
Learning Techniques”, (2018), Vol. 1, No. 1.
[10] Y. Khourdifi and M. Bahaj, "Feature Selection with Fast Correlation-Based Filter for Breast Cancer
Prediction and Classification Using Machine Learning Algorithms," 2018 International Symposium on
Advanced Electrical and Communication Technologies (ISAECT), Rabat, Morocco, 2018, pp. 1-6.
[11] R. M. Mohana, R. Delshi Howsalya Devi, Anita Bai, “Lung Cancer Detection using Nearest Neighbour
Classifier”, International Journal of Recent Technology and Engineering (IJRTE), Volume-8, Issue-2S11,
September 2019
[12] Ch. Shravya, K. Pravalika, Shaik Subhani, “Prediction of Breast Cancer Using Supervised Machine
Learning Techniques”, International Journal of Innovative Technology and Exploring Engineering (IJITEE),
Volume-8 Issue-6, April 2019.
[13] Haifeng Wang and Sang Won Yoon, “Breast Cancer Prediction Using Data Mining Method”, Proceedings
of the 2015 Industrial and Systems Engineering Research Conference,
[14] Abdelghani Bellaachia, Erhan Guven, “Predicting Breast Cancer Survivability Using Data Mining
Techniques”

25
[15] Juhyeon Kim, Hyunjung Shin, Breast cancer survivability prediction using labeled,
unlabeled, and pseudo-labeled patient data, Journal of the American Medical Informatics
Association, Volume 20, Issue 4, July 2013, Pages 613–618.
[16] N. Khuriwal and N. Mishra, "Breast cancer diagnosis using adaptive voting ensemble
machine learning algorithm," 2018 IEEMA Engineer Infinite Conference (eTechNxT),
New Delhi, 2018, pp. 1-5.
[17] M. Amrane, S. Oukid, I. Gagaoua and T. Ensarİ, "Breast cancer classification using
machine learning," 2018 Electric Electronics, Computer Science, Biomedical Engineerings'
Meeting (EBBT), Istanbul, 2018, pp. 1-4.
[18] M. R. Al-Hadidi, A. Alarabeyyat and M. Alhanahnah, "Breast Cancer Detection
Using K-Nearest Neighbor Machine Learning Algorithm," 2016 9th International
Conference on Developments in eSystems Engineering (DeSE), Liverpool, 2016, pp. 35-
39.
[19] Kibeom Jang, Minsoon Kim, Candace A Gilbert, Fiona Simpkins, Tan A Ince, Joyce
M Slingerland “WEGFA activates an epigenetic pathway regulating ovarian cancer
initiating cells” Embo Molecular Medicines Volume 9 Issue 3 (2017)

[20] Joseph A. Cruz and David S. Wishart “Applications of Machine Learning in

cancer prediction and
prognosis Cancer informatics” 2(3):59-77 · February 2007
[21] SA Medjahed, TA Saadi, A Benyettou “Breast cancer diagnosis by using k-nearest
neighbor with different distances and classification rules” International Journal of
Computer Applications 62 (1),

26
27

Project Report On Breast Cancer
67% (3)
Project Report On Breast Cancer
47 pages
Ankita Patra
No ratings yet
Ankita Patra
17 pages
Presentation 3
No ratings yet
Presentation 3
17 pages
BC Detect.
100% (1)
BC Detect.
38 pages
SELF: A Stacked Based Ensemble Learning Framework For Breast Cancer Classification
No ratings yet
SELF: A Stacked Based Ensemble Learning Framework For Breast Cancer Classification
16 pages
Federated Learning For Internet of Things A Comprehensive Survey
No ratings yet
Federated Learning For Internet of Things A Comprehensive Survey
37 pages
Comparative Analysis of Breast Cancer Detection Using Cutting-Edge Machine Learning Algorithms (MLAs)
No ratings yet
Comparative Analysis of Breast Cancer Detection Using Cutting-Edge Machine Learning Algorithms (MLAs)
15 pages
Musfequa Final Proposal
No ratings yet
Musfequa Final Proposal
15 pages
Bi-Rads Classification of Breast Cancer Momographies
No ratings yet
Bi-Rads Classification of Breast Cancer Momographies
15 pages
Comparative Study of Classification Techniques On Breast Cancer FNA Biopsy Data
No ratings yet
Comparative Study of Classification Techniques On Breast Cancer FNA Biopsy Data
8 pages
How Can Machine Learning Be Used To Classify Breast Cancer?
No ratings yet
How Can Machine Learning Be Used To Classify Breast Cancer?
6 pages
Report of Breast Cancer
No ratings yet
Report of Breast Cancer
80 pages
Breast Cancer Diagnosis Using Deep Learning Algorithm: Naresh Khuriwal DR Nidhi Mishra
No ratings yet
Breast Cancer Diagnosis Using Deep Learning Algorithm: Naresh Khuriwal DR Nidhi Mishra
6 pages
BCPUML Breast Cancer Prediction Using Machine Learning Approach—a Performance Analysis
No ratings yet
BCPUML Breast Cancer Prediction Using Machine Learning Approach—a Performance Analysis
10 pages
Breast Cancer Diagnostiic Using Machine Learning
No ratings yet
Breast Cancer Diagnostiic Using Machine Learning
72 pages
3-2 A Review of Machine Learning Techniques
No ratings yet
3-2 A Review of Machine Learning Techniques
24 pages
A Comparative Analysis and Predicting For Breast Cancer Detection Based On Data Mining Models
No ratings yet
A Comparative Analysis and Predicting For Breast Cancer Detection Based On Data Mining Models
15 pages
Machine Learning Algorithms For Breast Cancer Analysis: Performance and Accuracy Comparison
No ratings yet
Machine Learning Algorithms For Breast Cancer Analysis: Performance and Accuracy Comparison
8 pages
The Comparative Study of Deep Learning N
No ratings yet
The Comparative Study of Deep Learning N
14 pages
Final Breast Cancer
100% (1)
Final Breast Cancer
23 pages
Breast Cancer Diagnosis
No ratings yet
Breast Cancer Diagnosis
31 pages
T212033 - Prachi Ratilal Patil
No ratings yet
T212033 - Prachi Ratilal Patil
28 pages
Breast Cancer Prediction Model Assignment
No ratings yet
Breast Cancer Prediction Model Assignment
37 pages
A Novel SVM Kernel Classifier Technique Using Supp
No ratings yet
A Novel SVM Kernel Classifier Technique Using Supp
19 pages
Surds and Indices Questions Specially For Sbi Po Prelims
No ratings yet
Surds and Indices Questions Specially For Sbi Po Prelims
14 pages
(IJCST-V11I3P3) :DR M Narendra, A Nandini, T Kamal Raj, V Sai Sowmya, CH Brahma Reddy
No ratings yet
(IJCST-V11I3P3) :DR M Narendra, A Nandini, T Kamal Raj, V Sai Sowmya, CH Brahma Reddy
3 pages
Breast Cancer Modeling and Prediction Combining
No ratings yet
Breast Cancer Modeling and Prediction Combining
6 pages
Mini_Project_Report[1].final
No ratings yet
Mini_Project_Report[1].final
23 pages
Research Proposal UK
No ratings yet
Research Proposal UK
13 pages
Breast Cancer Prediction a Comparative S-1
No ratings yet
Breast Cancer Prediction a Comparative S-1
14 pages
12
No ratings yet
12
17 pages
Utilizing Cutting-Edge Machine Learning Methods fo_241221_101813 paper
No ratings yet
Utilizing Cutting-Edge Machine Learning Methods fo_241221_101813 paper
7 pages
Article Review
No ratings yet
Article Review
6 pages
Abstract Cancer
No ratings yet
Abstract Cancer
1 page
Breast Cancer Classification Model Using Principal Component Analysis and Deep Neural Network
No ratings yet
Breast Cancer Classification Model Using Principal Component Analysis and Deep Neural Network
13 pages
Exploring_Machine_Learning_Classifiers_f
No ratings yet
Exploring_Machine_Learning_Classifiers_f
21 pages
SURVEY OF BREAST CANCER USING MACHINE LEARNING
No ratings yet
SURVEY OF BREAST CANCER USING MACHINE LEARNING
8 pages
A Review Paper On Breast Cancer Detection Using Deep Learning
No ratings yet
A Review Paper On Breast Cancer Detection Using Deep Learning
10 pages
Breast Cancer Prediction Using Gated Attentive Multimodal Deep Learning
No ratings yet
Breast Cancer Prediction Using Gated Attentive Multimodal Deep Learning
11 pages
Classification_of_Breast_Cancer_using_a_Novel_Neural_Network-based_Architecture
No ratings yet
Classification_of_Breast_Cancer_using_a_Novel_Neural_Network-based_Architecture
6 pages
BreastCancerDetectionusingArtificialNeuralNetworks (1)
No ratings yet
BreastCancerDetectionusingArtificialNeuralNetworks (1)
9 pages
Machine Learning Models For Breast Cancer Classifi
No ratings yet
Machine Learning Models For Breast Cancer Classifi
13 pages
1599311465islam2020 Article BreastCancerPredictionACompara
No ratings yet
1599311465islam2020 Article BreastCancerPredictionACompara
14 pages
A_Deep-Learning-Based_Novel_Method_to_Classify_Breast_Cancer
No ratings yet
A_Deep-Learning-Based_Novel_Method_to_Classify_Breast_Cancer
6 pages
Breast Cancer Prediction Using Machine Learning: Article
No ratings yet
Breast Cancer Prediction Using Machine Learning: Article
13 pages
Survey on Supervised Machine Learning in the Diagnosis and Detection of Breast Cancer STA
No ratings yet
Survey on Supervised Machine Learning in the Diagnosis and Detection of Breast Cancer STA
9 pages
Breast Cancer Detection Based On Thermographic Images Using Machine Learning and Deep Learning Algorithms
No ratings yet
Breast Cancer Detection Based On Thermographic Images Using Machine Learning and Deep Learning Algorithms
9 pages
Zeroth Review Minor P
No ratings yet
Zeroth Review Minor P
11 pages
Research Paper Diagnosis
No ratings yet
Research Paper Diagnosis
10 pages
Grdjev06i010003 PDF
No ratings yet
Grdjev06i010003 PDF
4 pages
Prediction of Breast Cancer Using Supervised Machine Learning Techniques
No ratings yet
Prediction of Breast Cancer Using Supervised Machine Learning Techniques
5 pages
Research Paper Final
No ratings yet
Research Paper Final
11 pages
Journal-Breast Cancer Prediction
No ratings yet
Journal-Breast Cancer Prediction
10 pages
Malignant and Benign Breast Cancer Classification Using Machine Learning Algorithms
No ratings yet
Malignant and Benign Breast Cancer Classification Using Machine Learning Algorithms
5 pages
Yuuy
No ratings yet
Yuuy
5 pages
2019-05 Machine Learning Techniques For Detecting and Predicting Breast Cancer
No ratings yet
2019-05 Machine Learning Techniques For Detecting and Predicting Breast Cancer
5 pages
Yousefi Arzyabiamalkard12
No ratings yet
Yousefi Arzyabiamalkard12
5 pages
Neural Network
No ratings yet
Neural Network
15 pages
09_Chapter 05 (1)
No ratings yet
09_Chapter 05 (1)
68 pages
C# Practical Solution
No ratings yet
C# Practical Solution
61 pages
Exercises On Complex Variables PDF
No ratings yet
Exercises On Complex Variables PDF
101 pages
Python Ch-4_Notes
No ratings yet
Python Ch-4_Notes
15 pages
Part 02 Question (9-20)
No ratings yet
Part 02 Question (9-20)
12 pages
Feature Selection For Breast Cancer Detection Using Machine Learning Algorithms
No ratings yet
Feature Selection For Breast Cancer Detection Using Machine Learning Algorithms
4 pages
WWW Mypractically Xyz 2021 12 Msbte Basic Science Chemistry Solved HTML
No ratings yet
WWW Mypractically Xyz 2021 12 Msbte Basic Science Chemistry Solved HTML
46 pages
Breast Cancer Prediction Using Machine Learning
No ratings yet
Breast Cancer Prediction Using Machine Learning
8 pages
Breast Cancer Classification Using Machine Learning
No ratings yet
Breast Cancer Classification Using Machine Learning
9 pages
Bolted Truss Column Connections - xlsx-29!14!2020
No ratings yet
Bolted Truss Column Connections - xlsx-29!14!2020
34 pages
LPS Maintenance & Testing
No ratings yet
LPS Maintenance & Testing
6 pages
DE GUZMAN, ISAIAH Q._MMEM
No ratings yet
DE GUZMAN, ISAIAH Q._MMEM
19 pages
Accrolube Spec
No ratings yet
Accrolube Spec
3 pages
Water Hammer in HRSG
No ratings yet
Water Hammer in HRSG
11 pages
8051 Programming PDF
No ratings yet
8051 Programming PDF
74 pages
Thesis Template University of Waterloo
100% (3)
Thesis Template University of Waterloo
5 pages
Microbial Growth: Paper Praktikum Mikrobiologi
No ratings yet
Microbial Growth: Paper Praktikum Mikrobiologi
16 pages
Ground Shaking
100% (1)
Ground Shaking
20 pages
Chapter 11 Thread Fastener
100% (1)
Chapter 11 Thread Fastener
54 pages
Analysis of The Determinants of Universities Efficiency in Turkey Application of The Data Envelopment Analysis and Panel Tobit Model
No ratings yet
Analysis of The Determinants of Universities Efficiency in Turkey Application of The Data Envelopment Analysis and Panel Tobit Model
6 pages
Automatic Water Level Controller
No ratings yet
Automatic Water Level Controller
14 pages
CT Relays (Act) : Features Typical Applications
No ratings yet
CT Relays (Act) : Features Typical Applications
5 pages
Topic 3 Research Process
No ratings yet
Topic 3 Research Process
3 pages
Sample Mathematics Questions For Fairy Bee 12 13 Years Old
No ratings yet
Sample Mathematics Questions For Fairy Bee 12 13 Years Old
4 pages
ME Math 5 Q1 0202 SG
No ratings yet
ME Math 5 Q1 0202 SG
11 pages
6.5.1.2 Packet Tracer - Layer 2 Security
No ratings yet
6.5.1.2 Packet Tracer - Layer 2 Security
6 pages
How To Configure Portal URL Alias in SAP NetWeaver Portal 7.01
No ratings yet
How To Configure Portal URL Alias in SAP NetWeaver Portal 7.01
8 pages
error-viptela
No ratings yet
error-viptela
2 pages
Against "Ostrich" Nominalism 1980
No ratings yet
Against "Ostrich" Nominalism 1980
10 pages
La Place 0
No ratings yet
La Place 0
3 pages
LL Determination From Casagrande and Fall Cone Results Example
No ratings yet
LL Determination From Casagrande and Fall Cone Results Example
3 pages
Advanced Analytics of Image Datasets in Human Health
From Everand
Advanced Analytics of Image Datasets in Human Health
Dr. Zemelak Goraga
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Project Report

Uploaded by

Project Report

Uploaded by

INTRODUCTION

Breast cancer has become the most recurrent type of health

The second major cause of women's death is breast cancer (after

Machine learning is an application of artificial intelligence that

1.2 RELATED WORK

Title of the Date of Problem Software Techniques

27th Apr, 2020 Used of SVMs and 97.90%

And INbreast model

Attention - New deep

Lung cancer 32 instances,57 Compared

Phase 2 - DATA PREPRATION

Phase 3 - FEATURES SELECTION

In machine learning and statistics, feature selection, also known as variable

ID number 2) Diagnosis (M = malignant, B = benign) 3–32)

Phase 4 - Feature Projection

Feature projection is transformation of high-dimensional space data to a lower

Phase 5 - Feature Scaling

Phase 6 - Model Selection

The general workflow is:

import matplotlib.pyplot as plt

import seaborn as sns

# reading data from the file

# return all the columns with null values count

# return the size of dataset

# remove the column

# shape of dataset after removing the null column

# Get the count of malignant<M> and Benign<B> cells

# label encoding(convert the value of M and B into 1 and 0)

from sklearn.preprocessing import LabelEncoder

# get the correlation

# visualize the correlation

# split the dataset into dependent(X) and Independent(Y) datasets

# spliting the data into trainning and test dateset

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn.linear_model import LogisticRegression

from sklearn.tree import DecisionTreeClassifier

from sklearn.ensemble import RandomForestClassifier

print('[0]logistic regression accuracy:',log.score(X_train,Y_train))

print('[1]Decision tree accuracy:',tree.score(X_train,Y_train))

print('[2]Random forest accuracy:',forest.score(X_train,Y_train))

# testing the models/result

from sklearn.metrics import accuracy_score

from sklearn.metrics import classification_report

from joblib import dump

The work was implemented on i5 processor with 2.30GHz speed, 8 GB

We evaluate the effectiveness of all classifiers in terms of time to build the

Logistic 0.9649122807017544 0.96 0.96

Decision 0.9385964912280702 0.94 0.94

4.2 FUTURE WORK

[20] Joseph A. Cruz and David S. Wishart “Applications of Machine Learning in

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.