Paper 06
Paper 06
Research Article
Detection of Breast Cancer Using Histopathological Image
Classification Dataset with Deep Learning Techniques
V. K. Reshma,1 Nancy Arya,2 Sayed Sayeed Ahmad ,3 Ihab Wattar,4 Sreenivas Mekala,5
Shubham Joshi ,6 and Daniel Krah 7
1
Department of Artificial Intelligence and Machine Learning, Hindustan College of Engineering and Technology, Coimbatore, India
2
Department of Computer Science and Engineering, Shree Guru Gobind Singh Tricentenary University, Gurugram, India
3
College of Engineering and Computing, Al Ghurair University, Dubai, UAE, UAE
4
Department of Electrical Engineering and Computer Science, Cleveland State University, USA, USA
5
Department of Information Technology, Sreenidhi Institute of Science and Technology, Hyderabad, Telangana, India
6
Department of Computer Engineering, SVKM'S NMIMS MPSTME Shirpur, Maharashtra 425405, India
7
Tamale Technical University, Ghana
Received 4 December 2021; Revised 2 January 2022; Accepted 7 February 2022; Published 2 March 2022
Copyright © 2022 V. K. Reshma et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Cancer is one of the top causes of mortality, and it arises when cells in the body grow abnormally, like in the case of breast cancer.
For people all around the world, it has now become a huge issue and a threat to their safety and wellbeing. Breast cancer is one of
the major causes of death among females all over the globe, and it is particularly prevalent in the United States. It is possible to
diagnose breast cancer using a variety of imaging modalities including mammography, computerized tomography (CT), magnetic
resonance imaging (MRI), ultrasound, and biopsies, among others. To analyze the picture, a histopathology study (biopsy) is often
performed, which assists in the diagnosis of breast cancer. The goal of this study is to develop improved strategies for various CAD
phases that will play a critical role in minimizing the variability gap between and among observers. It created an automatic
segmentation approach that is then followed by self-driven post-processing activities to successfully identify the Fourier
Transform based Segmentation in the CAD system to improve its performance. When compared to existing techniques, the
proposed segmentation technique has several advantages: spatial information is incorporated, there is no need to set any initial
parameters beforehand, it is independent of magnification, it automatically determines the inputs for morphological operations
to enhance segmented images so that pathologists can analyze the image with greater clarity, and it is fast. Extensive tests were
conducted to determine the most effective feature extraction techniques and to investigate how textural, morphological, and
graph characteristics impact the accuracy of categorization classification. In addition, a classification strategy for breast cancer
detection has been developed that is based on weighted feature selection and uses an upgraded version of the Genetic
Algorithm in conjunction with a Convolutional Neural Network Classifier. The practical application of the suggested improved
segmentation and classification algorithms for the CAD framework may reduce the number of incorrect diagnoses and
increase the accuracy of classification. So, it may serve as a second opinion tool for pathologists and aid in the early detection
of diseases.
Image acquisition
Pre-processing
Segmentation
Classification
tumours or cancers, which are masses or lumps of cells that effectively determine ROI in histopathological images,
have grown uncontrollably and outgrown their original appropriate pre-processing techniques. For example, digital
environment. Lung cancer, liver cancer, colorectal cancer, cameras or scanners (sensors) may collect histopathological
stomach cancer, and breast cancer are the most frequent pictures at various magnification levels, which images can
kinds of cancer [1]. then be further processed using pre-processing methods like
To research cancer development in organs, histopathology color conversion, finalization, reconstruction, and so on to
is the process of microscopic inspection and detailed evalua- provide final images.
tion of a biopsy sample performed by an expert/pathologist
to study cancer growth in the organs [2]. A histological tissue 1.2. Segmentation. It is typical to practice in the medical
slide is made before the pathologist’s microscopic inspection arena to segment patients to better determine the return
of the tissue sample. Typical histopathological specimens con- on investment [4]. The process of segmentation divides a
sist of a high number of cells and structures that are haphaz- picture into non-overlapping homogeneous parts.
ardly surrounded and dispersed by a variety of different It distinguishes the items of interest from the back-
kinds of tissues. The physical interpretation of historical pic- ground by using methods such as clustering, edge and
tures, as well as the visual observation of these images, takes region-based, threshold, region expanding, and other similar
time. It requires years of experience and expertise. To increase approaches, amongst others. Based on the features that will
the analytical and predictive capacities of histopathology pic- be retrieved, the segmentation method is chosen.
tures, the use of computer-assisted image analysis is a potential
approach. It also contributes to the efficiency of histopatholo- 1.3. Feature Extraction. Characteristics extracted by feature
gic by offering a dependable second opinion for consistent extraction methods are distinguishable features that are not
analysis, which increases their productivity. This may help to affected by incorrect adjustments to the input [5]. Following
shorten the time it takes to diagnose a problem. As a result, the picture segmentation stage, the extraction of features is
the mortality rate may be reduced, and the burden of pathol- carried out either at the tissue level or the cellular level to
ogists may be reduced. The essential phases of the CAD sys- 11 quantify differences. The most often extracted character-
tem have been covered briefly below and are shown in Figure1. istics are intensity, fractal, textural, and morphological fea-
tures, which are listed in alphabetical order. To extract
1.1. Pre-Processing. During the pre-processing step, the orig- such properties at the cellular level, it is necessary to know
inal data obtained by sensors is transformed into a structure, the specific positions of the cells ahead of time. On the other
from which the most relevant aspects connected to the hand, fractal, topological, and textural properties may be
domain are recognized for subsequent analysis [3]. The pri- retrieved and used to quantify changes at the tissue level,
mary goal of this stage is to remove background noise from as previously stated. The characteristics extracted from
the input picture to improve its overall quality. It is conceiv- breast cancer tissues using the feature extraction method
able that the findings will differ depending on a variety of may be utilized to further classify the tissues in the breast
factors, including inconsistent circumstances that may exist cancer patient’s body.
during the preparation of the tissues, picture acquisition,
and staining method, among others. These variations in pic- 1.4. Classification. Based on the existing training dataset,
ture quality have the potential to have a major influence on classification is utilized to determine to which set of catego-
the algorithms used for image segmentation, feature extrac- ries a new instance belongs [6]. It is necessary to utilize mul-
tion, and classification in the next stages of the process. To tiple classifiers to divide tissues into different groups
BioMed Research International 3
depending on the kind of breast cancer or the grade. Terri- an early or in-time diagnosis. There is a constant need for
tories and cells in a picture are classified into one of the clas- CAD systems/frameworks to reduce the burden of patholo-
sifications described above, which includes benign and gists by isolating and filtering out the observably benign
malignant tissues and cells. It is possible to classify histopa- areas, to aid in the early identification of breast cancer and
thological pictures using a variety of approaches, including the decrease in the death rate associated with the disease.
K-Nearest Neighbor (KNN), fuzzy systems, neural networks, [12].
logistic regression, and others, which may be applied to the The researchers must assess the model, approach, or
images. framework that they have established to make informed
decisions. Segmentation and classifier performance may be
1.5. Contributions and Problem Definition. Female breast evaluated using a variety of metrics, and these parameters
cancer has the highest fatality rate when compared to other can be utilised to build a proven framework. To evaluate
cancer forms, and this is especially true for young women. the system, it is important to use two datasets for training
In the year 2012, 8.2 million people died globally as a result and testing. To prevent the memorizing issue, the system
of cancer, a figure that has risen dramatically to 8.8 million must be evaluated on a separate dataset known as the test
people dying as a result of cancer in the year 2015 [7]. Breast dataset, which is different from the dataset used for training.
cancer, in particular, was responsible for 5.7 lakh fatalities True Positive (TP), True Negative (TN), False Positive (FP),
globally in the year 2015. Between 2005 and 2015, the num- and False Negative (FN) are all metrics that may be used to
ber of cancer cases climbed by 33% over the globe. forecast the efficacy of a segmentation and classification
There are 0.3 million deaths due to cancer each year in approach (FN). In this case, the term “TP” refers to the
India. The number of new cases of breast cancer reported number of people who were anticipated to be suffering from
in India in 2016 was around 1.5 lakh [8]. The growth in sickness but are suffering. The term “TN” refers to the num-
the pattern of cancer patients in India over the previous ber of people who are expected to be free of illness and who
decade has enabled researchers to predict an increase in are not suffering from sickness. The term “FP” refers to the
the number of cancer patients before the end of the decade number of people who are anticipated to be suffering from
in the year 2025. Breast cancer is now ranked as the fifth the illness but who are not suffering from it. It is the number
most lethal cancer among all forms of malignancies, of people who are projected to be healthy but who are sick
although it is the leading cause of mortality among women with the illness (FN = predicted number of people)
under the age of 50. Early identification and increased public (patients). The standard deviation of the whole data is equal
knowledge may drastically lower the death rate in a given to the standard error of the mean (SEM). The total number
community. The probability of a full recovery is increased of photos is equal to n.
when sickness is detected early on and given a favourable The paper organization is as follows: Section 2 consists
prognosis. Consequently, precise approaches are needed to of a literature survey and problem definition from existing
increase early detection and reduce the fatality rate from works, section 3 consists of the methodology of proposed
breast cancer. Despite recent advances in our knowledge of work and section 4 includes experimental analysis and sec-
the molecular biology of breast cancer, as well as the novel tion 5 comprises conclusion and future work.
innovations that have resulted, the histopathological analysis
continues to be the most widely utilized imaging modality 2. Literature Survey
for the diagnosis of breast cancer [9, 10]. The present path-
ological diagnosis is based on the subjective judgement of Wavelet features, gray level statistical characteristics, and
the pathologist. A time-consuming process that requires a multilayer feed-forward neural networks are used in con-
high level of specialty and experience from the pathologists, junction with multilayer feedforward neural networks for
it is also impacted by factors such as the pathologist’s fatigue the automated detection and categorization of clustered
and workload pressure, among other things. Recently, the micro calcification [13]. The identification of micro calcifica-
use of computerized image inspection and machine learning tion was carried out in two steps in this study. After seg-
algorithms has made it possible to perform digital tissue his- menting suspected microcalcification pixels from the
topathology on human tissue samples. In the previous original picture, the second stage entails recognising and
decade, digital pathology has evolved from the practice of categorising individual microcalcifications using wavelet
using microscopes equipped with cameras to the digital characteristics, Gray level statistical variables, and neural
scanning of complete tissue samples, which is now the stan- networks to further refine the classification. They have
dard procedure. In recent years, factors like a significant obtained a true identification rate of 90 per cent, and the
increase in available processing power, lower-cost storage output is confirmed by comparing it to the Nijmegen
devices [11], and significant advancements in image analysis database.
techniques have helped to mainstream computer-aided On the other hand, [14] has presented a technique for
design (CAD) systems into the everyday routine of pathol- the automated identification and categorization of micro cal-
ogy labs. For illness detection, diagnosis, and prognosis cification. They classified individual micro calcification
[10], these technologies have been developed to complement items based on texture characteristics, form features, and
the judgement of the human expert, that is, the pathologist. scalar area features, all of which were utilized in conjunction
The automated analysis of biomedical data provided by this with one another. The earning vector quantization proce-
upgraded CAD system may assist the pathologist in making dure is used to build the feature vector template, and then
4 BioMed Research International
the Fisher discriminate criteria are used to choose the fea- Sugandha et al. (2009) [22] in their paper. A texture and
tures from the feature vector template. Then, for the classifi- shape-based feature extraction method is used after the pic-
cation of micro calcification objects into benign, malignant, ture has been pre-processed to extract texture and shape-
or false objects, a multilayer feed-forward back propagation based characteristics. Following feature extraction, they used
neural network classifier is used to classify them. The valid- a genetic algorithm to discover the ideal collection of charac-
ity of this study is checked by the use of the DDSM database teristics that would result in the highest classification
and diagnostic digital mammogram carotenoid that were accuracy.
taken into consideration for review. It is shown by this As previously stated, Kamal et al. [23] have described the
method that the efficiency of the micro calcification detec- process for LS-SVM classifier-based breast cancer detection,
tion system may reach up to 90%. and the performance of this classifier has been tested in
terms of k-fold cross-validation, sensitivity, specificity, and
2.1. Classification Using Mammogram Images. Sheeraz Un confusion matrix. Different approaches such as the
Nazir et al. (2014) [15] suggested a technique for segmenting wavelet-based approach presented by Biking Li and Zheng
medical pictures using multifractal analysis, which they call Dong (2006) [24], the fuzzy logic-based strategy, and others
“fractal segmentation”. Monica Jenifer et al. (2014) [16] have been offered for the detection of lesions in mammo-
explore the tumor segmentation and classification technique grams [25, 26]. It has been discovered that the wavelet-
in detail. The approaches used in this study, such as picture based technique performs much better in the analysis of
enhancement, segmentation, feature extraction, and classifi- mammography pictures. A decision support system is
cation, are used to resolve the two challenges in question. designed using multi-objective genetic and neural network
Following pre-processing, a modified watershed segmenta- algorithms to classify tumors and detect the stages of cancer
tion technique is used to segment the picture, and an SVM [27]. The system also identifies the degrees of cancer. The
classifier is used to classify the segments once they have been first-order statistical features, geographical Gray level-
segmented. Using the MIAS database and photos collected dependent features, surrounding region dependent features,
from the Apollo facility, the work is put through its paces. Gray level run length feature, and Gray level difference fea-
The technology achieves a 98 per cent accuracy rate in its ture have all been taken into consideration. The creation of
output. According to Narayan et al. (2011) [17], different a CAD system enabled the categorization of mammograms
current approaches for image pre-processing are discussed, into three types: fatty, glandular, and dense tissue, using an
including several algorithms. Aside from that, they have also SVM classifier as the basis.
discussed the pros and cons of various pre-processing The statistical characteristics are extracted from the
approaches. mammography picture, and the system is evaluated to
A breast mass detection technique has been developed ensure that it is successful using the Mini-MIAS database,
by Sampa et al. (2011) [18] to identify breast mass in mam- which is available online. For the evaluation, the DDSM
mography images. Pre-processing procedures are used to database image is utilized. The elimination of noise from
bring attention to the internal anatomy of the breast by pictures is accomplished by the use of morphological
reducing background noise and objects from the image. approaches, whitewalls, and procedures known as opening
According to the results, the shape parameters of the breast by reconstruction and closure by reconstruction. The Otsu
region are extracted and utilised as inputs to an SVM classi- technique is then used to segment the images once they have
fier, which categorises the breast area as mass or non- been segmented. The mean, variance, standard deviation,
masses. According to the manufacturer, this system has 80 and entropy are the characteristics that were extracted. A
per cent sensitivity, a 0.84 false-positive rate per picture total of 100 photos were obtained from the DDSM database,
and a 0.2 false-negative rate per image, as well as an area and the CAD system generated results with 100 per cent
under the ROC curve of 0.87. accuracy and recall for both benign and malignant tumors.
Meenakshi Sundaram et al. (2014) [19] employed the They employed 584 mammography pictures from the
data mining approach to categorize mammograms as nor- DDSM and achieved an area under the receiver operating
mal, benign, or malignant based on the results of the exam- characteristic curve (AUC) of 0.97.
ination. They have retrieved six intensity histogram features: Nabha et al. (2013) [28] investigate the CAD system,
the mean, the variance, the skewness, the kurtosis, the from which they extract properties such as Hue moments,
entropy, and the energy of distribution. They have presented center moments, and Hara lick moments. The combined
a fuzzy association rule mining approach for categorization kernel-based SVM classifier is used for classification in this
that they believe would work well. A total of 300 MIAS application.
mammograms are evaluated for testing, and the system has To provide an overview of breast cancer detection and
indicated that utilizing precision and recall metrics, an aver- classification methods that are relevant to the study activity,
age accuracy of 95 per cent may be achieved. Zhang et al. many methodologies have been used. The current relevant
(2009) [20] provided a review of current advancements in work has shown that the application of new methodologies
the deployment and development of computer-aided detec- is essential to identify and categorize breast cancer tissue
tion systems for the diagnosis of breast cancer. By Ranga more efficiently and precisely [29]. Even though numerous
Yan and colleagues (2009) [21], an overview of the currently algorithms are available at every stage of the design process,
available CAD systems is provided and evaluated. A strategy new multi-resolution and multidirectional transformations
for breast tumor categorization has been described by have been proposed for mammogram image decomposition,
BioMed Research International 5
3. Methodology
This chapter describes an automated segmentation
approach, followed by self-driven post-processing processes
that are based on the segmentation results. The suggested
approach may be broken down into three basic phases: M_DC M_DC
Pre-processing: smoothening the picture, ii) Automated seg-
mentation: threshold to properly identify the area of interest,
and iii) Classification: predicting the stages of cancer are
some techniques used [31]. The flowchart for the proposed
work is shown in Figure 2.
3.1. Pre-Processing. A big number of variations may arise Figure 3: Pre-processing image.
throughout the process of photographing and slide process-
ing. Because the image is captured in a compressed format,
the brightness of the background in the resulting histopa- rate of 2 pixels per pixel. It is possible to determine the
thology image is not always consistent with the foreground. median by first sorting all the pixel values from the window
Recognition of the nuclei of malignant cells in histological into numerical order, and then replacing the pixel value
pictures is essential to accurately segment sick cells in histo- being evaluated with the pixel value in the middle (median).
pathological images [32]. To get greater overall visual sepa- For example, suppose you have a picture size (with the gray
ration between the cell nuclei (target region) and the value of the image Ip at the location of the pixel). Image Ip
surrounding area, it is required to do some preparatory pro- generates an output pixel for each pixel in the input image
cessing to smooth out the pixels and boost contrast (intercel- Ip that contains the median value in a 3 ∗ 3 neighborhood
lular matter). as defined in equation (1) surrounding the corresponding
The following procedures should be followed to convert pixel in the output image (x, y).
a histopathological image to a grayscale image: Using the n o
RGB (color) format, the histopathological breast image that H jk = median xy j , xyk + 1 ð1Þ
was acquired may be shown below. Cell nuclei and other
components might be difficult to detect when images are
shown in the RGB format. Cancer must be diagnosed by The working of the 2D median filter using a 3 ∗ 3 sam-
pathologists focusing on the identification of cell nuclei, pling window is shown in Figure 4.
which is a difficult task [33]. As a result, the RGB format For simplicity, consider Z to be a matrix of sorted 3 ∗ 3
of the histopathological picture is transformed into a gray- window pixels with 59 intensity values. The median of Z is
scale image, as seen in Figure 3. now equal to one. As a result, the value of the corresponding
To smooth the pixels in the picture, use the median filter center pixel in the output picture shown in Figure 4 will be
as follows: Median filtering is a nonlinear approach to replaced by the value of 1. Now, the median value in a 3 ∗
removing noise from pictures that may be applied to any 3 neighborhood surrounding the relevant pixel in the input
image. It is extensively used because it is very successful at picture Z is included in the output pixel matrix of the Z algo-
reducing noise while maintaining the edges of objects. It rithm. The benefit of applying a median filter is that it pro-
does this by traversing through the picture pixel by pixel, duces a more robust average that is not considerably
replacing each value with the median value of the pixels in impacted by the presence of an unrepresentative pixel in
its immediate surroundings [34]. Windows are patterns of the surrounding area. As a result, it is extensively used to
neighbouring pixels that travel across the entire image at a minimize the amount of noise in photographs.
6 BioMed Research International
MALIGNANT BENIGN
(a) (b)
Where M1,N represents the size of the variable and x, y and extracting visual qualities or information from them.
represents a continuous variable. It is possible to implement With feature extraction, the goal is to improve the overall
the discrete wavelet transform by using the Fast Wavelet performance of the classification and prediction issue.
Transform (FWT). Iterative filter banks are used to con- This is dependent on the number of levels and directions
struct multistage structures for calculating the DWT coeffi- that are employed in the decomposition of the pictures, as
cients at two or more subsequent scales in the FWT well as the number of sub-bands created. The pictures in this
algorithm. Mallet’s algorithm, often known as Mallet-tree piece are dissected utilizing multiple levels of abstraction and
decomposition, is another name for the FWT method. In diverse orientations. The decomposition levels range from 2
most cases, images are represented as 2D matrices, and their to 5, and the direction of the breakdown may be anywhere
analysis is carried out using the 2D wavelet transform. A 2D between 2 and 64. The production of shear let coefficients
separable transform is nothing more than a collection of two is the only thing that comes out of the decomposition pro-
1D transforms that have been applied in sequence. After cess as an output. Specific characteristics are derived from
obtaining it via 1D row transformation, it is applied to the the coefficients in this equation. The shear let coefficients
output of the 1D row transformation to acquire it through are used in this technique as well, and the same four first-
1D column transformation. order statistical variables, namely mean, variance, skewness,
This wavelet transform is computed on the picture by and kurtosis, are derived from the data. All of these charac-
applying a filter bank to it. While the low pass filter is repre- teristics are combined to produce feature vectors, which are
sented by f(a), the high pass filter is represented by H(a). A then utilized as one of the inputs for the classification
low pass picture LP (approximation image) and three process.
detailed images HL, LH, and HH are formed as a conse- Mean: The mean of all pixels in a picture is used to cal-
quence of processing the image rows and columns indepen- culate the average grey level of all pixels in the image. It is
dently as well as sampling each direction by a factor of two. determined with the help of Equation (11) as
Genetic
code
generation mod
el1
classificati
mod on
Initial
el2
population
mod
Fitness
Crossover el3
calculation
Dataset1
Mutation Population
Training
Dataset2
Random
Selection
selection
Dataset3
the image are shorter and thinner, and the peak is more con- ferent images. The method begins by identifying traits/key-
cave. It is determined with the help of the Equation. (13) as points that are likely to be seen in a variety of photos of a
similarly-shaped item. If at all feasible, such characteristics
should be scaled and rotated invariably. It is based on multi-
1 S D ½uD ð j, kÞ − μt 4
lD = 〠〠 −3 ð13Þ scale space theory, and the feature detector is based on Hes-
SD j=1 k=1 σt
sian matrix, which is used in the SURF technique. Because
the Hessian matrix has excellent performance and precision,
σt represents standard deviations of the image matrix. it is often used. In Figure 1, the provided point is x = (x, y),
Feature extraction is carried out on the picture which is and the Hessian matrix H(x) in x at scale.
acquired using the hierarchical template matching approach.
This step is important to obtaining the properties of the ROI. 3.4. Classification. Typically, a mammography picture
Feature extraction is one of the important steps for the diag- carries a great deal of information in a variety of formats.
nosis of malignant tumors. So here employed a new extrac- The use of such variation information increases the dimen-
tion criterion called SURF (Speeded Up Robust Features) sions and calculation of feature vectors, which, in turn,
(Speeded Up Robust Features). It is a novel scale and rota- results in a drop in the classification accuracy of the system
tion invariant detector and descriptor. By employing SURF, as a consequence. To prevent this, consistent characteristics
will obtain distinct interest spots. Feature detection is carried are taken into consideration for the categorization process,
out on the ROI as well as the discovered interest spots. This with duplicate and unnecessary information being elimi-
will help to lower the false positive rates. The feature detec- nated. Feature selection is the process of picking a limited
tor is based on a Hessian matrix. Each suspicious interest number of features from numerous features in an initial fea-
point should have a unique description that is not based ture set. It is generated by deleting the items from the initial
on the features scale and rotation called descriptors. The surf feature list that were deemed unnecessary or redundant.
description is based on Haar Wavelet Responses. Feature This chosen feature subset is provided as one of the inputs
detection and description are carried out on the integral pic- to the classifier, and the breast mammography picture is
ture. From this feature extraction approach, will acquire a classed as either normal or malignant as a result of the use
collection of descriptors. of this information. In this study, statistical texture charac-
The use of the Speeded Up Robust Features (SURF) teristics are collected from each dissected ROI and used to
method, which is employed for the majority of vision tasks create a texture map. Statistics-based texture features are
as well as for object identification [6]. SURF comes under the most important and are frequently used in medical
the categorization of highlight descriptors since it extracts image analysis applications. When determining the best fea-
key-points from various parts of a given picture, and as a ture subset from statistical texture features, GA is applied.
result, it is useful in determining similarity between two dif- GA is used in this study effort to pick the optimum
BioMed Research International 9
characteristics with the aid of the tournament selection tech- Training and validation accuracy
nique, and the size of the tournament, in this case, is two
0.85
participants. Population size, population type, and the num-
ber of generations are all given the value bit string and 0.84
accordingly in the input value assignment. When this is
done, the operations of uniform mutation and arithmetic 0.83
crossover are carried out, with the probability of mutation
and the probability of crossover is 0.10 and 0.8, respectively. 0.82
Figure 6 represent the Flowchart of proposed work.
0.81
Information gain, gain ratio and gain index are examples
of feature selection measures that result in an overfitting 0.80
issue in the data. The genetic algorithm, on the other hand,
is a naturally inspired algorithm that provides stochastic 0.79
optimization. Probabilistic transition rules, rather than
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
deterministic transition rules, are used by GA’s. The genetic
algorithm is a stochastic optimization approach, which Training acc
means that the genes of people are generally chosen at ran- Validation acc
dom when the algorithm is run. Individuals in a community
are subjected to genetic algorithms, which are designed to Figure 8: Accuracy of Proposed Work.
provide better and better approximate results. They employ
mechanisms including selection, cross-over, and mutation
to arrive at the best possible outcomes. When there are a properly categorized, the Perceptron Learning Algorithm
large number of features, genetic algorithms are generally reaches convergence. Perceptron is capable of dealing with
more effective than standard feature selection methods in outliers. In the Artificial Neural Network, a minimal number
determining subsets of variables. Comparing the perfor- of layers are employed. Deep feed-forward artificial neural
mance of the genetic algorithm with other feature selection networks are used to assess visual images. There are hun-
techniques, the genetic algorithm outperforms the competi- dreds of secret layers established on CNN. Deploying deep
tion. It is capable of handling data sets with a few features, learning is becoming more popular in three key areas of
and genetic algorithms themselves are parallelized algo- application: detection, prediction, and generation. Due to
rithms, which further accelerate the feature selection the data dimension constraint of the Artificial Neural Net-
process. work, high-level feature extraction has been performed using
Image net is responsible for implementing visual recog- a 2D CNN using the LIDC dataset. The first layer of CNN is
nition tasks. Some example CNNs are Alex net, ZFnet, a convolutional layer with filter size 20 and stride size 1,
VGGnet, GoogleNet, and MsResNet, to name a few. which is followed by a max-pooling layer with size 2 x 2
Improved performance may be achieved by the use of Recti- and stride size 1, which is followed by a convolutional layer
fied Linear Units (ReLUs) and their derivatives. McCulloch with filter size 20 and stride size 1. The data is extracted by
(a neuroscientist) and Pitts created a computer model of the use of a filter, receptive field, or kernel. Pooling functions
the neuron that was very simple. When all the inputs are are used to minimize the spatial size of the representation in
10 BioMed Research International
Technique for extracting features Output classification Accuracy Precision Recall Measure Gmean
K-nearest Neighbourhood 66:12 63:12 71:44 63:75 61:91
Naïve Bayes 78:48 74:44 73:12 73:36 76:68
SURF Discrete transform 82:11 83:52 81:13 82:72 82:11
Support vector machine 86:12 85:83 86:78 81:37 82:75
Proposed 89:13 86:23 81:47 85:38 85:17
Table 2
a progressive manner. Similarly to the first and second
layers, the third layer is similarly a convolutional layer with (1) Input Image
filter size 32 and a stride size of 1. For the most part, the size (2) Generate the scale space
of a CNN may be calculated using the formula
(3) Use non-maximal suppression to initially determine the
feature points and then accurately locate the feature points
layer n = ðm − g + 1Þ ∗ ðm − g + 2Þ ð14Þ
(4) Use the improved FT algorithm to find all salient regions in the
image
The image size is represented by the number n × n. The
(5) Calculate the proportional weights of feature points outside the
filter has a size of f × f pixels. Except for the fifth layer, which significant region
contains a filter with a size of 32, the first six layers are
(6) Extract the SURF descriptor of the selected key point
arranged alternately by convolution layer1, max-pooling
layer1, convolution layer2, max-pooling layer2, convolution
layer3, max pooling3 pattern, except for the fifth layer, denotes the filter size, P denotes padding, and S denotes
which contains a filter with a size of 16. The seventh layer stride.
is organized alternatively by convolution layer3, max pool- In this case, the input image size is 48x48 pixels, the filter
ing3 pattern, and a convolution layer3 pattern. As the sev- size is 40 pixels, the padding value is zero, and the stride is
enth layer, another convolutional layer with filter size 4 x 4 one.
x 32 is utilized, this time with filter size 4 x 4 x 32. The acti-
vation layer makes use of the ReLU (Rectified Linear Unit),
and the eighth layer is another convolutional layer with a fil- 4. Experimental Analysis
ter size of 4x4x32, which is the largest available size. When it
comes to the last layer, a software operator is utilized. There- Using photos from the large-scale Breast Cancer Histological
fore, the parameters of filters in convolutional layers must be Database (BreaKHis) dataset, which contains histological
compatible with the size of maximum pooling operators to images of breast cancer, the experimental assessment was
enable relevant computations to be performed. We found conducted out on the data. BreaKHis dataset has a total of
that, after forward propagation of the 9 layers, each input 7909 pictures. Tumours are classified into two superclasses:
picture of size 50x50x1 leads to an output image of size benign and malignant. There are four sub-classes of tumours
1x1x2, and so on for each subsequent input picture. within each superclass. Adenosis is a benign tumour that
CNN is made of convolutional layers, each of which has may develop into a tumour of any kind. Figure 7 shows
the following features that separate it from the others: input examples of photos from the dataset at a 400x magnification
‘Image I, a bank of filters K with a dimension of klXk2 and factor. Varied forms of breast cancer tumours are repre-
height ‘h’, weight “w,” and biases “b,” and biases “b,” and sented by the subclasses, each of which is known to have a
biases ‘b.” As an example of the result of this convolution different prognosis and therapy.
method, the following is shown: Matlab 2019b was used to complete the implementation
of all of the experiments. To attain the best results while
i z maintaining a balance between runtime and recognition rate
ðJ ∗ LÞy,z = 〠 〠 Ljk− jy+j−1,z+k−1 + c ð15Þ increase, the best results were obtained employing 500 key
j=1 k=1 points of the feature extraction approach per key point. A
feature extraction approach was used to extract features
The parameters filter size, stride and zero-padding are from these 500 key points. A single feature vector represent-
important in the behaviour of CNN. The size of the output ing the average of all the technique’s key points extracted the
feature map generated depends on the parameters. The for- features. To make things even more complicated, the feature
mula to find the dimensionality of feature vectors in CNN is vector was sent into each classifier for categorization. The
performance comparison of feature extraction with classifi-
ð X − G + 2 ∗ QÞ cation is conducted out. Figure 8 gives the training accuracy
+1 ð16Þ
T of the proposed work.
Figure 8 shows the accuracy of the proposed work. The
Where W is the width or height of the image size, F experimental findings were assessed in terms of five distinct
BioMed Research International 11
100
90
80
70
60
50
40
30
20
10
0
Accuracy rate Precision rate
performance metrics: accuracy, precision, recall, F-measure, Table 3 represents the proposed work compared with the
and G-mean. Accuracy, precision, recall, F-measure, and feature extraction technique. However, the performance of
G-mean were the metrics used. Each performance assess- SURF and GLCM feature extraction approaches combined
ment measure is discussed in detail in this part, as well as with KNN classification was the worst because they are more
the results of feature extraction methods and variations of susceptible to the value of K that was chosen as a feature
classification approaches depending on the results of the extraction technique. If you want improved performance,
study. SURF feature extraction approach was used in con- you must get the best value of K, which may prove to be a
junction. The average findings are shown in Table 1. The time-consuming procedure.
SURF feature extraction approach is well-known for its low Figure 9 represent the performance metric of the pro-
dimensionality and for producing a large number of interest posed work. As a result, the SURF approach was shown to
points in both texture and geometrical structures that are be more efficient than other feature extraction strategies in
contrasted. It was discovered that the proposed approach terms of obtaining useful features. The proposed classifica-
worked best when used in conjunction with SURF. Because tion technique successfully classified the SURF features and
the suggested approach integrated several machine learning achieved the highest accuracy of 84 per cent, the highest pre-
approaches into a single model, variance and biassing have cision value of 83.12 per cent.
been reduced, and classification accuracy has been increased.
Table 1 The following table compares the average results 5. Conclusion
(in per cent) of classification approaches when used in con-
junction with the SURF feature extraction methodology. Based on weighted feature selection, an improved Genetic
SURF is on par with or even better than previously sug- Algorithm with Convolutional Neural Network was devel-
gested algorithms, while also being much quicker in terms of oped and tested, and it was shown to be effective in resolving
computing to a certain degree. This may be accomplished by difficulties that occurred at each iteration of the learning
doing the integral calculation on the original picture. For process during the selection of samples. The relief approach
each pixel in an integral picture, the value of that pixel is was used to pick an aggregation of the best textural, graph,
equal to the sum of all the grey values of all the points in a and morphological characteristics. The improved proposed
rectangular area that extends from the origin to this point. classifier used these combined characteristics as input, and
Algorithmic Flow of the proposed work is represented in it performed well. The present strategies for classifying
the Table 2. breast cancer using histopathological pictures pick suitable
12 BioMed Research International
samples only based on the parameters of the SVM classifier, [8] V. Roy, S. Shukla, P. K. Shukla, and P. Rawat, “Gaussian
which is a statistical learning algorithm. As a consequence, Elimination-Based Novel Canonical Correlation Analysis
they do not properly take into consideration the low- Method for EEG Motion Artifact Removal,” Journal of
density area of feature space as well as the inadequate initial Healthcare Engineering, vol. 2017, Article ID 9674712, 2017.
training set when picking prospective samples, resulting in a [9] J. R. Harris, M. E. Lippman, U. Veronesi, and W. Willett,
high likelihood of selecting incorrect potential samples. As a “Breast cancer,” New England Journal of Medicine, vol. 327,
result, the categorization system’s performance deteriorated. no. 5, pp. 319–328, 1992.
The enhanced proposed method has been used to increase [10] L. Hoffman-Goetz, D. Apter, W. Demark-Wahnefried, M. I.
the accuracy of breast cancer classification by dealing with Goran, A. McTiernan, and M. E. Reichman, “Possible mecha-
the condition of low-density regions and including the clus- nisms mediating an association between physical activity and
breast cancer,” Cancer: Interdisciplinary International Journal
ter assumption feature of patterns to choose correct prospec-
of the American Cancer Society, vol. 83, no. S3, pp. 621–628,
tive samples. Experimental comparison with current
1998.
classification approaches on four classification performance
[11] A. S. Rajawat, P. Bedi, S. B. Goyal et al., “Securing 5G-IoT
criteria was performed to experimentally evaluate the effi-
Device Connectivity and Coverage Using Boltzmann Machine
cacy of the proposed classification methodology. On a con- Keys Generation,” Mathematical Problems in Engineering,
ventional benchmark dataset, the suggested classification vol. 2021, Article ID 2330049, 2021.
approach produced better results than the existing
[12] R. Krishnamoorthi, S. Joshi, H. Z. Almarzouki et al., “A Novel
technique. Diabetes Healthcare Disease Prediction Framework Using
Machine Learning Techniques,” Journal of Healthcare Engi-
Data Availability neering, vol. 2022, Article ID 1684017, 2022.
[13] C. S. Healey, A. M. Dunning, M. D. Teare et al., “A common
The data that support the findings of this study are available variant in BRCA2 is associated with both breast cancer risk
on request from the corresponding author. and prenatal viability,” Nature Genetics, vol. 26, no. 3,
pp. 362–364, 2000.
[14] A. M. Wrobel and E. Ł. Gregoraszczuk, “Action of methyl-,
Conflicts of Interest propyl-and butylparaben on GPR30 gene and protein expres-
sion, cAMP levels and activation of ERK1/2 and PI3K/Akt sig-
The author(s) declare(s) that they have no conflicts of
naling pathways in MCF-7 breast cancer cells and MCF-10A
interest. non-transformed breast epithelial cells,” Toxicology Letters,
vol. 238, no. 2, pp. 110–116, 2015.
Acknowledgments [15] S. U. Nazir, R. Kumar, A. Singh et al., “Breast cancer invasion
and progression by MMP-9 through Ets-1 transcription fac-
No funding is available for this research work. tor,” Gene, vol. 711, article 143952, 2019.
[16] B. M. Jenefer and V. Cyrilraj, “An efficient image processing
methods for mammogram breast cancer detection,” Journal
References of Theoretical & Applied Information Technology, vol. 69,
[1] A. M. Barbosa and F. Martel, “Targeting glucose transporters no. 1, 2014.
for breast cancer therapy: the effect of natural and synthetic [17] H. K. Narayan, B. Finkelman, B. French et al., “Detailed echo-
compounds,” Cancers, vol. 12, no. 1, p. 154, 2020. cardiographic phenotyping in breast cancer patients: associa-
[2] H. Ghayumizadeh, O. Pakdelazar, J. Haddadnia, R. G. REZAI, tions with ejection fraction decline, recovery, and heart
and Z. M. Mohammad, Diagnosing Breast Cancer with the Aid failure symptoms over 3 years of follow-up,” Circulation,
of Fuzzy Logic Based on Data Mining of a Genetic Algorithm in vol. 135, no. 15, pp. 1397–1412, 2017.
Infrared Images, 2012. [18] S. Misra, S. Jeon, R. Managuli et al., “Ensemble Transfer Learn-
[3] M. K. Ahirwar, P. K. Shukla, and R. Singhai, “CBO-IE: A Data ing of Elastography and B-mode Breast Ultrasound Images,”
Mining Approach for Healthcare IoT Dataset Using Chaotic 2021, arXiv preprint arXiv: 2102.08567.
Biogeography-Based Optimization and Information Entropy,” [19] S. K. Lim, H. Tabatabaeian, S. Y. Lu et al., “Hippo/MST blocks
Scientific Programming, vol. 2021, Article ID 8715668, 2021. breast cancer by downregulating WBP2 oncogene expression
[4] S. Misra, S. Sharma, A. Agarwal et al., “Cell cycle-dependent via miRNA processor Dicer,” Cell Death & Disease, vol. 11,
regulation of the bi-directional overlapping promoter of no. 8, pp. 1–15, 2020.
human BRCA2/ZAR2 genes in breast cancer cells,” Molecular [20] Y. F. Zhang, Y. Yu, W. Z. Song et al., “miR-410-3p suppresses
Cancer, vol. 9, no. 1, pp. 1–19, 2010. breast cancer progression by targeting snail,” Oncology
[5] S. Stalin, V. Roy, P. K. Shukla et al., “A Machine Learning- Reports, vol. 36, no. 1, pp. 480–486, 2016.
Based Big EEG Data Artifact Detection and Wavelet-Based [21] K. Yu, L. Tan, L. Lin, X. Cheng, Z. Yi, and T. Sato, “Deep-learn-
Removal: An Empirical Approach,” Mathematical Problems ing-empowered breast cancer auxiliary diagnosis for 5GB
in Engineering, vol. 2021, Article ID 2942808, 2021. remote E-health,” IEEE Wireless Communications, vol. 28,
[6] E. L. Mead, A. Z. Doorenbos, S. H. Javid et al., “Shared no. 3, pp. 54–61, 2021.
decision-making for cancer care among racial and ethnic [22] H. Ahmed and A. Haseeb, “LMS based adaptive algorithm for
minorities: a systematic review,” American Journal of Public breast cancer detection using mammogram images,” American
Health, vol. 103, no. 12, pp. e15–e29, 2013. Scientific Research Journal for Engineering, Technology, and
[7] WHO, World Health Organization Cancer Fact Sheet, 2018. Sciences (ASRJETS), vol. 43, no. 1, pp. 169–177, 2018.
BioMed Research International 13