0% found this document useful (0 votes)
42 views36 pages

LAKSHMI Documentation

The document is a technical seminar report by Volla Lakshmi on 'Automated Android Malware Detection using ML and Deep Learning Algorithms for Cybersecurity,' submitted for a Master's degree in Computer Science and Engineering. It outlines the project's focus on enhancing Android malware detection through machine learning and deep learning techniques, detailing the methodologies, system analysis, and advantages of the proposed approach. The report emphasizes the importance of robust detection mechanisms to combat the increasing sophistication of Android malware threats.

Uploaded by

madhu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views36 pages

LAKSHMI Documentation

The document is a technical seminar report by Volla Lakshmi on 'Automated Android Malware Detection using ML and Deep Learning Algorithms for Cybersecurity,' submitted for a Master's degree in Computer Science and Engineering. It outlines the project's focus on enhancing Android malware detection through machine learning and deep learning techniques, detailing the methodologies, system analysis, and advantages of the proposed approach. The report emphasizes the importance of robust detection mechanisms to combat the increasing sophistication of Android malware threats.

Uploaded by

madhu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

AUTOMATED ANDROID MALWARE DETECTION USING ML AND DEEP

LEARNING ALGORITHMS FOR CYBERSECURITY

A TECHNICAL SEMINAR REPORT

Submitted by

VOLLA LAKSHMI

(22RH1D5807)
Under the Esteemed Guidance of

Dr.C.V.P.R. PRASAD
Professor
In partial fulfillment of the Academic Requirements for the Degree of

MASTER OF TECHNOLOGY

Computer Science and Engineering

MALLA REDDY ENGINEERING COLLEGE FOR WOMEN


(Autonomous Institution-UGC,Govt. of India)
Accredited by NBA & NAAC with ‘A’ Grade,Permanently Affiliated to JNTUH, Hyderabad, Approved
by AICTE-ISO 9001:2015 CertifiedNIRF Indian Ranking 2020, Accepted by MHRD Govt of India
AAA+ Rated by Careers360 Magazine, Top Hundred Rank Band by Outlook, 3th Rank CSR
Maisammaguda, Dullapally(post), Secunderabad , TELANGANA
January, 2024
MALLA REDDY ENGINEERING COLLEGE FOR WOMEN
(Autonomous Institution-UGC,Govt. of India)
Accredited by NBA & NAAC with ‘A’ Grade
Permanently Affiliated to JNTUH, Hyderabad, Approved by AICTE-ISO 9001:2015
CertifiedNIRF Indian Ranking 2020, Accepted by MHRD Govt of India
AAA+ Rated by Careers 360 Magazine, Top Hundred Rank Band by Outlook, 3th Rank CSR
Maisammaguda, Dullapally(post), Secunderabad,TELANGANA

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CERTIFICATE

This is to certify that The Technical Seminar Report entitled “ AUTOMATED ANDROID
MALWARE DETECTION USING ML AND DEEP LEARNING ALGORITHMS FOR
CYBERSECURITY ”“ is carried out by “VOLLA LAKSHMI (22RH1D5807)” in partial

fulfillment for the award of degree MASTER OF TECHNOLOGY in Computer Science


and Engineering from Malla Reddy Engineering College for Women, Secunderabad during
the academic year 2022-2024.

Supervisor’s Signature Head of the Department, CSE


Dr.C.V.P.R.PRASAD Dr.C.V.P.R.PRASAD
Professor Professor

External Examiner
MALLA REDDY ENGINEERING COLLEGE FOR WOMEN
(Autonomous Institution-UGC,Govt. of India)
Accredited by NBA & NAAC with ‘A’ Grade
Permanently Affiliated to JNTUH, Hyderabad, Approved by AICTE-ISO 9001:2015
CertifiedNIRF Indian Ranking 2020, Accepted by MHRD Govt of India
AAA+ Rated by Careers 360 Magazine, Top Hundred Rank Band by Outlook, 3th Rank CSR
Maisammaguda, Dullapally(post), Secunderabad,TELANGANA

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

DECLERATION

I hereby declare that the Technical Seminar Report entiled “AUTOMATED ANDROID
MALWARE DETECTION USING ML AND DEEP LEARNING ALGORITHMS FOR
CYBERSECURITY” “ submitted to Malla Reddy Engineering College for Women,
Secunderabad for the award of the Degree of Master of Technology in Computer Science
and Engineering is a result of original research work done by me.

It is declared that the Technical Seminar report has not been previously submitted to
any University or Institute for the award of Degree.

Being submitted by

VOLLA LAKSHMI
(22RH1D5807)
ACKNOWLEDGEMENT

I feel ourselves honored and privileged to place our warm salutation to our college
Malla Reddy Engineering College for Women and department of Computer Science and
Engineering which gave us the opportunity to have expertise in engineering and profound
technical knowledge.
I would like to deeply thank our Honorable Minister of Telangana State Sri. Ch.
Malla Reddy Garu, founder chairmen MRGI, the largest cluster of institutions in the state
of Telangana for providing us with all the resources in the college to make our project
success.
I wish to convey gratitude to our Principal Dr. Y. Madhavee Latha, for providing
us with the environment and mean to enrich our skills and motivating us in our endeavour
and helping us to realize our full potential.
I express our sincere gratitude to Dr .C.V.P.R. Prasad, Head of the Department of
Computer Science and Engineering for inspiring us to take up a project on this subject and
successfully guiding us towards its completion.
I would also like to thank our Technical Seminar coordinator Mr. G. Bhanu
Prasad, for his kind encouragement and overall guidance in viewing this program a good
asset with profound gratitude.
I would like to thank our internal guide Dr. --------------, and all the Faculty
members for their valuable guidance and encouragement towards the completion of our
project work.

With Regards and Gratitude


VOLLA LAKSHMI
(22RH1D5807)
ABSTRACT

Our project focuses on addressing the critical issue of Android malware detection
through the utilization of machine learning (ML) and deep learning models. We begin by
performing extensive data preprocessing on a dataset containing phone logs, aiming to extract
meaningful information. Subsequently, we categorize the data into five distinct classes: benign,
SMS malware, riskware, banking malware, and adware. Through the application of various
machine learning algorithms such as K-Nearest Neighbors (KNN), Logistic Regression, Random
Forest, and Recurrent Neural Networks (RNN), our system effectively identifies and classifies
the malware within Android phones. This project not only aids in pinpointing malicious software
but also provides insights into the prevalence of different malware types within the dataset. Our
approach serves as a valuable tool for enhancing Android security and safeguarding users against
potential threats.
CONTENTS

TITLE PAGE NO

Abstract i

Contents ii

1.INTRODUCTION 1

2. SYSTEM ANALYSIS 3

3.SYSTEM STUDY 5

4. SYSTEM REQUIREMENTS AND SPECIFICATIONS 6

5.SYSTEMDESIGN 8

6. IMPLEMENTATION 13

7.TESTING 21

8.OUTPUTSCREENS 23

9.CONCLUSION 28

10. FUTURESCOPE 29

11.REFERENCES 30
AUTOMATED ANDROID MALWARE DETECTION USING ML AND DL ALGORITHMS FOR CYBERSECURITY

1. INTRODUCTION
In the rapidly evolving landscape of mobile technology, Android devices have
become integral to our daily lives, serving as gateways to communication,
information, and services. However, the ubiquity of Android smart phones has
attracted the attention of malicious actors who exploit vulnerabilities for nefarious
purposes, leading to the rise of Android malware. The ever-growing sophistication
of these threats necessitates advanced detection mechanisms, and this project aims
to address this critical issue through the application of machine learning (ML) and
deep learning models.

Android malware encompasses a diverse range of malicious software, including


SMS malware, riskware, banking malware, and adware, each posing unique
challenges for detection. Traditional signature-based methods and heuristics have
limitations in effectively identifying these varied threats, prompting the need for
more adaptive and intelligent solutions.

The absence of a robust and scalable system for Android malware detection
hampers user security, leading to potential data breaches, financial losses, and
privacy violations. This project seeks to bridge this gap by leveraging the power of
machine learning and deep learning models to enhance the accuracy and efficiency
of Android malware detection.

The primary objective of this project is to develop a robust Android malware


detection system employing both machine learning (ML) and deep learning models.
Initially, we aim to compile and preprocess extensive phone log datasets, extracting
pertinent information to create a well-structured foundation. The subsequent
categorization of data into five distinct classes—benign, SMS malware, riskware,
banking malware, and adware—will lay the groundwork for effective model
training. Through the implementation of various ML algorithms, such as K-Nearest
Neighbors, Logistic Regression, and Random Forest, we seek to achieve accurate
classification of Android malware. Additionally, the integration of Recurrent Neural
Networks (RNN) will enable us to explore the strengths of deep learning in
identifying intricate patterns within the data.

MALLA REDDY ENGINEERING COLLEGE FOR WOMEN(AUTONOMOUS) Page 1


AUTOMATED ANDROID MALWARE DETECTION USING ML AND DL ALGORITHMS FOR CYBERSECURITY

The project will rigorously evaluate the performance of these models using defined
metrics and validation techniques to ensure robustness and generalization. Insights
into the prevalence and distribution of different malware types within the dataset
will provide valuable understanding. Emphasizing scalability and integration, we
plan to design the system to accommodate growing datasets and seamlessly
integrate with existing Android security frameworks. Ultimately, our project aspires
to contribute to the ongoing efforts in fortifying mobile security and offering users
and professionals an adaptive and efficient tool against evolving Android malware
threats.

MALLA REDDY ENGINEERING COLLEGE FOR WOMEN(AUTONOMOUS) Page 2


AUTOMATED ANDROID MALWARE DETECTION USING ML AND DL ALGORITHMS FOR CYBERSECURITY

2. SYSTEM ANALYSIS

2.1 EXISTING SYSTEM


The existing system employed a range of machine learning (ML) models,
including various support vector machine (SVM) algorithms, such as Least Square
SVM, Kernel Extreme Learning Machine (KELM), and Regularized Random
Vector Functional Link Neural Network (RRVFLN).

 Least Square Support Vector Machine (SVM):


Least Square SVM is an extension of the traditional SVM algorithm. It aims to
find the hyperplane that best separates data points by minimizing the squared
hinge loss.
 Kernel Extreme Learning Machine (KELM):
KELM is an extension of the Extreme Learning Machine (ELM) algorithm that
employs kernel methods to map data into a higher-dimensional space for nonlinear
classification.
 Regularized Random Vector Functional Link Neural Network (RRVFLN):
RRVFLN is a neural network architecture that incorporates regularization
techniques to improve generalization.
These models were combined into an ensemble, demonstrating a holistic approach
to malware detection. Furthermore, bias was introduced to the ensemble, possibly
to address class imbalance issues or to fine-tune model predictions. The ensemble
was then integrated into a voting system to collectively make decisions regarding
the presence of malware.
DISADVANTAGES
The existing system faced a notable limitation: it could not achieve a satisfactory
level of accuracy in malware detection. Because however, the effectiveness of an
ensemble depends on how diverse the base models are. If these three models are
very similar in nature, the ensemble may not offer a significant advantage. This
shortcoming highlights the complexity of the Android malware detection problem
and underscores the need for innovative approaches to enhance the precision and
robustness of such systems

MALLA REDDY ENGINEERING COLLEGE FOR WOMEN(AUTONOMOUS) Page 3


AUTOMATED ANDROID MALWARE DETECTION USING ML AND DL ALGORITHMS FOR CYBERSECURITY

2.2 PROPOSED METHODOLOGY

In our proposed system, we conducted an extensive evaluation of various machine


learning (ML) algorithms, including K-Nearest Neighbors (KNN), Logistic
Regression, Random Forest, Naive Bayes, and the innovative AdaBoost algorithm.
Our objective was to identify the optimal approach for Android malware detection,
and the results were nothing short of remarkable.

Random Forest, one of the ML algorithms tested, emerged as the standout


performer, achieving an impressive accuracy rate of 94%. Its strength in capturing
intricate patterns within the dataset was evident, thanks to its ensemble nature that
aggregates multiple decision trees. This robust performance highlighted Random
Forest as a powerful tool for combating Android malware.
In our pursuit of even greater accuracy, we explored deep learning models,
specifically Recurrent Neural Networks (RNNs). RNNs demonstrated their
potential by also achieving top-tier accuracy, showcasing their ability to capture
temporal dependencies inherent in Android malware logs. This success
underscored the versatility of deep learning approaches for such complex tasks.
Furthermore, the inclusion of AdaBoost within our arsenal of algorithms played a
pivotal role in improving overall accuracy. AdaBoost's adaptive weighting of data
points and ensemble of weak classifiers led to a more potent classification model.
By focusing on previously misclassified instances in each iteration, AdaBoost
adeptly guided the ensemble toward the challenging, harder-to-classify data
points. This adaptive approach significantly bolstered the system's capability to
handle intricate Android malware patterns.
ADVANTAGES
Overall, our proposed system, with its adept utilization of Random Forest, RNNs,
and the strategic incorporation of AdaBoost, not only achieved remarkable
accuracy but also demonstrated the synergy between different machine learning
approaches. This comprehensive approach holds the promise of significantly
enhancing Android security, offering robust protection against a wide spectrum of
malicious software threats and underscoring the importance of ensemble learning
in tackling real-world cybersecurity challenges.

MALLA REDDY ENGINEERING COLLEGE FOR WOMEN(AUTONOMOUS) Page 4


AUTOMATED ANDROID MALWARE DETECTION USING ML AND DL ALGORITHMS FOR CYBERSECURITY

3. SYSTEMSTUDY

 ECONOMICALFEASIBILITY
Our project demonstrates strong economic feasibility, as it offers cost-effective
Android malware detection solutions. Leveraging existing hardware and open-
source software reduces development expenses, while potential savings from
preventing malware-related damages make it a financially viable investment.
Conduct a return on investment (ROI) analysis, accounting for ongoing costs and
regulatory compliance, to determine the project's potential competitive advantage
and payback period. Sensitivity analysis helps assess the impact of changing
assumptions. The decision should be based on a positive ROI, alignment with
strategic goals, and dynamic adaptability to evolving cybersecurity threats.

 TECHNICALFEASIBILITY
The project exhibits high technical feasibility, given the availability of well-
established machine learning and deep learning frameworks. Accessible datasets
and ample online resources facilitate model development. Furthermore,
advancements in computational power and cloud services enhance the scalability
and implementation of the system.
It involves evaluating data quality, algorithm suitability, computational resources,
feature engineering, scalability, integration, regulatory compliance, real-time
capabilities, maintenance, testing, and deployment. A successful implementation
hinges on addressing these technical challenges and ensuring effective model
training, deployment, and monitoring within the evolving cybersecurity landscape.
 SOCIALFEASIBILITY
Our project addresses social feasibility by contributing to enhanced Android
security. Protecting user privacy and personal data from malware threats aligns
with societal expectations for safer digital experiences. Public awareness
campaigns can further promote responsible smartphone usage.
It involves ensuring user acceptance, addressing ethical and privacy concerns,
promoting transparency and accountability, and managing public perception.
Compliance with regulations, societal impact, collaboration, accessibility, and
continuous improvement are critical factors in building trust and fostering the
acceptance of this technology in the cybersecurity landscape.

MALLA REDDY ENGINEERING COLLEGE FOR WOMEN(AUTONOMOUS) Page 5


AUTOMATED ANDROID MALWARE DETECTION USING ML AND DL ALGORITHMS FOR CYBERSECURITY

4. SYSTEM REQUIREMENTS AND SPECIFICATIONS

4.1 SOFTWARE REQUIREMENTS


H/WSystemConfiguration: -

➢ Processor - Pentium–IV

➢ RAM -4GB (min)

➢ HardDisk -20GB

➢ KeyBoard -StandardWindowsKeyboard

➢ Mouse -TwoorThreeButtonMouse

➢ Monitor -SVGA

4.2 HARDWARE REQUIREMENTS


Our project leverages a robust software stack to achieve its objectives, with Python as
the primary programming language.

Python:
Python is a versatile, high-level programming language renowned for its simplicity and
readability. Its extensive library ecosystem and active community support make it an
ideal choice for our project.

Key features of Python include:

Scikit-Learn (sklearn):
Scikit-Learn is a powerful machine learning library in Python that simplifies the
implementation of various machine learning algorithms.
Pandas:
Pandas is a data manipulation library that facilitates data preprocessing and analysis.
We rely on Pandas for efficient data handling, including cleaning, transformation, and
feature engineering.

MALLA REDDY ENGINEERING COLLEGE FOR WOMEN(AUTONOMOUS) Page 6


AUTOMATED ANDROID MALWARE DETECTION USING ML AND DL ALGORITHMS FOR CYBERSECURITY

Seaborn:
Seaborn is a Python data visualization library built on top of Matplotlib. It provides a
high-level interface for creating aesthetically pleasing statistical graphics

Matplotlib:
Matplotlib is a versatile 2D plotting library in Python. We employ Matplotlib to create
various visualizations, including bar charts, line plots, and heatmaps, to convey project
insights and results effectively.

Operating System: Windows

MALLA REDDY ENGINEERING COLLEGE FOR WOMEN(AUTONOMOUS) Page 7


AUTOMATED ANDROID MALWARE DETECTION USING ML AND DL ALGORITHMS FOR CYBERSECURITY

5. SYSTEMDESIGN

5.1 SYSTEMARCHITECTURE
The architecture for an automated Android malware detection system using ML and
deep learning algorithms entails data collection from diverse sources, data
preprocessing for feature extraction and labeling, selection and training of ML and
DL models, real-time monitoring, alert generation, user-friendly interfaces,
scalability, security measures, compliance with privacy regulations, maintenance,
and ongoing improvement through feedback loops, creating a robust solution that
adapts to evolving threats while effectively safeguarding Android devices.

MALLA REDDY ENGINEERING COLLEGE FOR WOMEN(AUTONOMOUS) Page 8


AUTOMATED ANDROID MALWARE DETECTION USING ML AND DL ALGORITHMS FOR CYBERSECURITY

5.2 UMLDIAGRAMS

Class Diagram:
Class diagram is a static diagram. It represents the static view of an application. Class
diagram is not only used for visualizing , describing , and documenting different aspects
of a system but also for constructing executable code of the software application.

SYSTEM

 Take Input API’s


 Data Preprocessing
 Feature Extraction
 Applying the Algorithms
 Metrics Evaluation
 Shows the Results

USER

MALLA REDDY ENGINEERING COLLEGE FOR WOMEN(AUTONOMOUS) Page 9


AUTOMATED ANDROID MALWARE DETECTION USING ML AND DL ALGORITHMS FOR CYBERSECURITY

SEQUENCEDIAGRAM
A sequence diagram is a type of interaction diagram in Unified Modeling Language
(UML) used to visualize the interactions and the order of messages exchanged
between objects or components in a system. It shows how objects or components
collaborate over time to achieve a particular functionality.

SYSTEM USER
Input API’s

Data Preprocessing

Feature Extraction

Applying Algorithms

Metrics Evaluation

Shows the results

MALLA REDDY ENGINEERING COLLEGE FOR WOMEN(AUTONOMOUS) Page 10


AUTOMATED ANDROID MALWARE DETECTION USING ML AND DL ALGORITHMS FOR CYBERSECURITY

Usecase

A use case diagram is a graphical depiction of a user's possible interactions with a


system . A use case diagram shows various use cases and different types of users the
system has and will often be accompanied by other types of diagrams as well. The
use cases are represented byeithercirclesorellipses.

MALLA REDDY ENGINEERING COLLEGE FOR WOMEN(AUTONOMOUS) Page 11


AUTOMATED ANDROID MALWARE DETECTION USING ML AND DL ALGORITHMS FOR CYBERSECURITY

5.3 DATAFLOWDIAGRAM

A data flow diagram (DFD) maps out the flow of information for any process or system. It
usesdefinedsymbolslikerectangles,circlesandarrows,plusshorttextlabels,toshowdata
inputs,outputs, storage points and the routes between each destination. Data flowcharts
can rangefrom simple, even hand-drawn process overviews, to in-depth, multi-level
DFDs that digprogressively deeper into how the data is handled. They can be used to
analyze an existingsystemor modela newone. Likeallthebestdiagramsandcharts,
aDFDcanoftenvisually“say” things that would be hard to explain in words, and they
work for both technical andnontechnicalaudiences, fromdevelopertoCEO.

MALLA REDDY ENGINEERING COLLEGE FOR WOMEN(AUTONOMOUS) Page 12


AUTOMATED ANDROID MALWARE DETECTION USING ML AND DL ALGORITHMS FOR CYBERSECURITY

6. IMPLEMENTATION
6.1 SYSTEM MODULES
Implementing the Android malware detection project involves several key steps, from data
preprocessing to model development and deployment

 Data Collection and Preprocessing:


- Collect a diverse and representative dataset of phone logs.
- Preprocess the data to handle missing values, normalize features, and extract relevant
information for model training.

 Feature Extraction:
- Identify and extract features from the phone logs that indicates malware behavior.
- Transform categorical features into a suitable format for machine learning models.

 Dataset Labeling:
- Categorize the dataset into the predefined classes: benign, SMS malware, riskware,
banking malware, and adware.

 Model Selection:
- Choose machine learning algorithms such as K-Nearest Neighbors, Logistic Regression,
and Random Forest for initial classification.
- Implement a Recurrent Neural Network (RNN) for deep learning-based classification.

 Model Training:
- Split the dataset into training and validation sets.
- Train the selected models and validate their performance on the validation set.
- Tune hyperparameters to optimize model performance.

 Evaluation Metrics:
- Defie and calculate evaluation metrics such as accuracy, precision, recall, and F1 score
to assess the effectiveness of each model.

 Deep Learning Model Training:


- Train the Recurrent Neural Network (RNN) using the same training and validation sets
used for the machine learning models.
- Fine-tune hyperparameters for the RNN to achieve optimal performance.

 Model Comparison and Analysis:


- Compare the performance of machine learning models and the RNN.
- Analyze results to understand the strengths and weaknesses of each model in detecting
different types of malwares.
By systematically following these steps, the implementation of the Android malware
detection project can yield a reliable and effective solution for enhancing Android security
through machine learning and deep learning models.

MALLA REDDY ENGINEERING COLLEGE FOR WOMEN(AUTONOMOUS) Page 13


AUTOMATED ANDROID MALWARE DETECTION USING ML AND DL ALGORITHMS FOR CYBERSECURITY

6.2 SOURCE CODE

importnumpyasnp
importpandasaspd

fromsklearn.feature_selectionimportSelectKBest, f_classif
fromsklearn.model_selectionimporttrain_test_split
fromsklearn.metricsimportaccuracy_score, precision_score, recall_score,
f1_score
fromsklearn.metricsimportconfusion_matrix
fromsklearn.model_selectionimportGridSearchCV
fromsklearn.ensembleimportRandomForestClassifier
fromsklearn.neighborsimportKNeighborsClassifier
fromsklearn.naive_bayesimportGaussianNB
fromsklearn.linear_modelimportLogisticRegression
fromsklearn.svmimportSVC
fromsklearn.model_selectionimportcross_val_score

import seaborn assns


importmatplotlib.pyplotasplt
df=pd.read_csv('data.csv')
# Labels
# Adware: 1,253-------1
# Banking: 2,100------2
# SMS malware: 3,904--3
# Riskware: 2,546-----4
# Benign: 1,795-------5
df.head()
df.shape
df.isnull().sum()
label_counts=df['Class'].value_counts()

labels=label_counts.index.tolist()
counts=label_counts.tolist()

plt.bar(labels, counts)
plt.xlabel('Malware')
plt.ylabel('Counts')
plt.title('Malware Distribution in Dataset')
plt.show()
df.columns
df["Class"].unique()
# Data Preprocessing
X=df.drop(columns=['Class']) # Features
y=df['Class'] # Target

# Split the data into training and test sets

MALLA REDDY ENGINEERING COLLEGE FOR WOMEN(AUTONOMOUS) Page 14


AUTOMATED ANDROID MALWARE DETECTION USING ML AND DL ALGORITHMS FOR CYBERSECURITY

X_train, X_test, y_train, y_test=train_test_split(X, y,


stratify=y,test_size=0.2, random_state=42)

#ANOVA-based feature selection


num_features_to_select=120
selector=SelectKBest(score_func=f_classif, k=num_features_to_select)
X_train_selected=selector.fit_transform(X_train, y_train)
X_test_selected=selector.transform(X_test)

# Get the indices of selected features


selected_feature_indices=selector.get_support(indices=True)

# Convert selected features to a DataFrame


X_train=X_train.iloc[:, selected_feature_indices]
X_test=X_test.iloc[:, selected_feature_indices]
X_train.head()
X_test.shape
#Standardize Data
fromsklearn.preprocessingimportStandardScaler

scaler=StandardScaler()

X_train=scaler.fit_transform(X_train)
X_test=scaler.transform(X_test)
# Naive Bayes
nb_model=GaussianNB()
nb_model.fit(X_train, y_train)

# Predict labels on the test set


y_pred=nb_model.predict(X_test)

# Calculate Naive Bayes classifier metrics


nb_accuracy=accuracy_score(y_test, y_pred)
nb_precision=precision_score(y_test, y_pred,average='weighted')
nb_recall=recall_score(y_test, y_pred,average='weighted')
nb_f1=f1_score(y_test, y_pred,average='weighted')

print(f"Naive Bayes Classifier Accuracy: {nb_accuracy:.4f}")


print(f"Naive Bayes Classifier Precision: {nb_precision:.4f}")
print(f"Naive Bayes Classifier Recall: {nb_recall:.4f}")
print(f"Naive Bayes Classifier F1-Score: {nb_f1:.4f}")
conf=confusion_matrix(y_pred, y_test)
sns.heatmap(conf , cmap='YlGnBu', fmt='', xticklabels=['Adware' ,'Banking'
,'SMS malware', 'Riskware','Benign'], yticklabels=['Adware' ,'Banking' ,'SMS
malware', 'Riskware','Benign'], annot=True)
# Random Forest
rf_model=RandomForestClassifier(random_state=42)

MALLA REDDY ENGINEERING COLLEGE FOR WOMEN(AUTONOMOUS) Page 15


AUTOMATED ANDROID MALWARE DETECTION USING ML AND DL ALGORITHMS FOR CYBERSECURITY

rfModel=RandomForestClassifier(n_estimators=300, random_state=42)
rfModel.fit(X_train, y_train)

# Predict labels on the test set


y_pred=rfModel.predict(X_test)

# Calculate Random Forest classifier metrics


rf_accuracy=accuracy_score(y_test, y_pred)
rf_precision=precision_score(y_test, y_pred,average='weighted')
rf_recall=recall_score(y_test, y_pred,average='weighted')
rf_f1=f1_score(y_test, y_pred,average='weighted')

print(f"Random Forest Classifier Accuracy: {rf_accuracy:.4f}")


print(f"Random Forest Classifier Precision: {rf_precision:.4f}")
print(f"Random Forest Classifier Recall: {rf_recall:.4f}")
print(f"Random Forest Classifier F1-Score: {rf_f1:.4f}")
conf=confusion_matrix(y_pred, y_test)
sns.heatmap(conf , cmap='YlGnBu', fmt='', xticklabels=['Adware' ,'Banking'
,'SMS malware', 'Riskware','Benign'], yticklabels=['Adware' ,'Banking' ,'SMS
malware', 'Riskware','Benign'], annot=True)
# Logistic Regression
logmodel=LogisticRegression(max_iter=1000000, random_state=42)
logmodel.fit(X_train, y_train)

# Predict labels on the test set


y_pred=logmodel.predict(X_test)

# Calculate logistic regression metrics


lr_accuracy=accuracy_score(y_test, y_pred)
lr_precision=precision_score(y_test, y_pred,average='weighted')
lr_recall=recall_score(y_test, y_pred,average='weighted')
lr_f1=f1_score(y_test, y_pred,average='weighted')

print(f"Logistic Regression Accuracy: {lr_accuracy:.4f}")


print(f"Logistic Regression Precision: {lr_precision:.4f}")
print(f"Logistic Regression Recall: {lr_recall:.4f}")
print(f"Logistic Regression F1-Score: {lr_f1:.4f}")

conf=confusion_matrix(y_pred, y_test)
sns.heatmap(conf , cmap='YlGnBu', fmt='', xticklabels=['Adware' ,'Banking'
,'SMS malware', 'Riskware','Benign'], yticklabels=['Adware' ,'Banking' ,'SMS
malware', 'Riskware','Benign'], annot=True)
# SVM
SVmodel=SVC(kernel='linear', C=1.0, random_state=42)
SVmodel.fit(X_train, y_train)

# Predict labels on the test set


y_pred=SVmodel.predict(X_test)

MALLA REDDY ENGINEERING COLLEGE FOR WOMEN(AUTONOMOUS) Page 16


AUTOMATED ANDROID MALWARE DETECTION USING ML AND DL ALGORITHMS FOR CYBERSECURITY

# Calculate SVM classifier metrics


svm_accuracy=accuracy_score(y_test, y_pred)
svm_precision=precision_score(y_test, y_pred,average='weighted')
svm_recall=recall_score(y_test, y_pred,average='weighted')
svm_f1=f1_score(y_test, y_pred,average='weighted')

print(f"SVM Classifier Accuracy: {svm_accuracy:.4f}")


print(f"SVM Classifier Precision: {svm_precision:.4f}")
print(f"SVM Classifier Recall: {svm_recall:.4f}")
print(f"SVM Classifier F1-Score: {svm_f1:.4f}")

conf=confusion_matrix(y_pred, y_test)
sns.heatmap(conf , cmap='YlGnBu', fmt='', xticklabels=['Adware' ,'Banking'
,'SMS malware', 'Riskware','Benign'], yticklabels=['Adware' ,'Banking' ,'SMS
malware', 'Riskware','Benign'], annot=True)
#ADA Boost
fromsklearn.ensembleimportAdaBoostClassifier
abc=AdaBoostClassifier(n_estimators=100, random_state=0)
abc.fit(X_train, y_train)

# Predict labels on the test set


y_pred=abc.predict(X_test)

# Calculate SVM classifier metrics


abc_accuracy=accuracy_score(y_test, y_pred)
abc_precision=precision_score(y_test, y_pred,average='weighted')
abc_recall=recall_score(y_test, y_pred,average='weighted')
abc_f1=f1_score(y_test, y_pred,average='weighted')

print(f"ABC Classifier Accuracy: {abc_accuracy:.4f}")


print(f"ABC Classifier Precision: {abc_precision:.4f}")
print(f"ABC Classifier Recall: {abc_recall:.4f}")
print(f"ABC Classifier F1-Score: {abc_f1:.4f}")

conf=confusion_matrix(y_pred, y_test)
sns.heatmap(conf , cmap='YlGnBu', fmt='', xticklabels=['Adware' ,'Banking'
,'SMS malware', 'Riskware','Benign'], yticklabels=['Adware' ,'Banking' ,'SMS
malware', 'Riskware','Benign'], annot=True)
# KNN
#a range of k values to try
k_values= [2, 3, 5, 7, 9, 11]

best_k=None
best_accuracy=0

forkink_values:

MALLA REDDY ENGINEERING COLLEGE FOR WOMEN(AUTONOMOUS) Page 17


AUTOMATED ANDROID MALWARE DETECTION USING ML AND DL ALGORITHMS FOR CYBERSECURITY

knn_model=KNeighborsClassifier(n_neighbors=k)
# Use cross-validation to evaluate the model
scores=cross_val_score(knn_model, X_train, y_train, cv=5,
scoring='accuracy')
mean_accuracy=scores.mean()

ifmean_accuracy>best_accuracy:
best_accuracy=mean_accuracy
best_k=k
# Train the KNN model with the best k
best_knn_model=KNeighborsClassifier(n_neighbors=best_k)
best_knn_model.fit(X_train, y_train)

# Predict labels on the test set


y_pred=best_knn_model.predict(X_test)

# Calculate KNN classifier metrics


knn_accuracy=accuracy_score(y_test, y_pred)
knn_precision=precision_score(y_test, y_pred, average='weighted')
knn_recall=recall_score(y_test, y_pred, average='weighted')
knn_f1=f1_score(y_test, y_pred, average='weighted')

print(f"Best k: {best_k}")
print(f"K-Nearest Neighbors Classifier Accuracy: {knn_accuracy:.4f}")
print(f"K-Nearest Neighbors Classifier Precision: {knn_precision:.4f}")
print(f"K-Nearest Neighbors Classifier Recall: {knn_recall:.4f}")
print(f"K-Nearest Neighbors Classifier F1-Score: {knn_f1:.4f}")

conf=confusion_matrix(y_pred, y_test)
sns.heatmap(conf , cmap='YlGnBu', fmt='', xticklabels=['Adware' ,'Banking'
,'SMS malware', 'Riskware','Benign'], yticklabels=['Adware' ,'Banking' ,'SMS
malware', 'Riskware','Benign'], annot=True)
# Combined Result Visualization and Comparison
#classifiers and their colors
classifiers= ['Logistic Regression', 'SVM', 'Random Forest', 'KNN', 'Naive
Bayes']
colors= ['blue', 'orange', 'green', 'red', 'purple']

#metric variables
accuracy_scores= [lr_accuracy, svm_accuracy, rf_accuracy, knn_accuracy,
nb_accuracy]
precision_scores= [lr_precision, svm_precision, rf_precision, knn_precision,
nb_precision]
recall_scores= [lr_recall, svm_recall, rf_recall, knn_recall, nb_recall]
f1_scores= [lr_f1, svm_f1, rf_f1, knn_f1, nb_f1]

metrics= ['Accuracy', 'Precision', 'Recall', 'F1 Score']

MALLA REDDY ENGINEERING COLLEGE FOR WOMEN(AUTONOMOUS) Page 18


AUTOMATED ANDROID MALWARE DETECTION USING ML AND DL ALGORITHMS FOR CYBERSECURITY

# Define the data for each metric


data= [accuracy_scores, precision_scores, recall_scores, f1_scores]

fig, ax=plt.subplots(figsize=(10, 6))

x=np.arange(len(metrics))

# Set the width of each bar


width=0.10

#each classifier and plot bars


fori, classifierinenumerate(classifiers):
ax.bar(x+i*width, [d[i] fordindata], width, label=classifier,
color=colors[i])

#labels, title, and legend


ax.set_xlabel('Metrics')
ax.set_ylabel('Scores')
ax.set_title('Performance Metrics Comparison for Different Classifiers')
ax.set_xticks(x+2.5*width)
ax.set_xticklabels(metrics)
ax.legend()

# Show the plot


plt.show()
#classifiers and metrics
classifiers= ['Logistic Regression', 'SVM', 'Random Forest', 'KNN', 'Naive
Bayes']
metrics= ['Accuracy', 'Precision', 'Recall', 'F1-Score']

# metric scores
accuracy_scores= [lr_accuracy, svm_accuracy, rf_accuracy, knn_accuracy,
nb_accuracy]
precision_scores= [lr_precision, svm_precision, rf_precision, knn_precision,
nb_precision]
recall_scores= [lr_recall, svm_recall, rf_recall, knn_recall, nb_recall]
f1_scores= [lr_f1, svm_f1, rf_f1, knn_f1, nb_f1]

#dictionary to store metric


metric_scores= {
'Accuracy': accuracy_scores,
'Precision': precision_scores,
'Recall': recall_scores,
'F1-Score': f1_scores
}

#DataFrame from the dictionary


df=pd.DataFrame(metric_scores, index=classifiers)

MALLA REDDY ENGINEERING COLLEGE FOR WOMEN(AUTONOMOUS) Page 19


AUTOMATED ANDROID MALWARE DETECTION USING ML AND DL ALGORITHMS FOR CYBERSECURITY

#heatmap
plt.figure(figsize=(10, 6))
sns.set(font_scale=1.2)
sns.heatmap(df, annot=True, cmap='YlGnBu', fmt='.4f')
plt.title("Model Evaluation Metrics Heatmap")
plt.xticks(rotation=45)
plt.tight_layout()

plt.show()
df
#RNN
fromkeras.modelsimportSequential
importkeras
importkeras.backendaskb
importtensorflowastf
model=keras.Sequential([
keras.layers.Dense(32,activation=tf.nn.relu,input_shape=[120]),
keras.layers.Dense(32,activation=tf.nn.relu),
keras.layers.Dense(32,activation=tf.nn.relu),
keras.layers.Dense(5)
])
optimizer=tf.keras.optimizers.RMSprop(0.0099)
model.compile(loss='mean_squared_error',optimizer=optimizer)
history=model.fit(X_train, y_train,epochs=500)

MALLA REDDY ENGINEERING COLLEGE FOR WOMEN(AUTONOMOUS) Page 20


AUTOMATED ANDROID MALWARE DETECTION USING ML AND DL ALGORITHMS FOR CYBERSECURITY

7. TESTING

The purpose of testing is to discover errors. Testing is the process of trying to discover
every conceivable fault or weakness in a work product. It provides a way to check the
functionality of components, sub assemblies, assemblies and/or a finished product It is the
process of exercising software with the intent of ensuring that the
Software system meets its requirements and user expectations and does not fail in an
unacceptable manner. There are various types of test. Each test type addresses a specific testing
requirement.

TYPES OF TESTS

 Unit testing
Unit testing involves the design of test cases that validate that the internal program logic is
functioning properly, and that program n puts produce valid outputs. All decision branches
and internal code flow should be validated. It is the testing of individual software units of
the application. Unit tests ensure that each unique path of a business process performs
accurately to the documented specifications and contains clearly defined inputs and
expected results.

 Integration testing
Integration tests are designed to test integrated software components to determine if they
actually run as one program. Testing is event driven and is more concerned with the basic
outcome of screens or fields. Integration tests demonstrate that although the components
were in dividually satisfaction, as shown by successfully unit testing, the combination of
components is correct and consistent

 Functional test
Functional tests provide systematic demonstrations that functions tested are available
asspecifiedbythebusinessandtechnicalrequirements,systemdocumentation,andusermanuals.
Functional testing is centered by Valid Input, Invalid Input, Functions, Output, System
Procedures.

MALLA REDDY ENGINEERING COLLEGE FOR WOMEN(AUTONOMOUS) Page 21


AUTOMATED ANDROID MALWARE DETECTION USING ML AND DL ALGORITHMS FOR CYBERSECURITY

 System Test
System testing ensures that the entire integrated software system meets requirements. It tests
a configuration to ensure known and predictable results. An example of system testing is the
configuration oriented system integration test. System testing is based on process
descriptions and flows, emphasizing pre-driven process links and integration points.

 White Box Testing


White Box Testing is a testing in which in which the software tester has knowledge of the
inner workings, structure and language of the software, or at least its purpose. It is purpose.
It is used to test are as that cannot be reached from a black box level.
 Black Box Testing
Black Box Testing is testing the software without any knowledge of the inner workings,
structure or language of the module being tested. Black box tests, as most other kinds of
tests, must be written from a definitive source document, such as specification or
requirements document, such as specification or requirements document. It is a testing in which
the software under test is treated, as a black box. You cannot “see” into it. The test provides
inputs and responds to outputs without considering how the software works.
 Acceptance Testing
User Acceptance Testing is a critical phase of any project and requires significant
participation by the end user. It also ensures that the system meets the functional
requirements.
Test Results: All the test cases mentioned above passed successfully. Node fects encountered.

MALLA REDDY ENGINEERING COLLEGE FOR WOMEN(AUTONOMOUS) Page 22


AUTOMATED ANDROID MALWARE DETECTION USING ML AND DL ALGORITHMS FOR CYBERSECURITY

8. OUTPUTSCREENS

The dataset which we have taken into the consideration is distributed by the Malware by the values
of counts and malware detection which is shown below.

Naïve bayes:

In above fig, we can see the metrics values of the naïve bayes classifier which shows the
accuracy based on the classes such as Adware, Banking, SMS malware, Riskware, Benign.

MALLA REDDY ENGINEERING COLLEGE FOR WOMEN(AUTONOMOUS) Page 23


AUTOMATED ANDROID MALWARE DETECTION USING ML AND DL ALGORITHMS FOR CYBERSECURITY

Random Forest:

In above fig, we can see the metrics values of the Random Forest classifier which shows the
accuracy based on the the classes such as Adware, Banking, SMS malware, Riskware,
Benign. This classifier has the heighest accuracy of all the models which we have used in
this project.

Logistic Regression:

MALLA REDDY ENGINEERING COLLEGE FOR WOMEN(AUTONOMOUS) Page 24


AUTOMATED ANDROID MALWARE DETECTION USING ML AND DL ALGORITHMS FOR CYBERSECURITY

In above fig, we can see the metrics values of the Logistic Regression classifier which
shows the accuracy based on the the classes such as Adware, Banking, SMS malware,
Riskware, Benign.

SVM:

In above fig, we can see the metrics values of the Support Vector Machine classifier which
shows the accuracy based on the the classes such as Adware, Banking, SMS malware,
Riskware, Benign.

ABC Classifier:

MALLA REDDY ENGINEERING COLLEGE FOR WOMEN(AUTONOMOUS) Page 25


AUTOMATED ANDROID MALWARE DETECTION USING ML AND DL ALGORITHMS FOR CYBERSECURITY

In above fig, we can see the metrics values of the ABC classifier which shows the accuracy
based on the the classes such as Adware, Banking, SMS malware, Riskware, Benign. It
works through ABC analytics.

K-Nearest Neighbour:

In above fig, we can see the metrics values of the k-nearest neighbour classifier which
shows the accuracy based on the the classes such as Adware, Banking, SMS malware,
Riskware, Benign.

MALLA REDDY ENGINEERING COLLEGE FOR WOMEN(AUTONOMOUS) Page 26


AUTOMATED ANDROID MALWARE DETECTION USING ML AND DL ALGORITHMS FOR CYBERSECURITY

Overal Performance metrics:

In the above fig, the overall performance metrics of all the classifiers is shown together to
know the accuracy of the models. It was shown by bar graph which is known as a
visualization tool for the better understanding.

MALLA REDDY ENGINEERING COLLEGE FOR WOMEN(AUTONOMOUS) Page 27


AUTOMATED ANDROID MALWARE DETECTION USING ML AND DL ALGORITHMS FOR CYBERSECURITY

9. CONCLUSION

In conclusion, our proposed project for Android malware detection represents a significant
advancement in the field of cybersecurity. Through rigorous experimentation with machine
learning and deep learning models, including Random Forest with a remarkable accuracy
rate of 94% and Recurrent Neural Networks (RNN) consistently achieving 90%, we have
achieved outstanding results in classifying and identifying malicious software in Android
devices.

The advantages of our system extend far beyond these impressive accuracy metrics. It offers
a robust defense mechanism for Android users, safeguarding their personal data, privacy,
and overall digital experience. By accurately detecting and preventing malware, our system
mitigates the risks associated with malicious software, such as data theft, financial fraud,
and compromised device performance.

Furthermore, the adaptability of our RNN model, which can potentially improve its
accuracy with repeated runs, underscores its potential for ongoing optimization and
effectiveness. This adaptability aligns with the ever-evolving landscape of malware threats,
ensuring that our system remains resilient against new and emerging challenges.

In a world increasingly reliant on mobile technology, our project not only enhances Android
device security but also contributes to creating a safer digital environment for individuals
and organizations alike. The combination of machine learning and deep learning models
positions our system as a valuable asset in the fight against Android malware, offering peace
of mind and Protection to users in an interconnected world which makes the world more
innovative today.

MALLA REDDY ENGINEERING COLLEGE FOR WOMEN(AUTONOMOUS) Page 28


AUTOMATED ANDROID MALWARE DETECTION USING ML AND DL ALGORITHMS FOR CYBERSECURITY

10.FUTURESCOPE

These hurdles are based on various stages of our work and may be gradually rectified in
theworktobeundertakeninthefuture. Features declared mostly on the device are more durable
than the features specific to theapplications and therefore can usually automate malware
detection. The range of androidparameters for processing is rather big and difficult to detect
properly if someone does notextract thefeaturesproperly. There is still a fast increase in the
number of apps. Malware apps can always be identifiedin potential in combination with
methods based on AI or machine learning, such as ineptlearning, to make the detection more
sophisticated to make it easier to identify and regulateapp predictionrate.

Application behaviors in the malware ecosystem encourage non-emerging threats.


Ourstudy doesn’t incorporate the rider analysis or behaviors of repackaged malware. The
studysimplyusesthereverse-engineeredapkfilesandextractsthegivencontexttotheAndro-
Guardandextractsfeaturesinbinaryvectors.Althoughthisisamajorissueandakeychallengewith
the advancement in Android malware. This approach will be our advanced project to
performdifferential or effective analysis on reverse applications, determining the effects of
theseapplicationsandtheirresults.

The applications with time induce new features with enhanced malware abilities which
iswhy we would have to upgrade the system whenever the model’s FPR rate after
executionincreases. The simplest explanation for how to identify if the model is degrading
on evolvedfeatures is that our datasets are designed in binary matrix extracted from
features that arecurrently implemented in these applications and not features that will be
present in evolvedapps in coming years. With new features, we would have to reverse and
extract those featuresto form an updated dataset again to train on these classifiers

MALLA REDDY ENGINEERING COLLEGE FOR WOMEN(AUTONOMOUS) Page 29


AUTOMATED ANDROID MALWARE DETECTION USING ML AND DL ALGORITHMS FOR CYBERSECURITY

11. REFERENCES

[1] H. Rathore, A. Nandanwar, S. K. Sahay and M. Sewak, "Adversarial superiority in


Android malware detection: Lessons from reinforcement learning based evasion
attacks and defenses", Forensic Sci. Int. Digit. Invest., vol. 44, Mar. 2023.

[2] H. Wang, W. Zhang and H. He, "You are what the permissions told me! Android
malware detection based on hybrid tactics", J. Inf. Secur. Appl., vol. 66, May 2022.

[3] A. Albakri, F. Alhayan, N. Alturki, S. Ahamed and S. Shamsudheen,


"Metaheuristics with deep learning model for cybersecurity and Android malware
detection and classification", Appl. Sci., vol. 13, no. 4, pp. 2172, Feb. 2023.

[4] M. Ibrahim, B. Issa and M. B. Jasser, "A method for automatic Android malware
detection based on static analysis and deep learning", IEEE Access, vol. 10, pp.
117334-117352, 2022.

[5] L. Hammood, İ. A. Doğru and K. Kılıç, "Machine learning-based adaptive genetic


algorithm for Android malware detection in auto-driving vehicles", Appl. Sci., vol. 13,
no. 9, pp. 5403, Apr. 2023.

[6] P. Bhat and K. Dutta, "A multi-tiered feature selection model for Android malware
detection based on feature discrimination and information gain", J. King Saud Univ.-
Comput. Inf. Sci., vol. 34, no. 10, pp. 9464-9477, Nov. 2022.

[7] D. Wang, T. Chen, Z. Zhang and N. Zhang, "A survey of Android malware
detection based on deep learning", Proc. Int. Conf. Mach. Learn. Cyber Secur., pp. 228-
242, 2023.

[8] Y. Zhao, L. Li, H. Wang, H. Cai, T. F. Bissyandé, J. Klein, et al., "On the impact of
sample duplication in machine-learning-based Android malware detection", ACM
Trans. Softw. Eng. Methodol., vol. 30, no. 3, pp. 1-38, Jul. 2021.

[9] E. C. Bayazit, O. K. Sahingoz and B. Dogan, "Deep learning based malware


detection for Android systems: A comparative analysis", Tehničkivjesnik, vol. 30, no. 3,
pp. 787-796, 2023.

[10] H.-J. Zhu, W. Gu, L.-M. Wang, Z.-C. Xu and V. S. Sheng, "Android malware
detection based on multi-head squeeze-and-excitation residual network", Expert Syst.
Appl., vol. 212, Feb. 2023.

MALLA REDDY ENGINEERING COLLEGE FOR WOMEN(AUTONOMOUS) Page 30

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy