0% found this document useful (0 votes)
39 views4 pages

Detection of Spyware by Mining Executable Files

The document discusses a method for detecting spyware by mining executable files. It extracts features from binary files, performs feature reduction, and uses machine learning algorithms like Naive Bayes and SVM on the reduced feature sets to classify files as spyware or legitimate software, with the goal of detecting spyware.

Uploaded by

swamishailu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views4 pages

Detection of Spyware by Mining Executable Files

The document discusses a method for detecting spyware by mining executable files. It extracts features from binary files, performs feature reduction, and uses machine learning algorithms like Naive Bayes and SVM on the reduced feature sets to classify files as spyware or legitimate software, with the goal of detecting spyware.

Uploaded by

swamishailu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 4

Detection of Spyware by Mining Executable Files

Objectives
The main objective of our project is to establish a method in spyware detection
research using data mining techniques. These techniques are used for information
retrieval and classification. In application of techniques, there was only one change that
computer programs were used rather than text documents.
In this project, binary features are extracted from executable files. A feature
reduction method is then used to obtain a subset of data which is further used as a
training set for automatically generating classifiers. In this method, the generated
classifiers are used to classify new, previously unseen binaries as either legitimate
software or spyware. We will use appropriate value of n in order to yield high
performance, also suitable machine learning algorithm to produce high accuracy.

Project idea
The goal of the project is to detect spyware by using data mining and machine
learning. We use the Waikato Environment for Knowledge Analysis (WEKA) to perform
the experiments. WEKA is a suite of machine learning algorithms and analysis tools,
which is used in practice for solving data mining problems. First, we extract features
from the binary files and we then apply a feature reduction method in order to reduce data
set complexity. Finally, we convert the reduced feature set into the Attribute Relation File
Format (ARFF). ARFF files are ASCII text files that include a set of data instances, each
described by a set of features. Figure 2.1 shows the steps involved in our proposed
method.

Detection of Spyware by Mining Executable Files

Figure 2.1: Proposed System

We organized our work into following stages:


1. Data Collection
2. Byte Sequence Generation
3. N-gram Generation
4. Feature Extraction
5. Feature Reduction
6. ARFF Generation
7. Model Training

Step 1: Data Collection


Detection of Spyware by Mining Executable Files

Our data set consists of two classes of binary files:


(1) Benign files
(2) Spyware files.

Step 2: Byte Sequence Generation


This process makes file conversion from binary to byte sequence in each class.
We use xxd, which is a UNIX based utility for conversion.

Step 3: N-gram Generation


This process pieces out the byte sequences into a desired size of n (namely 4, 5
and 6). An n-gram is a sequence of n elements. This process also makes sure that each
line contains one n-gram and length of a single line is equal to the size of n.

Step 4: Feature Extraction


We extract the features by using two different approaches: Common Feature Based
Extraction (CFBE) and Frequency Based Feature Extraction (FBFE). Both methods are
used to obtain Reduced Feature Sets (RFSs) which are then used to generate the Attribute
Relation File Format (ARFF) files.
1. Frequency Based Feature Extraction (FBFE):
In FBFE, the frequency of each n-gram in each class is calculated.
2. Common Feature Based Extraction (CFBE):
In CFBE, the common n-grams are extracted from each class.

Step 5: Feature Reduction


In FBFE, all n-grams within a specified frequency range (50-500) are extracted
and the rest (1-49) are discarded. In CFBE, only one representation of each feature is
Detection of Spyware by Mining Executable Files

considered in one class. To obtain Reduced Feature Sets (RFSs) for CFBE and FBFE,
merge unique n-grams for both classes.

Step 6: ARFF Generation (Data Set Generation)


This process generates two ARFF databases: frequency based feature database
and common feature based database. All attributes in database are treated as Boolean
attributes. ARFF process searches for every n-gram in all byte sequences for a class and
assign a value to the attribute which can be either 1 or 0 on the present/not present
basis.

Step 7: Model Training


The ARFF file is used as input to WEKA for applying machine learning
algorithms. The algorithms used in the experiment are: ZeroR, Naive Bayes, SVM
(Support Vector Machines), J48, Random Forest and JRip.

Hardware Requirements

Pentium Processor, 1.6 GHz or advanced

RAM, 128 MB or more

HDD, 40 GB or more.

Software Requirements

Platform: Linux OS

Language: JAVA

Editor: G-Edit Editor

WEKA (Machine Learning Tool)

Detection of Spyware by Mining Executable Files

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy