0% found this document useful (0 votes)

39 views4 pages

Detection of Spyware by Mining Executable Files

The document discusses a method for detecting spyware by mining executable files. It extracts features from binary files, performs feature reduction, and uses machine learning algorithms like Naive Bayes and SVM on the reduced feature sets to classify files as spyware or legitimate software, with the goal of detecting spyware.

Uploaded by

swamishailu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views4 pages

Detection of Spyware by Mining Executable Files

Uploaded by

swamishailu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 4

Detection of Spyware by Mining Executable Files

Objectives
The main objective of our project is to establish a method in spyware detection
research using data mining techniques. These techniques are used for information
retrieval and classification. In application of techniques, there was only one change that
computer programs were used rather than text documents.
In this project, binary features are extracted from executable files. A feature
reduction method is then used to obtain a subset of data which is further used as a
training set for automatically generating classifiers. In this method, the generated
classifiers are used to classify new, previously unseen binaries as either legitimate
software or spyware. We will use appropriate value of n in order to yield high
performance, also suitable machine learning algorithm to produce high accuracy.

Project idea
The goal of the project is to detect spyware by using data mining and machine
learning. We use the Waikato Environment for Knowledge Analysis (WEKA) to perform
the experiments. WEKA is a suite of machine learning algorithms and analysis tools,
which is used in practice for solving data mining problems. First, we extract features
from the binary files and we then apply a feature reduction method in order to reduce data
set complexity. Finally, we convert the reduced feature set into the Attribute Relation File
Format (ARFF). ARFF files are ASCII text files that include a set of data instances, each
described by a set of features. Figure 2.1 shows the steps involved in our proposed
method.

Detection of Spyware by Mining Executable Files

Figure 2.1: Proposed System

We organized our work into following stages:

1. Data Collection
2. Byte Sequence Generation
3. N-gram Generation
4. Feature Extraction
5. Feature Reduction
6. ARFF Generation
7. Model Training

Step 1: Data Collection

Detection of Spyware by Mining Executable Files

Our data set consists of two classes of binary files:

(1) Benign files
(2) Spyware files.

Step 2: Byte Sequence Generation

This process makes file conversion from binary to byte sequence in each class.
We use xxd, which is a UNIX based utility for conversion.

Step 3: N-gram Generation

This process pieces out the byte sequences into a desired size of n (namely 4, 5
and 6). An n-gram is a sequence of n elements. This process also makes sure that each
line contains one n-gram and length of a single line is equal to the size of n.

Step 4: Feature Extraction

We extract the features by using two different approaches: Common Feature Based
Extraction (CFBE) and Frequency Based Feature Extraction (FBFE). Both methods are
used to obtain Reduced Feature Sets (RFSs) which are then used to generate the Attribute
Relation File Format (ARFF) files.
1. Frequency Based Feature Extraction (FBFE):
In FBFE, the frequency of each n-gram in each class is calculated.
2. Common Feature Based Extraction (CFBE):
In CFBE, the common n-grams are extracted from each class.

Step 5: Feature Reduction

In FBFE, all n-grams within a specified frequency range (50-500) are extracted
and the rest (1-49) are discarded. In CFBE, only one representation of each feature is
Detection of Spyware by Mining Executable Files

considered in one class. To obtain Reduced Feature Sets (RFSs) for CFBE and FBFE,
merge unique n-grams for both classes.

Step 6: ARFF Generation (Data Set Generation)

This process generates two ARFF databases: frequency based feature database
and common feature based database. All attributes in database are treated as Boolean
attributes. ARFF process searches for every n-gram in all byte sequences for a class and
assign a value to the attribute which can be either 1 or 0 on the present/not present
basis.

Step 7: Model Training

The ARFF file is used as input to WEKA for applying machine learning
algorithms. The algorithms used in the experiment are: ZeroR, Naive Bayes, SVM
(Support Vector Machines), J48, Random Forest and JRip.

Hardware Requirements

Pentium Processor, 1.6 GHz or advanced

RAM, 128 MB or more

HDD, 40 GB or more.

Software Requirements

Platform: Linux OS

Language: JAVA

Editor: G-Edit Editor

WEKA (Machine Learning Tool)

Detection of Spyware by Mining Executable Files

Service Manual: Confidential
100% (3)
Service Manual: Confidential
167 pages
Fengine S5800 Switch Datasheet-V30R203
No ratings yet
Fengine S5800 Switch Datasheet-V30R203
5 pages
DWDM Lab Manual 7th Sem
No ratings yet
DWDM Lab Manual 7th Sem
45 pages
Deepak Dmbi File
No ratings yet
Deepak Dmbi File
40 pages
DWBI Lab Manual 2023-24 Final
No ratings yet
DWBI Lab Manual 2023-24 Final
40 pages
DM Lab Material
No ratings yet
DM Lab Material
88 pages
Malware Detection Using Machine Learning
No ratings yet
Malware Detection Using Machine Learning
11 pages
Internet 2016 1 40 40038
No ratings yet
Internet 2016 1 40 40038
6 pages
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Exp 6
No ratings yet
Exp 6
12 pages
All Computers
No ratings yet
All Computers
44 pages
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
From Everand
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
Mark Magic
No ratings yet
Malware Detection Using Machine Learning and Deep Learning
No ratings yet
Malware Detection Using Machine Learning and Deep Learning
10 pages
DMW_LabFile_0901CS243D11_swastik
No ratings yet
DMW_LabFile_0901CS243D11_swastik
25 pages
Deep Neural Network Based Malware Detection Using Two Dimensional Binary Program Features
No ratings yet
Deep Neural Network Based Malware Detection Using Two Dimensional Binary Program Features
10 pages
Malware Detection Using Frequency Domain-Based Image Visualization and Deep Learning
No ratings yet
Malware Detection Using Frequency Domain-Based Image Visualization and Deep Learning
10 pages
AI-43 Data Mining
No ratings yet
AI-43 Data Mining
96 pages
A - Multi-Strategy - Adversarial - Attack - Method - For - Deep - Learning - Based - Malware - Detectors
No ratings yet
A - Multi-Strategy - Adversarial - Attack - Method - For - Deep - Learning - Based - Malware - Detectors
5 pages
Malware Detection
No ratings yet
Malware Detection
29 pages
Malware Identification
No ratings yet
Malware Identification
28 pages
dwm
No ratings yet
dwm
19 pages
Data Mining Lab Manual COMPLETE GMR
No ratings yet
Data Mining Lab Manual COMPLETE GMR
66 pages
Malware Detection Using Supervised Machine Learning: Submitted To
No ratings yet
Malware Detection Using Supervised Machine Learning: Submitted To
8 pages
Information Security Management
No ratings yet
Information Security Management
12 pages
DMW lab Print
No ratings yet
DMW lab Print
21 pages
Mini Project
No ratings yet
Mini Project
11 pages
DWDM Print
No ratings yet
DWDM Print
20 pages
Malware Detection Using ANN
No ratings yet
Malware Detection Using ANN
10 pages
Malwarepjct PDF
No ratings yet
Malwarepjct PDF
70 pages
Efficient and Effective Malware Detection System
No ratings yet
Efficient and Effective Malware Detection System
5 pages
Intrusion Detection - DM
No ratings yet
Intrusion Detection - DM
34 pages
DM Manual-Min
No ratings yet
DM Manual-Min
100 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
ULBP-RF: A Hybrid Approach For Malware Image Classification: Abstract
No ratings yet
ULBP-RF: A Hybrid Approach For Malware Image Classification: Abstract
5 pages
Elizabeth Walkup, MacMalware
No ratings yet
Elizabeth Walkup, MacMalware
5 pages
Data Mining in Spam Detection
No ratings yet
Data Mining in Spam Detection
7 pages
Malware_Detection_in_PE_files_using_Machine_Learning
No ratings yet
Malware_Detection_in_PE_files_using_Machine_Learning
6 pages
Research Paper
No ratings yet
Research Paper
8 pages
Data Warehousing Lab Manual
No ratings yet
Data Warehousing Lab Manual
36 pages
BI_Experiment _No_1
No ratings yet
BI_Experiment _No_1
7 pages
itdw
No ratings yet
itdw
44 pages
Malcode Detection
No ratings yet
Malcode Detection
5 pages
Data Mining in Bioinformatics
No ratings yet
Data Mining in Bioinformatics
21 pages
data mining and warehousing
No ratings yet
data mining and warehousing
30 pages
FINAL DW Record PDF
No ratings yet
FINAL DW Record PDF
32 pages
Compusoft, 3 (10), 1116-1123 PDF
No ratings yet
Compusoft, 3 (10), 1116-1123 PDF
8 pages
Detecting Malware in Portable Executable Files Using Machine Learning Approach
No ratings yet
Detecting Malware in Portable Executable Files Using Machine Learning Approach
7 pages
08 Rohit Final Malware Research Paper
No ratings yet
08 Rohit Final Malware Research Paper
13 pages
WEKA Manual
No ratings yet
WEKA Manual
25 pages
CS-703 (B) Data Warehousing and Data Mining Lab
No ratings yet
CS-703 (B) Data Warehousing and Data Mining Lab
50 pages
Experiment No: 01 Data Exploration & Data Preprocessing
No ratings yet
Experiment No: 01 Data Exploration & Data Preprocessing
54 pages
DMlab - FilE prINCE
No ratings yet
DMlab - FilE prINCE
27 pages
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
DWDM LAB MANUAL
No ratings yet
DWDM LAB MANUAL
55 pages
DWDM2-2
No ratings yet
DWDM2-2
44 pages
ELF Et Virologie Informatique
No ratings yet
ELF Et Virologie Informatique
7 pages
akron1311042709
No ratings yet
akron1311042709
104 pages
A Study of Detecting Computer Viruses in Real-Infected Files in The N-Gram Representation With Machine Learning Methods
No ratings yet
A Study of Detecting Computer Viruses in Real-Infected Files in The N-Gram Representation With Machine Learning Methods
15 pages
Data Warehousing and Data Mining Lab
No ratings yet
Data Warehousing and Data Mining Lab
53 pages
606 (2)
No ratings yet
606 (2)
16 pages
Dm&pa Lab Manual
No ratings yet
Dm&pa Lab Manual
68 pages
Dwh Manual Merged
No ratings yet
Dwh Manual Merged
47 pages
QUERIE Collborative Database Exploration
No ratings yet
QUERIE Collborative Database Exploration
26 pages
A Hybrid Cloud Approach For Secure Authorized Deduplication
100% (4)
A Hybrid Cloud Approach For Secure Authorized Deduplication
9 pages
Multimedia Answer Generation For Community Question Answering
No ratings yet
Multimedia Answer Generation For Community Question Answering
17 pages
Web Image Re-Ranking Using Query-Specific Semantic Signatures
100% (1)
Web Image Re-Ranking Using Query-Specific Semantic Signatures
5 pages
Ns2 Project List
No ratings yet
Ns2 Project List
2 pages
MIDTERM Worksheet1
No ratings yet
MIDTERM Worksheet1
5 pages
RFT For PCS Consultancy and Implementation Services1
No ratings yet
RFT For PCS Consultancy and Implementation Services1
17 pages
be_computer-engineering_semester-7_2019_may_data-analytics-da-pattern-2015
No ratings yet
be_computer-engineering_semester-7_2019_may_data-analytics-da-pattern-2015
2 pages
(Production and Operations Management) Chapter 5 Summary
No ratings yet
(Production and Operations Management) Chapter 5 Summary
6 pages
Cluster Computing: A Paper Presentation On
No ratings yet
Cluster Computing: A Paper Presentation On
16 pages
Kenwood - kdc-mp2032, Mp232, Mp4033, w4534
No ratings yet
Kenwood - kdc-mp2032, Mp232, Mp4033, w4534
36 pages
Ullah Et Al. - 2022 - An Effective Approach to Detect and Identify Brain
No ratings yet
Ullah Et Al. - 2022 - An Effective Approach to Detect and Identify Brain
17 pages
Smart Home Specifications (OPTIONAL)
No ratings yet
Smart Home Specifications (OPTIONAL)
23 pages
mb_memory_am5_6L-L7-1_brunin (1)
No ratings yet
mb_memory_am5_6L-L7-1_brunin (1)
1 page
A D T 8 8 6 0 + T V 1 0 0 4 C M User's Manual
No ratings yet
A D T 8 8 6 0 + T V 1 0 0 4 C M User's Manual
76 pages
(NN4 - GROUP 5) MARKETING PLAN OF HERTFORDSHIRE UNIVERSITY - Document
No ratings yet
(NN4 - GROUP 5) MARKETING PLAN OF HERTFORDSHIRE UNIVERSITY - Document
19 pages
OOP Chapter 2 Lecture Notes 2021
No ratings yet
OOP Chapter 2 Lecture Notes 2021
19 pages
CAD Modeling in Construction Planning
No ratings yet
CAD Modeling in Construction Planning
13 pages
CMO 29 s2007 Annex IVB Laboratory Requirements For The BSCE
No ratings yet
CMO 29 s2007 Annex IVB Laboratory Requirements For The BSCE
19 pages
Tender Form
No ratings yet
Tender Form
175 pages
Q1-W2-Comprog (1) 2022
No ratings yet
Q1-W2-Comprog (1) 2022
11 pages
Non-Contact Forehead Infrared Thermometer User Manual: M. Feingersh & Co - LTD
No ratings yet
Non-Contact Forehead Infrared Thermometer User Manual: M. Feingersh & Co - LTD
16 pages
SAP Batch Determination Made Easy
No ratings yet
SAP Batch Determination Made Easy
12 pages
DSL DSL: Chief Scientist RAD Data Communications
No ratings yet
DSL DSL: Chief Scientist RAD Data Communications
59 pages
WhitePaper 5G User Registration For Dual Access Dual Connectivity March2019
No ratings yet
WhitePaper 5G User Registration For Dual Access Dual Connectivity March2019
26 pages
Real Time Object Detection Using Deep Learning Andmachine Learning Project
No ratings yet
Real Time Object Detection Using Deep Learning Andmachine Learning Project
56 pages
Reyrolle 7SR119 G3
No ratings yet
Reyrolle 7SR119 G3
2 pages
Big Idea 2 Questions Vs 01
No ratings yet
Big Idea 2 Questions Vs 01
5 pages
Hpes Summer Training Presentation
No ratings yet
Hpes Summer Training Presentation
20 pages
Lesson Plan Two Brackets
No ratings yet
Lesson Plan Two Brackets
2 pages
Chapter 10
No ratings yet
Chapter 10
15 pages
Neural Network Project Report.
No ratings yet
Neural Network Project Report.
12 pages
Corporate Presentation - PeopleLink Unified Mailing (C)
No ratings yet
Corporate Presentation - PeopleLink Unified Mailing (C)
121 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Detection of Spyware by Mining Executable Files

Uploaded by

Detection of Spyware by Mining Executable Files

Uploaded by

Detection of Spyware by Mining Executable Files

Detection of Spyware by Mining Executable Files

Figure 2.1: Proposed System

We organized our work into following stages:

Step 1: Data Collection

Our data set consists of two classes of binary files:

Step 2: Byte Sequence Generation

Step 3: N-gram Generation

Step 4: Feature Extraction

Step 5: Feature Reduction

Step 6: ARFF Generation (Data Set Generation)

Step 7: Model Training

Pentium Processor, 1.6 GHz or advanced

RAM, 128 MB or more

Editor: G-Edit Editor

WEKA (Machine Learning Tool)

Detection of Spyware by Mining Executable Files

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.