VPD Final Report Updated
VPD Final Report Updated
Project Report on
By
RUDRAYYA S 4MN20CS039
SRINIDHI S 4MN20CS047
VARSHINI K Y 4MN20CS054
SAHANA D P 4MN21CS403
2023-2024
THANDAVAPURA MYSORE-571302
CERTIFICATE
This is to certified that the project work titled “DIGITAL DATA CONCEALMENT
USING ADVANCED STEGANOGRAPHY” has been successfully carried out by
RUDRAYYA S [4MN20CS039], SRINIDHI S [4MN20CS047], VARSHINI K Y
[4MN20CS054], SAHANA D P [4MN21CS403] bonafide students of Maharaja Institute
of Technology Thandavapura in partial fulfilment of requirements of Degree of
Bachelor of Engineering in Computer Science & Engineering of Visvesvaraya
Technological University, Belgaum during the academic year 2023-24. The project report
has been approved as it satisfies the academic requirements with respect to the project
work prescribed for Bachelor of Engineering Degree.
External Viva
Name of the Examiners Signature with date
1.
2.
ABSTRACT
Steganography, an ancient art form, has evolved into a modern technique for covert
communication. It involves concealing a message within an innocuous carrier, such as an image, audio,
or video file, to evade detection. Unlike cryptography, which encrypts a message, steganography hides
the existence of the message itself. Utilizing imperceptible alterations in the carrier, steganography
embeds bits of information, imperceivable to the human eye or ear, yet retrievable by intended recipients
using specialized tools. This clandestine method finds applications in various fields, including
cybersecurity, digital watermarking, and espionage. With the exponential growth of digital media and
communication channels, steganography poses both a threat and a defense mechanism in the realm of
information security. Its continuous development challenges researchers and practitioners to create
robust detection techniques while also advancing the sophistication of concealment methods, shaping the
ongoing cat-and-mouse game of covert communication.
LIST OF CONTENTS
Page
Sl No. Index
No.
1 INTRODUCTION 1-5
1.1 Introduction 1
1.2 Overview with Problem Identification 3
1.3 Objective 3
1.4 Scope 4
1.5 Existing system 4
1.6 Proposed system 4
1.7 Applications 5
2 LITERATURE SURVEY 6-7
3 SYSTEM REQUIREMENT SPECIFICATION 8-13
3.1 Functional requirement 8
3.2 Non-Functional requirement 11
3.3 Hardware requirement 12
3.4 Software requirement 13
4 SYSTEM ANALYSIS AND DESIGN 14-18
4.1 System Analysis 14
4.2 System Architecture 14
4.3 High level design 15
4.3.1 Data flow Diagram 16
4.3.2 Use Case Diagram 17
4.4 Low level Design 18
4.4.1 Flow Chart 18
5 IMPLEMENTATION 20-27
5.1 Data collection and preprocessing 20
5.2 User interface components 20
5.3 Backend implementation 20
5.4 Image steganography 21
5.5 Video steganography 22
5.6 Audio steganography 24
5.7 Text steganography 26
6 TESTING 26-30
6.1 Design of test case 28
6.2 Types of Testing 28
6.2.1 Unit Testing 28
6.2.2 Integration Testing 28
6.2.3 Functional Testing 29
6.2.4 System Testing 30
6.2.5 White Box Testing 30
6.2.6 Black Box Testing 30
7 RESULTS AND SNAPSHOTS 31-37
7.1 Result Analysis 32
7.2 Snapshots 35
CONCLUSION AND FUTURE ENHANCEMENT 36
BIBLIOGRAPHY 37
LIST OF FIGURES
Figure Page
Figures Title
No. No.
4.2 System Architecture 13
4.3.1 Sequence Diagram 14
4.3.2 Use Case Diagram 15
4.4.1 Flow Chart 16
6.1 Black Box and White Box Testing 28
7.1 Home page 31
7.2 Upload image 31
7.3 32
7.4 Red Light Signaled and movement detection begins 32
7.5 When movement detected either eliminated or declared as winner 33
Digital Data Concealment using Advance Steganography 2023-2024
CHAPTER 1
INTRODUCTION
1.1 Introduction
1.3 Objective
objective is to develop advanced steganalysis techniques capable of accurately
detecting hidden data within digital files. This involves leveraging machine learning, deep
neural networks, and signal processing methods to identify subtle deviations indicative of
steganographic manipulation, enhancing cybersecurity and safeguarding against covert
communication threats.
1.4 Scope
The scope encompasses researching and developing cutting-edge steganalysis
techniques to detect hidden data within various digital media formats. This includes
images, audio, video, and text files. The research involves leveraging machine learning
algorithms, deep neural networks, and signal processing methods to analyze complex
multimedia data and identify patterns indicative of steganographic manipulation. The goal
is to enhance cybersecurity measures and protect against covert communication threats in
the digital landscape.
1.7 Applications
1. Cybersecurity: The proposed system finds application in enhancing cybersecurity
measures by detecting covert communication channels used for malicious activities such
as data exfiltration, espionage, and cyberattacks.
2. Copy rights : by implementing the hidden data the owner ship of the original file can
be protected either the image ,audio ,video etc.
3. Intelligence Agencies: Intelligence agencies can employ the system to monitor digital
communications for covert messaging among potential threats, enhancing national
security efforts.
4. Corporate Security: Companies can use the system to protect sensitive information
and intellectual property by detecting attempts to hide data within digital files, thereby
mitigating risks of corporate espionage and data breaches.
5. Digital Watermarking: Forensic experts can employ the system to analyze digital
evidence and uncover hidden information within multimedia files, assisting in legal
proceedings and criminal investigations.
6. Social Media Monitoring: The system can be integrated into social media platforms
to detect covert communication channels used for spreading misinformation,
radicalization, or illicit activities.
CHAPTER 2
LITERATU
RE
SURVEY
Ritu sindhu Information edge bit embedding The final result of the
& pragathi hiding using application
singh -2020 steganography developed provides
the encryption and
decryption the data
that need to be
hidden and the data is
hidden in an cover
image providing the
total concealment
which cannot be seen
by eyes and the same
reversed method is
used to decrypt the
image to get the
secret message back.
Ayodeji
akinwumi Implementing Random bit The primary focus of
& image embedding steganography
Oluwatosi steganography communication for
n ogbeide technique secure data transfer
-2021 technique for ultimately providing
secure data hiding data security
in development of
android application
CHAPTER 3
SYSTEM REQUIREMENT SPECIFICATIONS
1.Multer
2.FFmpeg
3.GDAL
4.laravel
Multer:
Multer is a middleware for handling multipart/form-data in Node.js applications,
particularly useful for handling file uploads. It is designed to work seamlessly with
frameworks such as Express to simplify the process of uploading files from clients to
servers. Developed by the team behind Express.js, Multer provides a flexible and easy-
to-use solution for handling file uploads in Node.js applications.
One of the key features of Multer is its ability to handle various types of file uploads,
including single files, multiple files, and even complex forms with both text fields and
file inputs. It allows developers to specify the destination directory where uploaded files
should be stored and provides options for renaming files, limiting file size, and filtering
file types to enhance security and control over the upload process.
FFmpeg:
FFmpeg stands as a cornerstone in the realm of multimedia processing, offering a
comprehensive suite of tools for encoding, decoding, transcoding, and manipulating audio
and video files. Developed by Fabrice Bellard and maintained by a dedicated global
community, FFmpeg boasts an impressive array of features and capabilities. Its versatility
shines through its support for an extensive range of audio and video formats, including
popular standards like MP3, AAC, WAV, FLAC, MP4, and AVI, among others. This
cross-platform framework operates seamlessly on Linux, macOS, Windows, and BSD,
providing accessibility to a diverse user base. With its command-line interface, users can
execute commands to perform diverse multimedia operations effortlessly, from basic file
conversions to complex filtering and streaming tasks. Thanks to its modular architecture
and rich set of libraries, FFmpeg is highly flexible and extensible, empowering developers
to build custom multimedia applications or integrate its functionality into existing projects
with ease. From decoding and encoding audio/video streams to extracting tracks, applying
effects, and streaming content over networks, FFmpeg serves as a Swiss Army knife for
multimedia processing. Supported by an active community and extensive documentation,
FFmpeg remains an indispensable tool for multimedia enthusiasts, developers, and
professionals seeking powerful and reliable solutions for their audio and video processing
needs.
GDAL:
GDAL (Geospatial Data Abstraction Library) stands as a cornerstone in the
realm of geospatial data processing, offering a comprehensive suite of tools and functions
for reading, writing, and manipulating raster and vector geospatial data formats.
Developed collaboratively by a dedicated community of developers, GDAL boasts
support for an extensive range of formats, including GeoTIFF, ESRI Shapefile,
GeoJSON, and KML, among others. Its cross-platform nature ensures compatibility
across Linux, macOS, Windows, and various Unix-like systems, facilitating accessibility
across diverse environments. At its core, GDAL provides a unified interface for accessing
and processing geospatial data, enabling operations such as data conversion, reprojection,
resampling, and mosaicking. With a suite of command-line tools and bindings for popular
programming languages like Python and Java, GDAL empowers users to perform
common geospatial tasks efficiently, whether through direct command-line interaction or
Laravel passport
applications, offering a robust and flexible solution for securing APIs and protecting
sensitive data.
The Number One HTTP Server On The Internet¶. The Apache HTTP Server Project
is an effort to develop and maintain an open-source HTTP server for modern operating
systems including UNIX and Windows. The goal of this project is to provide a secure,
efficient and extensible server that provides HTTP services in sync with the current
HTTP standards.
HTML as the skeleton of a webpage. It's used to structure a webpage's content, and it
also tells the web browser how to display it. As Front-End Web Developer Pat DePuydt
explains in the video above, the front end is the part of the website a user or customer
interacts with. A lot goes into making the front end work, including database architecture,
frameworks, scaling solutions, and more. It includes: Styles: This includes the buttons,
layouts, inputs, text, images, and more.
Mysql Server :
The data in the system has to be stored and retrieved from database. Designing the
database is part of system design. Data elements and data structures to be stored have
been identified at analysis stage. They are structured and put together to design the data
storage and retrieval system.
A database is a collection of interrelated data stored with minimum redundancy to
serve many users quickly and efficiently. The general objective is to make database
access easy, quick, inexpensive and flexible for the user. Relationships are established
between the data items and unnecessary data items are removed. Normalization is done
to get an internal consistency of data and to have minimum redundancy and maximum
stability. This ensures minimizing data storage required, minimizing chances of data
inconsistencies and optimizing for updates. The MS Access database has been chosen
for developing the relevant databases.
Easy to Operate
The system should be easy to operate and should be such that it can be developed
within a short period and fit in the limited budget of the user.
Performance Requirements
Since the software is online, therefore much of the performance of the system
depends on the traffic that is present online and the speed of the Internet. We are trying
to give an improved performance by setting cookies to the functions.
Usability Requirements
The Navigation for the various operations is arranged in an orderly fashion based
on the requirements. The interface also must provide a soothing look to the eye of the
user.
Portability Requirements
The system should be portable and should be able to switch any environment
changes such as a change of database within a very short period.
Easy to Operate The system should be easy to operate and should be such that it
can be developed within a short period and fit in the limited budget of the user
Hard-Disk: 4OGB
• Coding Language: Server-side scripting languages like PHP, Node.js, or Python for
CHAPTER 4
The system design process builds up general framework building design. The
programming outline includes speaking to the product framework works in a shape that
may be changed into one or more projects. The prerequisite indicated by the end client
must be put systematically. An outline is an inventive procedure; a great configuration is
a way to the viable framework. The framework "Outline" is characterized as "The
procedure of applying different systems and standards with the end goal of
characterizing a procedure or a framework inadequate point of interest to allow its
physical acknowledgment”. Different configuration components are taken after to add to
the framework. The configuration detail portrays the components of the framework, the
segments or components of the framework, and their appearance to end clients.
Data flow: the path that the data takes between the external entities, processes and data
stores. It portrays the interface between the other mechanism and is shown with arrows,
typically label with a short data name, like "Billing details".
use cases represent the specific tasks or functionality the system can perform.
Relationships between the actors and use cases can be of different types, including
association, extend, and include. Association relationships show that an actor is
associated with a particular use case.
4.4.1 Flowchart
A flowchart is a diagram that represents a set of instructions. Flowcharts
normally use standard symbols to represent the different types of instructions. These
symbols are used to construct the flowchart and show the step-by-step solution to the
problem.
2. selecting the process : the user has two options either he can embed(encrypt) the data
or he can extract (decrypt) the file.
3. Input Stage:
- Digital media files (images, audio, video, text) for steganalysis.
- Parameters for machine learning algorithms and signal processing techniques.
4. Preprocessing Stage:
- Data preprocessing steps such as normalization, feature extraction, and dimensionality
reduction.
5. Steganalysis Stage:
- Application of steganalysis techniques, including statistical analysis, machine learning
algorithms, and signal processing methods.
- Detection of hidden data within digital files.
6. Postprocessing Stage:
- Analysis of steganalysis results.
- Filtering and refinement of detected covert communication channels.
7. Output Stage:
- Presentation of steganalysis findings.
- Reporting of detected covert communication activities.
- Integration with security ecosystems for further action.
Each stage may have multiple subprocesses, and the flow diagram would illustrate
the sequential flow of data and processes within the system.
CHAPTER-5
IMPLEMENTATION
5.1 Data Collection and Preprocessing
In the proposed steganalysis system, data collection and processing are essential
stages aimed at gathering a diverse dataset of digital media files and extracting relevant
features for subsequent analysis. During the data collection phase, various types of digital
media, including images, audio recordings, video clips, and text documents, are acquired
from different sources and platforms. Metadata extraction is performed to capture relevant
information such as file format, size, and timestamps. Additionally, the dataset undergoes
annotation and labeling to differentiate between files containing hidden data
(steganographic content) and those without (clean content). Subsequently, in the data
processing stage, preprocessing steps standardize the format and resolution of digital files,
while feature extraction techniques capture important characteristics such as pixel values
for images, frequency coefficients for audio, and textual features for documents.
Dimensionality reduction methods may be applied to reduce the feature space
dimensionality, followed by optional data augmentation techniques to increase dataset
diversity. Finally, the dataset is embedded or extracted as per the user choice Through
meticulous data collection and processing, the steganalysis system ensures the availability
of high-quality data for robust analysis and model development.
modals, to facilitate user interaction with the platform's features. Incorporate visual
elements, icons, and animations to enhance the user experience and provide feedback on
user actions
JPEG Images: JPEG (Joint Photographic Experts Group) images are one of the most
common types of images used for steganography due to their widespread use on the
internet and in digital photography. The lossy compression used in JPEG images allows
for hidden data to be embedded without significantly affecting the image quality.
PNG Images: PNG (Portable Network Graphics) images are another popular choice for
steganography. While PNG images use lossless compression, which preserves image
quality, they can still be used for hiding data by manipulating the pixel values.
BMP Images: BMP (Bitmap) images are uncompressed images that store color data for
each pixel in the image. While less common on the internet due to their larger file sizes,
BMP images can be used for steganography by directly manipulating the pixel values.
GIF Images: GIF (Graphics Interchange Format) images support animation and are
commonly used for simple graphics and animations on the web. While less common for
steganography due to their limited color palette and lossy compression, GIF images can
still be used for hiding data.
TIFF Images: TIFF (Tagged Image File Format) images are often used in professional
photography and publishing due to their support for lossless compression and high-
quality images. TIFF images can be used for steganography, although they are less
common for this purpose compared to JPEG and PNG images.
Read the cover image: Load the cover image (the image in which you want to hide
data) using PHP's imagecreatefromjpeg function.
Convert the secret data to binary: Convert the secret message or data that you want to
hide into binary format. Each character is represented by its ASCII value, which is then
converted to 8-bit binary representation.
Embed the binary data into the image: Iterate over each pixel of the cover image. For
each pixel, replace the least significant bit (LSB) of each color component (R, G, B) with
the corresponding bit of the secret message. This is done to minimize the visual impact on
the cover image.
Save the modified image: Save the image with the embedded data using PHP's
imagejpeg function.
Read the stego image: Load the stego image (the image containing hidden data) using
PHP's imagecreatefromjpeg function.
Extract the LSBs from each pixel: Iterate over each pixel of the stego image. Extract the
least significant bit (LSB) from each color component (R, G, B) of every pixel.
Convert the extracted binary data to text: Combine the LSBs to form binary bytes,
then convert each byte to its corresponding ASCII character.
Detect the end of the message: Continue decoding until a null character (ASCII code 0)
is encountered, indicating the end of the hidden message.
Output the decoded message: Present the decoded message, which was hidden within
the stego image..
AVI (Audio Video Interleave): AVI is a widely supported video format developed
by Microsoft. It supports both audio and video data and is commonly used for storing
multimedia content on Windows systems. AVI files are relatively straightforward and can
be manipulated for steganographic purposes.
MP4 (MPEG-4 Part 14): MP4 is a highly popular video format used for sharing
video content over the internet and on various devices. It supports advanced compression
techniques, making it suitable for steganography. MP4 files can contain audio, video, and
Department of CS&E, MITT Page 25
Digital Data Concealment using Advance Steganography 2023-2024
WAV (Waveform Audio File Format): WAV is a standard audio file format
developed by Microsoft and IBM. It is uncompressed and supports high-quality audio,
making it suitable for steganographic applications. WAV files are widely supported
across different platforms and software.
MP3 (MPEG-1 Audio Layer III): MP3 is a popular audio compression format
that reduces file size while maintaining perceptual audio quality. Due to its widespread
use and support, MP3 files are commonly used for steganography. However, embedding
data in MP3 files requires careful consideration of the compression algorithm's effects
on the hidden data.
options include:
FFmpeg: Use FFmpeg with PHP to manipulate audio files programmatically, including
reading, modifying, and saving audio files.
GDAL Library: Although primarily for geospatial data, GDAL can be used for basic
audio sample manipulation.
2) Convert Secret Message to Binary: Convert the secret message or data that you
want to hide into binary format.
3) Select Embedding Method: Choose a steganographic technique suitable for text,
such as whitespace manipulation, word substitution, or Unicode character modification.
4) Embed Binary Data into Text: Apply the chosen embedding method to hide
binary data within the cover text document. This could involve modifying whitespace
characters, substituting specific words or characters, or altering the formatting of the
text.
5) Save the Modified Text: Save the text document with the embedded data.
CHAPTER 6
TESTING
Unit testing involves the design of test cases that validate that the internal program
logic is functioning properly and that program inputs produce valid outputs. All decision
branches and internal code flow should be validated. It is the testing of individual
software units of the application .it is done after the completion of an individual unit
before integration. This is structural testing, that relies on knowledge of its construction
and is invasive. Unit tests perform basic tests at the component level and test a specific
business process, application, and/or system configuration. Unit tests ensure that each
unique path of a business process performs accurately to the documented specifications
and contains clearly defined inputs and expected results. The benefits of unit testing
include improved code quality, reduced software defects, faster debugging and
troubleshooting, and greater confidence in the reliability of the software. By isolating and
testing individual components of a system, developers can identify and fix defects more
quickly and efficiently, reducing the overall time and cost of software development.
they run as one program. Testing is event-driven and is more concerned with the basic
outcome of screens or fields. Integration tests demonstrate that although the components
were individually satisfied, as shown by successful unit testing, the combination of
components is correct and consistent. Integration testing is specifically aimed at exposing
the problems that arise from the combination of components. The benefits of integration
testing include improved software quality, reduced software defects, faster debugging and
troubleshooting, and greater confidence in the reliability of the software. By testing the
interactions between different software components, developers can ensure that the
software works as intended, and that all modules and services function correctly and
efficiently.
This testing can be complex and time-consuming, and requires careful planning and
coordination to ensure that all components are tested thoroughly and effectively. It should
be complemented by other testing methodologies such as unit testing, system testing, and
acceptance testing to provide a comprehensive and effective testing strategy.
System testing ensures that the entire integrated software system meets
requirements. It tests a configuration to ensure known and predictable results. An
example of system testing is the configuration-oriented system integration test. System
testing is based on process descriptions and flows, emphasizing pre-driven process links
and integration points. The benefits of system testing include improved software quality,
reduced software defects, faster debugging and troubleshooting, and greater confidence in
the reliability of the software. By testing the entire system as a whole, developers can
identify and fix defects and issues that may have been missed during integration testing or
other testing phases. However, system testing can be complex and time-consuming, and
requires careful planning and execution to ensure that all aspects of the software are
tested thoroughly and effectively. It should be complemented by other testing
methodologies such as unit testing, integration testing, and acceptance testing to provide a
comprehensive and effective testing strategy.
White Box Testing is a testing in which the software tester knows the inner
workings, structure, and language of the software, or at least its purpose. It is purpose. It
is used to test areas that cannot be reached from a black-box level. The benefits of white
box testing include improved code quality, reduced software defects, faster debugging
and troubleshooting, and greater confidence in the reliability of the software. By testing
the internal structure and implementation of the software, developers can identify and fix
defects and issues at an early stage, before they can propagate to other parts of the system
and become more difficult and expensive to fix. Therefore, it should be complemented by
other testing methodologies such as black box testing and grey box testing to provide a
comprehensive and effective testing strategy.
Black Box Testing is testing the software without any knowledge of the inner
workings, structure, or language of the module being tested. Black box tests, like most
other kinds of tests, must be written from a definitive source document, such as
specification or requirements document, such as specification or requirements document.
It is a testing in which the software under test is treated, as a black box. you cannot “see”
into it. The test provides inputs and responds to outputs without considering how the
software works. The benefits of black box testing include improved software quality,
reduced software defects, faster debugging and troubleshooting, and greater confidence in
the reliability of the software. By testing the software from the perspective of an end-user,
testers can identify and fix defects and issues that may impact the end-user experience.
However, black box testing may not cover all aspects of the software, such as internal
logic and control structures, and may not identify defects that are related to the software's
internal workings. Therefore, it should be complemented by other testing methodologies
such as white box testing and grey box testing to provide a comprehensive and effective
testing strategy.
CHAPTER 7
7.1 Analysis
Define the objectives: Firstly, establish the goals of the sign language recognition
project, specifying the types of sign language to recognize, the required accuracy rate,
and the performance metrics used for model evaluation.
Collect data: Gather a comprehensive dataset of sign language gestures along with
their corresponding labels. Divide the dataset into training, validation, and testing
subsets.
Train the model: Utilize deep learning techniques to train the model on the training
dataset. Experiment with various architectures and hyperparameters to optimize model
performance according to the predefined objectives.
Test the model: Assess the trained model's performance using the testing dataset,
measuring key metrics such as accuracy, precision, recall, and F1 score. Compare these
metrics against the established objectives.
Analyze the results: Examine the model's performance through techniques like
confusion matrix analysis, identifying instances of correct and incorrect predictions.
Investigate misclassifications to uncover patterns or challenges in recognizing specific
sign gestures.
Interpret the results: Interpret the findings within the context of the project's
objectives and evaluation metrics. Draw conclusions regarding the model's efficacy in
sign language recognition.
Make recommendations: Based on the conclusions drawn, propose suggestions for
enhancing the model or dataset. This may include expanding the dataset, incorporating
more diverse sign language variations, or exploring alternative deep learning
architectures.
Communicate the results: Present the project outcomes to stakeholders and
interested parties in a clear and concise manner, leveraging data visualizations and other
tools to support conclusions and recommendations effectively.
7.2 Snapshots
BIBLIOGRAPHY
[1] Ray, S. K., Ray, S. K., & Dey, S. (2020) Steganography for Digital Media Security:
Emerging Research and Opportunities. IGI Global..
[2] J. Li, J. Liu, G. Su, M. Zhang, and Y. Yang,(2021) ‘‘An generative steganography
method based on WGAN-GP,’’ in Proc. 2nd Int. Conf. Artif. Intell. Secur., Hohhot,
China, Communication
[3] Wahab O., Khalaf A., Hussein A. and Hamed F. (2021). Hiding Data Using Efficient
Combination of RSA Cryptography, and Compression Steganography Techniques,in
IEEE Access, vol. 9, pp. 31805-31815, 2021, doi: 10.1109/ACCESS.2021.3060317
[5] David Tidmarsh (2023). Guide to Steganography: Meaning, Types, Tools, &
Techniques. https://www.eccouncil.org/cybersecurity-exchange.
[6] X. Zhang, K. Chen, J. Ding, Y. Yang, W. Zhang and N. Yu,(2024). "Provably Secure
Public-Key Steganography Based on Elliptic Curve Cryptography," in IEEE
Transactions on Information Forensics and Security, vol. 19, pp. 3148-3163, 2024,
doi: 10.1109/TIFS.2024.3361219..