Steganogan: High Capacity Image Steganography With Gans
Steganogan: High Capacity Image Steganography With Gans
Figure 1. A randomly selected cover image (left) and the corresponding steganographic images generated by STEGANOGAN at approxi-
mately 1, 2, 3, and 4 bits per pixel.
than competing deep learning-based approaches with secret message between two actors. First, the information
similar peak signal to noise ratios. contained in a cryptogram is accessible to anyone who has
the private key, which poses a challenge in countries where
– We propose a new metric for evaluating the capacity of
private key disclosure is required by law. Furthermore, the
deep learning-based steganography algorithms, which
very existence of a cryptogram reveals the presence of a
enables comparisons against traditional approaches.
message, which can invite attackers. These problems with
– We evaluate our approach by measuring its ability to plain cryptography exist in security, intelligence services,
evade traditional steganalysis tools which are designed and a variety of other disciplines (Conway, 2003).
to detect whether an image is steganographic or not.
For many of these fields, steganography offers a promis-
Even when we encode > 4 bits per pixel into the image,
ing alternative. For example, in medicine, steganography
most traditional steganalysis tools still only achieve a
can be used to hide private patient information in images
detection auROC of < 0.6.
such as X-rays or MRIs (Srinivasan et al., 2004) as well as
– We also evaluate our approach by measuring its ability biometric data (Douglas et al., 2018). In the media sphere,
to evade deep learning-based steganalysis tools. We steganography can be used to embed copyright data (Mah-
train a state-of-the-art model for automatic steganalysis eswari & Hemanth, 2015) and allow content access control
proposed by (Ye et al., 2017) on samples generated systems to store and distribute digital works over the Inter-
by our model. If we require our model to produce net (Kawaguchi et al., 2007). In each of these situations, it
steganographic images such that the detection rate is at is important to embed as much information as possible, and
most 0.8 auROC, we find that our model can still hide for that information to be both undetectable and lossless
up to 2 bits per pixel. to ensure the data can be recovered by the recipient. Most
work in the area of steganography, including the methods
– We are releasing a fully-maintained open-source li-
described in this paper, targets these two goals. We propose
brary called STEGANOGAN1 , including datasets and
a new class of models for image steganography that achieves
pre-trained models, which will be used to evaluate deep
both these goals.
learning based steganography techniques.
3.1. Notation
2. Motivation
We have C and S as the cover image and the steganographic
There are several reasons to use steganography instead of image respectively, both of which are RGB color images and
(or in addition to) cryptography when communicating a have the same resolution W × H; let M ∈ {0, 1}D×W ×H
1
https://github.com/DAI-Lab/SteganoGAN be the binary message that is to be hidden in C. Note that D
SteganoGAN
is the upper-bound on the relative payload; the actual relative applying the following two operations:
payload is the number of bits that can reliably decoded
which is given by (1 − 2p)D, where p ∈ [0, 1] is the error
rate. The actual relative payload is discussed in more detail 1. Processing the cover image C with a convolutional
in Section 4. block to obtain the tensor a given by
The cover image C is sampled from the probability distribu- a = Conv3→32 (C) (1)
tion of all natural images PC . The steganographic image S
is then generated by a learned encoder E(C, M ). The secret
message M̂ is then extracted by a learned decoder D(S). 2. Concatenating the message M to a and then process-
The optimization task, given a fixed message distribution, ing the result with a convolutional block to obtain the
is to train the encoder E and the decoder D to minimize tensor b:
(1) the decoding error rate p and (2) the distance between
natural and steganographic image distributions dis(PC , PS ). b = Conv32+D→32 (Cat(a, M )) (2)
Therefore, to optimize the encoder and the decoder, we also
need to train a critic network C(·) to estimate dis(PC , PS ).
0
Basic: We sequentially apply two convolutional blocks to
Let X ∈ RD×W ×H and Y ∈ RD ×W ×H be two tensors of tensor b and generate the steganographic image as shown in
the same width and height but potentially different depth, D Figure 2b. Formally:
0
and D0 ; then, let Cat : (X, Y ) → Φ ∈ R(D+D )×W ×H be
the concatenation of the two tensors along the depth axis. Eb (C, M ) = Conv32→3 (Conv32→32 (b)), (3)
0
Let ConvD→D0 : X ∈ RD×W ×H → Φ ∈ RD ×W ×H be
a convolutional block that maps an input tensor X into a This approach is similar to that in (Baluja, 2017) as the
feature map Φ of the same width and height but potentially steganographic image is simply the output of the last convo-
different depth. This convolutional block consists of a con- lutional block.
volutional layer with kernel size 3, stride 1 and padding
Residual: The use of residual connections has been shown
‘same’, followed by a leaky ReLU activation function and
to improve model stability and convergence (He et al., 2016)
batch normalization. The activation function and batch nor-
so we hypothesize that its use will improve the quality of
malization operations are omitted if the convolutional block
the steganographic image. To this end we modify the basic
is the last block in the network.
encoder by adding the cover image C to its output so that
Let Mean : X ∈ RD×W ×H → RD represent the adaptive the encoder learns to produce a residual image as shown in
mean spatial pooling operation which computes the average Figure 2c. Formally,
of the W × H values in each feature map of tensor X.
Er (C, M ) = C + Eb (C, M ), (4)
3.2. Architecture
In this paper, we present STEGANOGAN, a generative adver- Dense: In the dense variant, we introduce additional connec-
sarial network for hiding an arbitrary bit vector in a cover tions between the convolutional blocks so that the feature
image. Our proposed architecture, shown in Figure 2, con- maps generated by the earlier blocks are concatenated to the
sists of three modules: (1) an Encoder that takes a cover feature maps generated by later blocks as shown in Figure
image and a data tensor, or message, and produces a stegano- 2d. This connectivity pattern is inspired by the DenseNet
graphic image (Section 3.2.1); (2) a Decoder that takes the (Huang et al., 2017) architecture which has been shown to
steganographic image and attempts to recover the data ten- encourage feature reuse and mitigate the vanishing gradient
sor (Section 3.2.2), and (3) a Critic that evaluates the quality problem. Therefore, we hypothesize that the use of dense
of the cover and steganographic images (Section 3.2.3). connections will improve the embedding rate. It can be
formally expressed as follows
3.2.1. E NCODER
The encoder network takes a cover image C and a message c = Conv64+D→32 (Cat(a, b, M ))
M ∈ {0, 1}D×W ×H . Hence M is a binary data tensor of d = Conv96+D→3 (Cat(a, b, c, M )) (5)
Ed (C, M ) = C + d
shape D × W × H where D is the number of bits that we
will attempt to hide in each pixel of the cover image.
We explore three variants of the encoder architecture with Finally, the output of each variant is a steganographic image
different connectivity patterns. All the variants start by S = E{b,r,d} (C, M ) that has the same resolution and depth
than the cover image C.
SteganoGAN
Encoder Decoder
(3, W, H)
Score
Critic
(a)
Figure 2. (a) The model architecture with the Encoder, Decoder, and Critic. The blank rectangle representing the Encoder can be any of
the following: (b) Basic encoder, (c) Residual encoder and (d) Dense encoder. The trapezoids represent convolutional blocks, two or more
arrows merging represent concatenation operations, and the curly bracket represents a batching operation.
The training objective is to to be less than or equal to the number of bits we can correct:
Figure 3. Randomly selected pairs of cover (left) and steganographic (right) images from the COCO dataset which embeds random binary
data at the maximum payload of 4.4 bits-per-pixel.
2
variances, σX and σY2 , and covariance σXY
2
of the images coding scheme discussed in Section 4 to produce our bits-
as shown below: per-pixel metric, shown in Table 1 under RS-BPP. We pub-
licly released the pre-trained models for all the experiments
shown in this table on AWS S33 .
(2µX µY + k1 R)(2σXY + k2 R)
SSIM = (16) The results from our experiments are shown in Table 1 –
(µ2X+ µ2Y + k1 R)(σX
2 + σ 2 + k R)
Y 2 each of the metrics is computed on a held-out test set of
images that is not shown to the model during training. Note
The default configuration for SSIM uses k1 = 0.01 and that there is an unavoidable tradeoff between the relative
k2 = 0.03 and returns values in the range [−1.0, 1.0] where payload and the image quality measures; assuming we are
1.0 indicates the images are identical. already on the Pareto frontier, an increased relative payload
would inevitably result in a decreased similarity.
5. Results and Analysis We immediately observe that all variants of our model per-
form better on the COCO dataset than the Div2K dataset.
We use the Div2k (Agustsson & Timofte, 2017) and COCO
This can be attributed to differences in the type of content
(Lin et al., 2014) datasets to train and evaluate our model.
photographed in the two datasets. Images from the Div2K
We experiment with each of the three model variants dis-
dataset tend to contain open scenery, while images from
cussed in Section 3 and train them with 6 different data
the COCO dataset tend to be more cluttered and contain
depths D ∈ {1, 2, ..., 6}. The data depth D represents the
multiple objects, providing more surfaces and textures for
“target” bits per pixel so the randomly generated data tensor
our model to successfully embed data.
has shape D x W x H.
In addition, we note that our dense variant shows the best
We use the default train/test split proposed by the creators
performance on both relative payload and image quality,
of the Div2K and COCO data sets in our experiments, and
followed closely by the residual variant which shows com-
we report the average RS-BPP, PSNR, and SSIM on the
parable image quality but a lower relative payload. The
test set in Table 1. Our models are trained on GeForce GTX
basic variant offers the worst performance across all metrics,
1080 GPUs. The wall clock time per epoch is approximately
achieving relative payloads and image quality scores that
10 minutes for Div2K and 2 hours for COCO.
are 15-25% lower than the dense variant.
After training our model, we compute the expected accuracy 3
http://steganogan.s3.amazonaws.com/
on a held-out test set and adjust it using the Reed-Solomon
SteganoGAN
Table 1. The relative payload and image quality metrics for each dataset and model variant. The Dense model variant offers the best
performance across all metrics in almost all experiments.
1
1
0.8 0.9
True Positive Rate
auROC
0.8
0.6
0.7
0.4
0.6
0.2
0.5
1 2 3 4 5 6 7
0 Number
D = 1, RS-BPP of Instances
= 1.0 D = 2, RS-BPP = 2.0
0 0.2 0.4 0.6 0.8 1 D = 3, RS-BPP = 2.9 D = 4, RS-BPP = 3.6
False Positive Rate D = 5, RS-BPP = 4.2 D = 6, RS-BPP = 4.4
Figure 5. The receiver operating characteristic (ROC) curve pro- Figure 6. This plot shows the performance of the steganography
duced by the StegExpose library for a set of 1000 steganographic detector on a held-out test set. The x-axis indicates the number of
images generated using the Dense architecture with a data depth different STEGANOGAN instances that were used, while the y-axis
of 6. The StegExpose library includes multiple steganalysis tools indicates the area under the ROC curve.
including SamplePairs (Dumitrescu et al., 2003), RSAnalysis
(Fridrich et al., 2001), ChiSquaredAttack (Westfeld & Pfitz-
mann, 2000), and PrimarySets (Dumitrescu et al., 2002). The
tool achieves an auROC of 0.59.
7.1. Traditional Approaches vector to the image vector, and applying feedfoward, reshap-
ing, and convolutional layers. They use the mean squared
A standard algorithm for image steganography is ”Highly
error for the encoder, the cross entropy loss for the discrim-
Undetectable steGO” (HUGO), a cost function-based al-
inator, and the mean squared error for the decoder. They
gorithm which uses handcrafted features to measure the
report that image quality suffers greatly when attempting to
distortion introduced by modifying the pixel value at a par-
increase the number of bits beyond 0.4 bits per pixel.
ticular location in the image. Given a set of N bits to be
embedded, HUGO uses the distortion function to identify The method proposed by (Zhu et al., 2018) uses the same
the top N pixels that can be modified while minimizing the loss functions as (Hayes & Danezis, 2017) but makes
total distortion across the image (Pevný et al., 2010). changes to the model architecture. Specifically, they “repli-
cate the message spatially, and concatenate this message
Another approach is the JSteg algorithm, which is designed
volume to the encoders intermediary representation.” For
specifically for JPEG images. JPEG compression works by
example, in order to hide k bits in an N × N image, they
transforming the image into the frequency domain using
would create a tensor of shape (k, N , N ) where the data
the discrete cosine transform and removing high-frequency
vector is replicated at each spatial location.
components, resulting in a smaller image file size. JSteg
uses the same transformation into the frequency domain, This design allows (Zhu et al., 2018) to handle arbitrary
but modifies the least significant bits of the frequency coef- sized images but cannot effectively scale to higher relative
ficients (Li et al., 2011). payloads. For example, to achieve a relative payload of 1 bit
per pixel in a typical image of size 360 × 480, they would
7.2. Deep Learning for Steganography need to manipulate a data tensor of size (172800, 360, 480).
Therefore, due to the excessive memory requirements, this
Deep learning for image steganography has recently been model architecture cannot effectively scale to handle large
explored in several studies, all showing promising re- relative payloads.
sults. These existing proposals range from training neural
networks to integrate with and improve upon traditional
steganography techniques (Tang et al., 2017) to complete 8. Conclusion
end-to-end convolutional neural networks which use adver- In this paper, we introduced a flexible new approach to im-
sarial training to generate convincing steganographic images age steganography which supports different-sized cover im-
(Hayes & Danezis, 2017; Zhu et al., 2018). ages and arbitrary binary data. Furthermore, we proposed a
Hiding images vs. arbitrary data: The first set of deep new metric for evaluating the performance of deep-learning
learning approaches to steganography were (Baluja, 2017; based steganographic systems so that they can be directly
Wu et al., 2018). Both (Baluja, 2017) and (Wu et al., 2018) compared against traditional steganography algorithms. We
focus solely on taking a secret image and embedding it experiment with three variants of the STEGANOGAN archi-
into a cover image. Because this task is fundamentally tecture and demonstrate that our model achieves higher rela-
different from that of embedding arbitrary data, it is difficult tive payloads than existing approaches while still evading
to compare these results to those achieved by traditional detection.
steganography algorithms in terms of the relative payload.
Natural images such as those used in (Baluja, 2017) and Acknowledgements
(Wu et al., 2018) exhibit strong spatial correlations, and The authors would like to thank Plamen Valentinov Kolev
convolutional neural networks trained to hide images in and Carles Sala for their help with software support and
images would take advantage of this property. Therefore, a developer operations and for the helpful discussions and
model that is trained in such a manner cannot be applied to feedback. Finally, the authors would like to thank Accenture
arbitrary data. for their generous support and funding which made this
Adversarial training: The next set of approaches for image research possible.
steganography are (Hayes & Danezis, 2017; Zhu et al., 2018)
which make use of adversarial training techniques. The key References
differences between these approaches and our approach are
the loss functions used to train the model, the architecture Agustsson, E. and Timofte, R. NTIRE 2017 challenge on
of the model, and how data is presented to the network. single image super-resolution: Dataset and study. In The
IEEE Conf. on Computer Vision and Pattern Recognition
The method proposed by (Hayes & Danezis, 2017) can only (CVPR) Workshops, July 2017.
operate on images of a fixed size. Their approach involves
flattening the image into a vector, concatenating the data
Almohammad, A. and Ghinea, G. Stego image quality
SteganoGAN
and the reliability of psnr. In 2010 2nd International Huang, G., Liu, Z., van der Maaten, L., and Weinberger,
Conference on Image Processing Theory, Tools and Ap- K. Q. Densely connected convolutional networks. IEEE
plications, pp. 215–220, July 2010. doi: 10.1109/IPTA. Conf. on Computer Vision and Pattern Recognition
2010.5586786. (CVPR), pp. 2261–2269, 2017.
Baluja, S. Hiding images in plain sight: Deep steganogra- Johnson, N. and C. Katzenbeisser, S. A survey of stegano-
phy. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, graphic techniques. 01 1999.
H., Fergus, R., Vishwanathan, S., and Garnett, R. (eds.), Kawaguchi, E., Maeta, M., Noda, H., and Nozaki, K. A
Advances in Neural Information Processing Systems 30, model of digital contents access control system using
pp. 2069–2079. Curran Associates, Inc., 2017. steganographic information hiding scheme. In Proc. of
Boehm, B. StegExpose - A tool for detecting LSB steganog- the 18th Conf. on Information Modelling and Knowledge
raphy. CoRR, abs/1410.6656, 2014. Bases, pp. 50–61, 2007. ISBN 978-1-58603-710-9.
Li, B., He, J., Huang, J., and Shi, Y. A survey on image
Conway, M. Code wars: Steganography, signals intelli-
steganography and steganalysis. Journal of Information
gence, and terrorism. Knowledge, Technology & Pol-
Hiding and Multimedia Signal Processing, 2011.
icy, 16(2):45–62, Jun 2003. ISSN 1874-6314. doi:
10.1007/s12130-003-1026-4. Li, B., Wang, M., Huang, J., and Li, X. A new cost function
for spatial image steganography. In 2014 IEEE Int. Conf.
Douglas, M., Bailey, K., Leeney, M., and Curran, K. An on Image Processing (ICIP), pp. 4206–4210, Oct 2014.
overview of steganography techniques applied to the pro- doi: 10.1109/ICIP.2014.7025854.
tection of biometric data. Multimedia Tools and Applica-
tions, 77(13):17333–17373, Jul 2018. ISSN 1573-7721. Lin, T., Maire, M., Belongie, S. J., Bourdev, L. D., Girshick,
doi: 10.1007/s11042-017-5308-3. R. B., Hays, J., Perona, P., Ramanan, D., Dollár, P., and
Zitnick, C. L. Microsoft COCO: common objects in
Dumitrescu, S., Wu, X., and Memon, N. On steganal- context. CoRR, abs/1405.0312, 2014.
ysis of random lsb embedding in continuous-tone im-
ages. volume 3, pp. 641 – 644 vol.3, 07 2002. doi: Maheswari, S. U. and Hemanth, D. J. Frequency domain
10.1109/ICIP.2002.1039052. qr code based image steganography using fresnelet trans-
form. AEU - International Journal of Electronics and
Dumitrescu, S., Wu, X., and Wang, Z. Detection of LSB Communications, 69(2):539 – 544, 2015. ISSN 1434-
steganography via sample pair analysis. In Information 8411. doi: https://doi.org/10.1016/j.aeue.2014.11.004.
Hiding, pp. 355–372, 2003. ISBN 978-3-540-36415-3.
Pevný, T., Filler, T., and Bas, P. Using high-dimensional im-
Fridrich, J., Goljan, M., and Du, R. Reliable detection of age models to perform highly undetectable steganography.
lsb steganography in color and grayscale images. In Proc. In Information Hiding, 2010.
of the 2001 Workshop on Multimedia and Security: New Reed, I. S. and Solomon, G. Polynomial Codes Over Certain
Challenges, MM&Sec ’01, pp. 27–30. ACM, 2001. Finite Fields. Journal of the Society for Industrial and
ISBN 1-58113-393-6. doi: 10.1145/1232454.1232466. Applied Mathematics, 8(2):300–304, 1960.
Hayes, J. and Danezis, G. Generating steganographic im- Srinivasan, Y., Nutter, B., Mitra, S., Phillips, B., and Ferris,
ages via adversarial training. In NIPS, 2017. D. Secure transmission of medical records using high
capacity steganography. In Proc. of the 17th IEEE Sympo-
He, K., Zhang, X., Ren, S., and Sun, J. Deep residual sium on Computer-Based Medical Systems, pp. 122–127,
learning for image recognition. IEEE Conf. on Computer June 2004. doi: 10.1109/CBMS.2004.1311702.
Vision and Pattern Recognition (CVPR), pp. 770–778,
2016. Tang, W., Tan, S., Li, B., and Huang, J. Automatic
steganographic distortion learning using a generative
Holub, V. and Fridrich, J. Designing steganographic dis- adversarial network. IEEE Signal Processing Letters,
tortion using directional filters. 12 2012. doi: 10.1109/ 24(10):1547–1551, Oct 2017. ISSN 1070-9908. doi:
WIFS.2012.6412655. 10.1109/LSP.2017.2745572.
Holub, V., Fridrich, J., and Denemark, T. Universal dis- Wang, Z., Bovik, A. C., Sheikh, H. R., and Simoncelli,
tortion function for steganography in an arbitrary do- E. P. Image quality assessment: from error visibility to
main. EURASIP Journal on Information Security, 2014 structural similarity. IEEE Trans. on Image Processing,
(1):1, Jan 2014. ISSN 1687-417X. doi: 10.1186/ 13(4):600–612, April 2004. ISSN 1057-7149. doi: 10.
1687-417X-2014-1. 1109/TIP.2003.819861.
SteganoGAN
Ye, J., Ni, J., and Yi, Y. Deep learning hierarchical rep-
resentations for image steganalysis. IEEE Trans. on
Information Forensics and Security, 12(11):2545–2557,
Nov 2017. ISSN 1556-6013. doi: 10.1109/TIFS.2017.
2710946.