Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Information hiding in digital images: Watermarking and steganography
(USC Thesis Other)
Information hiding in digital images: Watermarking and steganography
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
INFORMATION HIDING IN DIGITAL IMAGES: WATERMARKING AND STEGANOGRAPHY by Po-Chyi Su A D issertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirem ents for the Degree D O CTO R OF PHILOSOPHY (ELECTRICAL ENGINEERING) May 2003 Copyright 2003 Po-Chyi Su Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UMI Number: 3103970 UMI UMI Microform 3103970 Copyright 2003 by ProQuest Information and Learning Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. ProQuest Information and Learning Company 300 North Zeeb Road P.O. Box 1346 Ann Arbor, Ml 48106-1346 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UNIVERSITY OF SOUTHERN CALIFORNIA THE GRADUATE SCHOOL UNIVERSITY PARK LOS ANGELES, CALIFORNIA 90089-1695 This dissertation, written by PO-CHYI SU_____________________ under the direction o f h dissertation committee, and approved by all its members, has been presented to and accepted by the Director o f Graduate and Professional Programs, in partial fulfillment o f the requirements fo r the degree o f DOCTOR OF PHILOSOPHY Director D a t e May 1 6 , 2003 Dissertation Committee C ,~ c Chair Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Dedication Dedicated with love to my family. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Acknowledgements I would like to take this opportunity to express my deepest gratitude to my dissertation advisor, Prof. C.-C. Jay Kuo, for bringing me into the exciting field of m ultim edia signal processing and for his consistent instruction and encouragement. I learn a lot form his abundant knowledge, solid attitu d e and endless energy of doing research. The guidance under Prof. Kuo in these years helps me solve challenging problem s and, m ost im portantly, will benefit my entire career. I am grateful to Prof. Antonio O rtega and Prof. Ming-Deh Huang for taking their valuable tim e and efforts to serve on both my qualifying exam ination and dissertation committees. Their rich research experience significantly improves the quality of the thesis. Also, I would like to thank Prof. Zhen Zhang and Prof. Jong Won Kim for serving my qualifying exam ination committee and for their constructive comments and suggestions. I take pleasure in thanking all my colleagues in Prof. K uo’s research group. Working w ith them broadens my scope of research and makes my staying at the University of Southern California interesting. Special thanks are extended to Dr. Houng-Jyh Wang. The several papers on digital waterm arking th a t I worked w ith him have been very helpful to my doctorial study. His expertise in m ultim edia compression and processing deserves iii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. a lot of credit. And I would like to thank my colleague, Chung-Ping Wu. The fruitful discussion w ith him on m ultim edia security issues clarifies many im portant aspects of my research. Finally, there is nothing enough for me to show my appreciation to my family in Taiwan. I especially thank my parents and my brother for everything they give me. It is their continuous support th at makes me achieve one of my most im portant milestones. T hank you all very much. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Contents Dedication ii Acknowledgements iii List of Tables viii List of Figures ix Abstract xiii 1 Introduction 1 1.1 Significance of the R e s e a rc h ......................................................................................... 1 1.2 Contributions of the R e se a rc h ...................................................................................... 4 1.2.1 Inform ation Hiding in JPEG-2000 Compressed Im a g e s ........................ 5 1.2.2 Towards Affine-Invariant Digital Image W a te rm a rk in g ......................... 7 1.3 O rganization of the D issertation ............................................................................... 11 2 Overview of Information Hiding in Digital Images 13 2.1 General Concept of Inform ation Hiding .................................................................. 13 2.1.1 Inform ation Hiding, W aterm arking and S teganography......................... 13 2.1.2 Generic Model of Inform ation Hiding in Digital Images ..................... 14 2.1.3 Applications of Inform ation H id in g .............................................................. 17 2.1.4 Requirem ents of Inform ation H i d i n g ........................................................... 20 2.2 Approaches of Inform ation Hiding in Digital I m a g e s ............................................ 24 2.2.1 Inform ation Hiding w ith the Least Significant Bit M odification . . . 24 2.2.2 Inform ation Hiding by Changing the S t a t i s t i c s ....................................... 26 2.2.3 Inform ation Hiding in the Frequency D o m a in .......................................... 27 2.2.4 Spread Spectrum W aterm arking ................................................................. 29 2.2.5 O ther Inform ation Hiding A lg o r ith m s ........................................................ 35 3 Information Hiding in JPEG-2000 Compressed Images 37 3.1 Brief Review of JPEG-2000 .......................................................................................... 38 3.1.1 Forward/Inverse Image T r a n s f o r m .............................................................. 40 3.1.2 Q u a n tiz a tio n /D e q u a n tiza tio n ........................................................................ 41 3.1.3 Tier-1 C o d in g ....................................................................................................... 42 3.1.4 Tier-2 C o d in g ....................................................................................................... 46 v Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.1.5 R ate C o n t r o l ...................................................................................................... 48 3.1.6 Region of Interest C o d in g .................................................................... 50 3.2 Digital W aterm arking in JPEG-2000 Compressed Images .............................. 51 3.2.1 Framework for W a te rm a rk in g ...................................................................... 51 3.2.2 W aterm ark Em bedding ................................................................................ 55 3.2.3 W aterm ark D etection ................................................................................... 57 3.2.4 Progressive W aterm ark D e te ctio n ............................................................... 58 3.2.5 Region of Interest W a te rm a rk ...................................................................... 59 3.2.6 Threshold Decision and A n a ly s is ............................................................... 61 3.2.7 Experim ental Results .................................................................................... 67 3.3 Steganography in JPEG-2000 Compressed Images ........................................... 79 3.3.1 Challenges of Steganography in JPEG-2000 ........................................... 81 3.3.2 Progressive Em bedding of a Hidden Image and Its Drawbacks . . . . 83 3.3.3 Inform ation Em bedding w ith Lazy M ode C o d in g ................................. 85 3.3.4 Selection of Refinement Passes for Em bedding .................................... 87 3.3.5 Issues on Backward E m b e d d in g ................................................................... 89 3.3.6 Steganalysis of the Proposed Inform ation-Hiding Scheme ................ 94 3.3.7 Experim ental Results .................................................................................... 97 3.4 Conclusion ......................................................................................................................... 101 4 Towards Affine-Invariant Digital Image Watermarking 103 4.1 Discrete Fourier Transform of Im ag es......................................................................... 104 4.2 Previous Work on the Digital W aterm ark Resilient to Geom etrical Attacks 109 4.2.1 W aterm arking in RST Invariant D o m a in ..................................................... 109 4.2.2 Em bedding Tem plate for R e g is tr a tio n .........................................................113 4.2.3 Self-Reference Scheme by A u to c o rre la tio n ..................................................114 4.3 Structural Grid Signals for Synchronized W aterm ark D e te c tio n .......................117 4.3.1 C onstruction of the Grid P a t t e r n ................................................................... 119 4.3.2 Grid Signal E m b e d d in g ....................................................................................120 4.3.3 Grid Signal D e te c tio n ....................................................................................... 121 4.4 Spatial-Frequency Composite W aterm arking S c h e m e ........................................... 123 4.4.1 Frequency-Dom ain W aterm ark E m b e d d in g ...............................................124 4.4.2 Spatial-D om ain W aterm ark E m b e d d in g ......................................................129 4.4.3 Spatial-D om ain W aterm ark D e te c tio n .........................................................131 4.4.4 Frequency-Domain W aterm ark D etection .................................................. 138 4.4.5 Experim ental Results ....................................................................................... 142 4.5 Perceptual Block-based D CT W aterm arking Resisting Affine Attacks . . . . 149 4.5.1 W atson’s Perceptual M o d e l ............................................................................. 151 4.5.2 Structure of the Proposed S c h e m e ................................................................154 4.5.3 Signal E m b e d d in g .............................................................................................. 155 4.5.4 Signal D e te c tio n ..................................................................................................159 4.5.5 Experim ental Results ....................................................................................... 164 4.6 Comments on Grid Signal E m b e d d in g /D e te c tio n .................................................. 167 4.6.1 O ther Applications of Grid Em bedding ......................................................167 4.6.2 Robustness against Stirm ark Random Geometrical D istortion . . . . 169 vi Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.6.3 The Limit of Grid Em bedding for Digital W a te rm a rk in g ........................171 4.7 Conclusion ......................................................................................................................... 173 5 Future Work and Conclusion 175 5.1 Future Work .......................................................................................................................175 5.1.1 System Refinement of the Proposed A lg o rith m s.........................................175 5.1.2 Development of Video W aterm arking............................................................. 177 5.1.3 Universal W aterm ark D e t e c t o r ....................................................................... 179 5.2 Conclusion ......................................................................................................................... 180 Reference List 181 vii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. List of Tables 3.1 The num ber of false positive detections m easured from 100000 tested wa term ark sequences vs. estim ated num ber of false positive detections based on G aussian assum ption............................................................................................. 79 3.2 Performance comparison using the lazy and the norm al modes (PSNR in dB). 98 4.1 The correlation responses of different cropping attacks......................................145 4.2 Correlation responses of rotation, scaling and rotation/scaling attacks. . . . 146 viii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. List of Figures 2.1 The inform ation hiding system ..................................................................................... 15 2.2 (a) The model of a com m unication channel and (b) the precise waterm arking m odel.................................................................................................................................... 16 3.1 The block diagram of JPEG -2000................................................................................. 39 3.2 (a) The image to be compressed, (b) the contour of ROI, (c) the spatial- dom ain ROI mask, and (d) the wavelet-domain ROI m ask................................. 52 3.3 E xtra correlation gain from coefficient selection...................................................... 65 3.4 Original images vs. compressed/ waterm arked images: (a) original “Bike,” (b) waterm arked “Bike” (PSNR:32.49dB), (c) original “W oman” and (d) waterm arked “W oman” (PSNR:33.09dB). B oth images are w ith size 2048 x 2560 and are compressed w ith 0.5 b p p ............................................................... 69 3.5 W aterm ark detection results for (a) “Bike” and (b) “W om an” w ith 1000 w aterm ark sequences tested........................................................................................... 70 3.6 Progressive w aterm ark detection: (a) w aterm ark detection by using the progressive mode and (b) w aterm ark detection of the image at a bit rate of 0.01 b p p ................................................................................................................................ 71 3.7 ROI watermarking: (a) the fully reconstructed image from RO I coding b it stream , (b) the decoded image w ith a bit rate of 0.4 bpp, (c) the spatial difference of these two images (the lighter the pixel is, the larger the dif ference and the two black rectangles are the assigned ROI) and (d) the waterm ark detection result............................................................................................. 72 3.8 Postprocessing: (a) part of the “Bike” image before postprocessing, (b) part of the “Bike” image after postprocessing, in which the ringing artifacts are greatly reduced and (c) the w aterm ark detection result....................................... 73 ix Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.9 Compression attacks: (a) the w aterm arked “W oman” image compressed by JP E G w ith quality factor equal to 1 (PSNR: 22.61 dB), (b) the water m ark detection result of the JPE G -attacked image, (c) the watermarked “W oman” image compressed w ith SPIH T at a bit rate of 0.005 bpp (PSNR: 22.50 dB) and (d) the w aterm ark detection result of the SPIH T-attacked image..................................................................................................................................... 75 3.10 Color reduction attacks, (a) The attacked image w ith 4 colors and (c) the image undergoing halftoning, (b) and (d) are the detection response of (a) and (c) respectively........................................................................................................... 76 3.11 Images in extensive w aterm ark testing: (a) the original “Lena,” (b) the com pressed/w aterm arked “Lena” (bit-rate: 0.35 bpp, PSNR: 34.44 dB), (c) the original “Baboon” and (d) the com pressed/w aterm arked “Baboon” (bit-rate: 2 bpp, PSNR: 34.11 dB ).............................................................................. 77 3.12 Detection results of extensive waterm ark testing on the waterm arked im ages, (a) “Lena” and (b) “Baboon.” ....................................................................... 78 3.13 Additional MSE estim ation of (a) Lena, (b) Boat, (c) Peppers and (d) Baboon. 99 3.14 Capacity vs. additional MSE of the composite im age............................................ 101 3.15 PSN R of the composite image vs. PSN R of the hidden image............................ 102 4.1 The diagram of a w aterm arking scheme built on the R ST invariant domain. 110 4.2 (a) The Lena image and (b) the transform ed Lena image w ith the log-polar m apping...................................................................................................................................113 4.3 M ultiple embedding of the same registration pattern, where black circles are the initial pattern, circles w ith lines are the horizontally shifted pattern, circles w ith dots are the vertically shifted pattern and gray circles are the horizontally and vertically shifted p a tte rn ................................................................... 115 4.4 Autocorrelation of the w aterm arked im age.................................................................116 4.5 (a) The original Lena image, (b) the geometrically attacked Lena image, (c) the Lena image imposed w ith a grid and (d) the geometrically attacked Lena image w ith a grid im posed..................................................................................... 119 4.6 The autocorrelation function of the estim ated grid structure in (a) the wa term arked image, (b) the rotated waterm arked image and (c) the scaled waterm arked image.............................................................................................................. 123 4.7 The block diagram of the spatial-frequency composite w aterm arking scheme. 124 x Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.8 (a) The original image of size 352 x 288 and (b) the padded image of size 512 x 512 for w aterm ark em bedding...........................................................................125 4.9 (a) The Lena image, (b) m agnitudes of the D FT coefficients of the Lena image and (c) the region chosen for w aterm arking....................................................127 4.10 The embedding of a stronger w aterm ark in an image w ith varying charac teristics at different locations: (a) the original “Statue of Liberty” (b) the waterm arked “Statue of Liberty.” ..................................... 129 4.11 Spatial-dom ain w aterm ark embedding..........................................................................130 4.12 D FT m agnitudes of the composite waterm arked im age...........................................132 4.13 The Fourier spectrum of an image filtered by (a) H \ and (b) H 2 ........................ 133 4.14 (a) The Lena image w ith black boundaries, (b) the autocorrelation function of the H i filtered image and (c) the autocorrelation function of the H 2 filtered im age.........................................................................................................................134 4.15 (a) The original Lena image, (b) the image rotated by 15 degrees and cropped, (c) the recovered image by filling the background with zeros, (d) D FT m agnitudes of the zero-padded recovered image, (e) the recovered im age by filling the background with the m ean value and (f) D F T m agnitudes of the mean-filled recovered image..................................................................................140 4.16 (a) The original “Lena” and (b) the w aterm arked “Lena.” ................................. 143 4.17 Robustness test of JP E G compression................. 144 4.18 (a) Shearing the image by 5% in both horizontal and vertical directions, (b) the change of the aspect ratio by scaling the height by 0.8, (c) removing 17 rows and 5 columns, and (d) linear transform ation w ith aoo = 1.007, aoi = — 0.01, aio = — 0.01 and a n = 1.012, where aoo, aoi, aio and a n are the four param eters of affine m atrix as defined in (4.21)........................................ 147 4.19 Extensive tests for (a) the PSN R value and (b) the false positive rate analysis. 148 4.20 The w aterm arking scheme w ith grid embedding and detection............................ 154 4.21 Signal embedding of the proposed block-based scheme............................................155 4.22 The w aterm ark is em bedded in the shaded positions while the grid signal exists only in the other positions.................................................................................... 156 4.23 Em bedding of the w aterm ark sequence. “X” represents the pivot. The w aterm ark sequence is em bedded starting from the pivot w ith an inside-out m anner.....................................................................................................................................158 xi Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.24 The cross-correlation function of the folded grid and the embedded pattern in (a) the waterm arked image w ithout cropping and (b) the watermarked image w ith cropping........................................................................................................... 161 4.25 (a) The original Lena image and (b) the waterm arked Lena image with a PSN R value of 37dB........................................................................................................... 165 4.26 The robustness test of cropping..................................................................................... 166 4.27 Robustness test of scaling and rotation.......................................................................167 4.28 Aerial images taken from the same scene w ith different directions and heights. 168 4.29 (a) Lena w ith the grid (b) Lena w ith the grid after Stirm ark random geo m etrical distortion............................................................................................................... 170 xii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Abstract Inform ation hiding in m ultim edia data has draw n a lot of attention in recent years. A message is hidden in digital images, video and audio in an im perceptible m anner so as not to interfere w ith the normal usage of these host media files while its existence can be helpful for some interesting applications. The hidden inform ation related to content protection is usually term ed Digital Watermark. The recent proliferation of the information-hiding research mostly comes from the development of digital waterm arking techniques to meet the pressing need of content protection. The other interesting application is to transm it a large volume of data covertly in a m ultim edia file via inform ation hiding. The objective in this application is to conceal the very existence of the hidden inform ation using the innocuous m ultim edia data as a camouflage. D ata hiding under this scenario is often term ed Steganography, which literally means “covered w riting.” In this research, we focus on inform ation hiding in digital images. The dissertation consists of two m ajor parts. In the first p art, we investigate inform ation hiding in the state-of-the-art still image codec, JPEG-2000. A joint com pression/w aterm arking scheme is proposed for the copyright protection purpose. Next, we consider steganography in xiii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. JPEG-2000. Algorithms are developed to ensure th at a high volume of data can be hidden reliably in JPEG-2000 compressed images. In the second part, we aim at solving a very challenging problem in digital image waterm arking, i.e., synchronized detection under geometrical modifications. In order to derive a robust waterm arking scheme resilient to affine transform ations, we propose to embed a structural grid signal into digital images for synchronized waterm ark detection. A spatial-frequency composite w aterm arking scheme is designed so th a t the w aterm ark can survive attacks such as rotation, scaling and cropping, etc. A perceptual waterm arking scheme using block-based DCT methodology is also developed. It turns out th at grid em bedding/detection is of great help in developing generalized block-based waterm arking as well. Finally, the dissertation ends w ith some concluding rem arks and some future research topics as an extension of our current research. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 1 Introduction 1.1 Significance of the Research The rapid growth of com putational facilities and wide availability of network access lead to efficient digital d ata processing, delivery and storage. The superior compression technology further helps to compact digital content w ith high quality into a smaller d ata stream to facilitate its transm ission and m anipulation. In this m ultim edia era, people can not only enjoy all kinds of recreation, such as watching digital video, listening to digital audio and reading electronic publications, but also process the digitized m ultim edia data easily thanks to advanced utilities. Creation, authoring and dissem ination of m ultim edia content can be realized by ordinary users nowadays instead of lim ited to only a few professionals. These advantages definitely benefit our daily life but raise certain concerns. First of all, unlike traditional analog copying w ith which the quality of the duplicated content is degraded, a large num ber of perfect digital copies can be reproduced in a short period of time. The content providers could be reluctant to distribute their work in digital format since unlim ited copying of digital d ata by users without paying any royalty will cause them 1 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. a considerable am ount of financial loss. The shortage of m edia content will eventually hinder the progress of m ultim edia technologies. Protection of content owners’ intellectual property rights has to be enforced to ensure the owners receive w hat their hardworking and creativity deserve. Secondly, as great am ounts of m ultim edia d a ta are stored in the digital format, some digital data, such as digital photos or surveillance video, could be used as legal documents. However, digital docum ent can be tam pered or forged easily and the vulnerability of digital d ata may invalidate their legitim ate usage as evidence on the court. Thus, the authenticity of digital d ata has to be taken care of seriously. T hat is, in such forensic cases, we have to make sure th at the digital document is authentic and the inform ation content is not modified in delivery to its destination. Therefore, it comes a pressing need to develop effective ways to deter users from illegally reproducing or misusing digitized data. C ryptography is a classical m ethod to prevent digital d ata from unauthorized access. Digital data are transform ed by the encryption process so th a t the meaning of digital data becomes obscure to a person who intercepts the data but does not have a key for decrypting them. For example, the content owner can either encrypt a compressed video bit-stream or each of the video frames by a secret key. Users w ithout the proper decryption key cannot correctly decompress the bitstream (or the expanded video is not viewable) since video frames are scrambled in the encryption process. However, such protection may not be enough for m ultim edia d ata since the content is no longer protected after the intended receiver decrypts the video. The intended receiver may not follow the legitimate usage of these m ultim edia data th at he or she agreed before and may distribute them widely through high-speed networks or other convenient channels to make illegal profit. In other 2 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. words, cryptography only helps protect the digital content during its transmission. To compensate the deficiency of cryptography for m ultim edia data, researchers have consid ered and developed informat ion-hiding techniques to embed a signal in digital media to convey the inform ation of interest. The em bedded signal is known as a digital watermark. As traditional waterm arks exist in papers, the existence of a digital w aterm ark in a document does not interfere or severely degrade the quality of the content. Although a digital waterm ark is generally invisible or inaudible to hum an beings, it can be detected or extracted through com putations. Instead of being inserted in the header or the tail of the d ata that would be removed easily, a digital w aterm ark is em bedded in the content directly. The em bedded waterm ark may rem ain in the m edia even after normal data processing or form at conversion. It is this tight com bination of the embedded signal and the carrying m edia content th at makes digital waterm arking a promising technique to protect m ultim edia contents. For example, the hidden w aterm ark in a digital document can act as a proof of the authorship and/or to identify the source of the data. The presence of owner’s unique waterm ark in an investigated docum ent from an unauthorized possessor can prove the misuse of the document. The digital waterm ark may also help verify the originality of content, i.e. to identify if a suspected document is an original copy. Once the content is m anipulated, the embedded w aterm ark will be destroyed so th at the authenticator can examine the existence of the w aterm ark to show the integrity of contents. The other interesting application is the point-to-point secret communication between trusting parties to convey or exchange secure inform ation via inform ation hiding. The secret messages are hidden within another seemingly innocuous host media data to achieve 3 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. covert communication. This application can also be term ed as “Steganography,” which has long draw n trem endous interest in hum an history and may be applied in the m ilitary- related scenario. Based on the above discussion, we can see th at the research of information-hiding tech niques is significant in many applications. The designer should take different characteristics of the hidden information into account to achieve specific objectives. 1.2 Contributions of the Research In this research, we concentrate on two im portant issues of inform ation hiding in digital images. The first issue is related to inform ation hiding in JPEG-2000 [33], [84], [12], [1]. Along this direction, the following contributions have been made. • We developed a robust waterm arking scheme under the framework of this upcoming still image compression standard. • We studied steganographic applications, i.e. covert communication, in JPEG-2000 by hiding a large volume of inform ation in the JPEG-2000 compressed bit-stream . The second issue is related to robustness of a digital waterm ark against geometrical attacks, which is one of the most challenging problems in digital image waterm arking. Along this direction, this research has the following contributions. • We proposed to embed structural grid signals into digital images to tackle the syn chronization problem. • A spatial-frequency composite waterm arking scheme is thus derived to enable the em bedded digital w aterm ark to survive generalized geometrical modifications. 4 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • It was shown th at the algorithm s of grid embedding and detection also work well in block-based DCT waterm arking approaches, which are widely proposed and used in this field. • We evaluated the performance of each system by testing its resilience in various situations. More details of the contributions are given below. 1.2.1 Inform ation H iding in JP E G -2000 C om pressed Im ages JPEG-2000 is a new still image coding standard. It is intended as the successor of the existing JPE G standard in many im portant areas. In addition to its superior coding perfor mance in both low and high bit-rate compression applications, JPEG-2000 has numerous other interesting features, including progressive recovery of an image by fidelity or resolu tion, Region of Interest (ROI) coding, whereby different parts of an image can be coded w ith different fidelity, random access to particular regions of an image w ithout the need of decoding the entire coded stream and good resilience to bit-errors. Besides, JPEG-2000 allows efficient lossy and lossless compression w ithin a single unified coding framework. The state-of-the-art image codec is also designed to avoid excessive com putational and memory complexity. It is believed th at JPEG-2000 will be used widely and its rich fea tures and functionalities will benefit many emerging applications. As m any images will be compressed by JPEG-2000 in the near future, it is worthwhile to investigate efficient information-hiding techniques in JPEG-2000 compressed images. Therefore, we carry out the pioneering research on this subject. 5 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.2.1.1 Robust D igital W atermarking in JPEG-2000 First of all, we develop an integrated approach to JPEG-2000 and digital w aterm arking[79, 80, 72], By integrating the w aterm arking technique with JPEG-2000, the proposed wa term ark embedding and retrieval system is more efficient th an the existing waterm arking schemes. The waterm ark is em bedded in the discrete host image coefficients by exam ining bit-planes. A binary (or bi-polar) w aterm ark sequence is used, and the resulting waterm arked image coefficients also take only discrete values. Therefore, both waterm ark embedding and detection occur directly in the compressed domain. The integrated scheme eliminates both the need to compress the host image after w aterm ark embedding and the need to decompress the w aterm arked image before waterm ark detection. In addition to efficiency, the proposed scheme has many interesting features. The embedded waterm ark is robust against various signal processing attacks including com pression and filtering while the resulting waterm arked image m aintains good perceptual quality. The embedded w aterm ark can be detected progressively so th a t an operation, which is enabled via w aterm ark detection such as “never copy,” can be enforced earlier w ithout waiting for the whole image to be downloaded. Furtherm ore, ROI waterm arking can be easily coupled w ith ROI coding in the proposed scheme. In this case, when we receive ROI without the insignificant background, the waterm ark can still be detected without ambiguity. The em bedded waterm ark can be detected w ithout the knowledge of the original image so th at it is a “blind” waterm arking scheme. Experim ental results show th at the proposed integrated JPEG-2000 waterm arking performs very well and supports all above claims. 6 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.2.1.2 High-Volume Information Hiding in JPEG-2000 Next, we consider steganography in digital images compressed by JPEG-2000. The ob jective is to develop a d ata hiding scheme under the framework of JPEG-2000 so th at a large amount of inform ation can be secretly transm itted to the intended recipient in a reli able fashion. Therefore, instead of focusing on robustness as done in digital waterm arking for the copyright protection purpose, we take capacity and reliability into more serious consideration in high-volume inform ation hiding. However, the coding structure of JPEG-2000 limits this upcoming standard from be ing a reliable host for reliable inform ation hiding. We analyze the problem by examining the JPEG-2000 coding flow and determ ine appropriate positions for effective d ata hiding. Practical schemes are then proposed to embed a large am ount of d a ta into the JPEG-2000 compressed bit-stream in a reliable m anner [76]. Several design issues were examined to help achieve a b etter balance among the complexity, capacity and visual quality. Experi m ental results are given to dem onstrate the performance of the proposed algorithms. 1.2.2 Towards A ffine-Invariant D ig ita l Im age W aterm arking Most existing w aterm arking schemes are based on the additive spread-spectrum m ethod because of its robustness against noise and distortions. However, m ost spread-spectrum waterm arking m ethods fail to detect the w aterm ark when the waterm arked image under goes geometrical modifications such as cropping, rotation, scaling, or even the change of the aspect ratio. These operations are accessible to most casual users and can be applied with a low com putational cost. Since images after geometrical change usually preserve the 7 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. same perceptual quality of the original image, a practical w aterm arking scheme m ust be robust against geometrical attacks. The m ain reason for the failure of the spread-spectrum w aterm arking scheme under ge om etrical m odification is the loss of synchronization between the waterm ark detector and the embedded waterm ark. Spread-spectrum waterm arking m ethods adopt the m atched filter (or called the correlation detector) to detect the waterm ark. It determines the exis tence of a certain w aterm ark by calculating its sim ilarity or correlation with the extracted signal. Since the possibly existing w aterm ark is hidden in a very strong noise, i.e. the image content, it is very difficult for the waterm ark detector to predict the correct position of the hidden w aterm ark when the image is cropped, scaled or rotated. Exclusive searching is com putationally impossible especially for two-dimensional digital images. Although the searching process could be simplified if the original image is available during waterm ark detection and some known pattern recognition or registration processes could be applied, the performance is still not satisfactory because a very precise synchronization is usually required for w aterm ark detection. Furtherm ore, cropping or scaling can cause information loss of the w aterm ark so th at the embedded inform ation may not be correctly determined. We tackle the synchronization problem by structural grid embedding. The idea is mo tivated by the algorithm proposed by K utter [40]. W ith different underlying waterm ark signals being used, we develop two waterm arking schemes to resist affine modifications. 1.2.2.1 Spatial-Frequency Composite Wat er mar king We first propose a spatial-frequency composite w aterm arking scheme[73, 74] to resist crop ping attack and generalized affine transform ations. Two signals will be embedded into a 8 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. digital image, one in the frequency dom ain, or more specifically, Fourier dom ain and the other in the spatial dom ain of the image. The signal em bedded in the Fourier dom ain contains the desired hidden information, i.e. the digital w aterm ark th a t will be carried with the host image. The signal em bedded in the spatial dom ain, i.e. the grid signal, is used to achieve self-registration of the investigated image w ith the original image. The self-registration process can convert an image to its original orientation and scale w ithout explicitly resorting to the original host image or the w aterm arked image. After the regis tration process as a result of spatial-dom ain grid signal detection, the hidden inform ation can then be successfully determ ined from the registered image by detecting the frequency- domain waterm ark. B oth detection processes of spatial-dom ain grid and ffequency-domain w aterm ark are blind (i.e. not requiring the original im age). Experim ental results will show th at the em bedded w aterm ark survives a com bination of m anipulations applied to the wa term arked images. 1.2.2.2 Perceptual Block-based DCT Watermarking Block-based waterm arking is a general type of schemes, in which the image is divided into blocks for w aterm ark embedding and detection. A clear benefit is th at local visual masking effects can be incorporated into the waterm arking system. W ith divided blocks and their local statistics, the balance of robustness and invisibility of the waterm ark can be achieved in a decent way. In blocks with more activities, a stronger waterm ark can be embedded w ithout being noticed and the w aterm ark can resist more serious attacks as well. It should be noted th at the block-based Discrete Cosine Transform (DCT) is utilized in many existing waterm arking schemes. By decomposing the image into several frequency bands 9 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. using DCT, we can embed the w aterm ark in the significant frequency coefficients to attain a robust waterm ark. The waterm arked image is then formed by applying inverse DCT to each image block. Besides, block-based D C T is essential in im age/video compression, e.g. JPEG , M PEG 1/2/4. The sensitivity of the hum an visual system to DCT basis functions has thus been extensively studied, resulting in a recommended quantization table for JPEG [55, 91]. It is advantageous to use the visual model w ithin this framework in block-based wa term arking systems to reduce the im pact of quality degradation and, at the same time, to make the w aterm ark survive JP E G compression attack better. Real-tim e applications could also be achieved by applying w aterm arking in the same dom ain as image compres sion. Moreover, spatial division can be used to increase the am ount of the embedded waterm ark, i. e. groups of blocks are em bedded w ith different inform ation so th at m ultiple bits can be carried by a single image. Security of the system can also be increased by scrambling the order of the blocks so th at the waterm ark cannot be detected correctly w ithout the correct descrambling process. Existing research on block-based w aterm arking focuses on the robustness against filter ing/ compression attacks and perceptual issues. However, as m entioned earlier, geometrical attacks are very difficult to resist, especially for block-based m ethods. The geometrical attack will seriously limit the usage and applications of block-based w aterm arking tech niques. In block-based DCT w aterm arking schemes, the detector has first to determ ine the correct position to calculate DCT. Several trials of D CT would be needed when only simple cropping attacks are applied to the investigated image. Nevertheless, if the image 10 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. undergoes certain rotation and scaling, w aterm ark detection will become a very difficult task if no extra inform ation or mechanisms are supplied to achieve synchronization. To compensate the deficiency of block-based waterm arking schemes, we extend our research on composite waterm arking to robust and perceptual w aterm arking using block- based DCT [75]. Again, there will be two kinds of signals em bedded into the image: the block-based w aterm ark for carrying the hidden inform ation and the grid signal for self-registration. Both embedding processes take hum an visual effects into consideration and the interference between the two signals is minimized. The complete waterm ark em bedding/detection structures and experim ental results will dem onstrate the advantages of the proposed algorithm . 1.3 Organization of the Dissertation The content of the dissertation is organized as follows. In C hapter 2, we describe the general framework of the waterm arking systems followed by the discussion of some fa mous and early waterm arking m ethods. In Chapter 3, we investigate inform ation hiding in JPEG-2000. There are two parts in this chapter, i.e. digital w aterm arking and covert communication in JPEG-2000. A brief review of JPEG-2000 compression will be pre sented first. Next, we will propose the detailed algorithm for integrating waterm arking procedures w ith JPEG-2000 compression. The resulting system is theoretically analyzed and empirically tested. High-volume inform ation hiding in JPEG-2000 compressed images is then examined. Effective algorithm s are proposed to achieve reliable covert communica tion. In Chapter 4, we tackle the synchronization problem to achieve digital w aterm ark ing schemes resisting generalized geometrical modifications. We illustrate some existing 11 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. m ethods to cope w ith geometrical attacks and discuss advantages and lim itations of these algorithms. Then, we propose our own solution, which makes use of structural grid sig nals to achieve synchronized w aterm ark detection. We propose a novel spatial-frequency composite waterm arking scheme, which shows an impressive overall perform ance and is re silient to generalized affine transform ation. T hen we apply the sim ilar idea to block-based waterm arking schemes. After discussing the properties of block-based D C T waterm arking m ethods, we adopt a perceptual block-based DCT waterm arking algorithm as the base line scheme and enable it to survive affine transform attacks using structural grid signal embedding. Finally, possible extensions of the current research and concluding rem arks of the dissertation are given in C hapter 5. 12 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 2 Overview of Inform ation H iding in D igital Im ages 2.1 General Concept of Information Hiding 2.1.1 Inform ation H iding, W aterm arking and Steganography At the beginning, we would like to clarify the definitions of inform ation hiding, water marking and steganography. Although these three term s share m any similarities and could even be interchangeable in some literature, certain fundam ental differences lead us to define them as follows. Inform ation hiding is a general practice encompassing a broad range of applications in which the messages are embedded into the other m edia content for varying purposes. W aterm arking and steganography are two types of inform ation hiding. Steganography, which is derived from the Greek words meaning covered writing, hides the secret message into an innocuous host content to achieve covert communication. In order to act as a successful camouflage to conceal the very existence of the secret message, the host media content is usually chosen to have nothing to do w ith the hidden inform ation. Similar to 13 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. steganography, w aterm arking is also a procedure of im perceptibly embedding the infor m ation, i.e., a digital waterm ark, into the content. However, a digital waterm ark usually represents the ownership of the content, the identity of the legitim ate content user or other information used to help protect the host content. In other words, there exists a strong relationship between the em bedded digital w aterm ark and the host content. Besides, in order to achieve the intended functions, the existence of a digital w aterm ark is usually known to the users, in contrast to the fact th at the hidden inform ation in steganography is kept secret to th e public. Therefore, the dependency between the host media content and hidden inform ation differentiates digital waterm arking and steganography. 2.1.2 G eneric M od el of Inform ation H iding in D ig ita l Im ages Fig. 2.1 illustrates the generic model of inform ation hiding systems for digital images. In the embedding process, the input data include the original image, the hidden information and a key for security issues and the output is the composite image, or waterm arked image in applications of digital waterm arking. The original image can be of the compressed or uncompressed format. Many schemes work in the uncompressed form at to tie the hidden inform ation more closely w ith the image content while operating in the compressed form at may increase efficiency of implem entation. The hidden inform ation can be a text, a sequence of num ber, a binary logo or even a gray-level image. The key, which can be public or private depending on who has the access privilege of the hidden information, is used to increase the security of the system. The key can also be viewed as an identification num ber of the legitim ate user. In the detection process, the input d ata are the image of interest and the key. The original image may or may not be required in the detection 14 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. process. The output is either the hidden inform ation itself or some kind of confidence m easurement of its existence. In the latter case, the targeted hidden inform ation may be used by the detector for comparison. Original Hidden Investigated Original image or image information image hidden information Embedding process Detection process Composite image Hidden information or confidence measurement Figure 2.1: The inform ation hiding system. In digital w aterm arking, we can further adopt a viewpoint of comm unication to describe the model in a more precise way. Fig. 2.2(a) shows a classical comm unication model. In communication, the message to be transm itted is first encoded. The encoding process transform s the d ata stream for error correction an d /o r frequency spreading purposes. The encoded message is then used to m odulate a carrier signal and transm itted through the channel, where it encounters additive noises. The receiver dem odulates the noisy signal to a coded message. Finally, this coded message is decoded to produce the received message. One way to express a waterm arking model is to apply the comm unication model directly. The m edia to be waterm arked is viewed as the additive noise. Then waterm arking becomes a weak signal detection problem because the w aterm ark extractor/decoder needs to retrieve the w aterm ark from a very noisy channel. 15 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A more delicate way to delineate a waterm arking model is shown in Fig. 2.2(b). The m odulation step is replaced by the step of embedding the encoded message into some media content and the dem odulation step is replaced by the step of extracting the w aterm ark from the received signal. The noise in the transm ission channel results from various m edia processing procedures such as compression, filtering or malicious attacks aimed to destroy the message. Note there exists a second receiver in the waterm arking model, i.e., the hum an perceptual system, which should receive a message th at is perceived the same as the carrier media content. This fidelity constraint is analogous to the signal power constraint in a comm unication channel. Transmission Noise Message Encoder Modulator Decoder Demodulator Message (a) Message Media to be Watermarked Distortion Message Perceived Media Encoder Watermark Embedder Decoder Human Perceptual System Display Watermark Extractor (b) Figure 2.2: (a) The model of a com m unication channel and (b) the precise waterm arking model. 16 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.1.3 A pp lications o f Inform ation H iding Several interesting applications may be achieved by using the techniques of information hiding. We briefly m ention some of the applications that have been considered. 1. Copyright Protection The pressing need of protecting intellectual property rights is the m ain driving force of the research in inform ation hiding. A digital waterm ark representing the copyright information can be em bedded into the media d ata so th a t the owners can claim their ownership of the d ata in court by extracting the w aterm ark unambiguously if someone infringes on their copyright. 2. Fingerprinting The application of fingerprinting works in a slightly different way to protect the media content. To avoid unauthorized duplication and distribution of the m ultim edia content, an author can embed a distinct label as fingerprint into each copy of the data. If an unauthorized copy is found later, the origin of the copy can be traced by retrieving the fingerprint. Besides, if the fingerprint represents certain custom ers’ personal inform ation such as the credit card number, the original buyers will think twice before they distribute the received d ata to others because they will run the risk of spreading their own personal inform ation around. 3. A uthentication M ultim edia d ata in digital form at facilitate modifying and editing b ut problems arise when the possibly tam pered d ata are to be used for legal purposes or the 17 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. authenticity of the data is im portant to the users. In such situations, the d ata must be credible, i.e., the inform ation content in the signal is not modified in transit to its destination. Inform ation hiding provides a tam per-proofing tool for digital content. Once the content is m anipulated, the embedded signal or w aterm ark will be affected or even destroyed so th at its status or existence can be used to verify the integrity of the content. Although authentication of m edia content can be achieved through conventional cryptographic techniques, the advantage of using inform ation hiding is th a t the authenticator is inseparably bound to the content, which simplifies data handling. 4. Usage Control In some applications in which the m ultim edia content needs special hardw are for copying or viewing, a digital waterm ark can be used to control the usage, such as the perm ission of viewing, listening or recording, etc. If the m edia player or recorder detects illegal copies based on an unm atched waterm ark, it will refuse to play or record the digital data. 5. Convert comm unication The nature of m edia data, such as images, audio and video, provides a good host for hiding the high-volume inform ation in steganographic applications. By using the innocuous host m edia d a ta as a cover, we may fool possible eavesdroppers to communicate w ith the trusted party secretly. Prom a com m unication viewpoint, the host m edia can be seen as a secret channel. Obviously, covert comm unication may be 18 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. of m ilitary usage. Besides, some governments lim it the usage of encryption services and inform ation hiding may be a way to bypass the restriction. 6. Broadcast m onitoring Broadcast m onitoring is one of the potential applications th at can be achieved by in form ation hiding techniques. Advertisers, who pay the television or radio stations for commercial advertisem ents, would like to ensure those advertisem ents be broadcast as promised. The insertion and detection of the hidden signal may help them to ver ify this w ithout involving much hum an effort for monitoring. Broadcast m onitoring via inform ation hiding can be achieved by either w aterm arking or steganography. We may embed the signal in some particular segments of the broadcast channels, which are available b u t not used for content transmission. The drawback is th at special equipm ent may be required for handling the additional signal. Em bedding a digital waterm ark in the media content as the controlled signal has the advantage of being fully com patible w ith the installed broadcast equipm ent. The prim ary disadvantage is its com paratively complicated em bedding/detection process. 7. A nnotation The bits em bedded into the m edia data may comprise an annotation, giving further inform ation about the media content. For example, a photographic image could be labeled to describe the tim e and place the photograph was taken, a procedure th at could be done autom atically by the processor in a camera. In m ultim edia databases, a digital waterm ark may represent a serial num ber or an index for efficient management. Besides, a digital w aterm ark can be a flag to indicate types and 19 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. properties of the content, which may possibly be used to track pornography on the network. In medical applications, embedding the date and p atien t’s name in medical images could be a useful safety measure. 2.1.4 R equirem ents o f Inform ation H iding Although each application we m entioned above m ay have specific or different requirements, we list in the following some general requirem ents to which a designer of an information- hiding scheme should pay attention. 1. Unobtrusiveness The hidden inform ation or w aterm ark m ust be embedded in a sophisticated way to avoid degrading the perceptual quality of the host media. The users will not sense the existence of the embedded inform ation by viewing or listening to the composite or w aterm arked media. To achieve this, m any inform ation hiding techniques make use of certain hum an perceptual models in the embedding process. It should be noted th at this requirem ent has to be fulfilled in all of the applications of information hiding. 2. Robustness In order to achieve copyright protection, a robust waterm arking scheme is required so th at the embedded w aterm ark can always rem ain in the m ultim edia d ata and survive all possible signal processing procedures, which degrade the d ata to the extent th a t the commercial value of the m edia is still m aintained. These procedures are referred to “W aterm ark A ttacks.” These w aterm ark attacks may be applied to the 20 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. media content for the purposes of editing, storage or merely circumventing waterm ark detection. The attacks include b u t are not lim ited to compression, filtering, noise adding, geometrical m odification and even anti-waterm arking softwares. In data authentication, we need a fragile w aterm arking scheme instead so th at the malicious attacks will destroy the em bedded waterm ark. We can thus check the existence of the w aterm ark to verify the integrity of the investigated m edia data. Especially, the waterm ark should be fragile to the distortion used to change the image content, such as replacing a portion of the image w ith other objects. However, the fragile waterm ark should not be sensitive to compression or form at conversion, which is used for storage or transm ission purposes. Therefore, the challenge here is to provide a waterm ark th at can distinguish between information altering and simple signal altering transform ations. 3. Detection scenario Detection of the hidden inform ation can be done w ith or w ithout the presence of the original m edia depending on the applications. In digital w aterm arking, the wa term ark th a t requires the original m edia as a reference for detection is classified as private w aterm ark since the original m edia information is usually not available to the public. Therefore, the private w aterm ark can only be detected by the content owner or the justified third party. The detection of public w aterm ark does not resort to the original media so it is possible th a t all users can verify the existence of the w ater mark. W aterm ark detection w ithout resorting to the original m edia is also referred as “blind” or “oblivious” w aterm ark detection. The detection involving the original 21 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. media is usually more robust b u t the usage is restricted since it is not practical to search the original d ata in the database whenever we would like to detect the wa term ark from a suspected data. Therefore, “blind” w aterm ark detection has become a requirem ent for m ost w aterm arking schemes. In steganography, blind detection is always required. 4. Capacity By capacity, we refer to the ability to detect individual hidden messages w ith a low probability of error as the num ber of differently m arked versions of the me dia increases. The am ount of the em bedded inform ation m ust accom modate the application being considered. For example, the w aterm ark may represent one-bit information to determ ine if the digital content can be reproduced or not while the waterm ark may contain n bits inform ation to indicate the identification num ber of the intended recipient. In steganographic applications, the requirem ent of capacity or payload needs to be even higher. In general, inform ation-hiding techniques should provide a way to insert as many distinguishable hidden messages as possible. 5. Low false positive rate While the hidden inform ation should be detected correctly and unambiguously, the false detection of the hidden message should be as low as possible. In digital water marking, false detection includes false negative detection and false positive detection. The false negative detection means th at the waterm ark exists in the m edia but the waterm ark detector fails to detect it. The false positive detection means th a t the waterm ark is falsely detected in an unwaterm arked media. Let us consider the case 22 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. th at an im portant ball game is broadcast and a lot of people would like to record it. However, a “never copy” w aterm ark is falsely detected by some w aterm ark detector in consumers’ recording facilities and the recorders refuse to record the game. We can imagine how seriously the legitim ate users will be affected by the false positive detection. Thus, the false positive detection rate m ust be m ade as low as possible. 6. Efficient im plem entation Efficient im plem entation of an inform ation-hiding scheme is another im portant issue. Considering the prevalence of commercial facilities such as digital cameras, scanners and video players/recorders, we may require the hidden inform ation to be embedded and detected during recording or playback. Complex design of an information-hiding scheme will lim it its usage in real-tim e applications and hardw are implementations. 7. Security An inform ation-hiding system should be secure enough to prevent attacks from both casual users and knowledgeable hackers. Cryptography still plays an im portant role in m aintaining the security level of inform ation-hiding schemes. Therefore, recent research directions on inform ation hiding include the following: how to effectively embed an adequate am ount of information in the digital m edia w ithout intro ducing perceptual distortion and how to efficiently, correctly and unambiguously extract the hidden inform ation from the investigated media w ithout resorting to the original data, even though the investigated m edia may have been modified by signal processing proce dures. Inform ation hiding is thus becoming a challenging field, since we have to consider several design trade-offs such as effectiveness vs. efficiency, capacity vs. correctness and 23 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. quality degradation vs. robustness, etc. Besides, a good understanding of media repre sentation, signal detection and signal processing is necessary for a designer to construct a well-rounded inform ation-hiding system. 2.2 Approaches of Information Hiding in Digital Images Over these few years, a large num ber of inform ation-hiding algorithm s have been proposed. Due to the fact th a t surveying so many publications is not easy and th a t many approaches share common ideas, we briefly describe the general methodologies of information-hiding techniques and illustrate them by exemplary schemes. 2.2.1 Inform ation H iding w ith th e Least Significant B it M odification The most straightforw ard approach of hiding inform ation in digital images is to modify the least significant bit (LSB) of image pixels. In most of the images, the LSB plane doesn’t contain visually significant information. We can thus replace the LSB plane with the hidden inform ation w ithout affecting the perceptual quality of the image. Besides, for a 8-bit gray-level image, the payload of the inform ation hiding scheme based on LSB modification can be as high as 1/8 of the image data size. Therefore, this approach may be suitable in applications of covert comm unication to transm it high-volume secret data. However, as the LSB plane is insignificant to the image, it can be removed easily by such procedures as filtering or compression so its usage in robust w aterm arking is limited. Most of the early research on inform ation hiding is based on LSB modification [86, 87, 53, 93, 31]. Among them , we have to m ention the work by Tirkel et al. [86, 87]. They first defined the term “digital w aterm ark” to draw the analogy between the digital information 24 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. embedding and the paper waterm ark. Besides, they also recognized the im portance of digital waterm arking for such applications as copyright enforcement, counterfeit protection and controlled access to image data. Two m ethods were proposed in their papers. In the first m ethod, after the 8-bit image pixel is first compressed to 7-bit representation through histogram m anipulation, the LSB plane is replaced by the hidden information. The waterm ark decoding can be done easily by checking the LSB plane. In the second m ethod, an M-sequence [50] is circularly-shifted and added to each row of the image in the LSB plane. The w aterm ark detection is carried out by checking the cross-correlation between rows. Therefore, their approach of correlation detection is also considered one of the pioneering researches to apply the concept of spread spectrum in inform ation hiding. We will describe the spread spectrum w aterm arking in Section 2.2.4. O ther variants of LSB-based inform ation hiding exist. Because of the fragile nature, some waterm arking schemes adopted this methodology to achieve d ata authentication [93, 98, 95]. If malicious attacks are applied on the image, some portions of the fragile w aterm ark in the LSB plane may be destroyed and the scheme may thus signal th at the investigated image has been modified and may locate the tam pered regions as well. In some cases, the embedded waterm ark may be a binary logo to ambiguously declare the ownership information. However, embedding such inform ation in the LSB plane could raise certain security concerns. Voyatzis and P itas [88] proposed toral autom orphism s to scramble the logo before it is embedded into the LSB plane. A 2-D toral autom orphism is a spatial transform ation with periodic orbits, which means th a t a point in the 2-D d ata will be changed in position after each transform or iteration but will be transform ed back to the initial position after, for example, K iterations. Therefore, the em bedder transform s 25 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the logo to a chaotic form after T iterations while the detector can apply K — T transform s on the scrambled logo to recover back to the original logo. The value K and T are kept confidential and only known to b oth em bedder and detector to ensure certain degree of security. Inform ation hiding in halftone or dithered images may also be viewed as a LSB-based scheme since each image sample is represented by only one bit. The inform ation is embed ded by flipping the image samples. Early work can be traced back to a paper by Tanaka et aI. [83] in 1990. Advanced work by Knox et al. [35] made use of different halftone p at terns to embed a visible waterm ark. Recent work [24] embedded the hidden inform ation in halftone images by taking the local statistics into consideration. 2.2.2 Inform ation H iding b y C hanging th e S tatistics As LSB-based m ethods are vulnerable to some trivial attacks, the researchers think of using redundant information hiding, i.e., embedding the same inform ation for m ultiple times and detecting it via statistical m ethods. Caronni proposed a system for tracking unauthorized image distribution [10]. The m arking process is called “tagging.” A tag is a square random-noise pattern w ith size N x N . The image is first divided into N x N blocks and the local variance of each block is calculated. Only locations w ith smaller variance will be used for tagging. A tag is then imposed onto the selected image block. The selected block is hidden w ith 1 bit and is only tagged if the bit value is one. To recover an embedded bit, the difference between the original and the tagged image is com puted. T hen the mean of a supposedly tagged block is compared to the neighboring m ean to determ ine the bit value. 26 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Bender et al. proposed a scheme called “Patchwork” [6] to increase the robustness of their LSB-based spatial-dom ain waterm ark. The scheme random ly selects N pairs of pixels, (a1 ; bi), to hide 1 bit by increasing a ,’s by one and decreasing V s by one. The expected value of the sum of the differences between the ctj’s and b^s of N pairs will be 2N , if the image is marked. P itas proposed a similar idea to cast the signal in digital images [60]. To embed more bits, Langelaar et al. [44] extended the Patchw ork scheme by splitting the image into blocks and embedding one bit in each of the blocks. 2.2.3 Inform ation H iding in th e Frequency D om ain The m ethods th at we illustrated above embed or detect the inform ation or w aterm ark in the spatial domain, i.e., the luminance intensity, of the image. We may classify these approaches as spatial-dom ain w aterm arking schemes, which modify the values of image pixels directly for inform ation hiding. In contrast to the spatial-dom ain waterm arking, there exists frequency-domain or transform -dom ain waterm arking, which modifies the fre quency coefficients for inform ation hiding after a proper transform such as the discrete wavelet transform (DW T), the discrete cosine transform (DCT) or th e discrete Fourier transform (DFT) is applied. Although the m ajor difference between spatial- and frequency-dom ain waterm arking schemes is the convenience of im plem entation, the two approaches can provide different functions to cope with various applications. Generally speaking, frequency-dom ain water m arking schemes tend to achieve a b etter balance between robustness and fidelity th an spatial-domain schemes. First of all, the inform ation em bedded in the frequency dom ain 27 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. will be spread in a larger region in the spatial domain. Therefore, the em bedded wa term ark may survive some image processing procedures well. Besides, image transform s compact energy to a few transform coefficients. These coefficients have a larger magni tude, and are perceptually im portant to the image representation. A large distortion on significant coefficients will result in serious quality degradation. Therefore, embedding a waterm ark by slightly changing those significant coefficients will result in a robust wa term arking scheme because significant coefficients usually rem ain stable even after image m anipulation. The w aterm ark em bedded in significant coefficients can thus be detected more reliably. Moreover, some perceptual models are developed in the transform domain. By taking the hum an visual system model into consideration, we can make the embed ded w aterm ark invisible to hum an eyes. Therefore, the frequency-dom ain waterm arking approaches are more popular in the waterm arking literature. Block-based D C T w aterm arking is a popular approach. An early block-based DCT waterm arking m ethod is proposed by Koch et al. [36, 37]. The image is first divided into blocks of size 8x8, for which D C T is computed. From these blocks, two coefficients are selected pseudo-randomly as a pair in the mid-frequency band. The selected coefficients are quantized by using the default JP E G quantization table w ith a relatively low JPEG quality factor to accom modate lossy JP E G compression attack. These coefficients are then modified such th a t the difference between them is either positive or negative, depending on the bit value. Bors et al. [8] proposed a similar m ethod. C ertain blocks are selected and their middle-range DCT coefficients are modified so th a t they fulfill a linear or circular constraint according to the inform ation to be embedded. Langelaar et al. [46, 45] proposed to embed the w aterm ark by selectively discarding the high-frequency D C T coefficients. A 28 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. set of DCT blocks is chosen and divided into two subsets of equal size. The energy of the high-frequency coefficient in one subset is reduced by removing high-frequency coefficients. The inform ation can be extracted by comparing the energy in the two subsets. The m ethod is thus called differential energy waterm arking. Busch et al. [9] proposed to embed the waterm ark into D CT blocks th at are selected according to the block activity. The block activity is directly m easured through D C T coefficients so th a t efficient waterm arking can be achieved. M any other schemes are based on global image transform s, such as global DCT, wavelet and the Fourier transform . Most of them adopt the concept of spread spectrum w aterm ark ing to achieve the robustness against such attacks as compression and filtering processes. We will discuss them more thoroughly later. 2.2.4 Spread S p ectru m W aterm arking W ith the strong relationship between waterm arking and com m unication as described in Section 2.1.2, many waterm arking schemes are based on the additive spread-spectrum method, which is inspired by the spread-spectrum m odulation technique in the digital communication system. This technique provides more security and resistance to channel noises for digital communication. Similarly, the spread-spectrum waterm arking scheme can resist more serious content distortion. The hidden inform ation is represented by a pseudo-random signal w ith a low am plitude, which is added to the host d ata and then detected by using a correlation receiver or a m atched filter. The w aterm arked image can keep good perceptual quality since the value of each pseudo-random num ber is small. 29 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Besides, the pseudo-random signal is usually generated by a key, which is also suitable to inform ation-hiding applications. We discuss the spread spectrum w aterm arking approaches according to the domain where the w aterm ark em bedding/detectin is operated. In other words, we divide the schemes into the spatial-dom ain w aterm arking and the frequency-domain waterm arking. 2.2.4.1 Spread Spectrum Watermarking in the Frequency-domain A large portions of spread-spectrum w aterm arking schemes operate in the frequency or transform dom ain for better performance. An early spread-spectrum transform -dom ain scheme was proposed by Cox et al. [14]. They first suggested th at the w aterm ark sig nal m ust be placed in the perceptually significant components of the content to survive common signal processing procedures. A w aterm ark sequence of length N is added to the largest N coefficients (except for the DC coefficient) after the global D CT transform is applied to the image. By scaling the pseudo-random sequence by a weighting factor a and the m agnitude of the host coefficient, the scaled w aterm ark symbol is added to the selected host coefficient to form the waterm arked coefficient. The waterm ark is then retrieved by subtracting the original coefficients from coefficients of the suspected image. A correlation detector is used to calculate the sim ilarity between the original waterm ark sequence and the extracted one. If the w aterm ark embedded in the image matches with the tested w aterm ark, a large correlation response will be generated and the waterm ark can thus be determ ined. Piva et al. [61] proposed another global D CT-based m ethod b u t required no original image for w aterm ark detection. The global D C T coefficients are reordered by using the 30 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. zig-zag scan. Some fixed low- to middle-frequency coefficients (e.g. 16000th to 25000th coefficients for an image of size 512 by 512) were selected for w aterm ark embedding and retrieval to reach the balance between image fidelity and robustness. The absolute value of the coefficient was used to scale the w aterm ark energy. A detection threshold value, which depends only on the inform ation of the tested image, was set to determ ine the existence of the waterm ark. The em bedded w aterm ark can survive many serious distortions. Wavelet transform is also commonly used in digital waterm arking. Xia et al. [97] con sidered a m ulti-resolution waterm arking m ethod in the wavelet domain. The idea was a direct extension of Cox’s algorithm [14, 13] to the DCT dom ain and the original image was needed in waterm ark retrieval. After the wavelet transform is applied, the w aterm ark is detected first in higher frequency subbands. If the w aterm ark is not detected, coefficients in lower frequency subbands are taken into account for w aterm ark detection. W ang et al. [89, 90, 77] investigated a blind wavelet-based waterm arking scheme, which did not require the original image for waterm ark detection. The inserted w aterm ark signal was adaptively scaled by different weighting values of the subband to m aintain the high qual ity of the waterm arked image. The largest wavelet coefficient in each subband was chosen as a standard value. O ther significant coefficients were truncated to the standard value m ultiplied by a weighting factor. Then, the w aterm ark sequence was added to truncated coefficients to form waterm arked coefficients. In w aterm ark detection, these standard val ues can be accurately determ ined because they are perceptually significant to the image and usually rem ain stable after image m anipulation. Therefore, the w aterm ark can be re trieved by subtracting coefficients of a suspected image by scaled standard values. Finally, the sim ilarity between extracted and tested waterm arks is calculated. One advantage of 31 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. this scheme is th a t the collusion attack, by which the attackers average several copies with different waterm arks, will fail to generate an unwaterm arked image. Several other schemes were developed based on the wavelet transform , such as [32, 38, 99, 82, 4], The difference between these schemes usually lies in the way the waterm ark is weighted to decrease visual artifacts. Note that image compression and frequency-domain w aterm arking share some common characteristics. In image compression, we encode significant frequency coefficients first because these coefficients convey more fundam ental visual inform ation about the image. In waterm arking, we choose significant coefficients for w aterm ark casting to enhance its robustness since these coefficients often rem ain stable after the attack. W ith this similarity, Su et al. [79] proposed to integrate the image waterm arking m ethod w ith the state-of- the-art image compression standard, JPEG-2000. Efficiency can be achieved since the most expensive com putation related to the image transform has been already com puted as one part of compression and decompression algorithms. The integrated waterm arking scheme has various interesting properties, including progressive w aterm ark detection and region-of-interest (ROI) waterm arking. This integrated scheme will be presented in Section 3.2. Swanson et al. [81] proposed a perceptual waterm arking scheme, which explicitly makes use of the hum an visual model to embed and detect the w aterm ark from the block-DCT domain. For each D C T block, a frequency mask is com puted and scaled by the D C T of a pseudo-random sequence. This w aterm ark is added to the corresponding DCT block. Spatial masking is then applied to ensure the invisibility of the waterm ark. Another image-adaptive, block-based D C T waterm arking scheme was proposed by Podilchuk et 32 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. al. [62, 64]. They incorporated W atson’s model [91] to determ ine the Ju st Noticeable Difference (JND) of each D CT coefficient. In W atson’s model, the JN D value determines the m axim um am ount of quantization noise th at can be tolerated at every frequency location w ithout affecting the visual quality. Therefore, JND can also be used to indicate the m axim um energy th at the waterm ark can be em bedded to guarantee its invisibility. The w aterm ark is m odulated onto the coefficients in the block D C T coefficients if the value of the coefficient is larger than the JND. W aterm ark detection is based on the correlation between the difference of the original and the investigated images. M any spread-spectrum waterm arking schemes utilized the Fourier transform , such as [54, 57, 56, 73]. The m ain reason of this choice is to enable the em bedded waterm ark to survive geometrical attacks, which cause severe problems to w aterm ark detection. In order to discuss this issue in more detail, we will explain the properties of Fourier transform , which are im portant in digital waterm arking, and review those schemes in Section 4.2. 2.2.4.2 Spread Spectrum Watermarking in the Spatial-domain The spread-spectrum m ethod can also be applied in the spatial domain. The LSB scheme [87], which we m entioned before, is one of the early algorithm s to utilize the correlation detector for more robust w aterm ark detection. Wolfgang et al. [93] developed a water marking scheme called the Variable-W atermark Two-Dimensional algorithm (VM2D). The one-dimensional M-sequence is extended to a two-dimensional w aterm ark block. The wa term ark block is then added to the image repeatedly and detected by the m atched filter. Detection of the above two waterm arks does not require the original image. However, they are not robust enough since the waterm ark is operated in the LSB plane. 33 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Instead of working on gray-level images, K utter et al. [42] exploited hum ans’ unequal detection of different colors and proposed to embed the w aterm ark in the blue image component. They applied the m ethod of am plitude m odulation by adding the blue image component w ith the w aterm ark bits, which are weighted by the luminance values in the spatial domain. In w aterm ark detection, preprocessing through filtering is used to help estim ate the original value for more accurate detection. A robust block-based spatial-dom ain waterm arking scheme was developed at the Philips Research Laboratories [34], in which the video is treated as a sequence of still images and a two-dimensional spatial waterm ark pattern is embedded into every frame. The pattern is embedded in the fram e block by block. Each pixel is em bedded w ith the w aterm ark symbol scaled by a global weighting factor and a local weighting factor. In waterm ark detection, a filter is first applied to remove the correlation between neighboring pixels. Then, the w aterm ark is detected by circular convolution of the filtered block w ith the tested pattern. More than one p a tte rn will be embedded into a block and the distance between the peaks is used to carry the w aterm ark payload. In this scheme, w aterm ark detection is invariant to spatial shift. A lot more schemes, such as [15], [17] and [26], extended the spatial-dom ain still image waterm arking approaches to video frames for video waterm arking. Besides, theoretical performance analysis of the am plitude m odulation scheme in the spatial dom ain is done by Hernandes et al. [28, 27]. It should be noted th at the spatial-dom ain waterm arking still has its advantage. For the frequency-dom ain waterm arking schemes, geometrical attacks such as cropping and rotation to an image will cause a serious synchronization problem of w aterm ark detection. 34 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The spatial-dom ain waterm arking scheme may have a better chance to resist geometrical attacks because of the facts th at attaining the spatial inform ation from image pixels is much easier and th at the spatial-dom ain w aterm ark is em bedded and detected directly in image pixels w ithout the need of image transform s. 2.2.5 O ther Inform ation H iding A lgorithm s Q uantization is the other im portant approach of inform ation hiding. Chen and W ornell [11] generalized and proposed the waterm arking approach using quantization and term ed it as Quantized Index M odulation (QIM). QIM is based on a set of IV-dimensional quantizers. The quantizers satisfy a distortion constraint and are designed such th at the reconstruc tion values from one quantizer are separated far away from the reconstruction points of every other quantizer. The message to be transm itted is used as an index for selecting a quantizer, which embeds the information by quantizing the image data. In the decoding process, a distance metric is evaluated for all quantizers and the index of the quantizer w ith the smallest distance identifies the embedded information. Many schemes use the m ethod ology of quantization for embedding high-volume data, such as [70] or for authentication purposes, such as [39]. Salient point or feature extraction may be useful in inform ation hiding. An interesting approach is to embed the information by modifying the geometrical feature of the image [66]. A dense line pattern is pseudo-randomly generated. A set of salient points in the image is extracted. The detected points are then w arped such th a t a significantly large number of points are w ithin the vicinity of lines. In the detection process, the detector verifies if a large num ber of points are w ithin the vicinity of lines. A few other schemes 35 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. also make use of salient point or feature extraction to achieve certain objectives, such as robustness against geometrical attacks [5, 41]. The feasibility of these m ethods will be determ ined by the reliability and precision of the feature extraction process. Some of the inform ation hiding m ethods are designed to tailor such palette-based form ats as GIF or PNG. M achado [51] proposed a scheme, called “EZ Stego,” to hide the inform ation in GIF compressed images. The color palette in an image is first sorted by luminance. In the reordered palette, neighboring palette entries are typically near to each other so the scheme can embed the message in the LSB of the indices pointing to the palette colors w ithout degrading the image quality much. Fridrich et al. proposed a similar idea for d ata hiding in GIF [20]. Besides, perm uting the image palette in a special order can also be used for inform ation hiding [43]. The advantage of this m ethod is th at the appearance of the image will not be affected. B ut the special palette generated by the scheme may not only raise suspicion but also be vulnerable when the image is re-saved by other image editors. One special type of inform ation-hiding schemes embeds the d ata w ithout introducing any distortion to the host image [21, 22], The basic idea of this invertible embedding is to make use of certain redundancy in the host m edia data. The em bedder compresses the redundancy part, such as the LSB of the image pixels, into a smaller size to leave some room for data hiding. The detector extracts the hidden data and expands the compressed redundant part to reconstruct the original data. The same idea can also be extended to several image form ats [23]. 36 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 3 Information Hiding in JPEG -2000 Compressed Images In this chapter, we deal w ith an interesting issue of inform ation hiding, including digital waterm arking and covert communication, in the JPEG-2000 still image compression stan dard. First of all, we provide a brief review of the basic architecture of JPEG-2000 in Section 3.1, which is intended to offer sufficient information to help understand concerns in designing an inform at ion-hiding scheme in this image standard. Next, we will give the detailed im plem entation and analysis of the proposed integrated inform ation-hiding scheme with the JPEG-2000 compression standard. The w aterm ark embedding and de tection procedures are presented in Section 3.2. We also discuss the decision and analysis of the threshold value used for w aterm ark detection with experim ental results illustrated. Then, we turn our attention to steganographic applications of JPEG-2000. We point out some challenges of inform ation hiding in wavelet-based codecs and present effective schemes under the framework of JPEG -2000 in Section 3.3 to achieve covert communica tion. Experim ental results will be shown to dem onstrate the feasibility of the proposed methods. Finally, concluding rem arks are given in Section 3.4. 37 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.1 Brief Review of JPEG-2000 The basic coding engine of JPEG-2000 is based on the Em bedded Block Coding w ith Optimized Truncation (EBCOT) scheme proposed by Dr. Taubm an. Roughly speaking, JPEG-2000 can be viewed as a block-based and bit-plane coder. By the block-based coder, we m ean th a t the basic coding unit is a block instead of the whole image as used in coding schemes such as SPIH T [67] and EZW [68]. A bi-orthogonal wavelet transform is first applied to the image, and each subband of wavelet coefficients is divided into blocks of samples. Each block is then encoded independently to generate a separate bit stream w ithout resorting to any inform ation from other blocks. The bit-stream can be truncated to a variety of discrete lengths with respect to different distortion measures. Once the entire image has been compressed, a post-processing operation passes over all the compressed blocks, and determines the extent to which each block’s bit-stream should be truncated to achieve the target bit-rate. The final bit-stream is form ed by concatenating the truncated bit-stream s of all blocks together. JPEG-2000 is a bit-plane coding, i.e., the most significant bits for all samples in the code block are handled first, then the next most significant bits and so on until all bit-planes are processed. W ith its block-based and bit-plane coding paradigm , JP E G -2000 can achieve several im portant features under the same framework. Now, let us take a closer look at the structure of the JPEG -2000 coding standard. The block diagram of JPEG-2000 is shown in Fig. 3.1. The JPEG -2000 coding flowchart consists of a few m ajor blocks. The m ajor blocks in the encoder side include the forward image transform , quantization, tier-1 encoding and tier-2 encoding while those in the 38 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. decoder side are composed of the tier-2 decoding, tier-1 decoding, dequantization and the inverse image transform . Basically, JPEG-2000 is a pipeline structure with a so-called “push and pull” model adopted. In the encoding process, a push m odel is employed. Image samples are pushed into the forward transform stage, which then pushes the transform ed samples to the quantization stage as soon as those data become available. Likewise, the quantization stage pushes quantized symbol indices to the encoding stage, which pushes compressed bits to form the code stream. In the decoding process, the dual of the encoder’s push paradigm , i.e., a pull model, is employed. Image samples are pulled out of the reverse transform stage, which pulls transform ed samples from the dequantizer. The dequantizer in tu rn pulls data out of the decoder, which pulls or reads the bit-stream from the compressed file. This pipeline structure also attem pts to minimize the internal memory size so as to facilitate its hardw are im plem entation. We will examine and explain the functions of each m ajor block in the following sections. Source ^ Image Compressed bit-stream Expanded Image Tier-2 Decoder Tier-2 Encoder Tier-1 Decoder Tier-1 Encoder Dequantization Inverse Image Transform Quantization Forward Image Transform Figure 3.1: The block diagram of JPEG-2000. 39 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.1.1 F orw ard/Inverse Im age Transform The image transform stage consists of the inter-com ponent transform and the intra component transform . After the image is divided into tiles and certain preprocessing is applied to the tile samples so th at they have a nom inal dynamic range approxim ately centered about zero, the inter-component transform is first applied to the tile-component data to reduce the correlation between components, leading to improved coding efficiency. Two inter-com ponent transform s are defined: the irreversible color transform (ICT) for lossy compression and the reversible color transform (RCT) usually for lossless compres sion. B oth of the transform s map the image data from the RGB dom ain to the Y CrCb color space. Following the inter-component transform in the encoder is the intra-com ponent trans form, i.e. the wavelet transform . T hrough the wavelet transform , tile components are decomposed into different resolution levels, which contain a num ber of subbands. Both reversible interger-to-interger and irreversible real-to-real wavelet transform s are available in JPEG-2000. The irreversible transform is im plem ented by means of the Daubechies 9/7 filter while the reversible transform is im plem ented by means of the 5/3 filter. Two filtering modes, i.e. the convolution-based and the lifting-based modes, are supported. The inverse intra-com ponent and inter-com ponent transform s in the decoder essen tially undo the effect of the forward transform s in the encoder. Unless the transform s are reversible, the inversion may only be approxim ate due to the finite precision arithm etic effect. 40 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.1.2 Q u a n tization /D eq u an tization In the encoder, after inter-com ponent and intra-com ponent transform s, the resulting co efficients are quantized. Q uantization helps to achieve b etter compression by representing transform coefficients w ith the minimum precision required for the desired level of image quality. Transform coefficients are quantized using scalar quantization w ith a deadzone. A different quantizer is employed for each subband with the quantizer step size as its only param eter. The quantization process can be w ritten as V{x,y) = x sgn{U(x,y)), (3.1) where A is the quantizer step size, U(x, y) is the input subband signal, and V(x, y) is the output quantizer index for the subband. The baseline codec has two modes of operation, i.e. integer mode and real mode. Lossless compression is always operated in the integer mode, in which the quantizer step sizes are always fixed at one to effectively bypass the quantization. In the real mode, the quantizer step sizes are chosen in conjunction w ith rate control. The step sizes used by the encoder are conveyed to the decoder via the code stream . The step sizes signaled are not absolute but relative quantities. T h at is, the quantizer step size for each subband is specified relative to the nom inal dynamic range of the subband signal. In the decoder, the dequantization stage approxim ately reverses the effect of quantization to obtain the quantized transform coefficient. The dequantization process can be w ritten as U{x,y) = {V(x,y) + 0.5 x sgn{V (x,y))} x A , (3.2) 41 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. where V (x , y) is the input quantizer index for the subband, and U(x, y) is the reconstructed subband signal. It should be noted th at quantization of transform coefficients is one of the two m ajor sources of inform ation loss in the coding path of JPEG-2000. 3.1.3 Tier-1 Coding At the encoder side, the quantization stage is followed by the tier-1 encoding. The quan tizer indices for each subband are partitioned into code blocks. Code blocks are rectangular in shape with the same dimension, except at image boundaries where some blocks may have smaller dimensions. The nom inal size of blocks is a free param eter w ith certain con straints. A typical choice for the nom inal code block size is 64 x 64. After a subband has been partitioned into code blocks, each of the code blocks is independently coded. The coding is performed using the bit-plane coder and each bit-plane is processed by several coding passes. Therefore, the output of the tier-1 encoding process is a collection of coding passes for the code blocks. At the decoder side, the bit-plane coding passes for the code blocks are input to the tier- 1 decoder, these passes are decoded, and the resulting d ata are assembled into subbands. In lossy compression, the reconstructed quantizer indices may only be approxim ations to the quantizer indices at the encoder since the code stream may only include a subset of the coding passes generated by the tier-1 encoding process. In the lossless case, the reconstructed quantizer indices m ust be the same as the original ones at the encoder side and all coding passes m ust be included for lossless coding. In each bit-plane, there are three coding passes: 1) significance, 2) refinement and 3) cleanup pass. All of the three coding passes scan the samples of a code block in the same 42 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. fixed order. The code block is partitioned into stripes w ith the nom inal height of four samples. If the code block height is not a m ultiple of four, the height of the bottom stripe will be less th an this nom inal value. The bit-plane encoding process generates a sequence of symbols for each coding pass. All of the symbols are either entropy coded or raw coded. For entropy coding, a context-based adaptive binary arithm etic coder, or more specifically, the MQ coder, is employed. For raw coding, the binary symbols are em itted as raw bits with simple bit stuffing. Both the entropy and raw coding processes ensure th at certain bit patterns never occur in the output, allowing such patterns to be used for the error resilience purpose. Next, we examine the three coding passes in more detail. 1. Significance Pass The first coding pass for each bit-plane is the significance pass. This pass is used to convey significance w ith its sign information for samples th a t have not yet been found to be significant and are predicted to become significant during the processing of the current bit plane. A sample is predicted to become significant if any 8-connect,ed neighbor has already been found to be significant. The symbols generated during the significance pass may or may not be arithm etically coded. If arithm etic coding is employed, the binary symbol conveying significance inform ation is coded using one of nine contexts. The particular context used is selected based on the significance of the sam ple’s 8 -connected neighbors and the orientation of the subband w ith which the sample is associated. If the arithm etic coding is used, th e sign of a sample is coded as the difference between the actual and predicted sign value. Otherwise, the 43 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. sign inform ation is coded directly. Sign prediction is perform ed using the significance and the sign inform ation for 4-connected neighbors. 2. M agnitude Refinement Pass The second coding pass for each bit-plane is the m agnitude refinement pass. This pass signals subsequent bits after the most significant bit for each sample. If a sample was found to be significant in a previous bit plane, the next m ost significant bit of th at sample is conveyed using a single binary symbol. Like the significance pass, the symbols of the m agnitude refinement pass m ay or may not be arithm etically coded. If arithm etic coding is adopted, each refinement symbol is coded using one of three contexts. The particular context is selected based on if the second MSB position is being refined and the significance of 8-connected neighbors. 3. Cleanup Pass The th ird coding pass for each bit-plane is the cleanup pass. This pass is used to convey significance and its sign inform ation for those samples th at have not yet been found to be significant and are predicted to rem ain insignificant during the processing of the current bit-plane. The key difference between the cleanup and significance pass is th at the cleanup pass conveys inform ation about samples th at are predicted to rem ain insignificant, rather th an those th a t are predicted to be significant. The other im portant difference is th at the samples in cleanup passes are not processed individually but sometimes processed in groups. As we m entioned earlier, a code block is partitioned into stripes w ith a nom inal height of four samples. Then, stripes are scanned from top to bottom , which we refer to as a vertical scan. If 44 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the vertical scan contains four samples and all of the samples are predicted to rem ain insignificant, the so-called “aggregation m ode” is entered. W hen this occurs, the four samples of the vertical scan are examined. If all four samples are insignificant, an all-insignificant aggregation symbol is coded, and the processing of the vertical scan is complete. Otherwise, a some-significant aggregation symbol is coded, and two binary symbols are then used to code the num ber of leading insignificant samples in the vertical scan. The symbols generated during the cleanup pass are always arithm etically coded. W hen the aggregation mode is not employed, the significance and the sign coding functions as in the case of the significance pass. To sum up, cleanup passes always employ arithm etic coding. In the case of significance and refinement passes, two possibilities exist, depending on whether the so-called “lazy m ode” is enabled. If the lazy mode is enabled, only the significance and refinement passes for the four m ost significant bit-planes use arithm etic coding, while the rem aining such passes are raw coded. Otherwise, all significance and refinement passes are arithm etically coded. The lazy mode significantly reduces the com putational complexity of bit-plane coding by decreasing the num ber of symbols th a t m ust be arithm etically coded. The cost of lazy mode coding is reduced coding efficiency at low bit-rate compression. Consecutive coding passes th a t use the same encoding scheme (i.e., arithm etic or raw coding) constitute a “segment.” All of the coding passes in a segment can collectively form a single codeword or each coding pass can form a separate codeword as determ ined by the term ination mode. Two term ination modes are supported: per-pass term ination and per-segment term ination. In the first case, all coding passes are term inated. In the second case, only the last coding 45 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. pass of a segment is term inated. Term inating all coding passes facilitates improved error resilience at the expense of decreased coding efficiency. 3.1.4 T ier-2 C oding The tier-1 encoding is followed by the tier-2 encoding. The input to the tier-2 encoding process is the set of bit-plane coding passes generated during the tier-1 encoding. The coding passes are packaged into d ata units called packets, in a process referred to as packetization. The resulting packets are then output to the final code stream . Each packet is comprised of two parts: a header and a body. The header indicates which coding passes are included in the packet, while the body contains the actual coding pass data. The packetization process imposes a particular organization on coding pass data in the output code stream. This organization facilitates many of the desired features including rate scalability and progressive recovery by fidelity or resolution. R ate scalability is achieved through quality layers. Each coding pass is either assigned to one of the layers or discarded. The coding passes containing the m ost im portant data are included earlier in the lower layers, while the coding passes associated with finer details are included later in the higher layers. During decoding, the reconstructed image quality improves increm entally with each successive layer processed. Since some coding passes may be discarded in the case of lossy compression, the tier- 2 coding is the second prim ary source of inform ation loss in the coding path. In the tier-2 coding, one packet is generated for each com ponent, resolution level, layer, and precinct. A precinct is essentially a group of code blocks w ithin a subband. The precinct partitioning for a particular subband is derived from a partitioning of its 46 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. parent LL band (i.e., the LL band at the higher resolution level). Each resolution level has a nominal precinct size. Each of the resulting precinct regions is then m apped into its child subbands at the next lower resolution level. Precinct boundaries always align with code block boundaries. Since coding pass data from different precincts are coded in separate packets, using smaller precincts reduces the am ount of d a ta contained in each packet, which leads to improved error resilience, while the coding efficiency is degraded due to the increased overhead of more packets. It should be noted th a t a packet can be empty. Em pty packets are sometimes necessary since a packet m ust be generated for every component-resolution-layer-precinct combination even if the resulting packet conveys no new information. In the decoder, the tier-2 decoding process extracts the coding passes from the code stream in a process referred to as depacketization and then associates each coding pass w ith its corresponding code block. The m ajor task of the tier-2 coding is to achieve rate control by selecting subsets of coding passes to include in the code stream . Given a target bit-rate, each of the independent code block bit-stream s is truncated in an optim al way so as to minimize distortion subject to the bit-rate constraint. The idea is referred to as Post-Com pression Rate-D istortion (PCRD) optim ization since the rate control is applied after all the subband samples have been compressed in the tier-1 coding. The advantages of PC R D optim ization is its reduced complexity. To generate an embedded bit-stream w ithout PCRD, we may have to process the image several tim es and a large buffer is required to store the whole image, which could not be accessible in certain applications. W ith PC R D , the image needs only be compressed once to generate an out-of-order bit-stream , which is generally much 47 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. smaller than the original image and hence can be buffered easier. The rate control is then applied on this compressed bit-stream to generate the em bedded bit-stream . 3.1.5 R ate C ontrol We briefly describe the rate control mechanism of JPEG -2000 as follows. For each code block, Bi, the em bedded bit-stream is truncated to the rates, i?”\ The contribution from Bi to the distortion improvement in the reconstructed image is denoted by £>”*, for each truncation point, nj. If an additive distortion m etric th a t approxim ates M ean Squared Error (MSE) is assumed, the overall image distortion, D, can be calculated by, O = E = £ I4 E - ‘- M ) 2 1 • P - 3 > i i [ keBi J where s j [k] denotes the two-dimensional sequence of subband samples in the code block R , s?’ [k] denotes the quantized representation of these samples associated w ith truncation point nj, and denotes the L2-norm of the wavelet basis function for the subband, bi, to which the code block belongs. This approxim ation is valid provided the wavelet transform ’s basis functions are orthogonal and the quantization errors in each of the samples are uncorrelated. The truncation points, n^, are selected to minimize distortion subject to a constraint, R max, on the available bit-rate, i.e. R max > R = J 2 r T- (3.4) % 48 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Any set of truncation points, nf, which minimizes D( A ) + Ai?(A) = + XRf ) (3-5) % for some A, is optim al in the sense th at the distortion cannot be reduced w ithout increasing the overall rate and vice versa. Thus, a value of A can be found such th at the truncation points which minimize (3.5) yield R(X) — R max, then this set of truncation points m ust also be an optim al solution to the R-D optim ization problem . Since we only have a discrete set of truncation points, it may not be possible to find a value of A for which R (A) is exactly equal to R max. However, since the code blocks are relatively small and there are many truncation points, it is sufficient to find the smallest value of A such th at R (A) < R max. The optim al truncation points, nf, for any given A, can be determ ined efficiently based on a small am ount of sum m ary inform ation collected during the generation of each code block’s em bedded bit-stream . A simple algorithm to find the truncation point, nf, which 7 1 ^ " 7 1 ^ minimizes D i i + AR t 1 , is as follows: 1) Initialize nf = 0 2) For j= l,2 ,3 , - Set A R l = r { - R , f and A D l = o f - Dj. - If A D | / A Rj > A then update n * = j Since this algorithm has to be executed for m any different values of A, a set of feasible truncation points, N i, should be determ ined first. Let j \ < j 2 < , 73... be an enum eration of these feasible truncation points and let the corresponding distortion-rate “slopes” be given by S{k = A D jk / A R f where A R f = Rjk - R j ^ 1 and A Djk - D j ^ 1 - D\k. These 49 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. slopes should be strictly decreasing. If Sfk+1 > Sfh, then these truncation point, j/j, will not be selected by the above algorithm , regardless of the value of A. W hen restricted to a set of truncation points whose slopes are strictly decreasing, the above algorithm reduces to the selection nf = max jjfc 6 Ni\Sjk > a | so th at each such point m ust be a valid candidate for some value of A . The set of feasible truncation points, Ni, can be determ ined using a conventional convex hull analysis im m ediately after the bit-stream for Bi has been generated. The rates, R\k and slopes S B , for each G iVj, are kept in a compact form along w ith the embedded bit-stream until all code blocks have been compressed. The search for the optim al A and nf can thus be proceeded in a straightforw ard manner. 3.1.6 R egion o f In terest C oding Region of interest (ROI) coding is one interesting feature supported by JPEG-2000. ROI coding makes it possible to encode regions in which users are more interested w ith b etter quality than the rest of the image. For the extrem e case, the specified ROI can be encoded losslessly while the rem aining parts of the image are encoded w ith low bit-rates. W hen ROI is small compared w ith the whole image, the transm ission tim e and the storage space can be greatly saved. In JPEG2000, two types of ROI functionality are defined. The first one is “ROI during encoding,” in which ROI is specified when the image is compressed. The other one is “ROI during decoding” th at supports interactive browsing. In JPEG-2000 VM, the “ROI during encoding” mode is implemented. The im plem entation is based on the m axim um shift method. The principle is to scale or shift coefficients so th at the bits associated w ith the ROI are placed in higher bit-planes th an the bits associated w ith the background. 50 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. During the embedded coding process, the m ost significant ROI bit-planes will be placed in the bit-stream before any background bit-planes of the image. Thus the ROI will be decoded or refined before the rest of the image. If the bit-stream is truncated w ithout fully decoded, the ROI will be of higher fidelity. Therefore, the implement at ion of ROI in JPEG-2000 is shown as follows: 1. The wavelet transform of the image is calculated. 2. After the ROI is chosen in the image domain, a ROI mask is derived as shown in Fig. 3.2 to indicate the set of coefficients th at are required for ROI. 3. The wavelet coefficients are quantized and the quantized coefficients are stored in a sign m agnitude representation. 4. Coefficients outside the ROI are downscaled by a specified scaling value. 5. The resulting coefficients are encoded. The decoder reverses these steps to reconstruct the image. It should be noted th at the scaling value assigned to the ROI and the coordinates of the ROI are added to the b it stream. The decoder also performs the ROI mask generation b ut scales up the background coefficients. 3.2 Digital Watermarking in JPEG-2000 Compressed Images 3.2.1 Fram ework for W aterm arking Before presenting the im plem entation of the proposed w aterm arking scheme, we briefly describe the basic framework of the waterm arking m ethod to be adopted. Similar to most 51 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (b) I v U v 1 (d) Figure 3.2: (a) The image to be compressed, (b) the contour of ROI, (c) the spatial-dom ain ROI mask, and (d) the wavelet-domain ROI mask. robust w aterm arking schemes given in Section 2.2, the additive spread-spectrum water m arking m ethod is chosen due to its decent characteristics in robustness, unobtrusiveness and security to waterm arking applications. After a proper transform (the wavelet trans form in our scheme) is applied to the image, the waterm ark is added onto the selected frequency coefficient by l'{x,y) = I(x,y) + a(x, y) x W{x,y), (3.6) where f (x, y) is the w aterm arked coefficient and I(x, y) is the original coefficient w ith the coordinate (x,y) in the spatial position. I(x,y) is chosen based on its m agnitude, i.e., the coefficient w ith the large m agnitude is selected for w aterm ark embedding. W (x , y) is the corresponding w aterm ark symbol, which can be a real num ber or only take values, 1 and 52 * Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. -I. The weighting factor a(x,y) is a positive num ber used to adjust the amount of added w aterm ark energy. The value of a is usually adjusted according to the m agnitude of the frequency coefficients or the different subband characteristics so th a t the balance between robustness and fidelity of the resulting w aterm arked image can be achieved. The inverse transform is then applied to form the w aterm arked image. In w aterm ark detection, a correlation detector is used to determ ine if the w aterm ark exists in the tested image. It is based on the fact th at if the coefficients and the w aterm ark sequence are independent, the inner product of the w aterm ark and the coefficient sequences will be close to 0. If the target w aterm ark sequence is added to the coefficient sequence, we will get a peak response from the inner product. We show this basic idea as follows; I*(x,y) is the wavelet coefficient of the suspected image. We make use of the correlation detector to determ ine if the wavelet coefficients are em bedded w ith a specific waterm ark sequence W*(x, y). The correlation response p of the w aterm ark detector can be expressed as p = J 2 (I*{x,y)*W*(x,y)) (3.7) ( ® » y ) By assuming th at I* (x, y ) is formed by casting w aterm ark symbol W(x, y ) onto the original coefficient I(x.y) w ithout any modification, then we can express p as p = J 2 ({I (x , v ) + a (x ,y) x W{x,y)) x W*(x,y)) (3.8) 0 ,2 /) = (I{x,y) x W*{x,y)) +- {a{x,y) x W{x,y) x W*(x,y)) (3.9) (x,y) (x,y) 53 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. After calculating the expected value of both sides, we get £\p] = £ l%,y) + £ E (a(x>y) x w ix,y) x ^ * (^ 2 /)) , (3.10) where £[■] is the expected value. The first term on the right-hand side of (3.10) is zero if the tested w aterm ark sequence W*(x,y) and the coefficients I{x,y) are independent. Similarly, the second term is also zero if W(x, y) and W*(x, y) are independent or W(x, y) does not exist. If the image is em bedded w ith W*(x, y), i.e., W(x, y) = W*(x,y), then the expected value of the correlation response £ [p} will be close to £\p] = £ E a(x>y) x w *2(x,y) l x> y) (3.11) which is much larger th an zero. Therefore, we can simply examine the peak response and compare it w ith a threshold value to determ ine the existence of w aterm ark w ithout any difficulty. In the proposed system, the w aterm ark is em bedded after coefficients are quantized so th a t the w aterm ark can be easily em bedded into the bitstream . W aterm ark detection is done before the dequantization stage. We choose the non-reversible kernel for w ater mark embedding and detection in the following discussion since lossy compression offers a broader application scope th an the lossless one. The same idea can however be applied to the reversible kernel w ithout modification. 54 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.2.2 W aterm ark E m bedd ing To make the em bedded w aterm ark robust against attacks, significant coefficients are chosen for w aterm ark casting. Significant coefficients are those w ith a larger m agnitude. Because coefficients in each subband have been normalized to have unit gain in JPEG-2000 imple m entation, the significant coefficients in each subband tend to have their highest non-zero bit in the same bit-plane. Therefore, we can apply the same w aterm ark embedding rule in each subband. JPEG-2000 divides the subband into blocks which is the basic coding unit so th at we also use the coding block as the basic unit for w aterm arking. In JPEG-2000 im plem entation, the original /-b it image samples are level shifted to a nominal range of — 27” 1 to 2 /_ 1 and then shifted up by P — I — G bits to fit w ithin the P -b it im plem entation precision. G is the num ber of guard bits, which is included to avoid the occurrence of overflow so th a t the frequency coefficients can be represented w ith the fixed-point precision. The wavelet transform kernels are then norm alized so th a t the low- pass analysis filters always have a unit DC gain and the high-pass analysis filters always have a unit Nyquist gain. This means th at the nominal range of subband coefficients will be in the range of — 2P ~ G~ 1 to 2F ~G~1. In the representation of the coefficient, the Most Significant Bit (MSB) of the coefficient indicates the sign value and the rem aining P — 1 bits represent the absolute m agnitude of the coefficient. This simplifies the case when the m agnitude is required only, because we can avoid the calculation of the absolute value. Therefore, we select significant coefficients by examining the highest non-zero bit (not including the sign bit) th a t is higher th an a certain bit-plane w ith index q. T h at is, 55 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. coefficient Ibs (x, y) in the block bs of subband s w ith the coordinate (x, y) will be chosen for waterm ark embedding if \\Ibs(x,y)\\>2^ (3.12) where we define th a t the Least Significant Bit (LSB) of coefficients form the bit-plane w ith index “0.” The strategy to select coefficients m atches the bit-plane coding well since coefficients to be coded earlier will be cast with the w aterm ark first. The waterm ark in our scheme is a random num ber sequence taking two values 1 and -1. First, a seed, which can be viewed as a user ID num ber, is used to generate the w aterm ark sequence with the length equal to the num ber of coefficients in a coding block. The sequence forms a waterm ark m ap w ith a dimension equal to th at of a block. For a block of size 7 x 7 , the waterm ark m ap is Wbs(x,y ), where x,y G [ 0 , 7 ), and the coefficient Ibs(x,y) th at satisfies (3.12) is modified to 4 (x, y) by 4 (x, y) = Ibs (x,j/) + (W b s (x, y) x 2 ^ ) (3.13) where W b s (x, y) = ±1 is the w aterm ark element in the position (x, y) on the waterm ark map associated w ith block bs and 8b s is the num ber of bit-shift th a t depends on the im plem entation precision P and the waterm ark energy. In general, the shifted num ber is chosen to be 8ba= P - I - G + a bt. (3.14) As m entioned earlier, P , G and I are the im plem entation precision, the guard bit, and the image sample precision, respectively, and a b s is the w aterm ark scaling factor. We may 56 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. increase the em bedded waterm ark energy by shifting the w aterm ark a few bits to the left. It should be noted th a t can vary in different subbands or blocks so th at we can adjust it according to different subband or block characteristics. Generally, only bitplanes around P - 1 — G + abs will be affected by waterm ark embedding so the bitplane-based waterm ark embedding m ethod does not affect the coding efficiency much. Besides, by experiments, the setting of 5bs achieves a pretty good balance between image quality and robustness of the waterm ark. Special care must be taken th a t we do not cast the waterm ark in blocks of the DC band since it may lead to serious fidelity degradation in the waterm arked image. 3.2.3 W aterm ark D etectio n As done in the embedding procedure, we only pick coefficients th a t satisfy (3.12) for waterm ark detection. We use the same bit-plane as a reference so th a t only the coefficients that are possibly em bedded with w aterm ark are taken into consideration. If we embed and detect the w aterm ark in all of the wavelet coefficients, a sim ilar objective m ight be achieved, but the waterm arking process will be less effective. (The com putational load will increase a lot especially when we have to test a lot of w aterm ark sequences to see which one is embedded in the im age). However, we have to take the significance of different subbands into account during w aterm ark detection. A value called “extra LSB” and denoted by (3bs is determ ined along with the JPEG-2000 norm alization process and can be interpreted as the num ber of in significant bit-planes in the coding block bs. fibs is smaller in subbands th at need b etter 57 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. precision and larger in high-frequency subbands. Thus, the calculation of correlation re sponse is done as EEE s b s (x,y) (Ibt{x,y) x 2 & .) x (Wba{x,y) x 2 ^ M ) ( I I / * ( a : , a r ) U > 2 « ) p = ^ — -----------Y Y Y ------------------------------------------ ’ ( 3 ' 1 5 ) s b t i h ) 2 ^ . - ^ . ) (IIP (*,V)II>2«) °S where 8ba is defined in (3.14). is the coefficient of the investigated image. Note th at there is a difference between (3.7) and (3.15). In (3.15), we normalize the correla tion response by the sum of squares of selected w aterm ark symbols. Since the w aterm ark symbol takes value 1 or -1, (x, y ) is equal to 1 and om itted in the denom inator. There are a couple of reasons th at we decide to normalize the correlation response. First of all, the value of the response will not be affected by the num ber of selected coefficients. Consequently, the same correlation response can be used in different scenarios, e.g. nor mal w aterm ark detection and progressive w aterm ark detection, which will be discussed in Section 3.2.4. Second, this norm alization process will help in explaining the high value of the correlation response of the w aterm arked image in our scheme. T h at is, the value of (3.7) in a w aterm arked image will be even larger th an (3.11). This phenomenon will be discussed in Section 3.2.6. 3.2.4 Progressive W aterm ark D etectio n Progressive w aterm ark detection is one of the most attractive features for waterm arking in JPEG-2000 compressed images. W hen a large image is being decompressed, it is not efficient to detect the waterm ark after the whole image is formed. This is especially true 58 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. for Internet applications. A fully-embedded compression scheme lets the user truncate the image at any tim e to get his or her “best” image. Thus, it is desirable th a t the w aterm ark can also be detected progressively. JPEG-2000 is a bit-plane coder which can support the fully-embedded feature. Significant coefficients, which have been embedded w ith the w aterm ark in our scheme, will be encoded and decoded first so th a t progressive w aterm ark detection can be achieved easily. However, we should set a threshold value 7 7 to indicate the m inimum num ber of coefficients needed for w aterm ark detection. If the correlation response is higher than the current threshold, which is used to decide the existence of waterm ark, b ut the selected coefficients are less th an 77, the detection process should continue in other blocks to avoid possible false alarm. The derivation of the threshold in the proposed waterm arking scheme will be discussed in Section 3.2.6. 3.2.5 R egion o f Interest W aterm ark Most waterm arking techniques embed the w aterm ark in the entire image w ithout taking the image content into account. For many applications, a certain portion of an image is more im portant than others. Especially, for object-oriented images, regions th a t cover the m ain objects are of m ajor concern to the image owner. For example, in a picture w ith a person appearing at the center, the image viewer usually cares more about the person than the background of the picture. The portion th at attracts more attention of an image viewer is the Region of Interest, i.e., ROI. It is desirable to embed a more robust waterm ark in ROI to give it better protection [78]. ROI is usually selected by image owners or users in the spatial dom ain. After selecting ROI in an image, it is straightforw ard to embed the waterm ark in the spatial dom ain, i.e., 59 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. modifying values of image pixels directly so th at we can decide as our will on what portions of the image should be em bedded w ith w hat kind of waterm arks. It is worthwhile to point out th a t there may be different ROI in the same image depending on different applications. However, in order to achieve a robust waterm ark, frequency-dom ain is preferred for wa term ark embedding. Thanks to the combined spatial-frequency characteristic of wavelet transform , we can make use of the spatial self-similarity between wavelet subbands to determine the coefficients belonging to ROI for waterm ark embedding. In addition to copyright enforcement, ROI waterm arking can also be used as d ata la beling to assist content retrieval in image databases. In image archiving, high performance data representations and structures are essential to image database m anagement. By using traditional database indexing m ethods, it is not easy to index the location, size and rela tionships of the objects in the image. In the newly-proposed database indexing techniques, the objects are extracted by low-level feature extraction or segm entation. Nevertheless, it is very difficult to achieve perfect image segm entation and image understanding m ethods may need to be included to improve the performance of object extraction. Related tech niques of image understanding are not m ature yet for object retrieval. ROI waterm arking could bridge the gap between the traditional and newly-proposed indexing m ethods. As illustrated in Section 3.1.6, to achieve ROI coding, the encoder keeps the coefficients th at belong to ROI unchanged while downscales the other coefficients th at do not belong to ROI by a few bits. The encoding process is done as usual while the coordinates of ROI and scaling values are put in the bit-stream header for transm ission. W hen the SNR progressive mode is used, ROI will be sent before the background. T he decoder can detect ROI by examining the m agnitude of received coefficients since all ROI coefficients are 60 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. larger than other coefficients outside ROI. The decoder may have to upshift the received coefficients when necessary. Under this scenario, we do not have to change the proposed algorithm because only coefficients in ROI w ith a larger m agnitude will be embedded w ith the waterm ark. All coefficients outside ROI will be downshifted so th a t they will not satisfy the criterion in (3.12) for w aterm ark embedding and retrieval. To conclude, our scheme can support ROI waterm arking autom atically. 3.2.6 T hreshold D ecision and A nalysis It is essential to determ ine a threshold value so th at the existence of a w aterm ark sequence can be detected by com paring the value of the correlation response w ith the selected threshold value. There are two m ain parts in this section. First of all, we determine the threshold value to decrease the possibility of false positive detection. False positive detection occurs when the waterm ark is falsely detected in an image th at contains no waterm ark or the wrong w aterm ark ID is detected. Since the usage of the image will be limited once a certain w aterm ark is found, false positive detection will bring much more inconvenience to the legitim ate users. Therefore, the threshold value should be decided carefully. Second, we examine the peak correlation response of the waterm arked image and show th at the existence of the w aterm ark can generate a large correlation response. In w aterm ark detection, we compute the sum of m ultiplications of the shifted coeffi cients with the corresponding waterm ark symbol in the w aterm ark m ap, and then divide it by a weighting factor, i.e., the weighted norm of the w aterm ark sequence as indicated in (3.15). Here, we assume th at the correlation response p follows the Gaussian distribution due to the Central Limit Theorem. 61 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The variable p in (3.15) has a m ean equal to zero if the w aterm ark does not exist. The variance of p can be estim ated by * T. (ft) ? > , < / ) -------------------------------------------------------------------- (3. i 6) < s bs (x,y) 2(26bs 2Pb^ where all entities in above are the same as those in (3.15). By definition, the Gaussian Integral Function [71] (or simply the Q function) can be w ritten as roo i 2 Q{z) = J e 2 dx. (3.17) If random variable Y (u) follows the G aussian distribution w ith m ean m and variance a2, the probability th at Y(u) > a can be expressed as n — 777 Pr{Y(u) > a} = Q( ). (3.18) < 7 We should set up the threshold value according to the Q function so th a t the false alarm rate is lower th an a given probability. As a result, the threshold value is actually a function of the variance of p. Here, we simply define the threshold as T = T x a, (3.19) where r is a scaling param eter. On one hand, we can lower the false alarm rate by raising the t value. O n the other hand, we can reduce the r value so th at detection of the 62 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. embedded w aterm ark can be more easily achieved even under very serious attacks at the expense of a higher false alarm rate. For example, if the desired false alarm rate is around 1CT12, we should choose the threshold value as T = la since Pr{Y{u) > T} — Q(~) = Q( 7) = 1.28 x 1(T12. (3.20) a In progressive w aterm ark detection, the probability of false alarm can be larger because the num ber of the selected coefficients may not be large enough. We choose a larger r to get a higher threshold to lower the probability of false alarm as much as possible. It is also possible to adjust the r value to adapt to different detection situations. After setting up the threshold, we would like to analyze the peak value of the correlation response when a certain w aterm ark is found to exist in an image. To simplify the analysis, it is assum ed th at the extra LSB /3is and the bit-shift num ber 8bs are both equal to ( 5 in all blocks. If the waterm ark exists, (3.15) can be modified as E {(/„ + ^ x 2«)x 2 - x tr,,} ■ p E (w d x W„) Y.(Wi xWj) ' ( ' 1 where W d and I d are the waterm ark symbol and the original coefficient selected by the waterm ark detector. To be more precise, we calculate the expected value on both sides as £\p] = £ U I d X w d) lU W d X W d)J x 2~s + 1. (3.22) As discussed in Section 3.2.1, one may think th a t the first term on the right-hand side of (3.22) is zero so th at the expected value of the m axim al correlation peak is unity. 63 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. However, the peak can be substantially larger th an 1 if a certain waterm ark exists in the image as argued below. Let I e denote th e original coefficient selected by the waterm ark embedder, i.e., the coefficient satisfies (3.12), and W e be its respective waterm ark symbol, i.e., the w aterm ark symbol to be cast on the selected coefficient. First, we calculate the expected cumulative sum of I e x W e: £[Re] = £ [ £ ( I e x We)] . (3.23) Note th at £[B,e] = 0 because I e and W e are independent. Let us divide £[Re] into two parts, i.e., £[Re] = £[Re]] + £[Re2] = £ [ £ ( J ei x Wei)] + £ [ £ ( I C 2 x W e, 2)] . (3.24) I e2 is the coefficient satisfying both of the following conditions: 2q < ||Je2|| < 2q + 2s, q>8, (3.25) and I e2 x W e3 < 0, (3.26) where We2 is the respective w aterm ark symbol of I e2. Obviously, £[Re2] is a negative number so th at £[Rei] has to be positive to make £[Re) equal to 0. However, owing to the process of waterm ark detection, coefficients Ie2 in R e, 2 will not be picked by the detection process in waterm ark retrieval since its highest non-zero bit is q — 1 , which is lower th an 64 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. q. Therefore, the expected value of the correlation response calculated in the detection process is equal to S\p] = £ [R e ] - £ [ R x 2 “ J + l = £ [ R eiJ x2_i + l > l (3.27) Magnitude of the original coefficients Threshold for embedding and detection Watermark embedding Magnitude of the watermarked coefficients D E F G H f J A B C D E F G Coeff. Sign of Sign of Selected for Change of Selected for Coeff. watermark embedding? magnitude detection? A + - Yes - No B + + Yes + Yes C + Yes Yes D + No No change No E + Yes No F + + Yes + Yes G No No change No H + Yes No + + No No change No J Yes + Yes Figure 3.3: E xtra correlation gain from coefficient selection. The number of coefficients I e2 is quite large so th at the first term of the right-hand side in (3.27) can be larger th an 1 to generate a peak value p th a t could be even equal to 2 when the w aterm ark exists in the image. Therefore, the first term of the right-hand side in (3.22) is positive and does make a contribution to the peak correlation response when a waterm ark is embedded. An example is shown in Fig.3.3. During w aterm ark embedding, the coefficients D, G and I will keep unchanged because their m agnitude is 65 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. lower th an the embedding threshold. They will not be chosen for w aterm ark detection either because the threshold value for waterm ark detection is the same with the one for embedding. The coefficients A, E and H are chosen for w aterm ark embedding but will not be chosen for w aterm ark detection since their m agnitudes are lower th an the threshold value for waterm ark detection after waterm ark embedding. These coefficients are I e2 and the ignorance of I e2 in w aterm ark detection will increase the correlation response since the sign of these coefficiensts and th at of the waterm ark symbols are different. One m ain concern of the correlation-based w aterm ark detection is th e efficiency issue. In applications where one is required to determ ine which w aterm ark ID num ber is em bedded, we may have to check all possible ID numbers. By assum ing the total num ber of users is 232, it is not practical to try from ID num ber 0 to 232 — 1 to determ ine the exactly embedded w aterm ark ID number. Thanks to the block coding of JPEG-2000, we can simplify the detection structure. We first divide all coding blocks into n subset Si, i = 1, ■ ■ •, n. We can embed k bits in each of the subset S{. Therefore, the total num ber of bits th at can be em bedded in the image is k x n. In w aterm ark detection, we only need to check 2k different num bers in each subset to decide the &-bit value, k x n bits can be decoded correctly after n subsets are processed. If we also consider the sign of the corre lation response, we can check only 2k~ ~ l waterm ark candidates in each block. For larger images, we are able to have more divided subsets to allow a larger waterm ark capacity. However, because spread-spectrum waterm arking is adopted in the system, a spreading gain m ust be m aintained to reliably embed and retrieve the w aterm ark. Therefore, there exists a tradeoff between robustness and capacity. 66 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.2.7 E xperim en tal R esults In this section, we show some experim ental results to dem onstrate the robustness of the proposed w aterm arking scheme. The em bedded w aterm ark has an ID num ber 500, which is actually the seed to generate the random w aterm ark sequence. The waterm ark is embedded when the image is compressed. It can be detected when the JPEG-2000 bit-stream is expanded. We test 1000 w aterm ark ID num bers to see if the correct one is detected without ambiguity. A threshold value calculated w ith (3.19) is used to determ ine if there exists a certain w aterm ark in the image. First of all, we examine the param eters to be used in the experiments. In JPEG-2000 implem entation, the coefficient precision P can be either 32 or 16. We choose 32 since it is commonly used in software or hardware designs today. The guard bit G is chosen as 2, which has been shown a reasonable num ber to avoid the overflow problem. Therefore, the nominal range of subband coefficients will be in (— 229 , 229). The image sample precision I is equal to 8 for gray-level images. In the experiments, we let q in (3.12) be equal to 24 so th at if a coefficient th at has the non-zero bit higher th an or equal to 24 will be viewed as a significant coefficient for w aterm ark embedding or retrieval. The waterm ark sequence Wba, which takes value 1 or - 1, is left-shifted by 2 2 -I- ctba bits and then added to the selected coefficients to form waterm arked coefficients as indicated in (3.14). The value of “extra LSB,” which indicates the num ber of insignificant bit-planes in a coding block, is determ ined by JPEG-2000. We do not make any change on it. In w aterm ark detection, our algorithm tends to generate a very high correlation response if the w aterm ark exists. This allows us to set a higher threshold value to avoid any possibility of false alarm. The threshold scaling param eter r in (3.19) is set to 7. If the progressive w aterm ark detection 67 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. is used, we choose r to be 8.5. As m entioned in Section 3.2.4, we should define the m inim um num ber rj of the selected coefficients to claim the existence of waterm ark. In the experiment, rj is chosen to be 500. Two JPEG-2000 test images, “Bike” and “W oman,” w ith size 2048 by 2560, as shown in Fig. 3.4 (a) and (c), respectively, are used to dem onstrate the invisibility of the embedded waterm ark. Because of the large size of the images, we encode them with a lower bit rate equal to 0.5 bpp so th at they can be stored and transm itted efficiently. The SNR progressive mode is enabled to dem onstrate the fully-embedded feature and progressive waterm ark detection. W hen the w aterm ark function is disabled, the peak signal to noise ratio (PSNR) between the original and the compressed images of “Bike” and “W oman” are 33.54 dB and 33.70 dB, respectively. The PSN R values between the original and w aterm arked/com pressed images are 32.49 dB and 33.09 dB, respectively. It is clear th at the quality degradation resulting from w aterm ark insertion is very little. The w aterm arked images are shown in Fig. 3.4(b) and (d). Next, we dem onstrate the correlation response in the w aterm ark detection process. D etection results are shown in Fig. 3.5. We can see clearly th a t there exists a peak with the w aterm ark ID num ber 500 in bo th cases. The peak value of the correlation response is much larger th an the threshold value T, which is shown as the break; line in the figures. The responses of other waterm arks are much lower th an T . The target w aterm ark can thus be determ ined unambiguously. It usually takes a while to decode such large images completely from the entire bit stream . Furtherm ore, w aterm ark detection may involve a lot of coefficients so th a t the waterm arking process is also time-consuming. We take the “W oman” image for example. 68 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 3.4: Original images v.s. com pressed/w aterm arked images: (a) original “Bike,” (b) waterm arked “Bike” (PSNR:32.49dB), (c) original “W oman” and (d) waterm arked “W oman” (PSNR:33.09dB). B oth images are w ith size 2048 x 2560 and are compressed with 0.5 bpp. There are over 400K coefficients selected for waterm ark detection. Actually, the num ber of selected coefficients necessary for w aterm ark detection can be reduced. We use progressive w aterm ark detection by taking advantage of the fully-embedded feature of JPEG-2000 to speed up the waterm arking process. There are two ways to dem onstrate this property. The first one is to adopt progressive waterm ark detection. During the decoding process, whenever the correlation response is larger than the threshold T and the num ber of se lected coefficients is larger th an 77, we stop waterm ark detection and claim the existence 69 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0 100 200 300 400 500 600 700 800 900 1000 Watermarks 0 100 200 300 400 500 600 700 800 900 1000 Watermarks (a) (b) Figure 3.5: W aterm ark detection results for (a) “Bike” and (b) “W om an” w ith 1000 wa term ark sequences tested. of waterm ark. We test “W oman” image in progressive w aterm ark detection. The result is shown in Fig. 3.6(a). The num ber of selected coefficients is 32,611. The value is still large because we set up a conservative threshold to avoid false alarm . In the second ap proach, we do not use progressive detection b u t specify the decoding rate of the image. Fig. 3.6(b) shows the detection result when the image is decoded at a bit rate equal to 0.01. The PSN R of the decoded image is 22.98 dB. In this case, the num ber of selected coefficients is 6,511. The existence of waterm ark w ith ID num ber 500 is detected in both cases, yet the speed of w aterm ark detection is greatly improved. The next example is ROI waterm arking. ROI coding is especially useful for large images. We use the other JPEG-2000 test image, “Aerial2” (2048 x 2048) shown in Fig. 3.7(a), as an example because application of ROI is im portant for aerophotography. We assign two rectangular regions which cover two constructions in image “Aerial2” as the desired ROI. We then enable the SNR progressive mode to encode the image, and then decode it w ith a lower bit rate. We can see from the Fig. 3.7(b) th a t the two regions 70 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.5 f 1 ----------- 1 ----------- 1 ------------: ------------ r— -----1 -----------1 ------------1 ------------ 1 ---- — | 2 $ 1.5 - 100 200 300 400 500 600 700 Watermarks 0 100 200 300 400 500 600 700 800 900 1000 Watermarks Figure 3.6: Progressive waterm ark detection: (a) w aterm ark detection by using the pro gressive mode and (b) w aterm ark detection of the image at a bit rate of 0.01 bpp are well reconstructed while other parts of the image rem ain blurred. The ROI coding can be verified by the spatial difference of the transm itted image and the roughly decoded image as shown in Fig. 3.7(c) where the lighter the pixel is, the larger the difference between the two images. The detection result shown in Fig. 3.7(d) indicates th a t the proposed waterm arking scheme m atches the ROI feature of JPEG -2000, and the embedded waterm ark can be detected w ithout any difficulty. Besides, w ith ROI waterm arking, it is also possible to embed different w aterm ark ID num bers into different objects in the same image. In this case, the waterm ark can be viewed as a function of d ata labeling, which may benefit content-based retrieval in the m anagement of m ultim edia databases as we mentioned before. Although wavelet-based coding schemes have the advantages over the block DCT-based coding m ethod in term s of the rate-distortion tradeoff perform ance, reconstructed images still suffer from various coding artifacts such as ringing effect, graininess, and blotchiness, etc. In JPEG-2000 VM, a postprocessing technique is used to reduce these artifacts so 71 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. I I --------------1 --------------1 --------------1 --------------L _ ----- ------------------ L _ _ _ ---- 1 --------------1 --------------1 --------------1 0 100 2 0 0 300 4 0 0 500 600 7 0 0 800 900 1000 Watermarks (d) Figure 3.7: ROI watermarking: (a) the fully reconstructed image from ROI coding b it stream , (b) the decoded image w ith a bit rate of 0.4 bpp, (c) the spatial difference of these two images (the lighter the pixel is, the larger the difference and the two black rectangles are the assigned ROI) and (d) the w aterm ark detection result. th at the overall visual quality of decoded images can be improved substantially. Therefore, we apply the JPEG-2000 postprocessing technique [33, 69] to the decoded image and then test the performance of w aterm ark detection. We decode the “Bike” image w ith 0.125 bpp and then feed it to the postprocessing stage w ith three iterations. The effect of the postprocessing can be understood by com paring the two images, before and after postprocessing. Fig. 3.8(a) and (b) show only p art of the “Bike” image. The ringing artifact around the bike handler in Fig. 3.8(a) is greatly reduced in Fig. 3.8(b) so th a t the visual quality is improved. The w aterm ark detection result dem onstrated in Fig. 72 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.8(c) indicates th at our waterm arking scheme can be coupled very well w ith JPEG-2000 including the postprocessing procedure. O 0.2 0 100 2 0 0 3 0 0 4 0 0 500 600 7 0 0 8 0 0 900 1000 Watermarks Figure 3.8: Postprocessing: (a) part of the “Bike” image before postprocessing, (b) p a rt of the “Bike” image after postprocessing, in which the ringing artifacts are greatly reduced and (c) the waterm ark detection result. We then apply a series of attacks to show the robustness of the proposed waterm arking schemes. First, we consider compression attacks, i.e., to compress the image w ith other schemes at low bit rates. The well-known D CT-based codec, JP E G , and the wavelet- based codec, SPIHT, are the two compression attacks under test. The attacked image is encoded into the JPEG-2000 bitstream for w aterm ark detection. Since perceptual loss 73 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. caused by compression varies in different images, some images can be compressed with a higher compression ratio yet preserving good image quality. To verify th a t the waterm ark is robust against JP E G and SPIH T attacks, we choose to compress the image w ith extremely low bit rates. T hat is, the “W oman” image is compressed by JP E G w ith quality factor equal to 1 and by SPIH T w ith a bit rate of 0.005 bpp. The resulting images and detection results are shown in Fig. 3.9. In the JPEG -attacked image as shown in Fig. 3.9(a), a very serious blocking artifact appears since JP E G encodes an image block by block. The SPIHT-attacked image, as shown in Fig. 3.9(c), is blurred very much since only a few coefficients are used to reconstruct the whole image. The PSN R values of the JPEG - and SPIH T-attacked images are 22.61 dB and 22.50 dB, respectively. A lthough the two images are compressed to an unacceptable degree, the embedded w aterm ark still survives well as shown in Figs.3.9(b) and (d). GIF is a popular file form at for graphics. Unlike JPE G , the m axim um num ber of colors th at can be used for a picture is 256. For some images, annoying visual degradation will not be generated when they are converted to the GIF form at. Thus, color reduction is another im portant type of attack th at a w aterm ark must resist. We tested two kinds of attacks. The first one is to reduce the num ber of colors from 256 to 4 for a gray-level image. The second one is image halftoning th at is used quite often in FAX, newspapers, or other publications. The attacked images with the detection results are shown in Fig. 3.10. The distinctive peak of the correlation response indicates the survival of the waterm ark even though the attacked image is visually different from the original one. W ith the advances of graphical tools, users may edit an image w ith some artwork. Thus, it is interesting to test the robustness of the proposed w aterm arking scheme by 74 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0 100 200 300 400 500 600 700 800 900 1000 Watermarks 0 100 200 300 400 500 600 700 800 Watermarks (c ) (d) Figure 3.9: Com pression attacks: (a) the w aterm arked “W oman” image compressed by JPE G w ith quality factor equal to 1 (PSNR: 22.61 dB), (b) the w aterm ark detection result of the JPE G -attacked image, (c) the w aterm arked “W om an” image compressed with SPIHT at a bit rate of 0.005 bpp (PSNR: 22.50 dB) and (d) the waterm ark detection result of the SPIH T-attacked image. using a popular editing tool, e.g. the Paint Shop Pro. In order to dem onstrate the attack effect, we choose the two classic images, “Lena” and “Baboon” shown in Fig.3.11(a) and (c) as the tested images because of their diverse image characteristics and structures. Both of the images are of the size 512 x 512. We encode “Lena” w ith bit rate 0.35 bpp and “Baboon” w ith 2 bpp so th at the PSN R values of the resulting compressed/ watermarked images (compared w ith the original image) are around 34 dB. For “Lena” image, if the 75 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 100 200 300 400 500 600 700 800 900 1000 Watermarks 100 200 300 400 500 600 700 Watermarks (c) (d) Figure 3.10: Color reduction attacks, (a) The attacked image w ith 4 colors and (c) the image undergoing halftoning, (b) and (d) are the detection response of (a) and (c) respec tively w aterm ark function is disabled, the PSN R value is 35.64 dB, while th at of the w ater m arked/com pressed image is 34.44 dB. For “Baboon” image, the PSN R values of the compressed and the com pressed/waterm arked images are 34.98 dB and 34.11 dB, respec tively. Thus, it is clear th at quality degradation resulting from w aterm ark insertion is limited. The two waterm arked images are shown in Fig.3.11(b) and (d). Detection results are shown in Fig. 3.12. The tests include 0) no attack, 1) sharpening, 2) edge enhancem ent, 3) low-pass filtering, 4) high-pass filtering, 5) dilating, 6) eroding, 7) 76 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 3.11: Images in extensive w aterm ark testing: (a) the original “Lena,” (b) the com pressed/waterm arked “Lena” (bit-rate: 0.35 bpp, PSNR: 34.44 dB), (c) the original “Baboon” and (d) the com pressed/w aterm arked “Baboon” (bit-rate: 2 bpp, PSNR: 34.11 dB). histogram equalization, 8) mosaic, 9) 25% uniform noise adding and 10) 25% random noise adding. We show the maximum response (resulting from the em bedded w aterm ark), the second largest response (related w ith one of the other 999 w aterm arks) and the detection threshold. We can see th at different attacks do have varying effects on the embedded waterm ark while the w aterm ark is still robust under these attacks since the w aterm ark is embedded in the significant coefficients th a t usually rem ain stable after most image processing operations. 77 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. -© Maximum resp on se S eco n d largest response Threshold S i l 1 3 1 Attacks on Baboon Maximum resp o n se S eco n d lar Threshold responss i I < 5 Attacks on Lena (a) (b) Figure 3.12: Detection results of extensive waterm ark testing on the waterm arked images, (a) “Lena” and (b) “Baboon.” Finally, we would like to m easure the false positive rate of the proposed system. As mentioned in Section 3.2.6, there are two cases in false positive detection: (1) the water m ark is detected in an un-w aterm arked image and (2) the wrong w aterm ark ID is detected. In the first case, i.e., the w aterm ark is found in an un-waterm arked image, the best m ethod to measure the probability of false positive detection is to detect w aterm arks in numerous clear (un-watermarked) images by using the proposed w aterm arking scheme. This part is difficult to achieve due to the shortage of content sources. However, we believe th at the Gaussian assum ption should hold in this case so th at the false positive rate should be covered by our analysis. The m ajor concern of the accuracy of the G aussian assum ption comes from the second case, i.e., a wrong w aterm ark is detected from a w aterm arked im age w ith different ID. At this point, we would like to verify it by experim ental data. To do so, we generated 100000 w aterm ark sequences, constructed the w aterm arked image by embedding one of the waterm ark sequences and used 100000 w aterm arks (one correct wa term ark and the other 99999 incorrect waterm arks) for detection. We counted the num ber 78 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Error rate Num ber of wrong w aterm ark detected Expected num ber of false detection 5 x 1 0 -3 525 500 10"3 99 100 5 x 10~4 55 50 IQ"4 12 10 8 x H T 5 10 8 5 x 10” 5 7 5 3 x 1Q~5 3 3 10~5 1 1 Table 3.1: The num ber of false positive detections m easured from 100000 tested waterm ark sequences vs. estim ated num ber of false positive detections based on Gaussian assum ption. of tested waterm arks w ith a correlation response higher th an the threshold value set ac cording to the allowable false positive rate. We tested the false positive rate from 5 x 10~3 to 10~5. We m easured more points around 10~~4 to 10~5 because these m easurem ents may be more correct and im portant. The results are shown in Table 3.1. We see th a t the number of false detection m atches pretty well w ith the predicted false detection based on the Gaussian assum ption. Table 3.1 verifies the suitability of our analysis. By following this trend, we expect th a t the false positive analysis will work when a higher threshold value (with lower false positive rate) is set. 3.3 Steganography in JPEG-2000 Compressed Images In this section, we consider a different scenario of inform ation hiding in JPEG-2000 from the waterm arking scheme presented in Section 3.2. We focus on developing a stegano- graphic scheme under the framework of JPEG-2000 so th a t a high volume of information can be secretly transm itted to the intended recipient in a more reliable fashion. The steganographic application may be of certain m ilitary usage. It can also be useful when 79 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the inform ation is sensitive in some way th a t the sender and the receiver would like to exchange it through a public channel w ithout being noticed by the third party. In our case, a digital image serves as a host or is viewed as a camouflage to cover the existence of the hidden inform ation. Instead of focusing on the robustness issue as done in most of the digital image waterm arking research, we take capacity, reliability and security into more serious consideration. The goal is to ensure th a t the suitable am ount of hidden information be transm itted w ithout errors. Two approaches to achieve steganography in digital images are commonly used. The first approach is to embed the inform ation into the imagery d a ta w ithout taking any file format into consideration, as the way many digital w aterm arking schemes are operated. The second approach is to explicitly work on a specific image form at. We believe th at the second approach may be more appropriate in the application of covert communication. First of all, digital images are usually compressed to facilitate their storage or transmission. In natural images, the lossy compression is more commonly employed to form a compact representation of the image. By embedding the secret inform ation into the imagery data, we run the risk of losing it by the subsequent image compression, which contradicts the requirem ent of reliability. Besides, embedding the hidden inform ation in the compressed images may also help avoid the problem of attacks since transcoding or signal processing procedures rarely happen in the process of transm ission if the images have been stored in a certain form at. However, embedding inform ation in the compressed bit-stream is much more difficult th an in the imagery dom ain since the space for hosting the hidden inform ation has been significantly squeezed by m odern compression m ethods. 80 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Many of previous inform ation-hiding schemes operating on the compressed domain were based on JPE G . The choice is partly because th at most of the still images circulated nowadays are compressed w ith JPE G . T he other reason is th at, as a block DOT codec, JPE G lends itself to a good candidate for inform ation embedding due to its fixed block structure. As JPEG-2000 is viewed as a prom ising image standard in the near future, we develop novel and feasible approaches to effectively achieve steganography in JPEG-2000 compressed images. 3.3.1 C hallenges o f Stegan ograp hy in JP E G -2000 Before designing a steganographic scheme under the framework of JPEG-2000, we have to first determ ine an appropriate position in JPEG -2000 coding flow for effective information embedding. From the structure given in Fig. 3.1, there are three positions to be considered. We examine their suitability for inform ation embedding below. (1) Image Transform After the intra-com ponent image transform , the image d ata are transform ed to wavelet coefficients. If we modify the d ata at this stage, the scheme will be equivalent to many ex isting wavelet-based waterm arking algorithm s, which may take other wavelet-based codecs as attacks. For digital waterm arking, the payload is usually low, and m ultiple embedding w ith the m ajority detection or the spread-spectrum concept can be applied. The embed ded inform ation can thus have sufficient robustness against lossy compression of another codec. However, m ultiple embedding is not suitable in the steganographic application as the required payload is usually high and we have to make efficient use of the already lim ited bandw idth. 81 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (2) Quantization Q uantization is an im portant step in image compression, which reduces certain visual redundancy for efficient coding. As m entioned in Section 3.1, quantization is the prim ary source of inform ation loss. We have to avoid losing the hidden d a ta due to coarser quan tization by embedding them in the quantization indices. The solution works for JPE G (as many JPEG -based data hiding schemes operate on the quantization indices), but is not good for steganography in JPEG-2000. It should be noted th a t wavelet-based coders usually truncate the compressed bit-stream to fulfill the targeted bit-rate. In JPEG-2000, PCRD optim ization strategy is adopted so th at the truncation mechanism is activated af ter the whole image has been compressed. If embedding the inform ation at this stage, we cannot predict exactly which quantization index or bit-plane of an index will be included in the final code stream . The em bedded inform ation will not be perfectly recovered unless the lossless compression mode is chosen. (3) Coding If the inform ation is em bedded in the output of tier-2 coding, i. e., the JPEG -2000 pack ets, it can be guaranteed th at all the em bedded information will be received w ithout error and in a correct order because we avoid the two m ajor sources of inform ation loss, i.e., quantization and bit-stream truncation. However, we will have difficulty in m odifying the packets for inform ation embedding since the bit-stream s may have been com pactly com pressed by the arithm etic coder. Careless modification could result in failure of expanding the compressed image. 82 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.3.2 P rogressive E m bedding o f a H id d en Im age and Its D rawbacks There does exist a solution to partially achieve high-volume inform ation hiding in wavelet- based codecs. From the previous discussion, we know th a t the hidden inform ation could be lost after the subsequent truncation of the compressed bit-stream if it is em bedded in the quantization index. However, an intuitive concept indicates th a t certain indices may have a better chance of survival since they are of more significance. To be more specific, wavelet coefficients in lower frequency subbands are usually more im portant th an those in higher frequency bands. An extrem e example is resolution progressive transmission, in which lower frequency subbands will be sent prior to higher ones. In this case, lower frequency bands should be preserved well at high bit-rates. On the other hand, although some portions of the embedded inform ation may be lost, the recipient can still receive enough inform ation if the significant portions are transm itted successfully. Therefore, if the hidden inform ation is a digital image of a smaller size, we may transm it it by embedding in the quantization indices. This general idea should work in m ost of the wavelet codecs. We briefly describe the idea and factors th at should be considered. First of all, we decompose the hidden image with the wavelet transform . The num ber of wavelet decom posing levels and the image size should be related to the host image. For example, if the host image is 512 x 512 and decomposed w ith 3 levels, we may set the rule th at the hidden image is one fourth the size, i.e., 256 x 256, w ith the same decomposing levels. We may need to pad the sides of the hidden image when its size is smaller than required. This strategy is to make sure th at no image specific inform ation be necessarily known by the re cipient. Besides, it should be noted th at coefficients in each level be represented by a fixed num ber of bits, which is also known by both sides. Then we embed the wavelet subbands 83 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. of the hidden image according to their im portance into the counterparts in the host image. Basically, lower (higher) frequency subbands of the hidden image will be embedded in the lower (higher) frequency subbands of the host image. Em bedding is based on modification of lower bit-planes of the host quantization index, i.e., replacing its least significant bits with the bits of the hidden information. The num ber of the bit-planes used for inform ation embedding will affect both capacity and quality of the resulting image. Em bedding fol lows certain predefined agreements between the inform ation em bedder and the recipient. A rule of thum b is th at the sign of a coefficient, which is com paratively im portant than the magnitude, should be embedded more carefully. In addition, bit-planes of lower frequency subbands of the hidden image should also be b etter em bedded th an those of the higher frequency subbands. Perm utation can be applied to the hidden inform ation before em bedding to increase uncertainty. The recipient should be able to perm ute it back correctly to reconstruct the hidden image. We should not embed any data in the lowest frequency band, i.e.. the DC band, of the host image to avoid generating unpleasant artifacts. Many existing wavelet-based w aterm arking schemes may emphasize the function of progressive em bedding/detection. However, there are some drawbacks in the idea of pro gressive hidden image transm ission. First of all, we know from the previous discussion th at many param eters have to be known in advance by both the sender and the receiver so th at the hidden image can be correctly extracted and perceived. Nevertheless, if a lot of information has to be shared beforehand between the sender and the receiver through a certain side channel, the steganographic scheme becomes im practical. Besides, if the hidden information is an image, the eavesdropper may have more chances to detect its existence, given th at the characteristics of an image are quite different from th a t of the 84 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. host signal. Encryption might not be applicable here since we cannot guarantee th at the hidden inform ation be transm itted w ithout errors due to the truncation of JPEG-2000 or other wavelet-codec and an encrypted bit-stream is usually not robust to any error. The most we can do to increase the security level is to perm ute the position of the wavelet coefficients of the hidden image as m entioned above. It is not clear how this perm utation procedure will prevent the eavesdropper from detecting the hidden image. Finally, the re quirement of embedding an image as the hidden inform ation significantly lim its the usage of this algorithm. The em bedded inform ation should be general binary data, in any form such as texts, encrypted bit-stream s, raw / compressed images, or else, to achieve practical covert communication. It is apparent th at the m ethod presented above may not meet our requirements. In the following sections, we would like to develop a practical steganographic scheme to convey high-volume general binary data in a more secret and reliable manner. 3.3.3 Inform ation E m bedd ing w ith Lazy M ode C odin g As analyzed in Section 3.3.1, in order to transm it general binary d ata secretly without error, we should embed the hidden inform ation in the JPEG-2000 packets. However, we have to avoid modifying the bit-stream th at is entropy coded for its correct decoding. In JPEG-2000, a so-called “lazy m ode” coding option is introduced, in which the arithm etic coding procedure is completely bypassed for m ost of the significance and the m agnitude refinement coding passes. Thanks to this lazy mode coding option, we propose a stegano graphic scheme, which solves all the above-mentioned problems to achieve reliable covert communication in JPEG-2000. 85 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. This lazy m ode choice for the proposed scheme can be further justified below. It has been observed th at, at high bit-rates, the symbols produced by the significance and refine ment passes have distributions close to a uniform one so th at there is no substantial benefit from arithm etic coding. Bypassing the MQ coder can thus reduce the complexity and im prove the execution speed w ithout degrading the coding performance. Since images used to hide a large volume of d ata are usually compressed at high bit-rates, it is appropriate to embed inform ation in JPEG-2000 with the lazy mode enabled. Besides, our embedded information could also be uniformly distributed given th a t encryption is usually applied. Therefore, these passes tu rn out to be good candidates for m odification w ith less chance of being noticed. The proposed scheme is to embed the d ata in the selected m agnitude refinement passes. W ith the lazy mode, except for the four m ost significant bit-planes, the rem aining such passes are raw coded so we can simply modify the raw d ata w ithout causing problems. However, we should ensure th at certain bit patterns never occur in the output, since such patterns may be used for the error resilient purpose. The choice of the m agnitude refinement pass for inform ation embedding is explained as follows. Among the three types of passes, the significance pass carries significance and necessary sign information. Any modification of the significant passes could cause either sign flipping or decoding errors. For the cleanup pass, it is sometimes run-length coded so its modification is still prohibited. The m agnitude refinement pass carries subsequent bits after the most significant bit for each sample. T he significant bits can act as visual masking to make the m odification of these subsequent bits less obvious. By considering these 86 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. factors, we conclude th at only the m agnitude refinement pass is suitable for information embedding. 3.3.4 S election o f R efinem ent P asses for E m bedd ing In order to avoid degrading the composite image severely, we m ay only use a subset of the raw coded m agnitude refinement passes for inform ation embedding. We describe three scenarios of selecting suitable m agnitude refinement passes as follows. (1) Fixed number of the lowest bit-planes The most straightforw ard m ethod is to examine the bit-planes where these raw coded magnitude refinement passes are located. Given th at the to tal num ber of meaningful bit- planes in a subband is K , which is signaled explicitly in the code stream , the modification for information embedding is restrained to those bit-planes lower th an K — G. The smaller G is assigned, the more bit-planes can be modified so th a t more hidden inform ation can be carried. Basically, this idea is similar to the common LSB-based inform ation-hiding methodologies, in which only the lowest few bit-planes are modified to avoid introducing visible artifacts. The subtle difference is th at, in this steganographic scheme, not all the data in those bit-planes b u t the d ata included in the m agnitude refinement passes are affected by the embedding process. The advantage of this em bedding scenario is its simple im plem entation since b oth the inform ation embedding and extraction can be done efficiently in the tier-2 coding. Besides, bo th of the inform ation em bedder and extractor only need to know the param eter G to achieve successful secret comm unication. The amount of the inform ation th a t has to be transm itted through other subsidiary channels is thus significantly reduced. 87 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (2) Bit-planes below the M SB A more sophisticated way is to take the MSB of the quantization index into account. In this embedding scenario, the digit of a quantization index is allowed to be modified for information embedding if it is located at the bit-plane th at is below the MSB by at least P bit-planes. This idea is somewhat analogous to those w aterm arking schemes th at use the m agnitude of a host coefficient to scale the embedded w aterm ark [14, 13]. If a host signal is large in m agnitude, we may modify it w ith a greater scale so th a t more information or a stronger w aterm ark can be em bedded owing to the masking effect. Therefore, the capacity of the steganographic scheme may thus be improved w ithout further affecting the visual quality. In addition, we can intentionally ignore some bit-planes of certain coefficients for embedding due to a more advanced visual m asking m odel or increased security concerns. However, the complexity of the im plem entation increases accordingly since the inform ation em bedding/extraction processes become coefficient-wise, instead of simply viewing the whole bit-planes as a group. In this way, the tier-1 coding has to be involved in the inform ation em bedding/extraction processes. Besides, we have to deal with problems such as the varying length of the coding passes and the special p attern for error resilience in a more careful manner. (3) Backward embedding A better way of selecting suitable m agnitude refinement passes for inform ation embed ding is to consider the im portance of these passes to the overall quality of the compressed image. As explained in Section 3.1.4, the tier-2 coding achieves rate scalability through multiple quality layers. Each coding pass is either assigned to one of the layers or dis carded according to its rate-distortion slope, which is calculated in the tier-1 coding and 88 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. passed to the tier-2 coding for organizing the code stream . The coding passes w ith larger rate-distortion slopes are included earlier in the lower layers, while the coding passes with smaller rate-distortion slopes are included later in the higher layers. O ur goal is to hide as much inform ation as possible w ith the m inim al im pact on the image quality. Obviously, the embedding process should function in the opposite order of the tier-2 coding by selecting less im portant coding passes earlier for modifying. There fore, we propose the idea of backward embedding to take account of the im portance of the passes to the overall image quality. After the tier-2 coding determ ines the passes th at will be included into the code stream , an extra procedure embeds the inform ation backward, starting from the last included refinement pass. On one hand, modifying these insignificant passes may be sim ilar to discarding them , which does not severely affect the image, compared to those passes included earlier in the lower layers. On the other hand, the length of these passes could be larger so th at we can actually hide more information. The embedding procedure can be carried out until the image quality has been degraded within an acceptable level. A term ination p attern may be necessary to signal the end of embedding so th at the decoder can learn when to stop extracting the hidden information. 3.3.5 Issues on Backw ard E m bedding By considering the complexity and the performance, backward embedding is adopted in the proposed steganographic scheme. W ith m ultiple layers being employed in JPEG-2000, backward em bedding can easily select those passes th a t are less im portant to the com pressed image for inform ation embedding w ithout considering the location of the bit-plane and the associated coefficient. The embedding process can thus be done very efficiently 89 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. since only the tier-2 coding is involved in the embedding process and the coding structure of JPEG-2000 can be kept almost the same except th at an extra procedure to embed the d ata in a backward fashion is necessary. It is noteworthy th a t a m ajor benefit of the proposed JPEG-2000 steganographic scheme over the existing JPE G schemes is its controllable rate-distortion trade-off. Here, the rate means the capacity of the hidden in form ation while the distortion is referred to as the additional degradation resulting from the inform ation embedding process. In existing JP E G embedding schemes, the effect on image quality due to information hiding is usually unpredictable since it is difficult to achieve good rate-control in the JP E G standard. In contrast, our scheme may exploit the characteristics of wavelet-codecs to achieve a b etter balance between the payload of the hidden inform ation and the resulting image quality. As the embedding process starts from the last included m agnitude refinement pass, the ending point of embedding will decide the distortion of the composite image. A simple scenario for controlling capacity and distortion goes as follows. If the image will be compressed w ith the bit-rate equal to B bpp, we can guarantee th a t the composite image will have the quality of the image compressed w ith C bpp, where C < B, by embedding the raw coded m agnitude refinement passes until the one th at is included in both the bit-stream w ith B bpp and the bit-stream w ith C bpp. The idea is easy to implement but the estim ation of the quality is very conservative since we do not modify all the three coding passes in a bit-plane but m agnitude refinement passes, which may only occupy a small portion. We can see this argum ent from an extrem e case th a t C is equal equal to 0 while the resulting composite image will still have a reasonably good quality. Therefore, to dem onstrate the advantages of the proposed JPEG -2000 steganographic scheme over other 90 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. existing schemes based on the JP E G standard, a more accurate quality m easurem ent is necessary. If MSE is used as the quality measure, the most accurate way is to calculate the additional distortion in the spatial (or the image) domain. However, the complexity is too high in this approach since we have to expand the compressed bit-stream several times after each embedding of a pass. A more practical way is to evaluate the additional distortion in the wavelet dom ain along w ith the generation of the bit-stream . In JP E G -2000, the overall distortion in term s of MSE of the compressed image and the original image is estim ated by (3.3) and the rate-distortion slope can then be derived as described in Section 3.1.4. The distortion estim ation is based on the two assum ptions, i.e. the orthogonal wavelet basis functions and uncorrelated quantization errors. A lthough neither of the assum ptions are held perfectly, the estim ation is acceptable in the case of compression. Following the same route, we can calculate the additional distortion introduced by inform ation embedding. During inform ation embedding, distortion happens when the bit flips from 0 to 1 or from 1 to 0, i.e., when the original bit and the embedding bit are different. The additional distortion can then be calculated by the sum of the difference between the original quan tization index and the index after possible bit flipping and scaled w ith the quantization step size and the L-2 norm of the wavelet base function. In the im plem entation, we need not keep the original quantized coefficient and the resulting coefficient to calculate their difference but evaluate it on the fly w ith each bit-plane processed. For the same index, if a bit in a certain bit-plane changes from 1 to 0, we record it as a negative change while if a bit changes from 0 to 1, we view it as a positive change. The difference between the index values before and after inform ation embedding can be calculated by taking the sum of the 91 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. positive changes subtracted by th e sum of the negative changes. We give a quick example. If the original value is 44 (101100) and the value after embedding is 50 (110010). Each of the positive change and negative change happens twice. T he sum of positive changes will be 1 x 24 + 1 x 21 = 18 and the sum of negative changes will be 1 x 23 + 1 x 22 = 12 so the difference will be 6. This way of calculation is straightforw ard, and the benefit is its adaptation to the bit-plane coding structure. However, the exact determ ination of distortion comes w ith a few drawbacks, which increase the complexity of the im plem entation. First of all, the inform ation embedding is carried out in the tier-2 coding. At this stage, what the tier-2 encoder sees is only a bit-stream . It knows nothing more th an the position of the bit-plane or the code-block to which the pass belongs. W hen flipping a bit in a pass, we may not know exactly which coefficient will be affected. This problem may be solved by passing more inform ation from tier-1 coding to tier-2 coding. Since only the raw coded m agnitude refinement passes will be embedded w ith the hidden inform ation, the tier-1 encoder may have to send extra information to the tier-2 encoder, indicating the correspondence between the bit in the pass and its associated coefficient. For a 64x64 block, the extra inform ation may be 64 x 64 bits long for each pass w ith 1 representing th at the pass includes the bit inform ation and 0 representing the null information. The encoder is then able to scan w ith the same order to identify which coefficient will be affected by a certain bit flipping to evaluate the distortion. Nevertheless, the other problem exists. We have to keep the distortion value for each coefficient in each code block, which increases the memory consum ption significantly and contradicts the requirem ent of efficient memory usage in the tier-2 coding. Next, we 92 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. provide two rough evaluation m ethods, which simply operate in the bit-plane level and we will then validate their practibility by comparing them w ith the exact distortion approach. The first m ethod is to calculate the distortion by summing up M SE of bit flipping in each bit-plane. If a bit flipping happens, we add it to the overall distortion w ithout taking account of the coefficient. The additional distortion A D w ith one bit flipping is expressed as A D = u 2 h x (A b)2 x (2P)2, (3.28) where is the L-2 norm of the wavelet basis function, A& is the quantization step size associated w ith band b and p is the bit-plane position. It is apparent th at the distortion calculated in this way is only a rough estim ation since the exact change of the distortion w ith each coefficient is not recorded. We look at two simple but extrem e cases. In the first case, when the original value is 100 and the value after embedding is O il, the actual squared error is 1 while we overestim ate it w ith 21. In the second case, when the original value is 111 and the value after embedding is 000, the squared error is 49 while we underestim ate it w ith 21. However, in most common cases, we found th a t the overestim ate and underestim ate com pensate each other and this m ethod provides us a pretty good estim ation of the additional distortion introduced by information embedding. The second estim ation m ethod is to utilize the distortion of each pass calculated during the tier-1 encoding. The distortion improvement of each pass is estim ated by the differ ence between the distortion m easured by (3.3) before and after including the pass. The distortion is recorded and then evaluated with the rate increase of this pass for rate control 93 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. as m entioned in Section 3.1.5. We can view this step as a m easurem ent of im portance of the pass since the inclusion of the pass w ith large distortion improvement makes great im pact on the compressed image. We believe that, in inform ation embedding, if the host signal and the hidden signal are both of the uniform distribution, the distortion introduced by embedding or modifying the content of the pass would be very similar to discarding the whole pass. Therefore, we may also use this distortion m easurem ent of a pass as an estim ation of the distortion resulting from inform ation embedding. A clear benefit of this m ethod is th at the overhead of embedding is made as small as possible since the embed ding process shares the same procedure of distortion m easurem ent in the coding process. It should be noted th a t the m id-point reconstruction rule is often employed in the distor tion measurement in the tier-1 coding as shown in (3.2). Inform ation embedding, however, results in bit flipping w ithout m id-point reconstruction as done in the dequantization step. Therefore, we m ultiply the distortion estim ated in the tier-1 coding by 2 as a measurement of the distortion introduced in inform ation embedding of this pass. 3.3.6 Stegan alysis o f th e P rop osed Inform ation-H id ing Schem e As mentioned before, robustness is not the m ain issue of steganographic applications since we do not expect the attacker will modify the image content by either transcoding or other signal processing procedures, especially in the case th at the secret inform ation is hidden in a compressed file for storage or circulation. Capacity, reliability and security are the three m ajor concerns. In our inform ation-hiding scheme w ith the JPEG -2000 standard, we can guarantee th at the secret transm ission be carried out w ithout errors by embedding the information in raw coded m agnitude refinement passes. We can achieve high-volume covert 94 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. communication by choosing an appropriate am ount of passes for inform ation embedding. The remaining issue to be discussed is the security of the proposed scheme. In this section, we try to play the role of an eavesdropper to clarify some security issues to which we should pay attention when designing a steganographic scheme. The first step the eavesdropper may take is to analyze the bit-stream structure. One drawback of the proposed scheme is th at the inform ation-hiding procedure is operated in a special m ode of JPEG-2000, i.e. the lazy mode. Some people may question that the attacker may suspect the existence of certain hidden inform ation in the JPEG-2000 bit-stream if it is compressed w ith the lazy mode. Eventually, this problem depends on how popular the lazy mode will be. From our viewpoint, the lazy mode coding operation is very useful in high bit-rate image compression. The complexity can be significantly reduced by employing the lazy mode coding because the com putationally expensive MQ coding is bypassed while coding efficiency will not be affected much, especially in the high bit-rate coding, which is a very possible case for inform ation embedding. The ROI coding may be the only scenario th at the lazy mode is not appropriate to be applied. Therefore, we do not see m any reasons for not adopting the lazy mode coding in a broad range of imagery applications. Next, the eavesdropper may analyze the d a ta in the m agnitude refinement passes to see if any unusual distribution appears. As m entioned before, the reason why the MQ coder does not improve coding efficiency in the m agnitude refinement passes in the lower few bit-planes is th a t the distribution of these d a ta is close to a uniform one. Therefore, if we can encrypt or scramble the d ata in some way so th a t the hidden inform ation also has a uniform distribution, the chance th at the eavesdropper can tell the difference will be 95 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. small. However, we have to make sure th at some special patterns designed for increased error resilience in JPGE-2000 should not appear in the modified bit-stream . Aside from the purpose of correct expanding the compressed bit-stream , this cautious strategy can prevent th at the appearance of the m ark at the wrong position reveals the existence of the hidden information. The eavesdropper may further expands the compressed bit-stream to see if any ab normal situation happens. In the proposed steganographic scheme, we do not change the length of the m agnitude refinement passes but modify the binary content. In general cases, the modified m agnitude refinement passes generate the same num ber of symbols w ith the original passes. However, some special situations may happen when more symbols or less symbols are generated th an expected. This comes from the fact th a t extra bits are added by the encoder to avoid generating error resilience patterns as described before. The inac curate num ber of symbols will not affect the norm al operation of the coding but may give a loophole for the eavesdropper to sense the existence of the hidden information. Besides, the bit-stuffing at the end of the pass to comply w ith the byte boundary may appear differently if em bedding is done in a careless way. We may not em bed inform ation into the bits th a t are used for bit-stuffing if a unified bit-stuffing byte is adopted in most of the JPEG-2000 coders. In other words, we should only modify the bits th at come from the m agnitude refinement passes and avoid the bits used in the simplified im plem entation or any type of m arkers to ensure better security. A more advanced eavesdropper may examine the behavior of wavelet coefficients to see if possible hidden inform ation exists. This is actually an interesting topic to investigate if information embedding has different effects on wavelet coefficients from the quantization 96 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. process. We believe th at modifying the JPEG -2000 packets may result in some intriguing phenomenon on the inverse wavelet transform . We may study this subject by avoid the quantization step, i.e., by operating the inform ation embedding in the lossless compression mode. We leave this part as future research. 3.3.7 E xperim en tal R esu lts The im plem entation of our steganographic scheme was based on JA SPE R [1]. JA SPE R is a free reference code of JPEG-2000 offering the baseline coding w ith an excellent performance and thus serves as a good framework. In the experim ent, we used the four well-known gray-level images, “Lena,” “B oat,” “Peppers” and “Baboon” as the host images, all w ith the same size of 512 x 512, to carry the generalized binary information. We assume th at the image will be compressed into the bit-stream w ith a high bit-rate, i.e. 2 bpp. It should be noted th a t the length of the bit-stream is not changed by the inform ation embedding process. In other words, the embedding process will only affect the quality of the image by modifying the bit-stream content, i.e. certain m agnitude refinement passes, as described in Section 3.3.3. First of all, we would like justify the claim th at the lazy m ode coding does not give an inferior perform ance compared w ith the norm al mode in the case of high bit-rate image coding. We compressed “Lena,” “Baboon,” “B oat” and “Peppers” images from 0.5 bpp to 2 bpp w ith the normal mode and the lazy mode and then compared the PSN R values of the expanded images in each case. The results are shown in Table 3.2. We see th a t the difference of PSN R values is very lim ited. It should be noted th at the larger difference, such as Peppers in 2 bpp, comes from the fact th at the two compressed bit-stream s are 97 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 3.2: Performance comparison using the lazy and the normal modes (PSN R in dB) B it-rate(bpp) 0.50 0.75 1.00 1.25 1.50 1.75 2.00 Lena (normal) 37.06 38.92 40.31 41.55 42.75 43.94 45.12 Lena(lazy) 37.01 38.87 40.26 41.50 42.69 43.87 45.04 Boat (normal) 33.15 35.11 36.61 37.98 39.30 40.61 41.91 Boat (lazy) 33.14 35.06 36.55 37.91 39.21 40.52 41.81 Peppers (normal) 35.60 37.00 38.23 39.44 40.66 41.89 43.11 Peppers(lazy) 35.55 36.94 38.17 39.36 40.55 41.76 42.96 Baboon(norm al) 25.47 27.41 29.06 30.58 32.02 33.41 34.73 Baboon(lazy) 25.47 27.41 29.06 30.59 32.03 33.41 34.73 different in their lengths although we have tried to make them as close to the target bit- rate as possible. This result may suggest th at the lazy mode is applicable in many cases and the com pression/decom pression speed is thus trem endously improved w ithout much quality degradation. One of the m ain advantages of this steganography scheme over other JPEG -based schemes is its controllable distortion during the process of inform ation embedding. As we mentioned before, we can estim ate the additional distortion in MSE in the wavelet domain quite precisely. However, in order to simplify the structure w ithout increasing memory consumption, we presented two m ethods to roughly estim ate the distortion introduced by information embedding, i.e. the m ethod one estim ating the overall MSE by adding up errors in each bit-plane and the m ethod two utilizing the existing distortion value calcu lated in the tier-1 coding. We would like to verify their applicability by some experiments. Figure 3.13 shows the additional M SE m easured in “Lena,” “B oat,” “Peppers” and “Bab- bon.” The horizontal axis represents the num ber of m agnitude refinem ent passes th at are chosen for inform ation embedding. As more passes are modified for em bedding the secret data, the MSE increases accordingly. The dash lines are the actual MSE while the 98 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. E s t i m a t i o n 1 E s t i m a t i o n 2 A c t u a l M S E > * * * * * * ' + + E s t i m a t i o n 1 x * E s t i m a t i o n 2 • • A c t u a l M S E + * / . * • * * t - * * * * * * * * 35 40 45 P a ss num ber (b) E s t i m a t i o n 1 E s t i m a t i o n 2 A c t u a l M S E E s t i m a t i o n 1 E s t i m a t i o n 2 A c t u a l M S E 10 12 14 16 18 (c) (d) Figure 3.13: A dditional MSE estim ation of (a) Lena, (b) Boat, (c) Peppers and (d) Baboon. and “x” m ark the estim ated MSE values calculated by the m ethod one and m ethod two respectively. We can see th at the two m ethods perform pretty well at the beginning of the estim ation but the errors accum ulate as more passes are processed. An interesting phenomenon indicates th a t the m ethod one seems to underestim ate the distortion while the m ethod two tends to overestim ate it. Some calibrating steps m ay need to be taken to achieve a more accurate measurem ent. In steganographic applications, we are interested in the relationship between the ca pacity and the resulting composite image. We embedded the four images w ith the same 99 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. binary data and examined the MSE increase of the image due to the em bedding process associated w ith the payload of the hidden information. We can see from Fig. 3.14 th at the capacity varies in the four images even though they are compressed w ith the same ratio. This phenomenon should not be too surprising since the compression affects images in different ways. In our scheme, the capacity is eventually determ ined by the num ber of m agnitude refinement passes th a t are raw coded so the distribution of the data across the three passes will directly affect the payload. Fig. 3.14 also shows th at, with more information being em bedded, the MSE value grows as expected. However, they do not relate to each other linearly. We take “Lena” as an example. Em bedding the first 3000 bytes of the binary d ata only results in about 1 additional MSE of each pixel in average but embedding the next 1000 bytes quickly increases the MSE to 2. The last 100 bytes even cause the MSE to change by more th an 9. Therefore, the em bedding process should evaluate this curve to decide how much inform ation is appropriate to be embedded. It should be noted th a t the MSE increase may be roughly estim ated in conjunction w ith the embedding process so th at we can stop embedding at the point th at a m inim al acceptable quality is reached. Since the payload is quite large in our experiment, we may consider using an image with a smaller size as the intended binary d ata for inform ation embedding. We compressed the image, “F-16” (128 x 128), into a JPEG-2000 bit-stream and em bedded it into the four host images. The relationship between the PSN R of the composite image and th at of the hidden image along w ith embedding is shown in Fig. 3.15. The result dem onstrates the benefits of backward embedding, in which the less im portant refinement passes are used to carry the more im portant inform ation of the hidden image owing to the layered structure 100 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5 5 0 0 -+ Lena ** Boat ■ * Peppers -© Baboon 5 0 0 0 4 5 0 0 4 0 0 0 3 5 0 0 a . 2 5 0 0 2000 1 5 0 0 1000 5 0 0 A d d i t i o n a l M S E i n c r e a s e Figure 3.14: Capacity vs. additional M SE of the composite image. of JPEG-2000. Progressive transm ission of the hidden image can thus be achieved. At the beginning of the embedding, the composite image degrades little while the PSN R of the hidden image boosts quickly. At the latter part of the embedding, the large sacrifice of the composite image only helps to improve the finer detail of the hidden image so the increase of the PSN R value is limited. Under this scenario, the embedding process should proceed until both the composite and the hidden images have an acceptable quality. 3.4 Conclusion An integrated approach to image compression and w aterm arking was first presented. JPEG-2000 provides various features for different applications while the proposed wa term arking m ethod can be coupled w ith JPEG-2000 to provide a way to assert copyright inform ation for JPEG -2000 compressed images. The w aterm ark sequence is embedded 101 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 44 4 2 -+ Lena -* Boat -* Peppers ■ © Baboon 3 6 3 2 3 0 20 P S N R o f t h e h i d d e n i m a g e ( d B ) Figure 3.15: PSN R of the composite image vs. PSN R of the hidden image into significant wavelet coefficients so th a t it is robust against general signal processing attacks. The detection process does not resort to the help from the original image. Pro gressive w aterm ark detection is also supported so th at the w aterm ark retrieval process can be done faster. ROI waterm arking is achieved easily under the same framework. This integrated scheme can be viewed as a base-line w aterm arking scheme, and several variants can be derived based on this basic scheme. We then discussed the issue of steganography in JPEG-2000. Feasible inform ation hiding schemes were proposed to reliably hide a high volume of data into JPEG-2000 compressed images for covert communication. The “lazy m ode” coding option of JPEG - 2000 was employed, and its usage and functions were explained and justified. Several design issues were examined to develop a well-rounded steganographic scheme in this state-of-the-art image coding standard. Experim ental results were shown to dem onstrate the practicability of the proposed algorithm s and the decent performance. 102 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 4 Towards Affine-Invariant Digital Image Watermarking In this chapter, we aim at solving the challenging synchronization problem of waterm ark detection in digital images. The goal is to enable the embedded w aterm ark to survive attacks such as cropping, rotation, scaling, shearing, change of the aspect ratio or other generalized geometrical attacks, i.e. affine attacks. The basic idea of our proposed m ethod ology is to make use of structural grid signals for synchronized w aterm ark detection w ithout resorting to the original image. We will show th at structural grid signal embedding can be applied to a broad range of waterm arking schemes to achieve resilience to geometrical distortions. We will first examine some im portant characteristics of Discrete Fourier Transform (DFT) in Section 4.1, since D FT plays a vital role in m ost of the schemes th at are resistant to geometrical attacks. Next, we review some existing algorithm s dealing w ith geometrical transform ations in Section 4.2. Then, we describe our proposed solutions. We will present the idea of embedding and detection of structural grid signals to tackle the synchroniza tion problem in Section 4.3. W ith different types of underlying waterm arks being used, 103 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. two affine-invariant waterm arking schemes are illustrated, i.e. a spatial-frequency com posite waterm arking scheme in Section 4.4 and a block-based D C T waterm arking scheme in 4.5. Experim ental results of both schemes will be shown to dem onstrate their good performance. Some comments on grid em bedding/detection are provided in Section 4.6.1. Concluding rem arks are given in Section 4.7. 4.1 Discrete Fourier Transform of Images D FT is a powerful tool in a wide variety of signal processing applications, including filtering and spectral analysis. Efficient algorithm s were developed for its num erical com putation. In image waterm arking, D FT is also frequently used since some characteristics of D FT can be exploited to make the em bedded w aterm ark resist geometrical modifications. Let us review some properties of D FT on 2-dimensional image data. Let a given image be a function f(x ,y ) defined on an integer valued C artesian grid 0 < x < 0 < y < IV 2. Its D F T F(m, n) is defined as T ^ / — j2-7rm x 1 ~~j2T tny F(m, n) = ] T Y f ( x >y)e Nl N2 > (4-1) x= 0 v —0 and the inverse transform is defined as 1 1 j 2 'K m x , j2 ir n y f ( x ,y) = Y J 2 F(m, n)e Ni N* . (4.2) N i N 2 — — z m = 0 n = 0 104 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In some cases, it is convenient to represent the D FT coefficient by its phase 9 and m agnitude A, i.e., 9(m ,n ) = a rc tan (/m [F (m , n)}/Re[F(m, n)]) (4.3) A(m, n) = 1 / Re[F(m , n)]2 + Im[F(m, n )]2 where Im[-} and R e [-] are the im aginary and real parts of the Fourier coefficient. It should be noted that the m agnitude of the Fourier transform is even sym m etric and the phase part of the Fourier transform is odd sym m etric when the transform is applied on real numbers. We then examine effects of translation, rotation and scaling of an image on its D FT coefficients. 1. Translation Suppose th a t we shift the image f(x, y) to the point (x*,y*), so th at it becomes f(x — x*,y — y*). The D FT of f( x — x*,y — y*), F t(m, n), can be expressed as x= 0 y —0 We then rew rite it as F t(m.,n) = l J 2 Y , f ( x - x * , y - y * ) e N\ — 1 N 2 — 1 £ —0 y —0 m x (4.5) 105 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. If the translation is applied circularly in both the horizontal and vertical directions, or a periodic repetition of the image in all directions is assumed, the D FT of the shifted image then becomes F*(m, n ) = F(m, n)e~27rj(^ T + ^ ). (4.6) Thus, we can see th a t the difference of the D FT of the shifted image and th at of the [ ny*) original image is the term e Ni , which is a shift in phase. In other words, the circular shift of the image will only affect the D F T phase and leave the D FT m agnitude intact. It should be noted th a t the cropping operation in image editing is not equivalent to a circular shifting process. We can view cropping as a com bination of circular translation plus noise th a t comes from the deletion of some d a ta samples. 2. R otation We rew rite here the D F T w ith N\ = IV 2 = N, N - 1 N - 1 F(m, n) = £ E f ( x , y ) e ^ mx+ny\ (4.7) x~0 y— 0 We can then introduce polar coordinates on (x, y) and (m, n) as follows: x = r cos 9, y = rs in # , m = tocoscf) and n = tosiruf). Since m x+ ny = rto(cos 9 cos </)+sin9 sin (ft) = ru) cos (6 — 0), we can express the D FT as F{^4>) ^ E E / M ) 6^ ™ 008^ ' (4-8) r e 106 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Now, we rotate /( r , 0 ) by 9 * . It is apparent that E,E + (4.9) F (w ,0 + 0*)- Therefore, rotating the 2-dimensional d ata through an angle 9* causes the Fourier representation to be rotated through the same angle, i.e., / ( x cos 0*—y sin 0*, resin9*+ ycos9*) « -» F (m cos 9* — n sin9*, m sin9* +ncos9*). If we scale the 2-dimensional d ata by a in one axis and (3 in the other axis, we obtain We can see from the above th a t scaling an axis in the spatial dom ain causes a reciprocal scaling in the frequency domain. W ith these properties, D FT has various benefits in w aterm arking to resist generalized geometrical attacks as described in the following sections. The other im portant application of D FT in waterm arking is its efficient calculation of the circular convolution. First of all, let m be an integer. The integer n = mod(m, N) is 3. Scaling ( ■ m x + n y ) (4.10) 107 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the integer for which 0 < n < N and n — m is a m ultiple of N. Let us consider the two one dimensional data, d\ and d,2, w ith finite length N. The circular convolution Conv^l4^ can be expressed as The autocorrelation function can be viewed as a special case of the circular convolution when the d ata sequence d is cicularly convolved w ith the inverse order of itself. T hat is, the autocorrelation function A u{m) can be expressed as It should be noted th a t autocorretion function Au(m) has a sym m etric structure, i.e., Auijn) = A u(—rn) because of m odulus algebra. Calculating the circular convolution in a straightforw ard m anner is com putationally expensive when the size of d ata samples grows. However, D FT can be used to calculate it efficiently [65]. The convolution of two sequences d\ and cfo is actually equivalent to N - 1 Conv{dl4, 2){m) = E d\ (n) x d2 (mod(m — n, N)). n = 0 (4.11) N - l A u(m) = '^2 d(n) x d(mod(n + m, N))). n = 0 (4.12) Conv{dud2)= T 1{F(d1) ® J r(d2)} (4.13) where © denotes the point to point m ultiplication, JF(-) the Fourier Transform and T *(■) the inverse Fourier Transform. 1 0 8 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.2 Previous Work on the Digital W atermark Resilient to Geom etrical Attacks There are basically three m ain approaches to make the w aterm ark resilient to geometrical attacks. The first approach is to embed and detect the w aterm ark in a rotation, scaling and translation (RST) invariant domain. The other two m ethods attem pt to determine and then invert the possible geometrical modification. One m ethod embeds a m atching tem plate for w aterm ark recovery while the other m ethod, called the self-reference scheme, uses the autocorrelation function to detect the geometrical reference. 4.2.1 W aterm arking in R ST Invariant D om ain O ’R uanaidh et al. [54] proposed a waterm arking m ethod based on the Fourier-Mellin transform . They suggested th at the w aterm ark should be em bedded to be invariant to common image transform ations including rotation, scaling and translation (RST) so th at the w aterm arking process is performed in an R ST invariant dom ain. The diagram of the RST invariant w aterm arking scheme is shown in Fig. 4.1. In the first step, the Fourier transform of the image is com puted. Since shifting in the spatial dom ain results in a phase shift in the frequency dom ain, the image representation will be translation-invariant if we only keep the am plitude of Fourier coefficients. In the second step, the log-polar m apping (LPM) is applied so th at the am plitude of the Fourier transform is changed from the Cartesian coordinate to the log-polar coordinate, i.e., the 109 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Am plitude Am plitude P hase V V Am plitude P hase LPM DFT ILPM DFT IDFT H I A m plitude IDFT RST Invariant Image Figure 4.1: The diagram of a waterm arking scheme built on the RST invariant domain. coordinates w ith the logarithm ic radius and the angle axes. Consider a point (x, y) € R, where x = cos 9, y = sin (4.14) and where is the radius and 9 is the angle in the range (— 7r,7r]. Then, LPM m aps a point from (x,y) to (p,9). We see th a t every point (x,y) corresponds to a unique point (p,9). Thus, the scaling effect can be converted to the translational effect, i.e., (,px,py) ** (p + ln{p),9). (4.15) Similarly, the rotational effect can be converted to the translational effect, i.e., (xcos(9 + (5 ) — y sin(# + 5),xsin(0 + 5) + ycos(9 + 8)) -f-l (p, 9 + (5). (4.16) 110 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. To summarize, LPM will change both scaling and rotation into horizontal and vertical shifts in the new coordinate. By com puting the D FT of the log-polar m ap and keeping the D FT am plitude only, we are led to a rotation and scaling invariant representation. By combining the above two steps together, we can achieve invariance in rotation, scale and translation transform ations. Taking the Fourier transform of a log-polar m ap is actually equivalent to com puting the Fourier-M ellin transform . The w aterm ark is then em bedded into the am plitude of the Fourier-M ellin transform by using the spread spectrum m odulation. The waterm arked image is constructed by applying inverse D FT and the inverse log-polar m apping (ILPM) twice. The phases in b oth Fourier transform s are not modified but simply integrated into the waterm arked am plitudes for the inverse Fourier transform s. A sim ilar m ethod th a t replaces the log-polar m apping w ith the log-log m ap ping (LLM) can be used to resist the change of the aspect ratio, where the horizontal and vertical scaling ratios may be different. Although Fourier-Mellin based waterm arking is a very elegant approach, it has several lim itations. First of all, the w aterm ark will not survive under the com bination of rotation and the change of the aspect ratio since neither LPM nor LLM will generate an invariant dom ain for w aterm ark detection. Second, since all procedures m ust be done in the discrete domain, both LPM and ILPM will cause a loss of image quality from sampling. In other words, there has been inform ation loss in converting an image to the log-polar coordi nate and then transferring it back to the C artesian coordinate w ithout any waterm arking procedure. Consider the Lena image shown in Fig. 4.2(a) and its log-polar m apping in the spatial dom ain shown in Fig. 4.2(b), we see th a t more sampling points in the region far from 111 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the center are needed to limit the loss of inform ation during the m apping process. This will increase the internal memory usage for interm ediate image buffering. Besides, some form of interpolation is always needed when we change the coordinates. The interpolation scheme will definitely affect the perform ance of waterm ark detection. The more delicate the interpolation m ethod is, the b etter precision we will achieve. Nevertheless, it demands a higher com putational load. Furtherm ore, interpolation only performs well if neighboring samples are of a similar value. The suitability of interpolation in the transform dom ain is questionable. Lin et al. [49] proposed another w aterm arking scheme. W ithout pursuing the “strong invariance,” they embedded the w aterm ark in the log-polar m apping of the m agnitude of the Fourier transform . The detection process involves a comparison of the w aterm ark w ith all cyclic rotations of the extracted waterm ark. Instead of directly applying spread- spectrum waterm arking, which views the host image as noises, they introduced a mixing function such th at the output of the m ixing function is perceptually sim ilar to the image signal and has a high correlation w ith the waterm ark. The mixing function adopted in this algorithm is a weighted sum of the image and waterm ark signals. We believe th a t the use of the mixing function is the key to its success of resisting the common rotation and scaling modifications. The mixing function should reduce the inaccuracy problem resulting from the coordinate change. A similar idea based on the Radon transform was proposed by Wu et al. [96]. 112 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (a) (b) Figure 4.2: (a) The Lena image and (b) the transform ed Lena image w ith the log-polar mapping. 4.2.2 E m bedding T em plate for R egistration The second approach is to embed an auxiliary tem plate in the frequency or the spatial domain of the image. The em bedded inform ation is thus composed of two parts: the wa term ark signal th a t conveys the necessary payload and a tem plate, which usually contains no information but is used to recover the image back to its original shape. Once the tem plate is successfully determ ined and the image is recovered by the reverse transform ation, the w aterm ark can be retrieved afterwards. Fleet et al. [18] proposed to embed sinusoidal signals in the spatial dom ain of the image. The sinusoidal signals can provide a coordinate fram e for registration. These si nusoids can be detected by examining the frequency dom ain because the frequency of the selected sinusoids rarely appears in natural images. Pereira et al. [57] proposed the other m ethod to resist geometrical attacks by embedding a tem plate in the frequency domain. Both the tem plate and w aterm ark signal are cast in the Fourier transform coefficients. The tem plate consists of additionally em bedded peaks in the Fourier spectrum . T hat is, the waterm ark em bedder casts extra peaks in some locations of the Fourier spectrum . The 113 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. locations of the peaks are predefined or selected by a secret key. The strength of the tem plate is determ ined adaptively by using the local statistics of the Fourier spectrum . These embedded local peaks will reflect the geometrical modifications applied to the images be cause of the rotation and scaling properties as shown in Section 4.1. The w aterm ark is embedded by a differential coding scheme. Two points th at are 90 degrees apart are m od ified such th a t the difference is equal to the desired message bit. In w aterm ark detection, all local m axima in the m agnitude of the Fourier transform are extracted by using a small window. The geometrical transform ation is then estim ated by a point m atching algorithm between extracted peaks and the reference tem plate points. Since the extracted peaks may not be the one inserted by the w aterm ark embedder, a m ethod to prune the search space is proposed to reduce the com putational cost. Some iterations may be needed to obtain an accurate affine m atrix, which determ ines the possible geometrical attacks applied on the image. The m atching algorithm may also be simplified w ith the help of LPM and LLM [56]. After the affine transform is found and the inverse affine transform ation is applied, the w aterm ark can be detected in the Fourier domain. 4.2.3 Self-R eference Schem e by A u tocorrelation The third solution is to embed a reference pattern several tim es into the image such th at generalized geometrical transform ation can be reflected by applying autocorrelation on the investigated image. K u tter [40] proposed a m ethod to embed the same reference pattern at shifted locations. Four patterns consisting of pseudo-random num bers are embedded in the image. The four patterns are not totally different b ut linearly shifted copies of each other. The initial pattern is a two-dimensional random num ber array. The second 114 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ^0 Initial pattern Horizontally shifted copy of the initial pattern • Vertically shifted copy of the initial pattern v_y) Horizontally and vertically shifted copy of the initial pattern Figure 4.3: M ultiple embedding of the same registration pattern, where black circles are the initial pattern, circles w ith lines are the horizontally shifted p attern , circles w ith dots are the vertically shifted p attern and gray circles are the horizontally and vertically shifted pattern. pattern is then formed by horizontally shifting the first p attern by 6X columns. Similarly, the third pattern is formed by vertically shifting the first p a tte rn by 5y. Finally, the fourth pattern is formed by shifting the first pattern by Sx horizontally and 5y vertically. The four patterns are embedded as shown in Fig. 4.3. The first p a tte rn is embedded at locations with odd rows and odd columns. The second p attern is em bedded in locations with odd rows and even columns. The third pattern is em bedded in locations w ith even rows and odd columns. The fourth p attern is embedded in locations w ith even rows and even columns. Thus, the same random num ber is embedded at four different locations, i.e., (x,y), {x + 2 x Sx + 1 ,y), (x,y + 2 x 8y + 1) and (x + 2 x Sx + 1, y + 2 x Sy + 1). 115 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 4.4: Autocorrelation of the watermarked image. In the w aterm ark recovery process, an estim ate of the em bedded p attern based on a prediction filter is applied. Then, the two-dimensional autocorrelation is computed. As shown in Fig. 4.4, nine peaks can be detected in the autocorrelation function. The center peak represents the energy of the filtered image while the other eight peaks, which are symmetric around the center owing to the symmetric structure of autocorrelation, are generated by the four em bedded patterns. The locations of these peaks can be used to compute the geometrical transform ation for its inversion. Experim ental results show th at the algorithm can resist generalized geometrical transform ations including the change of the aspect ratio, rotation and shearing. However, the interleaved embedding seems vul nerable to the block-based coding scheme such as JPE G compression since the waterm ark is viewed as the high frequency noise. If the image is slightly scaled (or rotated) and then compressed w ith JPE G , the autocorrelation procedure may fail to detect peaks for registration. Besides, the waterm ark th at carries the payload in K u tter’s scheme is em bedded in the spatial domain. It is not easy to perform synchronization of the spatial 116 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. dom ain waterm ark when the image is cropped since the autocorrelation cannot detect the translation in this case. 4.3 Structural Grid Signals for Synchronized Watermark D etection Unlike filtering or lossy compression, geometrical attacks do not remove the embedded waterm ark but introduce the synchronization problem for the w aterm ark detector. To resist geometrical attacks, the detector has to find a way to locate the right position where the w aterm ark is em bedded so th at subsequent w aterm ark detection can be successful. It is not an easy task since the hidden inform ation is a weak signal b ut is interfered with a strong noise, i.e. the host image. The requirem ent of blind detection makes this task even more challenging. As discussed in Section 4.2, we believe th a t a better solution is to embed an extra signal explicitly for synchronized w aterm ark detection. A clear advantage of using this extra signal is to avoid ambiguous detection so as to decrease the complexity of waterm ark detection. If a certain geometrical attack is detected undoubtedly by extracting this extra signal, the detector can easily invert the attacked image back to its original shape for waterm ark detection without the need of guessing and detecting repeatedly. Besides, spread-spectrum based w aterm ark detection usually requires very precise synchronization. The auxiliary signal for synchronization will increase accuracy of geometrical recovering so th at the hidden waterm ark can be detected much easier. We have to first determ ine where this auxiliary signal for synchronization should be em bedded/ detected and what properties it should have. Since geometrical attacks are 117 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. applied to the pixel dom ain directly, a straightforw ard b ut better idea is to embed and detect this signal in the spatial-dom ain, i.e. image pixels. Resorting to image transform s is usually prohibited since image transform s require a collection of image d a ta while the structure of these d ata may have already been modified by geometrical attacks. By com paring the two Lena images before and after certain geometrical attacks as shown in Figure 4.5 (a) and (b), we can tell th at the image has undergone certain geo m etrical transform ations. However, given the fact th at the waterm ark detector does not have the original image for reference, it is difficult for the detector to be aware of this geometrical change. The degradation caused by the geometrical attacks is lim ited espe cially when we only view the attacked image. Let us impose a grid on the Lena image as shown in Figure 4.5 (c) and apply the same geometrical modifications to it. From the skewed grid as shown in Figure 4.5 (d), we have some clues about the procedures th at have been operated on this image. This is basically w hat m otivates the idea of grid signals em bedding/detection to resist geometrical attacks. A lthough we cannot impose a visible grid on the image as shown in Figure 4.5 (c), we can achieve this in a rather implicit m anner. We construct the grid by embedding structural signals into the image. The structural signals are formed by horizontally and vertically repeating the same pseudo random pattern, or the grid pattern. The energy of the grid signals is small compared to the host image so as not to introduce any unpleasant artifacts. However, the embedding process of grid signals should take into account the host image features and hum an visual effects to increase the robustness against waterm ark attacks. Therefore, we may also view the grid signal as a spatial-dom ain waterm ark. 118 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (C ) (d) Figure 4.5: (a) The original Lena image, (b) the geometrically attacked Lena image, (c) the Lena image imposed w ith a grid and (d) the geometrically attacked Lena image with a grid imposed. 4.3.1 C onstruction o f th e G rid P a ttern The pseudo-random pattern, or the grid pattern, used to tile the grid structure should have a good autocorrelation property, i.e., the two dimensional autocorrelation function of the pattern should generate a single delta pulse. To achieve this, we first take a look at the au tocorrelation function of one dimensional signal w ith finite length N. The autocorrelation function Au(m) of signal d(n) can be expressed as N - 1 A u{m) = ^ d(n) x d(mod(n + m ,N ))). (4.17) n = 0 119 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. As mentioned in Section 4.1, we can calculate (4.17) efficiently by resorting to DFT: A u = ^ 1{ ^ (d ) © ^ * (d ) } , (4.18) where © denotes the point to point m ultiplication, !F(-) the Fourier Transform and lc'_1(-) the inverse Fourier Transform and * the complex conjugate. The autocorrelation function and signal are expressed in vector form, A u and d, respectively. Now we require th a t a single delta pulse be shown in the autocorrelation function, i.e., A u = [N, 0, 0 ,..., 0], (4.19) which corresponds to iF(d) © JF*(d) = [ 1 , 1 , 1 , 1], (4.20) In other words, the m agnitudes of the Fourier Transform of signal d are all equal to 1. Therefore, to generate an M x M pseudo-random pattern w ith a good autocorrelation property, we can generate a random p attern and calculate its two dimensional DFT. After we force the m agnitudes of the D FT coefficients equal to M to make the pattern have a unit variance and leave the phases unchanged, the grid p attern is formed by taking the inverse DFT. 4.3.2 Grid Signal E m bedding As m entioned earlier, the operations of grid signals will be performed in the spatial (or the pixel) domain. Since image features vary in different parts of the image, the embedding 120 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. of grid signals should take image characteristics into account so th at the embedded grid signals are robust against attacks and fulfill the visual constraint as well. In other words, we weigh the same im portance on the grid signals for synchronization and the watermark carrying the necessary hidden information. There are two reasons for this design. First, the payload of digital w aterm arking for copyright protection is usually not high so there exists some room for extra signal embedding. Second, geometrical modifications are very common to image users. It deserves special effort to embed the auxiliary signal for synchronization so th at a practical w aterm arking scheme can be attained. It should be noted th at the same size M will be used in all of the waterm arked images so the value m ust be set such th at all suitably large images can be protected by the watermark. Since the size of the image may not be equal to a m ultiple of M , boundaries of the image may be simply ignored for grid signal embedding. A rule of thum b is to set M as a m ultiple of 8, which will help achieve robustness against JP E G compression since the general block size of DOT adopted in JP E G compression is equal to 8. 4.3.3 Grid Signal D etectio n We now consider how to detect the affine attack using the em bedded structural grid signal. The affine attack can be defined by a 2 x 2 m atrix plus translation. The whole image will be transform ed by the same m atrix w ith certain shift. In other words, for an image pixel 121 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. located at (x, y), its new position, (x ,y ), after the affine transform ation can be determ ined using the six param eters in the following model: “ ■ / X ^ 0 0 « 0 1 X X s ~ X + 1 y GlO a n y Vs D etection of the grid signal is based on autocorrelation, e.g. K u tter’s scheme [40]. In our system, the structural grid signals are responsible for the determ ination of the four param eters in the affine m atrix, i.e. ago, aoi, oio and a n . Special care has to be taken to the two translation param eters, x s and ys so th at the synchronization problem can be fully solved. Under the assum ption th at the grid signal em bedded in a pixel can be appropriately estim ated or extracted, we can get the autocorrelation function as shown in Fig. 4.6(a). We represent the autocorrelation function by A u(x, y) where the central peak in Fig. 4.6(a) is denoted by Au(0 ,0). If the waterm arked image is not geometrically modified, we can find peaks in locations (i x M ,j x M ), where i and j are integers. The distance between peaks corresponds to the block size M of the grid structure. In Figs. 4.6(b) and (c), we show the autocorrelation function after the image is rotated and scaled. It is clear that detected peaks can reflect the geometrical transform ation applied to the image. In the following two sections, we will make use of the grid em bedding/detection tech nique to develop complete affine-invariant waterm arking schemes. 122 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 4.6: T he autocorrelation function of the estim ated grid structure in (a) the wa term arked image, (b) the rotated w aterm arked image and (c) the scaled waterm arked image. 4.4 Spatial-Frequency Com posite W atermarking Scheme In this section, we propose the spatial-frequency composite scheme to resist geometrical transform ations. A frequency-domain w aterm ark, or more specifically, Fourier-domain waterm ark is first embedded into the host image to convey necessary information. A spatial-dom ain waterm ark, i.e., the grid signal, is em bedded in the frequency-domain w aterm arked image to achieve image registration. M agnitudes of Fourier transform are se lected for frequency-dom ain waterm ark embedding due to their insensitivity to translation of the image. The detection is done in the reverse order. The spatially embedded structure is recognized by looking at the autocorrelation function of the image, which would contain the corresponding peaks as we m entioned before. These peaks are analyzed to identify any affine distortions. The inverse affine transform is then used to adjust the attacked image back to its original orientation and scale. The frequency-domain waterm ark can be detected easily to unveil the information th at the waterm arked image carries. 123 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. We will present the system in the order of embedding and detection procedures, i.e., frequency-domain waterm ark embedding, spatial-dom ain w aterm ark embedding, spatial- dom ain w aterm ark detection and frequency-dom ain w aterm ark detection. The block dia gram of the composite waterm arking scheme is shown in Fig. 4.7. Watermarked image to be published Original image Result Investigated image Spatial- domain watermark detection Frequency- domain watermark embedding Frequency- domain watermark detection Geometrical correction Spatial- domain watermark embedding Attacks: Geometrical transformation Filtering Compression Figure 4.7: The block diagram of the spatial-frequency composite waterm arking scheme. 4.4.1 Frequency-D om ain W aterm ark E m bedding The frequency-domain waterm ark is used to carry the necessary payload. First of all, an N by N D FT is applied to the original image, where N is chosen to be an integer power of 2, i.e., N = 2k. One advantage of this choice is to make the D FT calculation more efficient. Besides, by setting this agreement between the w aterm ark em bedder and the w aterm ark detector, the requirem ent of the presence of the original image can be avoided in the detection phase. However, the w idth w and the height h of the image may not be the integer power of two. We can form a tem porary image of size 2k by 2k, where 2k~l < rnax(w, h) < = 2k, by padding the image w ith zero. An example is shown in Fig. 124 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.8. If the size of the original image is 352 x 288, a tem porary image w ith size 512 x 512 is generated by zero-padding the p art surrounding the original image. After the w aterm ark is embedded, we simply crop the added portion to keep the original size of the image. We do not expect th at the image will be modified to a very large degree since a substantial am ount of visual degradation will be introduced and the commercial value of the image can be significantly reduced by doing so. (a) (b) Figure 4.8: (a) The original image of size 352 x 288 and (b) the padded image of size 512 x 512 for w aterm ark embedding. The w aterm ark is em bedded in D F T coefficients of the middle frequency range. Low frequency coefficients are avoided since most of the energy of the image resides in the low frequency range. Even a slight m odification in low frequency coefficients will result in serious perceptual distortion. The high frequency range is also ignored for waterm arking since lossy image coders tend to remove high frequency coefficients, which are usually not significant to the hum an visual system. Thus, after applying D F T and shifting the spectrum to make the DC coefficient appear at the center, we em bed the w aterm ark in coefficients th at are located w ithin a circular strip w ith radius betw een r h and j-j. The other issue worth our attention is the cross artifact in the im age’s energy spectrum . Let us take a look at the “Lena” image and its D F T m agnitude as shown in Figs. 4.9 (a) 125 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. and (b), respectively. We see th a t there exist strong energy coefficients along the horizontal and vertical axes. No m atter how the image is cropped or scaled, the cross artifact will still appear near the axes since the cross artifact is not related w ith the frequency content of the underlying image d a ta but the discontinuity of pixel values on the image boundaries. The energy of discontinuous boundaries will be more dom inant if we pad the image with zero when the image size is not equal to the integer power of two. To make the embedded waterm ark reliably detected by the w aterm ark detector, we should avoid embedding the waterm ark in the region th at is close to the horizontal and vertical axes. We only embed the waterm ark in the angular range of (<5°, 90° — 5°), (90° + 5°, 180° — 5°), (— < 5 °, — 90° + ^°) and (— 90° — < 5 °, — 180° + 6°) to avoid the region w ith the cross artifact. The region for waterm ark embedding (and detection) in the Fourier spectrum can thus be shown in Fig. 4.9(c). The four sectors are chosen and the other black portion is om itted. It should be noted th a t m agnitudes of D FT coefficients are even sym m etric, i.e., they are symm etric around the center since the image data take real values. W hen embedding the waterm ark in the Fourier domain, we should keep this sym m etric property. Thus, the watermark will first be em bedded in coefficients in the upper two sectors. The resulting watermarked coefficients will be m apped to coefficients in the lower two sectors. After determ ining the region for waterm arking, we sta rt to em bed the w aterm ark in each of the coefficients in the region. We only modify m agnitudes of Fourier coefficients and leave the phase unchanged. The m agnitude of the w aterm arked coefficient, \C'(x,y)\ is formed by IC (x,y)\ = \C(x,y)\ x (1 + a f x wf (x,y)), (4.22) 126 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (a) (b) (c) Figure 4.9: (a) The Lena image, (b) m agnitudes of the D FT coefficients of the Lena image and (c) the region chosen for waterm arking. where C' (x,y) is the original coefficient, and Wf(x,y ) is the w aterm ark symbol, which can be of real or binary value. Param eter a / is used to scale the total waterm ark energy. We also use the m agnitude of the coefficient to weigh the em bedded waterm ark to avoid adding too much w aterm ark energy on a coefficient w ith a smaller value. In order to make the frequency-domain waterm ark less sensitive to the error of the rotation and scaling in the recovery process, the same waterm ark symbol may be em bedded in a larger region instead of a single point. We embed the w aterm ark by exam ining the position of the coefficient, th a t is, the angle and the radius. Consequently, the w aterm ark is embedded by using \C* (r, 0)\ = \C(r, 6)\ x (1 + a f x wf (r, 6)), (4.23) where we calculate radius r and angle 6 and round them to nearest integers. The corre sponding w aterm ark symbol is then embedded on the m agnitude. Finally, the waterm arked coefficient is formed by combining the w aterm arked m agnitude w ith the original phase. The other point we would like to emphasize is th at the w aterm ark value is horizontally (or vertically) sym m etric, i.e. Wf(x, y) = Wf(x, N — y). The m ajor concern here is to 127 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. make the w aterm ark scheme resist the m irror attack, i.e., flipping the image horizontally around the vertical axis. In many cases, such as landscape images, m irroring an image is apparently acceptable to most of image users. M any existing waterm arking schemes do not take this issue into consideration so th a t the complexity of the w aterm ark detector is increased for the need to detect the w aterm ark in the flipped image once more. In fact, flipping an image horizontally causes the m agnitude of the Fourier spectrum to flip as well. Therefore, embedding the same w aterm ark value in the two positions horizontally symmetric to each other will allow the waterm arking scheme to be resilient to m irror m anipulation. One drawback of embedding the w aterm ark in the Fourier dom ain is the ignorance of local statistics in the host image [49]. A lthough the PSN R value of the resulting D F T waterm arked image w ith respect to the original image is quite large (usually larger th an 45dB), the em bedded waterm ark could cause visible artifacts in images with various char acteristics. Let us consider the image “Statue of Liberty” as shown in Fig. 4.10(a). We enhance the energy of the waterm ark to exaggerate the artifact introduced by the frequency-domain w aterm ark embedding for explanation. From the waterm arked image shown in Fig. 4.10(b), we see th at the w aterm ark is hidden well in the statue, b u t it becomes quite visible in the sky. Therefore, a postprocessing procedure in the spatial domain is required to make the embedded w aterm ark adapt to the local characteristics of the image. The spatial postprocessing procedure should take hum an visual model into account to decide the maximum am ount th at a pixel can be changed while invisible to the hum an being. M ethods such as clam ped high-pass filter [34] or G irod’s spatial masking model [25] m aybe suitable for preventing the embedded w aterm ark from being perceived. 128 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The evaluation of the two m ethods has been reported in [52]. For both simplicity and generality, we adopt the clam ped high-pass filter as the tolerable error level, which defines the m aximum offset a pixel can be made. The effect is th at the energy of the embedded w aterm ark can be constrained in visually sensitive regions. Figure 4.10: The embedding of a stronger w aterm ark in an image w ith varying charac teristics at different locations: (a) the original “Statue of Liberty” (b) the waterm arked “Statue of Liberty” 4.4.2 S p atial-D om ain W aterm ark E m bedding The pseudo random pattern is repeatedly em bedded in the image in a tiled grid as shown in Fig. 4.11 to form the spatial-dom ain waterm ark, i.e., the grid signal. Let the pixel of the image output by the D F T waterm arking process described in Section 4.4.1 be denoted by I(x,y), where (x ,y ) is the position, 0 < x < W , 0 < y < H, and W , H are the w idth and height of the image, respectively. The waterm arked pixel I (i,j) is formed by I (x,y) = I{x,y) + a s x A(x,y) x ws(mod(x, M),mod(y, M)), (4.24) 129 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Frequency- Domain Watermarked Image Watermark ^ r embedding Composite X )--------------- ► Watermarked iv Image M Tiling j-M Spatial-domain watermark pattern Figure 4.11: Spatial-dom ain waterm ark embedding. where a s is the global weighting factor and ws is the corresponding spatial-dom ain water m ark symbol in the pseudo-random pattern. The local weighting factor, A(x,y), is again determ ined by the clam ped high-pass filter. T h at is, A(x, y) is obtained by filtering the D FT-w aterm arked image w ith a Laplacian high-pass filter Fc = - 1 - 1 - 1 1 - 1 8 - 1 X 8 - 1 - 1 - 1 (4.25) taking the absolute value and lim iting the offset w ithin a predefined m axim um value, Amax. Filtering w ith the clam ped high-pass filter can be viewed as an activity m easurem ent so th at stronger w aterm ark energy will be embedded in the area w ith more activity while less waterm ark energy is added in the sm ooth region to guarantee the invisibility constraint of the watermark. 130 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.4.3 Spatial-D om ain W aterm ark D etectio n As we m entioned before, the affine attack can be defined by an affine m atrix plus transla tion. The structural grid signals will be responsible for determ ining the four param eters in the affine m atrix while certain strategies have to be taken to deal w ith the translation. Due to the translation-invariant property of the Fourier m agnitude, the two shift param eters will not affect the w aterm ark payload as we choose to embed the w aterm ark in Fourier magnitudes. The spatial-dom ain w aterm ark is detected based on autocorrelation. W ith the help from FFT , we can calculate the autocorrelation function very efficiently. Before calculating the autocorrelation function of the investigated image to extract em bedded peaks for geometrical correction, we perform the prefiltering on the image. T he necessity of the prefiltering process can be explained below. First, the correlation detector is known to be optim al in an additive white Gaussian channel, when the receiver has full knowledge of the m odulation function used to transm it the messages. However, the channel noise in waterm arking, i.e., the image content, does not have this property since the m ean value is clearly not close to zero and the distribution of pixel values varies in different images. Besides, the strong correlation between image pixels will also underm ine the performance of the correlation detector. Therefore, the prefiltering process can decorrelate the image pixels on one hand and change the noise distribution w ithout affecting waterm ark detection on the other hand. The filtered image will have the m ean close to zero and a decreased variance. The probability distribution of the filtered image can possibly be modeled as the generalized Guassian noise so th a t the performance of w aterm ark detection is boosted [16]. 131 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Second, we can view the prefiltering as a w aterm ark prediction filter. The Fourier spectrum of the spatial dom ain w aterm arked image is shown in Fig. 4.12. The small waterm ark peaks appear in the figure across the low to high frequency bands. Each small peak is separated from each other by a fixed distance as a result from the repeated embedding of the spatial-dom ain waterm ark. If some of these w aterm ark peaks can be successfully extracted or estim ated, the spatial dom ain w aterm ark can be detected. Figure 4.12: D FT m agnitudes of the composite w aterm arked image. To further understand the usage of estim ation filters, we exam ine two types of filters applicable in spatial-dom ain w aterm ark detection since they can make the prefiltering process com putationally inexpensive. The two examined filters are 0 - 1 0 - 1 4 - 1 0 - 1 0 (4.26) 132 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. and Ho = 1 - 2 1 - 2 4 - 2 1 - 2 1 (4.27) The Fourier spectrum of the two filtered images by using filters Hi and H 2 are shown in Fig. 4.13 (a) and (b), respectively. We see th at the image filtered by H\ has more energy in lower to m iddle frequency components while the image filtered by H? has stronger energy in high frequency coefficients. In term s of w aterm ark prediction, it would be b etter to choose H 2 because w aterm ark can be successfully extracted by filtering out the image signal. However, lossy image compression tends to remove the insignificant high frequency coefficients. Most of the waterm arks existing in high frequency coefficients will be removed and the filter may only preserve noise introduced by JP E G compression. Therefore, H 2 may not work well when the waterm arked image is geometrically modified and compressed with lossy image codecs. If we would like to accom m odate image compression, i l l seems better in detecting the w aterm ark because it can predict the w aterm ark in the middle frequency coefficients, which will also be preserved in lossy compression. (a) (b) Figure 4.13: The Fourier spectrum of an image filtered by (a) H\ and (b) i? 2- 133 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. However, there exists another problem th a t may limit the usage of H\. Let us consider the image shown in Fig 4.14(a). Black boundaries are added to sides of the image. Images w ith boundaries are quite common, such as one frame of the video in letter-box format. If we apply the cross shape filter H\ to filter the image and calculate its autocorrelation, there will be three horizontal and three vertical ridges as shown in Fig 4.14(b). These ridges will dom inate and add difficulty in extracting autocorrelation peaks especially when the image is compressed. These ridges are generated owing to the structure of the filter. Note th at the sum of the raw or column cofficients of H\ is a positive num ber. After prefiltering the image, there will be a positive line along the black boundary. These positive numbers will generate ridges when we calculate the two-dimensional autocorrelation function. Although the problem could be solved by detecting ridges and selecting other peaks, we actually prevent this situation by changing the filter structure. For example, ridges will disappear as shown in Fig 4.14(c) when we apply H 2 because the sum of each raw and column of this Laplacian operator is zero. W ithout those ridges, detection of the peaks will become easier and efficient. Figure 4.14: (a) The Lena image w ith black boundaries, (b) the autocorrelation function of the H% filtered image and (c) the autocorrelation function of the H 2 filtered image 134 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Based on the above consideration, a possible better filter w ith the sum of its raw and column coefficients equal to zero is shown as follows. 0 1 - 2 1 0 1 2 - 6 2 1 - 2 - 6 16 - 6 - 2 1 2 - 6 2 1 0 1 - 2 1 0 (4.28) The Hs filter tends to keep the balance between compression and the accuracy of w ater m ark prediction so the waterm ark can be detected easier after the image is geometrically modified and then compressed w ith m ild JP E G compression. A more sophisticated way to detect the spatial-dom ain w aterm ark is to make use of W iener filtering. The first step is to predict the original signal and the second step is to subtract the investigated signal by the predicted original signal to retrieve the spatial- dom ain watermark. In our im plem entation, an adaptive W iener filtering algorithm [48] is adopted. The space-variant filter h(ni,ri 2 ), w ith —M < ni,ri 2 < M , is determ ined by h{ni,n2) = 0 - 2 +__ (2M + 1)2 c ro + < 4 zL. , rn = n 2 = 0 0 -2 + 0 -; 0, (4.29) otherwise where er%,a^ are the variances of the original image and the w aterm ark, respectively. Instead of assuming a fixed a2 and for the entire image, they are estim ated locally. The variance cr^(x, y) of the w aterm ark is estim ated by calculating the local variance of 135 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the investigated image filtered by Fc as shown in (4.25) and m ultiplying it by o ? s . The variance a 2 0 of the original signal is estim ated by O i( x ,y ) -o * ( x ,y ) , a f ( x ,y ) > a l ( x ,y ) its autocorrelation function, we calculate the second derivative to make the peak detection process easier. T hat is, we filter the autocorrelation function by the filter shown in (4.27). solution for peak detection is to make use of the eight peaks around the center. To detect those peaks, first of all, we ignore the region around the center for peak detection since the appearance of peaks in this region may be caused by the image structure itself instead of the embedded waterm ark. Next, we find the local m axim um th a t is closest to the center of the image. The peak is assumed to be one of eight peaks closest to the center. The determ ination of the local m axim um or peak is done by exam ining if A u(x, y )/A u(0,0) exceeds some threshold. It is actually the correlation coefficient of the extracted w aterm ark with its shifted version. To detect other peaks, we will erase values around peaks already detected, and the values of the opposite peaks across the center in the autocorrelation function because of its sym m etric structure, i.e., A u(x,y) = A u(—x, —y). T he other local maximum closest to the center is again chosen as the peak, which will also be assumed as one of the eight peaks closest to the center. It should be noted th a t utilization of other (4.30) where af(x,y) is the local variance of the investigated image. After prefiltering the image or predicting the spatial-dom ain w aterm ark and calculating Then we will be able to detect peaks if the spatial-dom ain w aterm ark exists. A simple 136 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. peaks will definitely help determ ine a more accurate affine m atrix because of the periodic structure of the peaks. However, the position of the detected peak (x, y) will always take an integer value. To increase the accuracy of detected peaks, we use the value of the autocorrelation function in a square block centered at (x, y) of size (25 + 1) x (25 + 1) to determ ine a more precise position of the peak. The final position of the peak can be calculated by v ‘ = S + i E g | _ {E f IjU W .iJ x O - w ) } , where A (i,j) = M ax(A u(i,j),0) and 1C is the num ber of nonzero A(i,j) excluding A(x,y). From empirical data, the precision can be increased by this weighting process. Given the location of detected peaks (x*,y *), the correction of the geometrical attack becomes a point-m atching problem. The m atching process can be done in a simple way because of the periodic structure of the peaks. By determ ining the relationship between positions of a point before and after the transform ation, the affine m atrix defined in (4.21) can be found. If the image is purely scaled and/or rotated, two param eters will be needed to describe the affine m atrix since it can be determ ined by the rotation angle and the scaling ratio. In this case, we only need one peak to determ ine the affine m atrix. If the image is modified by shearing, the change of the aspect ratio or other linear transform ations, we need two peaks to calculate the correct affine m atrix. 137 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Let us denote positions of the two points before transform ation by (x i,y i), (x 2 ,y 2), and positions of the two detected points after transform ation by (xl,y*), (x^y^)- The inverse affine m atrix can be calculated by using - - 1 - - - - a 00 a 01 = Xi X2 X X 1 ^ 2 a 10 a l l yi V2 y\ i 5* -x ■ - I ~ i - X* a 00 ° 0 1 t X 1 * 1 a 10 a n 1 [ (4.32) 4.4.4 F requency-D om ain W aterm ark D etectio n Before detecting the frequency-domain waterm ark, we first use the inverse affine transform to recover the image back to the original scale and orientation. For the pixel of the investigated image at location (x ,y), its recovered position (x*,y*) can be calculated via (4.33) Since x* and y* may not take integer values, some interpolation m ethod must be used. Note th at our scheme does not require a very accurate precision and linear interpolation m ethod is adopted. Although the recovered image does not have good quality, the water m ark can be successfully detected since we embed the w aterm ark in the middle-frequency coefficients of the image. The finer parts of the image represent the high frequency com ponents, which are not required in w aterm ark detection. However, we should pay attention to the case of rotation attacks. Let us consider the image shown in Fig. 4.15(b), which is generated by rotating the original “Lena” image (Fig. 4.15(a)) 15 degrees to the right. The size of the two images are different since it is 138 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. assumed th at all the image users will remove the additional background, i.e., the portion th at is not originally appear in the image (usually filled w ith zero value in most of the image editors), since slanting boundaries do not look pleasant to image users. In waterm ark detection, we apply the autocorrelation process and correctly convert the attacked image to its original orientation and scale as shown in Fig. 4.15(c). From the Fourier spectrum of the corrected image as shown in Fig. 4.15(d), the cross artifact will not be constrained near the horizontal and vertical axes. These additional artifacts come from slanting boundaries as we recover the image and pad the background w ith zeros. One m ethod to get rid of these cross artifacts is to crop only the central part of the recovered image as most of the image users will do. However, the effect of cropping an image on its Fourier coefficients is equivalent to shifting the image and adding noise to the Fourier coefficients. Cropping more image content will definitely affect the detection of the frequency-domain waterm ark. Therefore, instead of cropping the image, we choose to reduce the cross artifacts by filling the additional background of the image w ith its mean value (Fig. 4.15(e)). In normal cases, this procedure will decrease the discontinuity between image boundaries and other parts of the image. By comparing Fig. 4.15(d) and (f), we see th a t the cross artifact is significantly reduced and the waterm ark detection will thus be improved. After recovering the image to its original shape, we paste it on an N x N block, where N is the smallest num ber equal to the power of two th a t can cover the recovered image. In norm al cases, the size should be the same w ith the one used in embedding. If the com putational load is not a serious issue in w aterm ark detection, 2N x 2N D FT can also be tested if the waterm ark is not found in N x N DFT. The other m ethod is to calculate 139 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 4.15: (a) The original Lena image, (b) the image rotated by 15 degrees and cropped, (c) the recovered image by filling the background w ith zeros, (d) D FT m agnitudes of the zero-padded recovered image, (e) the recovered image by filling the background w ith the mean value and (f) D FT m agnitudes of the mean-filled recovered image. the D F T in a larger dimension, e.g. 41V x 41V. If the w aterm ark is not detected, we then detect the w aterm ark from the downsampled coefficients. In frequency-domain waterm ark detection, we encounter the same problem in spatial- dom ain waterm ark detection since the m agnitudes of the Fourier transform only take positive num bers so the distribution is definitely non-Gaussian. We follow the same stra t egy as adopted in spatial-dom ain w aterm ark detection, i.e., filtering Fourier m agnitudes via the use of a prediction filter. 140 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The correlation response is used to determ ine the existence of a waterm ark. We need only the two upper sectors in the D FT m agnitudes as shown in Fig. 4.9 to detect the waterm ark. Let Ci(r,6) and C%{r, 6) be the filtered m agnitudes of the Fourier coefficients in the upper/left sector and the u pper/right sector, respectively. Wj{r, 9) is the tested w aterm ark symbol at the corresponding position. The correlation response is calculated by _ E £ r , E L / {CT(''. ») * "Vfr. 0 } + S £ r , S C f-h i 0) X » ) } E . t / c f M ) + £ ;= r , E E | + j C |2(r,fl) The weighting factor shown in the denom inator is to make the correlation response have a normal distribution based on the C entral Lim it Theorem. Consequently, the detection threshold can be set according to the Q function to m aintain the false positive rate below a given probability as discussed in the threshold analysis of JPEG -2000 w aterm arking in Section 3.2.6. Since the frequency-domain w aterm ark carries inform ation about the image, let us examine its capacity. The energy in an image is seldom evenly distributed in the angular frequency of the Fourier spectrum . Images frequently have a large am ount of energy in one group of directions, while having lower energy in the orthogonal group of directions. Therefore, one possible m ethod is to divide the Fourier spectrum into S sections according to different angles. The value of S is lim ited by the spreading gain as the spread-spectrum waterm arking m ethod is utilized. We calculate the sum m ation of the correlation responses in one sector and the other sector ninety degrees apart from it. The sign of the correlation response of the sum m ation can be used to represent one-bit inform ation. In this case, the 141 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. correlation response in (4.34) should be modified to be the sum of the absolute value of the correlation response in each binary data. If we would like to increase the security of the system, we can consider the other m ethod, which makes use of a secret key to decide random coefficients for calculating the correlation response of each bit. This m ethod is applicable if the image has been geometrically corrected and the synchronization problem has been solved by using the spatial-dom ain waterm ark. 4.4.5 E xp erim en tal R esu lts The Lena image of size 512 x 512 is used in this section for experim ents. To begin with, we describe param eters to be used in the composite system. In frequency-dom ain waterm ark embedding, the w aterm ark is em bedded in coefficients w ith its radius in the range of [32,128] to achieve the balance between robustness and visual quality and its angle in the range of (9°, 81°), (99°, 171°), (— 9°, — 81°) and (— 99°, — 171°) to avoid the region with the cross artifact. The w aterm ark weighting factor a j in (4.23) is set to 0.6. The bi-polar waterm ark is used, i.e., the w aterm ark symbols take two values, -1 and 1. In spatial- dom ain waterm arking, the size M of the block is set to 64 so th a t the image of size at least 128 x 128 can be protected by the composite scheme. The global w aterm ark weighting param eter, a s, is set to be 0.1. The m axim um offset of the w aterm arked image from the original image is 9. In frequency-domain waterm ark detection, our goal is to achieve a false positive rate of 10~8 or lower while the correlation response is assum ed to be a random variable with a norm al distribution. Therefore, the threshold value is set to 5.61. The PSNR value of the composite w aterm arked image w ith respect to the original image is 38.3dB. The resulting w aterm arked image is shown in Fig. 4.16(b). 142 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (a) (b) Figure 4.16: (a) The original “Lena” and (b) the watermarked “Lena.” StirM ark [59] provides a generic set of tools for the robustness test of image water marking algorithms. We use this benchm ark tool to test the resilience of the proposed composite w aterm arking system to various attacks. The attacks perform ed by StirM ark can be classified into two categories. The first category is the noise-type attack, which includes filtering, color quantization and JP E G compression. The second category is the geometrical attack, which includes general image editing operations such as cropping, rotation, scaling and the random geometrical distortion, which applies varying types of geometrical attacks to different regions of an image. Up to now, no satisfactory solution has been found to deal w ith the random geometrical distortion. We will focus on the noise-type and the generalized geometrical attacks in our experim ents. Since most users will process the image and then save the image in a compressed form at, m ost experiments are done w ith JP E G compression. 1. JPE G Compression First, we test the JP E G compression attacks on the w aterm arked image. Since the compression attack does not change the geometry of the image, we detect the w aterm ark from the frequency-domain directly. The JP E G quality factor of the tests 143 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. vary from 90 to 10. The experim ental results are shown in Fig. 4.17. All correlation responses are well above the threshold value 5.61, which is indicated as a broken line. Therefore, our frequency-domain w aterm ark survives JP E G compression quite well. I c t .2 O JP E G Q uality Factor Figure 4.17: Robustness test of JP E G compression. 2. Filtering Next, we consider the filtering attack. StirM ark performs six types of filtering attacks including 2x2, 3x3, 4x4 m edian filters, frequency mode Laplacian removal attack [3], 3 x 3 Gaussian filtering and 3 x 3 sharpening. The correlation responses in the above six cases are 18.66, 18.42, 17.22, 10.57, 19.14 and 23.38, respectively. We see th at the em bedded w aterm ark is robust against these filtering processes as expected. 3. Cropping In most cases, image users are interested in the central part of the image. StirM ark crops the images by removing 1, 2, 5, 10, 15, 20, 25, 50 and 75% of borders. The correlation responses are shown in Table 4.1. As shown in the table, the waterm ark is successfully detected if the size of the rem aining image is larger th an a quarter of 144 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. W idth Cropped(% ) W idth Region Left(%) Correlation Response 0 512 100.00 20.29 1 506 97.67 18.04 2 501 95.75 14.55 5 486 90.10 16.19 10 460 80.72 14.89 15 435 72.18 11.98 20 409 63.81 10.99 25 384 56.25 12.35 50 256 25.00 8.08 75 128 6.25 0.26 Table 4.1: The correlation responses of different cropping attacks. the original image. The w aterm ark is not detected when the image is cropped more th an 75% of rows and columns. However, the commercial value of this small image has lost. 4. R otation and Scaling The correlation responses of the rotation, scaling and rotation/scaling are shown in Table 4. In rotation, the image is rotated by ±2°, ±1°, ±0.75°, ±0.5° and ±0.25°. Some larger degrees such as 5°, 10° and 15° are also tested. The correlation response rem ains high when a rotation of a smaller angle is applied. The correlation response drops a bit for the rotation of a larger angle since some portion of the image is cropped to preserve the central p art of the image. For scaling attacks, the ratios w ith 0.5, 0.75, 0.9, 1.1, 1.5 and 2.0 are tested. W hen the image is enlarged, the correlation response is close to the optim al value since we will not lose much image inform ation when we recover the image by down-sizing. However, the w aterm ark is not detected when the image is scaled to one half of its original size. The reason 145 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. R otation R otation/Scaling Scaling Angles Response Angles Response Ratio Response -2.00° 13.15 -2.00° 15.07 0.5 -0.13 ! i it-1 i © ! © I o 17.53 -1.00° 18.11 0.75 9.72 -0.75° 14.32 -0.75° 13.58 0.9 13.20 -0.50° 14.71 -0.50° 16.80 1.1 18.95 -0.25° 15.72 -0.25° 12.09 1.5 19.58 0.25° 16.51 0.25° 12.38 2.0 19.20 0.50° 13.91 0.50° 17.60 — — 0.75° 15.72 0.75° 14.08 — — 1.00° 14.99 1.00° 17.75 — — 2.00° 15.11 2.00° 14.45 — — 5.00° 10.61 5.00° 14.23 — — 10.00° 13.54 10.00° 14.24 — — 15.00° 9.05 15.00° 11.83 — — Table 4.2: Correlation responses of rotation, scaling and rotation/scaling attacks. comes from the fact th a t the spatial-dom ain w aterm ark is vulnerable to the JP E G compression when the image is down-sampled. If the JP E G compression is not applied, the w aterm ark can still be detected w ithout difficulty. The third attack is a combination of rotation, cropping and scaling. T h at is, the image is rotated, the center is cropped and scaled to its original size. All the correlation responses are higher th an the threshold value 5.61, which shows the capability of the proposed system to survive generalized geometrical modifications. 5. Other Generalized Geom etrical Transform ations Other generalized geometrical transform ations such as shearing, change of the aspect ratio, colum ns/rows removal and linear transform are also tested in the experiments. 146 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. We show some typical examples of these geometrical attacks in Fig. 4.18. Corre lation responses of Fig. 4.18(a), (b), (c) and (d) are 12.60, 18.32, 12.97 and 11.34, respectively. The w aterm ark is unambiguously detected. (c) (d) Figure 4.18: (a) Shearing the image by 5% in both horizontal and vertical directions, (b) the change of the aspect ratio by scaling the height by 0.8, (c) removing 17 rows and 5 columns, and (d) linear transform ation w ith aoo = 1.007, aoi = — 0.01, aio = — 0.01 and an = 1.012, where aoo; aoii «io and an are the four param eters of affine m atrix as defined in (4.21). Finally, we apply extensive tests to obtain the PSN R value of w aterm arked images and perform the false positive analysis on unwaterm arked images. Fig. 4.19(a) shows the histogram of the PSN R value of 4000 w aterm arked images, which are provided by Corel Corporation. The average PSN R is 38.9dB, very close to the PSN R of the waterm arked Lena image (38.3dB). We see th a t most of the waterm arked images have a high PSN R value around the range of 37dB and 39dB. The minimum and m axim um PSN R values are 33.5dB and 45.8dB, respectively, which correspond to a noisy image (which allows more 147 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. waterm ark energy to be embedded) and a sm ooth image (which allows less waterm ark energy to m aintain perceptual quality). Fig. 4.19(b) shows the false positive rate analysis. We detect the waterm ark in 10,000 natural images, which are not em bedded w ith waterm arks. The X-axis is the threshold for w aterm ark detection. If the absolute value of the correlation response calculated from (4.34) is larger th an the threshold, the w aterm ark is falsely detected. The Y-axis is the false positive rate. The ’+ ’ m arks in Fig. 4.19(b) come from the experim ental data, which are compared w ith the breaking line curve derived from the Q function. The dotted straight line indicates the false positive rate (10~8) corresponding to the threshold 5.73 used in the experim ent. We can see th at the false positive rate is under our control and it is expected th a t the very low false positive rate (10~8) can be achieved in the proposed composite w aterm arking scheme. Peak Signal lo Noise Ratio (dB) (a) (b) Figure 4.19: Extensive tests for (a) the PSN R value and (b) the false positive rate analysis. 148 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.5 Perceptual Block-based DCT W atermarking Resisting Affine Attacks As we m entioned earlier, the methodology of block-based w aterm arking is widely used because of its benefits of the increased perceptual performance, larger payload and pos sible com patibility w ith the current image compression standard, JPE G . However, the vulnerability of the block-based scheme against affine attacks severely lim its its scope of applications. In this section, we attem pt to solve the synchronization problem of general block-based waterm arking schemes by using grid em bedding/detection. We illustrate the usage of grid signals by a perceptual, affine-invariant, block-based D C T waterm arking scheme. Before describing the proposed approach, we would like to address the issue why we will present one more scheme in another domain. We first discuss transform -based water m arking briefly. The Fourier transform , the block-based discrete cosine transform and the wavelet transform are the three most popular ones used in transform -based or frequency- domain waterm arking. We compare advantages and disadvantages of using these trans forms as follows. The Fourier transform outperform s the other two in term s of the functionality. It is much easier to attain a digital w aterm ark surviving geometrical attacks by using the Fourer transform because of its invariant properties. Our proposed composite scheme is a good example as we chose the Fourier dom ain for w aterm ark em bedding/detection due to its shift invariance. However, the drawback of w aterm arking algorithm s based on the Fourier transform is the lack of perceptual m easurem ent in a pure frequency domain. In our 149 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. spatial-frequency composite waterm arking scheme, the scaled m agnitude of the Fourier coefficient is used as the w aterm ark weighting factor and a spatial masking is applied afterwards to ensure th at the em bedded signals fulfill the visibility constraint. W ithout this postprocessing step, the artifacts caused by the Fourier-domain w aterm ark may appear, especially in some fiat regions. The requirem ent of postprocessing in the spatial-dom ain may not only increase the complexity of the system but affect the em bedded Fourier- dom ain w aterm ark to a certain degree. Therefore, the m ajor concern in Fourer-based waterm arking is the visibility of em bedded watermarks. Since the wavelet transform and the block-based DCT have good capability of energy compaction, many image codecs, including JP E G and JPEG-2000, resort to either one of the two transform s for efficient coding. M any waterm arking schemes also make use of them to increase robustness against filtering and compression attacks. C ertain efficiency can thus be attained by embedding the digital waterm ark into the same dom ain as com pression, as w hat we have done in the JPEG -2000 waterm arking scheme. Moreover, many perceptual models have been developed based on block-based D C T or the wavelet trans form for efficient quantization [2, 58, 91, 92], W aterm arking schemes utilizing the two transform s can achieve better w aterm ark adaptation by using these existing visual models as well. More sophisticated visual models can help m aintain a good balance between the requirem ents of robustness and invisibility. Nevertheless, geometrical correction is always a m ajor problem for these transform s. For block-based D CT waterm arking, the ability to recover the block position accurately is the key to correct waterm ark detection. This is however a difficult task. The waterm arking scheme based on the wavelet transform has a sim ilar drawback as the transform is neither 150 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. shift nor rotation invariant. Most of previous research either ignored this problem or assumed th at the original image is available, which severely lim its the scope of applications. Our goal is to develop a perceptual waterm arking scheme resilient to geometrical at tacks. To develop a perceptual w aterm arking, the Fourier-based approach may not be appropriate and the wavelet-based and block-based waterm arking are b etter choices. To make the w aterm ark resist geometrical attacks, we exclude the wavelet transform since cropping will make wavelet coefficients totally different from their original values and such attacks as rotation or scaling make the synchronization problem even worse. Therefore, we propose a scheme based on block-based D C T to dem onstrate the idea of grid embedding to compensate the loss of synchronization in general block-based w aterm arking schemes as well as show the decent performance along w ith possible com patibility w ith JPE G . We will first discuss W atson’s visual model [91] in Section 4.5.1, which determines the Just Noticeable Difference (JND) of DCT coefficients and will be used to scale the embedded w aterm ark energy to achieve perceptual waterm arking. In Section 4.5.2, we dem onstrate the processes of signal embedding and detection. We apply W atson’s percep tual model in both the w aterm ark and the grid signal embedding so th a t the m odification of the DCT coefficient is lim ited by the JND to fulfill the invisibility constraint. Ex perim ental results in Section 4.5.5 will be used to show the robustness of the proposed method. 4.5.1 W atson’s P ercep tu al M od el W atson [91] proposed a perceptual m odel in block D CT dom ain to achieve an image- adaptive quantization table. The goal is to quantize the image as much as possible while 151 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. m aintain the quality of the image. It is worthwhile to note th a t quantization and water marking share some similarity. Q uantization modifies the d a ta to obtain a more compacted representation while w aterm ark changes the d ata to carry hidden inform ation. Both m od ifications require th a t the introduced distortion be not perceived by the hum an eyes. Therefore, it is reasonable to make use of the model in w aterm arking to achieve impercep tible waterm ark em bedding and to obtain a robust scheme by maximizing the embedded waterm ark energy. In W atson’s perceptual model, the estim ation of the visual threshold begins w ith an image independent perceptual threshold proposed by A hum ada and Peterson [2, 58]. The visual threshold associated w ith each D C T coefficient w ith frequency indices (i,j) is m easured under various conditions and can accommodate varying display luminance, res olution and viewing distance, etc. To be more specific, the log of Tij can be approxim ated by a parabola in log spatial frequency, !°g(Tij) = log(- • T ' o/ ) + 0 + f$j) - log(fmin))2, (4.35) where Tmjn , K and f rnin are functions of the total luminance. Tmjn is the luminance threshold at f min, the frequency where the threshold is smallest and K determines the steepness of the parabola, /^o and foj are vertical and horizontal spatial frequencies respectively. For a N x N block, they can be expressed as ^ ’° 2Nu>i ’ 2 N w j ’ ^ 4'36^ where wt/wj are vertical/horizontal w idth of a pixel in degrees of visual angle. 152 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Two masking effects are taken into account: the lum inance m asking and the contrast masking effects. Lum inance masking refers to the dependency of th e visual threshold and the m ean luminance of the local image region. The luminance-adj usted threshold M-jk is calculated by (4.37) where ar controls the degree of luminance masking and a typical value of 0.65 is used. co,o is the average of DC coefficients for the image or a nom inal value of 1024, corresponding to gray-level 8-bit images, and co;o ,& is the DC term of D CT for block k. Contrast masking indicates th a t the threshold for a visual p a tte rn would be reduced in the presence of other patterns, particularly those of sim ilar spatial frequency and ori entation. The lum inance-adjusted threshold is then adjusted for the component contrast via where k is the D C T coefficient and q j is the exponent controls the degree of contrast masking. Since contrast-adj usted threshold aims to increase the visual threshold by consider ing the additional effect, the larger value between the contrast-adjusted threshold and luminance-adj usted threshold will be chosen as the final masked threshold, (4.38) Mijtk — Mdx[Mithk,Mhjtk], (4.39) 153 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The final masked threshold is also called Just Noticeable Difference (JND). The additive noise is assumed to be invisible if it is less th an the JND. In this research, we will view W atson’s model as the visibility constraint and any em bedded signal should not violates this constraint to insure perceptual quality. 4.5.2 Stru ctu re o f th e P rop osed Schem e Our system block diagram is shown in Fig. 4.20. We embed a grid structure into the waterm arked image to determ ine applied geometrical modifications w ithout explicitly re sorting to the original image. Detection of the grid structure includes the autocorrelation and cross-correlation processes to recover the image to its original scale, orientation and position. After the registration process as a result of grid detection, the hidden inform ation can be successfully determ ined from the registered image by w aterm ark detection. Watermarked Image Original image Investigated image Result Block-based watermark detection Grid structure extraction Recovering scale and orientation Translation and pivot determination Block-based watermark and grid structure embedding Attacks: Geometrical transformation Filtering Compression Figure 4.20: The waterm arking scheme w ith grid embedding and detection 154 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.5.3 Signal E m bedding The signal embedding processes are shown in Fig. 4.21. In the proposed block-based scheme, two signals are embedded in the image. One is the w aterm ark signal and the other is the grid signal, corresponding to the frequency-domain w aterm ark and the spatial- dom ain waterm ark, respectively, in the composite waterm arking scheme presented in Sec tion 4.4. We do not term the grid signal as the spatial-dom ain w aterm ark here, since the grid signal will be shaped in the block D C T dom ain to be explained later. Original image 8X8 DCT | — Watermark sequence Watermark embedding g g g g p g g g g JND J(u,v,b) 8X8 DCT Grid signal shaping 8X8 IDCT Grid Structure W aterm arked im age Figure 4.21: Signal embedding of the proposed block-based scheme. In the composite waterm arking scheme, spatial-dom ain w aterm arking is em bedded in the image th at has been embedded w ith the frequency-domain waterm ark. The detection of the frequency-domain w aterm ark is more or less interfered by the existence of the other embedded signal. The situation is not so severe for Fourier-dom ain w aterm arks since the global Fourier transform is considered to be robust against noise adding. However, in the block-based waterm arking scenario, we should be more cautious in doing so. Some block-based D C T waterm arking schemes rely on more accurate detection, and grid signal embedding in the spatial dom ain w ithout special attention may not be acceptable. To make 155 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. grid em bedding/detection applicable in general block-based DCT w aterm arking schemes, it is necessary th at D C T coefficients selected for w aterm ark em bedding should not be modified further by the grid embedding process. Therefore, we divide DCT coefficients into two parts: one p art for carrying the water m ark and the other p art for hosting the grid signal. As done in JP E G compression, DCT is applied to 8 x 8 blocks to achieve signal decomposition. A classification scheme is shown in Fig. 4.22. The lower middle frequency coefficients at the shaded positions are selected for w aterm ark embedding since they are comparatively reliable for w aterm ark detection after possible filtering or compression attacks are applied. The grid signal is embedded in the rem aining coefficients. It should be noted th at some of the low frequency coefficients are also left for grid signal embedding so th at the grid signal can have reasonable resis tance against the compression attack. We can view the process of classification as dividing the communication channel into the “w aterm arking channel” and the “synchronization channel.” The interference between the two channels is minimized by doing so. Figure 4.22: The w aterm ark is em bedded in the shaded positions while the grid signal exists only in the other positions. 156 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Again, the w aterm ark signal is a pseudo-random sequence, which should be kept con fidential so th a t the security of the waterm arking system can be further ensured. For selected coefficient the corresponding waterm arked coefficient is formed by ci,j,k ~ C i,j,k + Mi x wi,j,k) (4.40) where M y y is JND determ ined in (4.39) to control w aterm ark energy and is the corresponding w aterm ark symbol taking 1 and -1 so th at the am ount of modification is equal to JND. The embedding m ethod is very similar w ith the one proposed by Podilchuk et al. [62], Special care m ust be taken to deal w ith the cropping attack. As m entioned before, cropping can cause a serious problem for block-based w aterm ark detection. If the water m ark sequence is em bedded from the beginning of the image (left-upper corner) to the end of the image (right-bottom corner), removing a small portion of the image on the four sides will result in synchronization loss, and detection of the em bedded w aterm ark will fail. A simple solution would be repeatedly embedding a shorter w aterm ark sequence into several parts of the image. However, this m ay limit the robustness and capacity of the wa term ark. Besides, some search algorithm s are still needed to cope w ith the trivial cropping attack. A low-complexity m ethod is thus required to make the w aterm ark resist simple cropping attack. It should be noted th at cropping is usually done on sides of an image. In a photograph, the central portion usually represents the Region of Interest (RO I), which is of more im portance. It is less possible to remove ROI since the value of the photograph will be reduced by doing so. Therefore, the waterm ark should be em bedded in the region 157 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. where cropping is difficult to be applied. We therefore choose a point near the center of the image as the “pivot.” The w aterm ark sequence should be em bedded starting from the pivot with an inside-out m anner as shown in Fig. 4.23. If the position of the pivot can be accurately determ ined by b oth the waterm ark embedder and the w aterm ark detector, m ultiple-bit w aterm arking can be achieved easily, and robustness and security can also be increased. Image to be watermarked x-l Figure 4.23: Em bedding of the waterm ark sequence. “X” represents the pivot. The w aterm ark sequence is em bedded starting from the pivot w ith an inside-out manner. Instead of using a simple Laplacian filter as an activity m easurem ent, we utilize the JND value decided by W atson’s visual model as our absolute visibility criterion for both w aterm ark em bedding and grid signal embedding in this block-based DCT waterm arking scheme. In other words, the weighting of the grid signal will also be done in the block D CT domain. A practical solution is to m ultiply the grid signal by a value of S, which results in a strong grid signal. The exaggeratedly scaled grid signal will be shaped in the D CT domain to ensure its invisibility using W atson’s model. 158 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. We calculate the 8 x 8 DCT of the grid signal. If the coefficient Cijjk is allowed to be modified via grid embedding, i.e., located in the region outside the shaded positions as shown in Fig. 4.22, the coefficient after grid embedding, will be = chhk + sign(gi}hk) x M in{\gitj^\, M i>jjk), (4.41) where g ij> k is the DCT coefficient of the grid signal. Thus, we can guarantee th at the embedding of the grid signal will not introduce visible artifacts since the m odification is w ithin the JND. To help determ ine the pivot point, which is used to indicate the sta rt of the waterm ark sequence, we intentionally choose a different pseudo-random p attern and embed it into the region covering the pivot as shown in Fig. 4.21. This p attern can be totally different from other patterns, or simply the negation of other patterns th at construct the grid structure. Due to the different characteristic of this pattern, the detector can identify this region, and find out the pivot for subsequent detection of the waterm ark sequence. 4.5.4 Signal D etectio n The signal detecting processes include the detection of the grid signal and the w aterm ark signal. The autocorrelation and cross-correlation procedures are the two im portant tools for grid detection. In short, autocorrelation helps determ ine the affine m atrix for recovering the image to its original orientation and scale while cross-correlation is used to m easure the translational offset in a grid. The grid covering the pivot is determ ined by calculating the correlation once more. The whole self-registration process is then completed. 159 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Now, we are ready to present the detail of the detection processes. We make use of the W iener filtering m ethod described in Section 4.4.3 to extract the grid signal because of its decent performance. The autocorrelation process is applied to the extracted grid signal in the same way as the composite scheme to determ ine the four param eters in the affine m atrix, i.e., a00, g0i , gio and a n , in E quation (4.21) for the inverse affine transform ation if necessary. After the image is transferred back to its original orientation and scale by applying the inverse affine m atrix, we have to determ ine the spatial translation. We do not m easure x s and y s in Equation (4.21) explicitly since cropping can be done w ith any am ount and the size of the original image is not supposed to be known by the w aterm ark detector. B ut the horizontal/ vertical shifts in a grid can be determ ined. T h at is, we can find out th at the horizontal and vertical shifts via (3 X x M + Xb and (3 y x M + yb, respectively. Although the exact values of f3x and j3y are not known, we can at least verify the correct coordinate of the grid structure. To determ ine Xb and yb, we fold up the extracted signal th at is geometrically adjusted by the inverse affine m atrix into a folded grid of size M x M . T hat is, signal values are summed up if they are at the same position in the grid. We then calculate the cross-correlation of the folded grid and the em bedded pattern, which is assumed to be known to the detector. Again, cross-correlation can be calculated efficiently by using DFT. Fig. 4.24 shows the cross-correlation of the folded grid and the grid pattern. Fig. 4.24(a) is the case of the waterm arked image w ithout cropping while Fig. 4.24(b) is the case of the cropped image. The horizontal and vertical shifts of the correlation peak reveal values of Xb and yb- 160 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (a) (b) Figure 4.24: The cross-correlation function of the folded grid and the em bedded pattern in (a) the waterm arked image w ithout cropping and (b) the w aterm arked image w ith cropping. W ith the help from the affine m atrix and translational offset param eters, the grid struc ture of the investigated image can be m atched w ith th at in the unattacked waterm arked image. The final step of the self-registration process is to determ ine the pivot. Identifica tion of the pivot is also achieved by the correlation m ethod. We calculate the correlation between each M x M grid w ith the pseudo-random pattern. As described earlier, the grid th at contains the pivot is different from others so that the correlation value can clearly indicate which grid contains the pivot. For example, if the unique grid is em bedded with a negative grid pattern, we will obtain a negative correlation response in this grid while correlation responses in other grids are much larger than zero. Synchronization is then fully recovered, and we can apply the block-based waterm ark detection to uncover the hidden information. 161 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. To determine the existence of a w aterm ark, the normalized correlation response is as the confidence m easurem ent, which is calculated via where S is the set of selected coefficients for waterm arking in a D C T block, c* • k is the DCT decoding can also be achieved using the m atched filter as coefficients or blocks are divided into sets, in which a w aterm ark bit is embedded. The sign of the correlation response in each set represents the decoded bit. A more sophisticated detection structure is to take the distribution of D CT coefficients into account. We assume th at the D C T coefficients (excluding DC term ) can be reasonably modeled as a generalized G aussian distribution. The probability density function can be expressed as and 7 is the shape param eter and a is the standard deviation. If the distribution of DCT coefficients is modeled correctly, w aterm ark detection may be improved by using the m axim um likelihood detection. It should be noted th a t the m atched filter is a special case of the maximum likelihood detection. P = Y k Y (i,j)eS ci,j,k X wi,j,k (4.42) coefficient of the investigated image and w Y k is the test w aterm ark sequence. W aterm ark /(c ) = A e ~ l/3c|7 (4.43) where (4.44) A = Pi 2T(l/7) ’ (4.45) 162 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The selection of the shape param eter 7 could bring certain im pact on the performance of the waterm ark decoder. There are two m ethods to determ ine the value of 7 . The first m ethod is to estim ate the shape param eter from a set of d ata based on the first- and second-order moments. Let p be u 2 £2 pc-/iir (4.46) where J5[-] is the expectation and p is the m ean of the set of D C T coefficients. The shape param eter 7 for the distribution of D CT coefficient can be found by solving r(i/7)r(37) _ (447) P ( 2 /7 ) "• {iA7) Equation (4.47) can be solved by a lookup table th at is generated by letting 7 vary over the range of values th at could possibly be expected for this param eter in small steps. However, we found th at the previous m ethod could underestim ate the shape param eter. The reason is th a t the maximum likelihood detection is based on the assum ption of the generalized G aussian distribution. The assum ption may fail when the image is complex and m any outliers w ith large values appear. The assum ption could be valid if we ignore those large coefficients for w aterm ark detection. However, in our waterm ark detection, we take the contrast masking into consideration as shown in Equation (4.38). It is clear th at the masking is directly related w ith the value of the coefficient. T hat is, we embed more w aterm ark energy in stronger coefficients, which is the interference of the watermark. W aterm ark hidden in those coefficients could be more reliable th an others. Since coefficients of a larger value could not be possibly modeled, the m atched filter or the 163 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. correlation detector will be the best m ethod to detect the existence of the waterm ark. In this case, 7 should be set to 2 and the m axim um likelihood detector is degenerated to the m atched filter. In this scenario, the param eter could be very difficult to predict to achieve the optim um result. Here comes the second solution. We select a small portion of coefficients as “probing coefficients” and embed them w ith a known waterm ark. These probing coefficients will be used to detect the very existence of the w aterm ark by E quation (4.42) and compare it with the threshold determ ined by the Q function to control the false alarm rate. At the same time, the probing coefficient will help select the best shape param eter in the investigated image for best w aterm ark decoding afterwards. 4.5.5 E xp erim en tal R esu lts In the experim ent, we assume th at one-bit inform ation is em bedded in the image. M ultiple- bit w aterm arking can be achieved easily by block allocation when perfect synchronization is reached. Again, the size of the grid p attern M is set to 64. The threshold value is set to 5.61, which corresponds to a false positive rate of 10-8 . The Lena image of size 512 x 512 is used for test. The viewing distance for invisibility test is equal to 3.56 m ultiplied by the picture size, which results in 32 pixels per degree of visual angle. The PSN R value of the waterm arked image with respect to the original image is 37 dB. The original and waterm arked Lena images are shown in Fig. 4.25. First of all, we would like to test the robustness of the w aterm arked image under cropping attacks. We random ly cropped a region w ith size 400 x 400 from the waterm arked image and then compressed it by JP E G with a quality factor varying from 90 to 10. The 164 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ( a ) ( b ) Figure 4.25: (a) The original Lena image and (b) the waterm arked Lena image w ith a PSN R value of 37dB. cropping operation results in non-zero values of translation, i.e., Xf, ^ 0 and ^ 0 . As shown in Fig. 4.26(a), all the correlation responses are above the threshold value 5.61, which is represented by a dotted line. The result dem onstrates the resilience of the waterm arking scheme against the com bination of cropping and JP E G compression. Next, we cropped the central p art of the images by removing 2, 5, 10, 15, 20, 30, 40, 50, 60 and 75% of borders and compressing the resulting images by JP E G w ith a high quality factor (90) to see the effect of the pure cropping attack on the waterm arked images. Results are shown in Fig. 4.26(b). Cropping causes information loss and the correlation response declines as the size of the rem aining image becomes smaller. However, we see th at correlation responses keep high when the image has a suitably large size and the value of the image is preserved. Here, it is assum ed th at the region th at contains the pivot is not cropped since it lies in ROI of an image. 165 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 90 Remaining Image alter Cropping {% ) JPEG Quality Factor (a) (b) Figure 4.26: The robustness test of cropping. One m ajor concern is the precision issue. This is especially when the image is rotated and scaled since w aterm ark detection usually requires very good precision, i.e., perfect synchronization. Thus, we performed extensive tests on rotation and scaling to show the applicability of the proposed waterm arking system. In scaling attacks, we scaled the image from 512 x 512 to 400 x 400 and changed the w idth and height of the image by 2 each time. The correlation responses are shown in Fig. 4.27(a). It is clear th a t the w aterm ark survives well under scaling attacks and the lack of perfect precision does not cause a catastrophic result, but makes the correlation response fluctuate a bit. The inform ation loss in down- sampling makes the correlation response a tendency to decline. In rotation attacks, we rotate the image every 0.5° from — 12° to 12°. The result is shown in Fig. 4.27(b). By comparing these two figures, it appears th at the system is more sensitive to rotation than scaling. R otation does not introduce much inform ation loss b u t synchronization loss resulting from the rotation process is more difficult to recover. A lthough the result shows th at the waterm ark can still be detected, we believe th at the perform ance can be improved 166 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. by taking more situations into account to increase the precision of peak determ ination. This will however increase the complexity of the w aterm ark detector as well. Rotation Angie (In degree) Side ot the Seated Image (a) (b) Figure 4.27: Robustness test of scaling and rotation. 4.6 Com m ents on Grid Signal E m bedding/D etection 4.6.1 O ther A p p lication s o f Grid E m bedding The embedded grid signal can help recover the geometrically modified image back to its original scale and orientation for waterm ark detection. This geometrical correction functionality in digital images can also be of help to other applications. One possible usage of the grid signal is to assist image registration in database m anagement. Given a picture of interest, we may need to search relevant images in the database so th at we can dig out images w ith sim ilar characteristics from the database or we can classify this picture and add it to the group of images w ith the same type for efficient archiving. In the remote sensing application, we may need to compare an airborne image w ith archived images to help us extract and understand useful information. R adar images can be used 167 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. jointly w ith a m ap database for m ap updating, improved analysis and determ ining sensor platform positions. Since the volume of images in an image database is usually very large, it is im proper to involve the m anual effort in the processes of comparison and m atching. Those processes should be done in an autom atic way through computing. The autom ated m atching of an image to a database is achieved in an iterative process, where we sta rt w ith an appropriate sensor position, search for corresponding points and com pute refinements to the sensor po sitio n /attitu d e [47]. Correlation, edge-based m atching, region-based m atching and feature extraction, etc. are common tools for image registration. However, images are usually taken at different tim e or varying situations such as differ ent distances, directions, viewing angles, foci and light conditions, etc. The phenomenon is more manifest in airborne images. For example, the two aerial pictures as shown in Fig. 4.28(a) and (b) are taken from the same scene w ith a different direction and height. The trivial differences of scaling and rotation increase the com puting burden of image registration since the comparison may have to be done in a different scaling, rotation and processing. It is clear th at if the two images can be aligned w ith the same scale and orientation, the complexity of the m atching process can be significantly reduced. iife-SSili mmm (a) Figure 4.28: Aerial images taken from the same scene w ith different directions and heights. 168 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A possible solution is delineated as follows. Before the image is taken, the direction and the height of the sensor are m easured. Next, the grid signal is generated, resized and scaled according to the height and direction m easured earlier. The size of the grid should be set corresponding to a predeterm ined height of the sensor. The grid should be pointed to a specific direction (north or south) and a rectangular grid may be more suitable in this scenario. After the picture is taken, the grid signal is weighted by taking the hum an perceptual model into account and em bedded into the airborne picture. It should be noted th at the energy of the grid signal might not be as strong as required in the waterm arking application since we do not expect much malicious attack applied to the image. However, a reasonable degree of robustness is still needed to cope w ith possible transcoding an d /o r processing. For an airborne image th at will be m atched w ith images in the database, the grid signal is first extracted and the autocorrelation function is calculated. After scaling and rotating the image by comparing the peaks of the autocorrelation function of the extracted grid signal with the known constellation, we can then apply the existing m ethods to extract features and m atch them w ith those of the image stored in the database. To sum up, w ith the objects aligned in the same direction and scale from the help of grid em bedding/detection, the comparison and m atching process can be done in a much easier manner. 4.6.2 R ob u stn ess against Stirm ark R and om G eom etrical D isto rtio n The random geometrical distortion offered in Stirm ark causes the m ost severe damage to digital image waterm arking schemes. To further understand how S tirm ark’s random 169 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. geometrical distortion affects the image, we apply the grid to the Lena image and compare it w ith its random distorted version as shown in Figs. 4.29(a) and (b). We can see th at each block is slightly stretched, sheared, shifted, bent and rotated by a small and different random am ount. As described earlier, since different operations are applied to different regions of the image, the geometrical m anipulation cannot be described by a single affine m atrix and the autocorrelation m ethod fails to determine the grid structure for the inverse transform ation. ■! ■§§ (a) (b) Figure 4.29: (a) Lena w ith the grid (b) Lena w ith the grid after Stirm ark random geomet rical distortion. However, w ith repetitive grid p attern embedding, we have a b etter chance to restore the image back to its original shape. The idea is to slightly distort the grid pattern in various ways and m atch it w ith the extracted signal so th at the m ost possible local geometrical m odification can be identified. The inverse operation can then be applied for reconstructing the local region. It is apparent th at the m ethod significantly increases the complexity of w aterm ark detection and the false positive detection rate could be raised 170 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. as well. Besides, the block size should be chosen carefully. If the block size is too large, we may not be able to reconstruct the image well and detection of the hidden waterm ark may still fail. However, if the block size is too small, the ability of the grid signal to resist generalized affine modifications may be decreased. The satisfactory reconstruction of the image with small grids is a problem to be solved too. 4.6.3 T he L im it o f G rid E m bedding for D ig ita l W aterm arking Although grid signals can help detect the geometrical transform ation so th a t the distorted image can be recovered, periodic embedding of the grid p attern has a m ajor drawback. For a 512 x 512 image w ith a 64 x 64 grid signal repeatedly em bedded, there exist peaks ap pearing every 8 rows and columns in the Fourier spectrum of the image. This phenomenon hints th at the grid signal is vulnerable to so-called tem plate attacks [29], in which the peaks shown in the Fourier dom ain are removed and geometrical m odifications are applied to the image afterwards. Besides, these peaks raise certain security concerns since they may re veal the existence of grid signals and the attacker may thus have enough knowledge about the repetitive grid p attern so th a t they can inverse the process to erase the em bedded grid. After the grid signal is removed, the ability of combating affine attack is compromised. In the spatial-frequency composite waterm arking scheme, a successful attack has to include geometrical attacks such as scaling or rotation to introduce the synchronization problem. In the perceptual block-based scheme, the detector may not be able to locate the starting point of the em bedded waterm ark, i.e., the pivot, since the p attern covering the pivot is removed. The subsequent w aterm ark detection task will become more difficult. 171 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In fact, this draw back is basically a problem of trade-off. T he explicit peaks reveal the existence of the grid signal. However, if we try to hide these peaks to enhance the security of the w aterm arking scheme, we are m aking the hidden w aterm ark more vulner able to geometrical attacks as well. W ithout the original image at hand, the waterm ark detector and attackers may have a similar am ount of knowledge about the hidden informa tion when the image has been geometrically modified. One possible solution to segment the image by using feature points. These feature points should be detected reliably even after geometrical or filtering attacks. As the image is segmented into several parts, we separate these parts into two groups. For grids located w ithin one group, the grid p at tern is positively m odulated. For grids in the other group, the grid p attern is negatively m odulated. By doing so, we may break the periodic nature of the grid signals so th a t the Fourier spectrum will not show those additive peaks. The w aterm ark detector will extract the em bedded signal and segment the image in the same way as the waterm ark embedder does. Then, the detector negates the extracted grid signal in the group in which the grid pattern is negatively m odulated in the embedder. T hen we should be able to determ ine the possible affine distortion by the autocorrelation process described before. It is appar ent th a t a reliable image segmentation algorithm is the key of this methodology. Besides, the w aterm ark em bedder and detector m ust have the same way to separate the segmented parts into positively m odulated and negatively m odulated groups efficiently. Otherwise, a further synchronization problem may arise due to this segm entation process. 172 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4 . 7 Conclusion Geometrical attacks cause a great deal of trouble to digital image waterm arking. We tack led the synchronization problem resulting from geometrical m odifications via structural grid signal embedding and detection to atta in affine-invariant w aterm arking schemes. We first developed a spatial-frequency composite waterm arking m ethod, in which two w ater m arks are embedded into two domains separately in a sophisticated way to achieve the balance between im perceptibility and robustness. The embedded spatial-dom ain w ater mark, i.e., the grid signal, is used to accurately recover an attacked image to its original orientation and scale. The frequency-domain waterm ark is responsible for carrying the w aterm ark payload. Since the frequency-dom ain w aterm ark is translation-resilient, the embedded inform ation can be correctly determ ined from the possibly cropped or shifted version of the w aterm arked image. Next, we extended the idea of grid em bedding/detection to the block-based waterm arking, which is commonly adopted in m any existing algorithm s but extremely vulnerable to geometrical attacks. W ith the help from the grid signal and using the autocorrelation and cross-correlation, we can determ ine the affine m atrix and translation to recover the modified image by inverse transform ation. The block-based DCT w aterm ark detector can then detect the w aterm ark from the geometrically corrected image. Besides, it should be noted th a t grid em bedding/detection should be helpful to many block-based waterm arking schemes, which do not necessarily work in the D C T do main. Some comments on grid signal em bedding/detection were given at the end to show its advantages and lim itations. 173 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Before ending this chapter, we would like to compare the two proposed affine-invariant schemes. Although both schemes achieve robustness against generalized geometrical a t tacks by adopting the similar m ethodology of structural grid signal em bedding/detection, there exists a subtle difference because of different domains where the waterm arking pro cesses are operated. The composite scheme employs a global Fourer-transform for wa term arking, which attains shift-invariance and better robustness with the cost of inferior visual performance. The block-based scheme makes use of a form al visual model to fulfill the constraint of invisibility. The interference between the w aterm ark and the grid sig nal is smaller in the block-based scheme as well. However, com pared w ith the composite scheme, the block-based scheme is less resilient to geometrical attacks since a more precise synchronization is required for separating the image into blocks. Besides, successful wa term ark detection relies on an accurate determ ination of the pivot point, which increases the difficulty of the self-registration process. 174 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 5 Future Work and Conclusion 5.1 F uture W ork Before concluding the dissertation, we would like to describe some possible future exten sions of the current research. 5.1.1 S ystem R efinem ent o f th e P rop osed A lgorith m s Some components in the proposed algorithm s may need further improvements. We exam ine them in the following, and consider possible solutions and strategies to be adopted in the future. 1. Affine-invariant digital image waterm arking One concern of the affine-invariant digital waterm arking schemes is the robustness against the attack obtained by combining geometrical transform ations and a higher ratio lossy compression. For the composite w aterm arking scheme, the hidden wa term ark cannot be detected if the image is down-sampled to less th an one fourth of its original size and compressed w ith JP E G compression of a m ild ratio. One 175 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. possible solution is to improve the frequency-domain w aterm ark using the log-polar mapping. It is expected th a t the resulting w aterm ark can be detected if the image is purely rotated and scaled. This part can be im plem ented separately since the frequency-dom ain w aterm ark should not severely contradict w ith the spatial-dom ain waterm arking scheme. The drawbacks are th a t the com putational load is increased and th a t the inform ation loss during the m apping process has to be handled. Be sides, some num erical numbers in the design of the proposed schemes have not yet been determ ined exactly. A lthough the experim ents have dem onstrated their decent performance, we believe th at the result can be further improved if certain param eters can be assigned more accurately w ith more attacks and images being tested. 2. Steganography in JPEG-2000 For the steganographic scheme developed under the framework of the JPEG-2000 standard, a more rigorous steganalysis may be necessary to ensure the security of the proposed system. However, the purpose of steganalysis is not to uncover the content of the secret message but to determ ine the very existence of the hidden information. Therefore, unlike the cryptanalysis, which can be done in a m athem atical way, the steganalysis may only be carried out by careful exam inations and repeated tests. It should be noted th at the security of the steganographic scheme still relies heavily on techniques of cryptography as the hidden inform ation is usually encrypted before embedded. 176 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.1.2 D evelop m en t o f V ideo W aterm arking Video waterm arking techniques have a broader scope of applications than image water marking. The m ain reason is th at entertainm ent products such as T V news items, show program s, movies are w orth a lot of value. T heir wide distribution to the consumer m ar ket makes the content vulnerable to intellectual property right violation. Besides, video stored in the digital form at may replace the videotape in the near future, since digital video presents an enormous improvement in video quality. Consequently, the video wa term arking technique can be integrated w ith the encryption technology as a copy control tool to deter illegal reproduction. A lthough many video waterm arking schemes have been proposed, there still leave some room for improvement. At the first thought, there is not much difference between video and image w ater m arking since we can view video as an image sequence so th a t w aterm ark embedding and detection can be performed on a frame-by-frame basis. However, some specific issues of video w aterm arking have to be taken care of. 1. Complexity One of the m ajor concerns of video waterm arking is its complexity. It is usually required th at the waterm ark can be detected in real time. If the waterm ark is not detected from the investigated video, the detector has to keep detecting the wa term ark w ithout affecting the perform ance of video playback or recording. Once an unm atched w aterm ark is detected, the video player or recorder will stop its operation since the investigated video may be an illegal copy. Thus, the waterm ark detector should have a m oderate complexity so th at the detection can be done efficiently. 177 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Besides, the am ount of w aterm ark payload is directly related w ith the complexity of im plem entation. Ideally, we can increase the w aterm ark payload by testing more orthogonal w aterm ark patterns in a video clip of interest. However, the tim e of the m atching process will increase accordingly. The adequate am ount of w aterm ark detection has to be justified by em pirical results. 2. Tem poral A rtifact In term s of visibility, one of the m ost distinct differences between image and video waterm arking schemes is its tem poral artifact. By viewing only one frame, the water m ark may not look obtrusive to the hum an being. However, their visual appearance in moving pictures can be quite annoying. For example, it has been found th at if the w aterm ark is embedded into the wavelet coefficients of each frame, the ringing artifact may become more visible. Therefore, special attention has to be paid to the tem poral artifact in video waterm arking. 3. False Positive Detection The volume of video data is quite large. This large volume of data adds more flexibil ity to waterm arking. For example, the w aterm ark can be em bedded frame by frame or only in some frames selected by a secret key, etc. However, the probability of false positive detection will also increase if the design and analysis of w aterm ark detection is not done carefully. As discussed earlier, false positive detection will cause a lot of inconvenience to legitim ate users so th a t the false positive analysis should be more precise in video waterm arking. 178 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4. Video Watermark Attack Some video waterm arking attacks such as fram e dropping/sw apping or rate changing m ust be considered. These attacks can be viewed as geometrical modifications in the tem poral domain. Besides, robustness against the higher-ratio compression is also an im portant issue. Increased robustness of the system against scaling and the change of the aspect ratio is still necessary since these processes are quite common in video editing or playing. However, the mechanism designed to resist these attacks may increase the com putational complexity as well. 5.1.3 U niversal W aterm ark D etecto r Many waterm arking schemes have been proposed and each of them may have various and unique characteristics. An interesting issue is to develop a universal w aterm ark detector, which is used to determ ine the existence of any hidden signal in an image of interest w ithout much inform ation about the hidden data. One of the applications of this research is to approxim ately estim ate the am ount of w aterm arked images th a t are circulated in the network. This m easurem ent may help to evaluate the potential of w aterm arking research and the demand of copyright protection of digital images more accurately. The other application is for the national security purpose. A universal w aterm ark detector may help to screen out suspected images, in which certain groups of people may embed secret information for covert communication. As discussed before, different w aterm arking schemes may adopt varying methodologies, such as the spread spectrum or quantization, or choose different transform s. All of these make developing a universal w aterm ark detector a very challenging task. However, if we 179 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. focus on detecting the existence of a robust w aterm ark th a t can resist generalized geomet rical attacks or a large volume of secret d ata transm itted by steganographic schemes, such a universal w aterm ark detector may exist. For example, as many schemes make use of periodical signals or em bed artificial peaks in the Fourier dom ain to resist generalized geometrical attacks, we may examine these signals in the Fourier dom ain to possibly detect the existence of a hidden waterm ark used for copyright protection. Besides, embedding a robust w aterm ark or a large am ount of data may unusually modify the statistics of the host signal. Therefore, there may exist certain analytical or statistical ways to differentiate clean images and images embedded with certain hidden information. 5.2 Conclusion In this dissertation, we explored the field of inform ation hiding in digital images, an innova tive idea of invisibly embedding additional inform ation into digital images for several inter esting applications. We focused on two im portant issues, i.e., digital w aterm arking/covert communication in the upcoming image standard, JPEG-2000, and the robustness of a digital waterm ark against geometrical attacks. We provided practical solutions to these challenging problems and discussed the feasibility of the proposed schemes via theoretical analysis and experim ental support. In the pioneering research of JPEG -2000 information- hiding, a robust w aterm arking scheme was developed for the copyright protection purpose. The waterm arking and the compression procedures are combined in a sophisticated way to achieve efficiency and features such as progressive w aterm ark detection and ROI wa termarking. A steganographic scheme was then designed to reliably and secretly transm it 180 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. high-volume inform ation in JP E G -2000 compressed images for covert communication. In the development of digital waterm arking schemes towards affine-invariance, we proposed to embed structural grid signals to solve the synchronization problem in blind w aterm ark detection. By using the m ethodology of grid signal em bedding/detection, we presented two affine-invariant schemes and their superior perform ances were dem onstrated. Building a comprehensive m ultim edia infrastructure relies on a well-designed security framework. In our opinion, research on inform ation hiding should be of trem endous help to construct a b etter m ultim edia system. The research of digital waterm arking may provide a final defense line for protecting intellectual property rights. The study of steganography should urge researchers to pay extra attention to security-related issues. We hope th at the advance of digital technologies brings convenience to our m odern life and our research can help move tow ard this direction. 181 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reference List [1] M. D. Adams, “The JPEG-2000 still image compression standard,” Tech. Rep., ISO /IE C JT C 1/SC 29/W G 1, Sep. 2001. [2] A. J. A hum ada and H. A. Peterson, “Lum inance-m odel-based D C T quantization for color image compression,” in Proc. SPIE, San Jose, CA, 1992, vol. 1666, pp. 365-374. [3] R. B arnett and D. E. Pearson, “Frequency mode L.R. attack operator for digitally waterm arked images,” Electronics Letters, vol. 34, no. 19, pp. 1837-1839, Sep. 1998. [4] M. Barni, F. Bartolini, V. Cappellini, A. Lippi, and A. Piva, “A DW T-based tech nique for spatio-frequency masking of digital signatures,” in Proc. SP IE International Conference Security and W atermarking of Multimedia Contents, San Jose, CA, Jan. 1999, vol. 3657, pp. 31-39. [5] P. Bas, J.-M. Chassery, and B. Macq, “Geometrically invariant w aterm arking using feature points,” IE E E Trans, on Image Processing, vol. 11, no. 9, pp. 1014-1028, Sep. 2002. [6] W. Bender, D. Gruhl, N. Morimoto, and A. Lu, “Techniques for d ata hiding,” IB M System, Journal, vol. 35, no. 3, pp. 313-336, 1996. [7] D. Benham, N. Memon, B.-L. Yeo, and M. Yeung, “Fast w aterm arking of DCT-based compressed images,” in Proc. International Conference Image Science, System s and Technology, Las Vegas, NV, June 1997. [8] A. Bors and I. Pitas, “Image w aterm arking using D CT dom ain constraints,” in Proc. IEEE International Conference on Image Processing (ICIP), Lausanne, Switzerland, Sep. 1996. [9] C. Busch, W. Funk, and S. W olthusen, “Digital waterm arking: from concepts to real-time video applications,” IE E E Computer Graphics and Applications, Image Security, vol. 19, no. 2, pp. 25-35, Jan. 1999. [10] G. Caronni, “Assuring ownership rights for digital images,” in Proc. Reliable I T Systems, VIS, Vieweg, Germany, 1995, pp. 251-263. [11] B. Chen and G. W. Wornell, “Q uantization index m odulation m ethods for digital waterm arking and inform ation embedding of m ultim edia,” Journal o f V LSI Signal Processing, vol. 27, no. 1/2, pp. 7-33, Feb. 2001. 182 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [12] C. Christopoulos, A. Skodras, and T. Ebrahim i, “The JPEG 2000 still image coding system: An overview,” IE E E Trans, on Consumer Electronics, vol. 46, no. 4, pp. 1103-1127, Nov. 2000. [13] I. J. Cox, F. T. Leighton, and T. Shamoon, “Secure spread spectrum waterm arking for images, audio and video,” in Proc. IE E E International Conference on Image Processing, Lausanne, Switzerland, July 1996, pp. 243-246. [14] I. J. Cox, F. T. Leighton, and T. Shamoon, “Secure spread spectrum waterm arking for m ultim edia,” IE E E Trans, on Image Processing, vol. 6 , no. 12, 1997. [15] V. D arm staedter, J.-F. Delaigle, D. Nicholson, and B. Macq, “A block based water m arking technique for M PEG-2 signals: O ptim ization and validation on real digital TV distribution links,” in Proc. European Conference M ultimedia Applications, Ser vices and Techniques, Berlin, Germany, May 1998, pp. 190-206. [16] G. Depovere, T. Kalker, and J-P Linnartz, “Improved w aterm ark detection reliability using filtering before correlation,” in Proc. IE E E International Conference on Image Processing (ICIP), Chicago IL, Oct. 1998, pp. 430-434. [17] J. D ittm ann, M. Stabenau, and R. Steinmetz, “Robust M PEG video waterm arking technologies,” in Proc. A C M Multimedia, Bristol, U.K., Sep. 1998, pp. 71-80. [18] D. J. Fleet and D. J. Heger, “Em bedding invisible inform ation in color images,” in Proc. IEEE International Conference on Image Processing (IC IP), Santa Barbara, CA, Oct. 1997, pp. 532-535. [19] J. Fridrich, “A new steganographic m ethod for palette-based images,” in Proc. PIC S 52nd Annual Conference, Savannah, GA, April 1999. [20] J. Fridrich and R. Du, “Secure steganographic m ethods for palette images,” in Proc. The 3rd Inform ation Hiding Workshop, New York, 2000. [21] J. Fridrich, M. Goljan, and R. Du, “Distortion-free d ata em bedding,” in fth Infor mation Hiding Workshop, Berlin, Germany, 1998, vol. 2137, pp. 27-41. [22] J. Fridrich, M. Goljan, and R. Du, “Invertible authentication,” in Proc. SP IE Photon ics West, Security and W atermarking of Multimedia Contents, San Jose, California, Jan. 2001, vol. 3971, pp. 197-208. [23] J. Fridrich, M. Goljan, and R. Du, “Lossless d ata embedding for all image form ats,” in Proc. SP IE Photonics West, Electronic Imaqinq 2002, San Jose, California, Jan. 2002, vol. 4675. [24] M. S. Fu and O. C. Au, “D ata hiding waterm arking for halftone images,” IE E E Trans, on Image Processing, vol. 11, no. 4, pp. 477-484, April 2002. [25] B. Girod, “The inform ation theoretical significance of spatial and tem poral masking in video signals,” in Proc. SP IE International Conference on H um an Vision, Visual Processing and Digital Display, Los Angeles, CA, Jan. 1989. 183 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [26] F. H artung and B. Girod, “Digital waterm arking of raw and compressed video,” in Proc. SP IE Digital Compression Technologies and System s fo r Video Communication, Los Angeles, CA, Oct. 1996, vol. 2952, pp. 205-213. [27] J. R. Hernandez, F. Perez-Gonzalez, and J. M. Rodriguez, “Perform ance analysis of a 2-D m ultipulse am plitude m odulation scheme for d ata hiding and waterm arking still images,” IE E E J. Select. Areas Commun., vol. 16, pp. 510-524, May 1998. [28] J. R. Hernandez, F. Perez-Gonzalez, J. M. Rodriguez, and G. Nieto, “The impact of channel coding on the performance of spatial w aterm arking for coyright protection,” in Proc. IE E E International Conference Acoustics, Speech and Signal Processing 1998 (IC ASSP 98), Seattle, WA, May 1998, vol. 5, pp. 2973-2976. [29] A. Herrigel, S. Voloshynovskiy, and Y. Rytsar, “The w aterm ark tem plate attack,” in Proc. SP IE Photonics West, Security and W atermarking of M ultimedia Contents, San Jose, CA, Jan. 2001, pp. 4314-4346. [30] M. Holliman, N. Memon, B.-L. Yeo, and M. Yeung, “Adaptive public w aterm ark ing of DCT-based compressed image,” in Proc. SP IE Photonics West, Security and W atermarking of Multimedia Contents, San Jose, CA, Jan. 1998. [31] C.-T. Hsu and J.-L. Wu, “Hidden signatures in images,” in Proc. IE E E International Conference on Image Processing (IC IP), Lausanne, Switzerland, Sep. 1996, pp. 223- 226. [32] H. Inoue, A. Miyazaki, A. Yamamoto, and T. K atsura, “A digital waterm ark based on the wavelet transform and its robustness on image compression,” in Proc. IEEE International Conference Image Processing (ICIP), Chicago, IL, Oct. 1998, vol. 1, pp. 391-395. [33] JPE G 2000 Docum ent, “JPE G 2000 verification model 5.0 (technical description),” Tech. Rep., ISO /IE C JT C 1/SC 29/W G 1, July 1999. [34] T. Kalker, G. Depovere, J. Haitsm a, and M. Maes, “A video w aterm arking system for broadcast m onitoring,” in Proc. SP IE Photonics West, Security and Watermarking of Multimedia Contents, San Jose, CA, Jan. 1999, vol. 3657. [35] K. T. Know and S. Wang, “Digital waterm arks using stochastic screens - a halftoning w aterm ark,” in Proc. SP IE International Conference Storage and Retrieval for Image and Video Databases, San Jose, CA, Feb. 1997, pp. 310-316. [36] E. Koch, J. Rindfrey, and J. Zhao, “Copyright protection for m ultim edia data,” in Proc. International Conference Digital Media and Electronic Publishing, Leeds, U.K., Dec. 1994. [37] E. Koch and J. Zhao, “Towards robust and hidden image copyright labeling,” in Proc. IE E E Workshop Non-Linear Signal and Image Processing, Neos M armaros, Thessaloniki, Greece, June 1995, pp. 452-455. 184 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [38] D. K undur and D. Hatzinakos, “A robust digital image waterm arking m ethod using wavelet-based fusion,” in Proc. IE E E International Conference Image Processing (ICIP), Santa Barbara, CA, Oct. 1997, vol. 1, pp. 544-547. [39] D. K undur and D. Hatzinakos, “Digital waterm arking for telltale tam per proofing and authentication,” Proc. IE E E , vol. 87, pp. 1167-1180, July 1999. [40] M. K utter, “W aterm arking resisting to translation, rotation, and scaling,” in Proc. of SP IE M ultimedia System s and Applications, Boston, MA, Nov. 1998, vol. 3528, pp. 423-431. [41] M. K utter, S. K. Bhattacharjee, and T. Ebrahim i, “Towards second generation water m arking schemes,” in Proceedings 6th International Conference on Image Processing (IC IP ’99), Kobe, Japan, Oct. 1999, vol. 1, pp. 320-323. [42] M. K utter, F. Jordan, and F. Bossen, “Digital signature of color images using am plitude m odulation,” in Proc. of SP IE Storage and Retrieval for Image and Video Databases, San Jose, CA, Feb. 1997, vol. 3022, pp. 518-526. [43] M. Kwan, “Gifshuffle,” http://w w w .darkside.com .au/gifshuffie. [44] G. Langelaar, J. C. A. ven der Lubbe, and R. Lagendijk, “Robust labeling m ethod for copy protection of images,” in Proc. SPIE, Electronic Imaging, San Jose, CA, Feb. 1997. [45] G. C. Langelaar and R. L. Lagendijk, “O ptim al differential energy waterm arking of DCT encoded images and video,” IE E E Trans, on Image Processing, vol. 10, no. 1, p p . 148-158, Jan. 2001. [46] G. C. Langelaar, R. L. Lagendijk, and J. Biemond, “W aterm arking by DCT coef ficient removal: Statistical approach to optim al param eter settings,” in Proc. SP IE Photonics West, Security and W atermarking of M ultimedia Contents, San Jose, CA, Jan. 1999, vol. 3657. [47] F. W. Leberl, Radargrammetric Image Processing, A rtech House, Inc., 1990. [48] J. S. Lim, Two-Dimensional Signal and Image Processing, Prentice Hall, 1990. [49] C.-Y. Lin, M. Wu, M. L. Miller, J. A. Bloom, I. J. Cox, and Y. M. Lui, “Rotation, scale, and translation resilient public waterm arking for images,” in Proc. SP IE Secu rity and W atermarking of Multimedia Contents, San Jose, CA, Jan. 2000, vol. 3971, pp. 90-98. [50] S. Lin and D. J. Costello, Error Control Coding: Fundamentals and Applications, Prentice Hall, 1983. [51] R. Machado, “Ez stego,” http ://w w w .stego.com. [52] W. Macy and M. Holliman, “Quality evaluation of waterm arked video,” in Proc. SP IE Photonics West, Security and W atermarking of M ultimedia Contents, San Jose, CA, 2000, vol. 4675. 185 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [53] K. M atsui and K. Tanaka, “Video-steganography: How to secretly embed a signature in a picture,” J. Interactive M ultimedia Association Intellectual Property Project, vol. 1, no. 1, pp. 187-205, May 1994. [54] J O ’Ruanaidh and T. Pun, “R otation, scale and translation invariant digital image w aterm arking,” in Proc. IE E E International Conference Image Processing, Santa Barbara, CA, Oct. 1997, pp. 536-539. [55] W. B. Pennebaker and J. L. Mitchell, JPEG : Still Image Data Compression Standard, New York: Van N ostrand, 1993. [56] S. Pereira, J. J. O ’Ruanaidh, F. Deguillaume, G. Csurka, and T. Pun, “Tem plate based recovery of fourier-based w aterm arks using log-polar and log-log m aps,” in Proc. IE E E Multimedia System s 99, International Conference on M ultimedia Com puting and Systems, Florence, Italy, June 1999, pp. 870-874. [57] S. Pereira and T. Pun, “Fast robust tem plate m atching for affine resistant image wa term arking,” in International Workshop on Inform ation Hiding, Dresden, Germany, 1999. [58] H. A. Peterson, A. J. Ahum ada, and A. B. W atson, “An improved detection model for DCT coefficient quantization,” in Proc. SPIE, H um an Vision, Visual Processing, and Digital Display, Bellingham, WA, 1993, vol. 1913, pp. 191-201. [59] F. A. P. Petitcolas, R. J. Anderson, and M. G. Kuhn, “Attacks on copyright m arking system s,” in Information Hiding, Second International Workshop, Portland, Oregon, April 1998, pp. 219-239. [60] I. Pitas, “A m ethod for signature casting on digital images,” in Proc. IE E E Interna tional Conference on Image Processing (ICIP), Lausanne, Switzerland, Sep. 1996. [61] A. Piva, M. Barni, F. Bartolini, and V. Cappellini, “D CT-based w aterm ark recovering w ithout resorting to the uncorrupted original image,” in Proc. IE E E International Conference on Image Processing, Santa B arbara, CA, July 1997, vol. 1, pp. 520-523. [62] C. Podilchuk and W. Zeng, “Perceptual waterm arking of still images,” in Proc. of Workshop Multimedia Signal Processing, Princeton, N J, June 1997. [63] C. I. Podilchuk and W. Zeng, “W aterm arking of the JP E G bit-stream ,” in Proc. International Conference Image Science, Systems and Technology, Las Vegas, NV, June 1997. [64] C. I. Podilchuk and W enjun Zeng, “Im age-adaptive w aterm arking using visual m od els,” IE E E Journal on Selected Areas in Communications, vol. 16, no. 4, May 1998. [65] R. A. Roberts and C. T. Mullis, Digital Signal Processing, Addison Wesley, 1987. [6 6 ] P. M. J. Rongen, M. J. J. J. B. Maes, and C. W. A. M. van Overveld, “Digital image waterm arking by salient point m odification,” in Proc. SP IE Photonics West, Security and Watermarking of Multimedia Contents, San Jose, CA, Jan. 1999, vol. 3657. 186 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [67] A. Said and W. A. Pearlm an, “A new, fast, and efficient image codec based on set partitioning in hierarchical trees,” IE E E Trans, on Circuits and Systems for Video Technology, vol. 6 , no. 3, pp. 243-250, June 1996. [6 8] J. Shapiro, “Em bedded image coding using zerotrees of wavelet coefficients,” IE E E Trans, on Signal Processing, vol. 41, no. 12, pp. 3445-3462, Dec. 1993. [69] M.-Y. Shen and C.-C. J. Kuo, “A rtifact reduction in low bit rate wavelet coding with robust nonlinear filtering,” in Proc. IE E E Second Workshop on Multimedia Signal Processing. Redondo Beach, CA, Dec. 1998. [70] K. Solanki, N. Jacobsen, S. Chandrasekaran, U. Madhow, and B. S. M anjunath, “High-volume d a ta hiding in images: Introducing perceptual criteria into quantiza tion based em bedding,” in IE E E International Conference on Acoustics, Speech, and Signal Processing, Orlando, FI, May 2002. [71] H. Stark and J. W. Woods, Probability, Random Processes and Estim ation Theory for Engineers, Prentice Hall, 1994. [72] P.-C. Su and C.-C. J. Kuo, “A n efficient im plem entation of digital image wa term ark,” in International Symposium on Multimedia Inform ation Processing (IS- M IP99), Taipei, Taiwan, Dec. 1999. [73] P.-C. Su and C.-C. J. Kuo, “An image waterm arking scheme to resist generalized geometrical transform ations,” in Proc. SP IE Photonics East, Boston, MA, Nov. 2000. [74] P.-C. Su and C.-C. J. Kuo, “Spatial-frequency composite w aterm arking for digital image copyright protection,” in Proc. SP IE Photonics West, Security and W ater marking of M ultimedia Contents, San Jose, CA, Jan. 2000. [75] P.-C. Su and C.-C. J. Kuo, “Synchronized detection of the block-based w aterm ark w ith invisible grid embedding,” in Proc. SP IE Photonics West, San Jose, CA, Jan. 2001 . [76] P.-C. Su and C.-C. J. Kuo, “Inform ation embedding in JPEG -2000 compressed im ages,” in IEEE International Symposium on Circuits and System s (ISC AS), Bangkok, Thailand, May 2003. [77] P.-C. Su, H.-J. Wang, and C.-C. J. Kuo, “Blind digital w aterm arking for cartoon and map images,” in Proc. SP IE Photonics West, Security and W atermarking of Multimedia Contents, San Jose, CA, Jan. 1999, vol. 3657. [78] P.-C. Su, H.-J. Wang, and C.-C. J. Kuo, “Digital image w aterm arking in regions of interest,” in Proc. P IC S 52nd A nnual Conference, Savannah, GA, April 1999. [79] P.-C. Su, H.-J. Wang, and C.-C. J. Kuo, “Digital w aterm arking on EB C O T com pressed images,” in Proc. S P IE ’ s 4fth Annual Meeting, Denver, CO, July 1999. [80] P.-C. Su, H.-J. Wang, and C.-C. J. Kuo, “An integrated approach to image water m arking and JPEG-2000 compression,” Journal of V L SI Signal Processing, vol. 27, no. 1/2, pp. 35-53, Feb. 2001. 187 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [81] M. D. Swanson, B. Zhu, and A, H. Tewfik, “Transparent robust image waterm ark ing,” in Proc. IE E E International Conference on Image Processing (ICIP), Lausanne, Switzerland, Sep. 1996. [82] M. D. Swanson, B. Zhu, and A. H. Tewfik, “M ultiresolution scene-based video water m arking using perceptual models,” IE E E Journal Selected Areas in Communications, vol. 16, pp. 540-550, May 1998. [83] K. Tanaka, Y. Nakam ura, and K. M atsui, “Em bedding secret inform ation into a dithered multi-level image,” in Proc. IE E E M ilitary Communications Conference, Monterey, CA, 1990, pp. 216-220. [84] D. Taubm an, “High performance scalable image compression w ith EB C O T,” IEEE Trans, on Image Processing, vol. 9, no. 7, pp. 1158-1170, July 2000. [85] D. Taubm an and A. Zakhor, “M ultirate 3-D subband coding of video,” IEEE Trans, on Image Processing, vol. 3, no. 5, pp. 572-588, Sep. 1994. [86] A. Z. Tirkel, G. A. Rankin, R. G. van Schyndel, W. J. Ho, N. Mee, and C. F. Osborne, “Electronic w aterm ark,” in Digital Image Computing, Technology and Applications (DICTA ’93), Sidney, Australia, 1993, pp. 666-673. [87] R. G. van Schyndel, A. Z. Tirkel, and C. F. Osborne, “A digital w aterm ark,” in Proc. IE E E International Conference on Image Processing (IC IP), Austin, TX, 1994, vol. 1, pp. 86-90. [88] G. Voyatzis and I. Pitas, “Applications of toral autom orphism s in image w aterm ark ing,” in Proc. IE E E International Conference on Image Processing (ICIP), Lausanne, Switzerland, Sep. 1996. [89] H.-J. Wang, P.-C. Su, and C.-C. J. Kuo, “Wavelet based blind waterm ark retrieval technique,” in Proc. SP IE Photonics East - Symposium on Voice, Video, and Data Communications, Boston, MA, Nov. 1998, vol. 3528. [90] H.-J. Wang, P.-C. Su, and C.-C. J. Kuo, “W avelet-based digital image w aterm arking,” Journal of Optics Express, vol. 3, no. 12, pp. 491-496, Dec. 1998. [91] A. B. W atson, “D C T quantization m atrices visually optim ized for individual images,” in Proc. SPIE, Human Vision, Visual Processing, and Digital Display, Bellingham, WA, 1993, vol. 1913, pp. 202-216. [92] A. B. W atson, G. Y. Yang, A. Solomon, and J. Villasenor, “Visibility of wavelet quantization noise,” IE E E Trans, on Image Processing, vol. 6, no. 8, Aug. 1997. [93] R. B. Wolfgang and E. J. Delp, “A w aterm ark for digital images,” in Proc. IE E E International Conference on Image Processing (ICIP), Lausanne, Switzerland, July 1996, pp. 219-222. [94] R. B. Wolfgang and E. J. Delp, “Fragile waterm arking using the VW 2D w aterm ark,” in Proc. SP IE International Conference Security and W atermarking of Multimedia Contents, San Jose, CA, Jan. 1999, vol. 3657, pp. 204-213. 188 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [95] P. Wong, “A public key waterm ark for image verification and authentication,” in Proc. IE E E International Conference on Image Processing (ICIP), Chicago, IL, Oct. 1998, vol. 1, pp. 455-459. [96] M. Wu, M. L. Miller, J. A. Bloom, and I. J. Cox, “A rotation, scale, and transla tion resilient public w aterm ark,” in Proc. IE E E International Conference Acoustics, Speech, and Signal Processing, Phoenix, AZ, M arch 1999. [97] X. G. Xia, C. G. Boncelet, and G. R. Arce, “A m ultiresolution w aterm ark for dig ital images,” in Proc. IE E E International Conference on Image Processing, Santa Barbara, CA, July 1997, vol. 1. [98] M. Yeung and F. M intzer, “An invisible waterm arking technique for image verifica tion,” in Proc. IE E E International Conference on Image Processing (ICIP), Santa Barbara, CA, July 1997, pp. 680-683. [99] W. Zhu, Z. Xiong, and Y.-Q. Zhang, “M ultiresolution w aterm arking for images and video: A unified approach,” in Proc. IE E E International Conference Image Processing (ICIP), Chicago, IL, Oct. 1998, vol. 1, pp. 465-468. 189 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Advanced video coding techniques for Internet streaming and DVB applications
PDF
Color processing and rate control for storage and transmission of digital image and video
PDF
Fine-grained control of security services
PDF
Contributions to coding techniques for wireless multimedia communication
PDF
Design and applications of MPEG video markup language (MPML)
PDF
Contributions to image and video coding for reliable and secure communications
PDF
Intelligent systems for video analysis and access over the Internet
PDF
Design and performance analysis of low complexity encoding algorithm for H.264 /AVC
PDF
Computer-aided lesion detection in positron emission tomography: A signal subspace fitting approach
PDF
Design and analysis of server scheduling for video -on -demand systems
PDF
A low-complexity construction of algebraic geometric codes better than the Gilbert -Varshamov bound
PDF
Adaptive video transmission over wireless fading channel
PDF
Error resilient techniques for robust video transmission
PDF
A comparative study of network simulators: NS and OPNET
PDF
Energy and time efficient designs for digital signal processing kernels on FPGAs
PDF
Content -based video analysis, indexing and representation using multimodal information
PDF
Design and analysis of collusion-resistant fingerprinting systems
PDF
Contributions to content -based image retrieval
PDF
Complexity -distortion tradeoffs in image and video compression
PDF
Efficient acoustic noise suppression for audio signals
Asset Metadata
Creator
Su, Po-Chyi (author)
Core Title
Information hiding in digital images: Watermarking and steganography
School
Graduate School
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Computer Science,engineering, electronics and electrical,OAI-PMH Harvest
Language
English
Contributor
Digitized by ProQuest
(provenance)
Advisor
Kuo, C.-C. Jay (
committee chair
), Huang, Ming-Deh (
committee member
), Ortega, Antonio (
committee member
)
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c16-280181
Unique identifier
UC11340116
Identifier
3103970.pdf (filename),usctheses-c16-280181 (legacy record id)
Legacy Identifier
3103970.pdf
Dmrecord
280181
Document Type
Dissertation
Rights
Su, Po-Chyi
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Tags
engineering, electronics and electrical