Close
The page header's logo
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected 
Invert selection
Deselect all
Deselect all
 Click here to refresh results
 Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Block-based image steganalysis: algorithm and performance evaluation
(USC Thesis Other) 

Block-based image steganalysis: algorithm and performance evaluation

doctype icon
play button
PDF
 Download
 Share
 Open document
 Flip pages
 More
 Download a page range
 Download transcript
Copy asset link
Request this asset
Transcript (if available)
Content BLOCK-BASED IMAGE STEGANALYSIS: ALGORITHM AND PERFORMANCE EVALUATION by SeongHo Cho A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) August 2012 Copyright 2012 SeongHo Cho Dedication This dissertation is dedicated to my family. My wife, Hayeun Lee, always supported me with love and trust. My father, Hanhyung Cho, inspired me with visions and gave me passion to go through Ph.D. process. My mother, Aeran Park, always encouraged me with trust in whatever circumstances. ii Acknowledgments I would like to thank my advisor, Dr. C.-C. Jay Kuo, for his guidance throughout my Ph.D. process. Without his patience, advice, and encouragement, this process might not have been possible. He gave me directions when I can not see further, and he gave me courage when I was frustrated. I also want to express my gratitude to my research group members, Dr. Byung-Ho Cha and Martin Gawecki. Dr. Cha was my mentor, who taught me knowledge about my research eld and gave me advice about how to get insight for my research. Martin always helped me greatly with research discussion and paper editing. iii Table of Contents Dedication ii Acknowledgments iii List of Tables vii List of Figures ix Abstract xi Chapter 1: Introduction 1 1.1 Signicance of the Research . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Review of Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Contributions of the Research . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.4 Organization of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . 10 Chapter 2: Research Background 11 2.1 Steganography and Steganalysis . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 Previous Work in Image Steganography . . . . . . . . . . . . . . . . . . . 14 2.3 Previous Work in Image Steganalysis . . . . . . . . . . . . . . . . . . . . . 15 2.4 Performance of Blind Steganalysis . . . . . . . . . . . . . . . . . . . . . . 18 2.4.1 Binary Classier . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4.2 Multi-Classier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Chapter 3: Block-Based Image Steganalysis: Algorithm and Perfor- mance Evaluation 24 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2 Block-Based Image Steganalysis . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2.2 Training Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2.3 Testing Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2.4 Comparison with Traditional Steganalysis . . . . . . . . . . . . . . 36 3.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.3.1 Binary Classier with Block-Based Image Steganalysis . . . . . . . 37 3.3.2 Multi-Classier with Block-Based Image Steganalysis . . . . . . . 41 iv 3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Chapter 4: Performance Study on Block-Based Image Steganalysis 46 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.2 Analysis of Block Size, Block Number, and Block Overlapping Eects . . 48 4.2.1 The Relationship between Block Size and Block Number . . . . . . 48 4.2.2 Analysis of Block Size Eect . . . . . . . . . . . . . . . . . . . . . 48 4.2.3 The Relationship between Block Decision Accuracy and Image Decision Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.2.4 Analysis of Block Number Eect . . . . . . . . . . . . . . . . . . . 55 4.2.5 Analysis of Block Overlapping Eect . . . . . . . . . . . . . . . . . 58 4.3 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.3.1 Experimental Set-up . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.3.2 Performance Study of Block-Based Image Steganalysis . . . . . . . 61 4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Chapter 5: Decision Fusion for Block-Based Image Steganalysis 68 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5.2 Decision Fusion Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.2.1 Decision Fusion Levels . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.2.2 Decision Fusion Topologies . . . . . . . . . . . . . . . . . . . . . . 71 5.3 Decision Fusion for Block-Based Image Steganalysis . . . . . . . . . . . . 73 5.3.1 Weighted Majority Voting . . . . . . . . . . . . . . . . . . . . . . . 73 5.3.2 Bayesian Decision Fusion . . . . . . . . . . . . . . . . . . . . . . . 76 5.3.3 Dempster-Shafer Theory of Evidence . . . . . . . . . . . . . . . . . 79 5.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.4.1 Experimental Set-up . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.4.2 Weighted Majority Voting . . . . . . . . . . . . . . . . . . . . . . . 83 5.4.3 Performance Comparison . . . . . . . . . . . . . . . . . . . . . . . 84 5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Chapter 6: Content-Dependent Feature Selection for Block-Based Image Steganalysis 86 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 6.2 Content-Dependent Feature Selection . . . . . . . . . . . . . . . . . . . . . 87 6.2.1 Dierent Block Types for Block-Based Image Steganalysis . . . . . 87 6.2.2 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.2.3 Measuring Feature Discriminatory Power . . . . . . . . . . . . . . 92 6.2.4 Feature Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 6.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 6.3.1 Experimental Setting . . . . . . . . . . . . . . . . . . . . . . . . . . 94 6.3.2 Content-Independent and Content-Dependent Feature Selection . . 95 6.3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 6.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 v Chapter 7: Conclusion and Future Work 98 7.1 Summary of the Research . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 7.2 Future Research Directions . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Bibliography 103 vi List of Tables 2.1 Confusion matrix of a binary classier. . . . . . . . . . . . . . . . . . . . . 21 2.2 Confusion matrix of a multi-classier. . . . . . . . . . . . . . . . . . . . . 23 3.1 Performance comparison of Pevny's method and the proposed block-based image steganalysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.2 Decision Reliability with Voting Dierence . . . . . . . . . . . . . . . . . . 40 3.3 Confusion matrices of Pevny's method and the proposed method with BPC 0:3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.4 Confusion matrices of Pevny's method and the proposed method with BPC 0:2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.1 The standard deviations of blockiness (B 1 , B 2 ) features with dierent block sizes (BB). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.2 The block number (N) in 512x512 image with dierent block sizes (BB) and step sizes (S). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.3 The average block decision accuracy (p) with dierent block sizes (BB). 62 4.4 The average image decision accuracy (P ) for non-overlapping block decom- position with xed image size 384 512. . . . . . . . . . . . . . . . . . . . 63 4.5 The average image decision accuracy (P ) with dierent block sizes (BB) and dierent step sizes (S). . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.6 Detection accuracy with dierent number of block classes. . . . . . . . . . 65 4.7 The performance improvement of block-based image steganalysis with dif- ferent classiers for MBS (8 block classes). . . . . . . . . . . . . . . . . . . 66 4.8 The performance improvement of block-based image steganalysis with dif- ferent classiers for MBS (16 block classes). . . . . . . . . . . . . . . . . . 66 vii 4.9 The performance improvement of block-based image steganalysis with dif- ferent classiers for MBS (8 block classes). . . . . . . . . . . . . . . . . . . 66 4.10 The performance improvement of block-based image steganalysis with dif- ferent classiers for MBS (16 block classes). . . . . . . . . . . . . . . . . . 67 5.1 Performance comparison of block-based image steganalysis with dierent threshold values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.2 Performance comparison of block-based image steganalysis with dierent adaptive weighting methods. . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.3 Performance comparison of block-based image steganalysis with dierent fusion methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 6.1 Index table of 274 merged feature components. . . . . . . . . . . . . . . . 94 6.2 Feature ranking with top 5 features for 4 block types . . . . . . . . . . . . 94 viii List of Figures 1.1 The digital revolution with the Internet: wide spread use of smartphones and tablet PCs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Practical application of steganography technology proposed by Fujitsu Labs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 (a) Watermarking and (b) Steganography. . . . . . . . . . . . . . . . . . . 12 2.2 Comparison of feature extraction from one image of size 96 96 and from image blocks of size 32 32. . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3 Calibration process for image block of size 32 32. . . . . . . . . . . . . . 19 2.4 The receiver operating characteristic (ROC) curve. . . . . . . . . . . . . . 20 2.5 Two types of errors in statistics. . . . . . . . . . . . . . . . . . . . . . . . 22 3.1 The block-based image steganalysis system. . . . . . . . . . . . . . . . . . 28 3.2 Random sampling: (a) sampled blocks in cover image and (b) correspond- ing sampled blocks in stego image. . . . . . . . . . . . . . . . . . . . . . . 30 3.3 Codewords computed as the average of feature vectors for each class in 2-dimensional space. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.4 Sample images from the Uncompressed Colour Image Database (UCID) and the INRIA Holidays dataset. . . . . . . . . . . . . . . . . . . . . . . . 38 3.5 The performance comparison of frame-based approach (Pevny's method) and block-based approach (the proposed block-based image steganalysis) for dierent steganographic algorithms: (a) Model-based Steganography and (b) Perturbed Quantization. . . . . . . . . . . . . . . . . . . . . . . . 45 4.1 The relationship between block numberN and block sizeB 2 for an image of size 512 512, which follows a curve of form N = (512 2 )=(B 2 ). . . . . . 49 ix 4.2 The image decision accuracy (P ) as a function of the block number (N) parameterized by the block decision accuracy p = 51%; 55%; 60%. . . . . . 57 4.3 Illustration of the overlap size (O) and the step size (S) for the overlapping block case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.4 The block number (N) in a 512512 image for dierent block sizes (BB) and step sizes (S). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.5 The image decision accuracy, P , as a function of the block number, N. . . 62 5.1 Overview of 5 dierent fusion levels [41]. . . . . . . . . . . . . . . . . . . . 69 5.2 Adaptive weighting method. . . . . . . . . . . . . . . . . . . . . . . . . . . 77 6.1 The 4 codewords representing 4 block types in the 2-D feature space derived from the principal component analysis. . . . . . . . . . . . . . . . 89 6.2 Sample image from UCID image database. . . . . . . . . . . . . . . . . . . 90 6.3 4 dierent block types based on steganalysis features (Scheme B). . . . . . 91 6.4 Illustration of the distribution of sample feature component values obtained from cover images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 6.5 Performance Comparison of Content-Dependent and Content-Independent Feature Selection Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . 96 x Abstract Traditional image steganalysis techniques are conducted with respect to the entire image. In this work, we aim to dierentiate a stego image from its cover image based on steganal- ysis results of decomposed image blocks. We also target at the design of a multi-classier which classies stego images depending on their steganographic algorithms. As a natu- ral image often consists of heterogeneous regions, its decomposition will lead to smaller image blocks, each of which is more homogeneous. We classify these image blocks into multiple classes and nd a classier for each class to decide whether a block is from a cover or stego image. Consequently, the steganalysis of the whole image can be con- ducted by fusing steganalysis results of all image blocks through a decision fusion process. Experimental results will be given to show the advantage of the proposed block-based image steganalysis for both binary classier and multi-classier. In addition, performance study on block-based image steganalysis in terms of block sizes and block numbers will be given in this work. First, we analyze the dependence of the steganalysis performance on one of these two factors, and show that a larger block size and a larger block number will lead to better steganalysis performance. Our study is veried by experimental results. For a given test image, there exists a trade-o between the block size and the block number. To exploit both eectively, we propose to use overlapping blocks to improve the steganalysis performance furthermore. Moreover, additional performance improvement of block-based image steganalysis with dierent number of classes and dierent classiers will be shown with experimental results. xi Decision fusion for block-based image steganalysis will be discussed. As multiple block decisions are obtained from each image, decision fusion will play a crucial role in combining all the block decisions together to make a nal decision for a given image. Among decision fusion techniques at dierent levels with dierent topologies, decision level fusion with parallel topology will be used for block-based image steganalysis. In addition, the importance of block decision result will be considered for decision fusion to improve the steganalysis performance. Experimental results with dierent decision level fusion techniques for block-based image steganalysis will be presented. Content-dependent feature selection for block-based image steganalysis will be pro- posed to reduce computational complexity with a signicantly smaller number of fea- tures. Depending on the characteristic of block type, features with high discriminatory power will be selected for each block type. Several approaches to measure feature discrim- inatory power will be introduced. Finally, experimental result, which shows performance improvement using content-dependent feature selection, will be presented. xii Chapter 1 Introduction 1.1 Signicance of the Research The digital technology together with the Internet has made production, delivery (com- munication), and consumption of digital contents in the form of multimedia easier than ever before. Recently, the wide spread use of smartphones including iPhones and Black- Berrys, tablet PCs like iPads, and netbooks has made this trend even more evident. Many people can access the multimedia data including audio, image, and video accord- ing to their personal tastes whenever they want and wherever they are. With the rapid increase of multimedia in the Internet, the multimedia security problem becomes increas- ingly important. The copyrights protection of digital multimedia data has also become signicantly important for content providers in both music and lm industry. In the meanwhile, the communication of secret messages using digital multimedia data become increasingly popular. This can be condential information for intelligence agencies such as Federal Bureau of Investigation (FBI) or Central Intelligence Agency (CIA). It is also well known that al-Qaeda terrorists in 9=11 attacks used multimedia data to communi- cate secret messages between al-Qaeda members [4]. Steganography embeds secret messages into the multimedia data so that no one except the intended recipients can detect the presence of secret messages. In order words, steganography can be considered as the art and science of hiding communication [46]. In contrast, steganalysis tries to detect the presence of secret messages hidden in the mul- timedia data and eventually extract the secret messages. The theoretical backgrounds from many dierent elds are required for steganography and steganalysis: signal pro- cessing, statistics, pattern recognition, machine learning, information theory, probability 1 Figure 1.1: The digital revolution with the Internet: wide spread use of smartphones and tablet PCs. theory, detection theory, communication, mathematics, and cryptography. Similar to other multimedia security techniques like cryptography and cryptanalysis, steganogra- phy and steganalysis have been developed side by side. After new breakthroughs were made in steganography, new steganalysis methods were developed to break these systems throughout the development of these technologies. Steganography technology is expected to play an important role in protecting digital copyrights of multimedia, especially music and lm. Regarding lm industry, Motion Picture Association of America (MPAA) estimated in 2006 that US movie studios are losing about 6:1 billion dollars annually in global wholesale revenue due to piracy of digital movie contents. Although Advanced Access Content System (AACS) based on watermarking technology was introduced as a digital rights management (DRM) system for next generation DVD format Blu-ray disks, many hackers were still successful enough to access these digital movie contents. As steganography technology even conceals the presence of a secret message such as copyrights information, it has the potential to be used in a more secure manner than existing watermarking technology. Steganography uses the multimedia data including audio, image, and video to embed secret messages. Among these dierent formats of multimedia for steganography, images 2 have been most popular media for data hiding due to appropriate le sizes, simple proce- dure for embedding, and numerous applications. There are various applications for image steganography throughout dierent industries, including embedding copyright informa- tion in professional images, personal information in photographs of smart IDs (identity cards), and patient information in medical images [4]. In addition, image steganography can be used for enhancing robustness of image search engines, safe communication of condential data, and video-audio synchronization as well. One real-world application of steganography is proposed by a Japanese company, Fujitsu Laboratories, in 2008 [4]. They used steganography technology to embed invisi- ble codes into images in printed materials. They took advantage of the fact that digital cameras are sensitive to yellow hue in the image while human eyes are not. This tech- nology changed the yellow hue so that special pattern is created and this pattern is only detected by digital cameras. Any digital camera in cellular phones can decode this pat- tern after installing one small Java application program. This technology can be used instead of widely-used barcodes. Another advantage of this technology is that secret messages can be embedded as normal printing process, which makes this technology cost-eective than RFID tags. The stagnography technology used for this application has high potential because it links the printed materials with the Internet. In other words, this enables the enhance- ment of printed materials with supplementary information from the Internet. In fact, this is attractive characteristic for advertisements because customers can always nd additional information in the Internet about products advertised in printed materials including newspapers, magazines, yers, and pamphlets. For example, if someone wants to advertise a restaurant, steganography technology can be used to embed the website link of the restaurant into an image of the restaurant in magazines. This example is shown in Fig. 1.2. Then, customers can take a picture of the image and retrieve additional information about the restaurant from the Internet. It is also possible to provide maps and directions to the restaurant, complete menus, 3 online coupons, pictures of famous dishes and interiors of the restaurant, video introduc- tions from chefs or managers of restaurants, and online ratings and reviews from other customers. Figure 1.2: Practical application of steganography technology proposed by Fujitsu Labs. With the advance of image steganography, many steganalysis methods have been developed to deal with new breakthroughs in image steganography. In the early stage, it is assumed that prior information about steganographic algorithms used to embed a secret message into images is available. This is called targeted steganalysis scenario. However, more attentions have been paid to a more practical problem, when no infor- mation is known about steganographic algorithms for given test images. This is called blind steganalysis scenario. Pioneering works in blind steganalysis begun in the early 2000s by Memom [1] and Farid [40]. As there is no information available related with steganographic embedding in given test images, blind steganalysis is considered to be much more challenging problem than targeted steganalysis. In order to solve blind steganalysis problem, cover images and corresponding stego images created using dierent steganographic algorithms can be used in the training process to design appropriate classier for unknown test images. However, blind steganalysis problem gets even more challenging when steganographic algorithms are not known at all for the training process as well [16]. In addition, the classier obtained from the training process is dependent not only on steganographic algorithms 4 but also on embedding rate, embedding order, image size, and image characteristics, which makes this problem even more complicated. 1.2 Review of Previous Work Previous research in this eld has focused on extracting features from images for the purpose of steganalysis [15], [53], [43]. The coecients in the DCT domain were used to extract features. DCT-based features were proposed in [15] for steganalysis, which exploit the fact that the inter-block dependency between neighboring blocks is often aected by steganographic algorithms. Shi et al. [53] considered the dierences between absolute values of neighboring DCT coecients as a Markov process and proposed 324 features accordingly. This feature set was motivated by the observation that the intra- block correlation among DCT coecients within the same block can be aected by steganographic embedding. Pevny and Fridrich [43] proposed a set of 274 merged features by combining the Markov features and DCT features together. This merged feature set provides one of the-state-of-the-art feature sets for steganalysis. The number of features was increased to achieve better steganalysis performance by researchers in recent years. Chen et al. proposed the updated version of Markov features (486 features) in 2008 [5]. The original Markov features proposed in [53] mainly utilize intrablock correlation among JPEG coecients. However, the updated Markov feature set considers both intrablock and interblock correlations among JPEG coecients to improve the steganalysis performance. As the calibration process used for merged DCT and Markov features is not useful for recent steganographic embedding methods such as YASS, Kodovsky [31] proposed a new merged feature set using the Cartesian calibration. This 548-dimensional feature set is shown to have better steganalysis performance than merged features. In addition, Pevn y recently proposed the subtractive pixel adjacency model (SPAM) feature set [42], which uses higher order Markov models. This feature set (686 features) is 5 based on modelling dierences between neighboring pixels in the spatial domain. More recently, Kodovsky [32] merged the SPAM feature set and the Cartesian calibration feature set to create a cross-domain feature set, which examines features from the spatial domain and the DCT domain at the same time. This feature set, which consists of 1234 features, was shown to have better performance for the YASS steganographic algorithm. Many steganographic embedding algorithms are block-based; namely, embedding the secret message into each 8 8 DCT block separately. Yang et al. [58] performed an information-theoretic steganalysis on the block-structured stego image. They provided an approximation of the relative entropy between probability distributions of the cover and the stego images. The relative entropy increases linearly withN=K1, whereN,K represent the total number of samples (pixels) and the block size, respectively. A larger relative entropy means a higher detection probability of the stego image. Although Yang et al. [58] studied block-structured stego images, their work is still a frame-based approach from our viewpoint since only one set of features is extracted from an image. Note that there is another dierence between Yang's work and our proposed "block- based image steganalysis". Our method decomposes an image into smaller homogeneous image units and treats each image unit as a basic unit for steganalysis. Thus, there are several 8 8 DCT blocks within each smaller image unit. This is because we need to extract features which measures inter-block dependencies between neighboring 8 8 DCT blocks within each image unit. In contrast, there is only one image unit which consists of numerous 8 8 DCT blocks in Wang's work. The more practical and realistic scenario for blind steganalysis is when stego images are created with dierent steganographic algorithms. In addition to deciding whether a given test image is a cover or stego image, blind steganalysis also needs to determine which steganographic algorithm was used to embed a secret message in the case of stego images. This is called a \multi-classier", which is a more dicult task in terms of obtaining correct classication results as opposed to a binary classier. In a multi- classier scenario, blind steganalysis rst attempts to decide whether the given test image 6 is a cover or stego image. In addition, if the given test image is determined to be a stego image, this classier has to decide which steganographic algorithm was used to embed the secret message. Pevny et al. [43] used 274 merged features for a multi-classier scenario, where blind steganalysis classies test images into 7 dierent classes: the cover image and stego images created from 6 dierent steganographic algorithms. A multi-classier was con- structed using several binary classiers together. In order to classify the test images into 7 dierent classes, they used the \max-wins" strategy, which employs binary SVM clas- siers for every pair of classes. Their experimental results showed that their approach was eective in classifying test images into 7 dierent classes. 1.3 Contributions of the Research The block-based image steganalysis is studied in depth in this research. The main contributions of this work are summarized as follows. 1. Many previous works followed the frame-based framework, which tried to extract features from images for steganalysis purpose. In this work, the block-based image steganalysis is proposed, which is considered to be a dierent framework to deal with steganalysis problem. This research direction looks promising, but no eort was made towards this direction in the past. To the best of our knowledge, this is the rst work to use a block-based approach for steganalysis in order to exploit the homogeneous characteristics of image blocks for steganalysis. For this purpose, dierent classiers are adopted for blocks of dierent types in test images. 2. The block-based approach provides much better detection accuracy results than previous works in steganalysis for both binary classier and multi-classier. Many frame- based approaches tried to nd better feature sets for blind steganalysis to improve the detection accuracy performance. However, the performance improvement is possible for 7 block-based approach without nding a better feature set or increasing the number of features. 3. Block-based image steganalysis gives us decision reliability information even when only one unknown image is given, which was not possible with previous works. In fact, this is an important problem for a practical application of steganalysis. We need to know how accurate (reliable) our decision is even when only one test image is given. This is possible with the block-based approach, which exploits rich information from given test image by decomposing it into smaller homogeneous blocks to make a decision as accurate as possible. 4. The block-based image steganalysis provides a universal framework which can be used to improve the performance for any type of feature set. Instead of extracting features from given images, the block-based approach decomposes images into smaller homogeneous blocks, where features are extracted for each block. Regardless of dierent feature sets, this methodology can be applied to all the frame-based approaches. 5. The block-based approach takes advantage of rich information of images by extracting multiple feature sets from smaller homogeneous blocks. While traditional frame-based approach only extracts single feature set from each image, block-based approach enables us to extract multiple feature sets for each image by extracting one feature set for each block. As we have the advantage of having more block decisions from a single image, we can make cover or stego decision of given image more accurate. 6. For block-based image steganalysis, multiple classiers can be adopted for blocks of dierent types for a test image. This is dierent from the frame-based approach, where only one classier is obtained after the training process. The block-based approach designs multiple classiers tailored to each block type. The content-adaptive classier for each block type provides more accurate steganalysis performance because each classier can focus more on the feature change due to steganographic embedding rather than the feature variation between dierent block types. 8 7. For the block-based image steganalysis, we also proposed to give dierent weights depending on block classes and image types, which improves the decision accuracy of the majority voting rule. As a result, the performance of the block-based approach can be further enhanced. 8. The block-based approach exploits the similarity between blocks from images in the training set and blocks from given test image. Although there are a large number of images in the training set, it might be dicult to nd an image that is similar to given test image. However, if we divide given test image into smaller homogeneous blocks, then smaller blocks are more likely to be similar to blocks from images in the training set. Thus, the block-based approach can design more accurate classier by observing the feature change of similar blocks from images in the training set after steganographic embedding. 9. The block-based approach can be useful for partial embedding scenario when only small parts of images are embedded with secret messages. The frame-based approach is likely to fail in this sophisticated scenario because features are extracted from the entire image without considering characteristics of smaller blocks. However, the block- based approach will be able to detect even the slight change of feature values after steganographic embedding within each block because it examines features from each block separately. Thus, the block-based approach can dierentiate stego images from cover images even when only small parts of images are embedded with secret messages. 10. We are able to provide the performance analysis of block-based image steganalysis as a function of the block size and the block number. This will enables us to understand the performance improvement with the block-based approach better. First, we analyze the dependence of steganalysis performance on each individual factor, and show that a larger block size and a larger block number will lead to better steganalysis performance. Our analysis was also veried by experimental results. 11. We proposed to use overlapping blocks for block-based image steganalysis, which improve steganalysis performance furthermore. For a given test image, there exists a 9 trade-o between the block size and the block number. To exploit both eectively, overlapping blocks can be used to overcome the trade-o and provide not only large block size, but also large block number at the same time. 12. We also analyzed the in uence of dierent factors on overall performance of block- based image steganalysis: number of block classes, classier types, and weighting process for block decisions. The performance improvement of block-based image steganalysis with dierent factors is also shown with experimental results. This will give us the general idea about how to maximize the performance of block-based image steganalysis with dierent factors. 1.4 Organization of the Dissertation The rest of this dissertation is organized as follows. The background of this research including the model of steganography, perfect secrecy of steganography, previous works in image steganography and image steganalysis, performance of blind steganalysis is described in Chapter 2. Then, the block-based image steganalysis is proposed and per- formance improvement for binary classier and multi-classier is shown in Chapter 3. The performance study on block-based image steganalysis in terms of the block size, the block number, and the block overlapping is given in Chapter 4. In addition, the eect of block class number and classier is presented together with experimental results. Deci- sion fusion for block-based image steganalysis is given in Chapter 5, which is used to give importance for each block decision and combine all the block decisions together to make a nal decision of given image. Content-dependent feature selection is used to improve the performance of block-based image steganalysis with a signicantly smaller number of features in Chapter 6. Finally, concluding remarks and suggestions for the future work are given in Chapter 7. 10 Chapter 2 Research Background 2.1 Steganography and Steganalysis There are several research areas related with multimedia security problems within the eld of \information hiding" (or data hiding): watermarking, steganography, and cryp- tography. Both watermarking and steganography try to embed secret messages into multimedia data. However, there are dierences between watermarking and steganogra- phy. Watermarking is considered to be insecure when embedded watermarks are removed or replaced. In contrast, steganography is considered to be insecure only when the exis- tence of secret messages is detected. Another dierence is that watermarking tries to deliver the original multimedia data while the embedded watermarks usually protect the multimedia data [57]. On the other hand, steganography tries to convey secret messages, and the multimedia data itself is usually not important at all. In cryptography, the secret key is used to scramble the multimedia data so that it cannot be understood correctly without the secret key. Cryptography is considered to be insecure if attackers nd the secret key to access the multimedia data directly [57]. The model of steganography can be explained using the analogy from Simmons' \Prisoner's Problem" [54], [3]. In this scenario, two prisoners Alice and Bob are kept in a prison. They try to communicate to each other in order to discuss an escape plan from the prison. However, their letters (messages) are only allowed to be delivered by agents of a warden named Eve. Thus, Eve will monitor messages exchanged between Alice and Bob. Inevitably, Alice and Bob need to nd a way to convey secret messages about the escape plan through these innocuous messages in order to avoid suspicion from the warden Eve. 11 (a) (b) Figure 2.1: (a) Watermarking and (b) Steganography. There are several dierent strategies the warden Eve can take to prevent the secret communication between Alice and Bob [16]. First, Eve can just monitor exchanged mes- sages and try to nd out with suspicion whether there is any secret message about the escape plan. This is called passive warden scenario. Second, Eve can try to make com- munication between Alice and Bob impossible even though she allows them to exchange letters. Eve can distort the content of original letters so that secret messages are not correctly understood by recipients. This is called active warden scenario. Third, Eve can try to gure out how Alice and Bob are hiding secret messages inside innocuous letters. Then, she can intentionally modify secret messages in the letter using the same method to distract Alice and Bob. This is called malicious warden scenario. Among 3 dierent scenarios of wardens (adversaries), the passive warden scenario is generally assumed for steganography model [3]. Here, the wardens (or adversaries) are passive, who only monitor exchanged letters to check whether there is a secret message or not. In other words, the warden is not active enough to make secret communication impossible or distort secret messages to distract the sender and the recipient. 12 The security level of steganogrpahy system can be measured by relative entropy (KL divergence) [3], [10], [16], which is a concept from information theory. The rela- tive entropy measures a non-symmetric dierence between two probability distributions, which can be dened as D KL (P C jjP S ) = X x P C (x) log P C (x) P S (x) (2.1) whereP C andP S represent the distributions from cover object and stego object, respec- tively. Here, X represents possible values in the distributions from cover object and stego object. The smaller relative entropy value implies that steganography system is more secure. Thus, we can consider steganographic system as \perfectly secure" when D(P C jjP S ) = 0. Also, steganographic system is -secure if D(P C jjP S )<. Shannon also discussed the concept of \perfect secrecy" for cryptosystems in 1949 [52]. In cryptograhy, ciphertext is the result of encryption performed on plaintext [24]. Plaintext and ciphertext can be thought of as cover object and stego object in steganog- raphy. The perfect secrecy can be dened by the following condition [16], [3] Pr (mjc) = Pr(m); (2.2) where Pr(m) is a priori probability of plaintext message m and Pr (mjc) is a posteriori probability of plaintext m after observing the ciphertext c. Under the perfect secrecy condition, a posterior probability is always equal to a prior probability for all possible plaintexts and ciphertexts [52]. This means that observing the ciphertext does not add any information at all to estimate the probability of plaintext more accurately. As the ciphertext gives no clue about the plaintext, the knowledge of ciphertext is considered to be useless. In other words, this also implies that two distri- butions from plaintext and ciphertext are exactly the same. This is why this denition is parallel to the dention using the relative entropy above. 13 2.2 Previous Work in Image Steganography Many steganographic algorithms have been introduced to embed secret messages in images over the last decade: JP Hide & Seek, JSteg, F5 [59], OutGuess [45], model- based steganography [50], Steghide [20], Perturbed Quantization [18], MMx [28], and YASS [55]. In general, steganographic algorithms take on one of two strategies for embedding secret messages [19]: statistics preservation and distortion minimization. First, statistics-preserving steganogrpahy tries to preserve all the related statistics by keeping the histograms of stego images as similar as possible to cover images. Sec- ond, minimal distortion steganogrpahy focuses on minimizing the embedding distortion as small as possible. Here, we would like to brie y explain 4 dierent steganographic algorithms (F5 [59], OutGuess [45], model-based steganography [50], and Perturbed Quantization [18]), which will be used for experiments in later chapters. F5 algorithm [59] is one of classical steganographic algorithms proposed by Westfeld in 2001. F5 algorithm embeds message bits into randomly-chosen DCT coecients. The absolute values of DCT coecients are always decreased by one in a process called matrix encoding [59], [16]. This process minimizes the necessary number of changes with non- zero AC DCT coecients in cover images to embed secret messages. In addition, crucial characteristics of the histograms of DCT coecients are still preserved after embedding process. Provos presented an improved method for steganographic embedding called OutGuess in 2001 [45]. OutGuess algorithm introduces a new method to preserve the statistical properties of the cover image using probabilistic embedding and error-correcting codes. First, probabilistic embedding minimizes modications to the cover image by using a pseudo-random number generator. Second, error-correcting codes are used to increase the exibility in selecting bits without increasing the number of necessary changes. 14 Sallee proposed model-based steganography (MBS) [50] in 2003, which tries to nd a statistical model ts to histograms of DCT coecients. This algorithm is one of rep- resentative examples which use statistics-preserving strategy. After nding a statistical model, it tries to embed secret messages optimally following this statistical model. An important advantage of this method is that it preserves not only the global histogram of all DCT coecients but also the histograms of individual DCT modes [19]. The dis- advantage of this method is that assumed statistical model might not be able to model histograms of DCT coecients exactly. Fridrich et al. proposed Perturbed Quantization (PQ) in 2004 [17], [18], [16], which tries to minimize the embedding distortion using the wet paper code. In Perturbed Quan- tization, secret messages are embedded into cover images to create stego images during information-reducing process such as JPEG compression. During this process, the sender tries to nd possible locations for embedding, where it is expected to be \uncertain" after processing. In other words, the original values before processing in these candidate loca- tions should be dicult to estimate. For example, it is almost impossible to estimate the original values of DCT coecients accurately after JPEG compression. The wet paper code provides the sender selection rules to choose places for embedding based on side information from cover images before going through information-reducing process. For JPEG compression, this side information can be uncompressed cover image. This side information is only available to intended recipients, but not to adversaries, which eventually provides improved steganographic security. 2.3 Previous Work in Image Steganalysis Many previous works focus on extracting features from images using coecients in Wavelet [40] or DCT domain [15], [53], and [43]. Moreover, Markov process is also used [53] or two feature sets are combined [43] in order to extract eective features for 15 steganalysis. Previous works also have reliable steganalysis result with features extracted from images. Lyu et al. extracted Wavelet-based features from test images for blind steganalysis after applying Wavelet decomposition in 2002 [40]. The distributions and error statistics of Wavelet coecients are used to extract 72 Wavelet features. This work is considered to be one of pioneer works in blind steganalysis. Fridrich extracted DCT-based features from test images for steganalysis [15]. In this work, 23 DCT-based features are extracted which consists of the rst order and the second order feature set. The rst order feature set considers the global histogram of all 64 DCT coecients and individual histograms for low frequency DCT modes. As natural images have higher-order correlations over distances larger than 8 pixels, the second order features captures inter-block dependency between neighboring DCT blocks. In fact, DCT coecients of the same DCT mode from neighboring blocks are not independent. DCT-based features take advantage of the fact that inter-block dependencies between neighboring 8 8 blocks are likely to increase after embedding by most steganographic algorithms. Shi et al. proposed 324 Markov process based features [53] which considers the dif- ferences between absolute values of neighboring DCT coecients as a Markov process. The energy distribution of JPEG coecients in each block is non-increasing along the zig-zag scanning order. Thus, this feature set takes advantage of the fact that DCT coecients within the same block also have intra-block correlation along horizontal, ver- tical, and diagonal directions. As steganographic embedding algorithms usually changes intra-block correlation between DCT coecients, this feature set is also known to be powerful for blind steganalysis. Pevny et al. proposed 274 merged Markov and DCT features in 2007 [43], which combines Markov process based features and DCT-based features together. In addition, this feature set also incorporates calibration process which takes advantage of the fact that the original image remains perceptually similar even after cropping 4 pixels in 16 both directions. In fact, DCT coecients also have approximately the same statistical properties as the original image after calibration process. Calibrated features are known to be more sensitive to embedding changes, which is more desirable characteristics for steganalysis. The merged Markov and DCT feature set is considered to be one of the- state-of-the-art feature set for steganalysis. The merged DCT and Markov feature set is used to extract features from image blocks for block-based image steganalysis. Note that these features are traditionally extracted from each image for frame-based approach, but from image blocks in the block-based steganalysis. For example, one merged feature set is extracted from 96 96 image which consists of 144 DCT blocks of size 88 marked with red color shown in Fig. ??. However, merged feature set is extracted from each 32 32 image block consists of 16 DCT blocks marked with blue color for block-based image steganalysis. For the case when the block size is 32 32, the total of 9 merged feature sets are extracted from 96 96 image. Figure 2.2: Comparison of feature extraction from one image of size 96 96 and from image blocks of size 32 32. This feature set also incorporates a calibration process [43] [15] which takes advan- tage of the fact that the original image remains perceptually similar even after cropping 17 4 pixels in both the horizontal and vertical directions. After cropping 4 pixels in both directions, 8 8 DCT blocks are dierent from DCT blocks from previous JPEG com- pression. However, the DCT coecients still have approximately the same statistical properties as those from the original image after the calibration process. Calibrated features are known to be more sensitive to embedding changes, which is a more desirable characteristic for steganalysis. The calibration process can be described as follows [15]: Given an original JPEG image J 1 , a function F is used to extract feature vectors F (J 1 ), which are the merged features. From JPEG image J 1 , four pixels are removed from the upper and left bound- aries and recompressed using the same quantization table to create a cropped JPEG image J 2 . The same function F is applied to JPEG image J 2 to extract the feature vectors F (J 2 ). Finally, a merged feature set f after the calibration process is obtained as an L 1 norm of dierence f =kF (J 1 )F (J 2 )k L 1 : (2.3) The calibration process was used to extract merged features from given image. For block-based image steganalysis, this process can also be used to extract the same features from each smaller image block in given image. For example, the calibration process can be applied to 32 32 block as shown in Fig. 2.3. 2.4 Performance of Blind Steganalysis 2.4.1 Binary Classier There are several methods to measure the performance of blind steganalysis: receiver operating characteristic (ROC), the universal benchmark method based on information theoretic principles, the average detection accuracy. 18 Figure 2.3: Calibration process for image block of size 32 32. The receiver operating characteristic (ROC) comes from signal detection theory, which represents the performance of binary classier. As shown in Fig. 2.4, the ROC curve is a plot of true positive rate as a function of false positive rate. True positive rate represents the ratio of stego images correctly classied as stego images. On the other hand, false positive rate represents the ratio of cover images incorrectly classied as stego images. If the ROC curve is closer to the upper left corner representing perfect classi- cation, then binary classier is considered to have better performance. In contrast, if the ROC curve is closer to the lower right corner, then binary classier is considered to have worse performances. The diagonal line represents random guess, when the probability of making correct decisions and the probability of making incorrect decisions are exactly the same. It is well-known that there exists a trade-o between true positive rate and false positive rate. Thus, if we want to increase true positive rate, then false positive rate also likely to increase at the same time. In other words, if we want to detect more stego images correctly, then this inevitably also increases the probability of classifying cover images incorrectly as stego images. Thus, we need to compute true positives rates for 19 given false positive rate in order to compare the performance of dierent steganalysis methods in a fair manner. Figure 2.4: The receiver operating characteristic (ROC) curve. Ker proposed the universal benchmark method to measure the performance of blind steganalysis [26]. It is based on information theoretic principles using empirically- estimated KL-divergence and asymptotic behavior. The proposed universal benchmark method eventually produces single number for each steganalysis method, which repre- sents the performance of blind steganalysis. Thus, this result can be used to rank the performance of dierent steganalysis methods. This method is also claimed to provide an application independent measurement of long-term detection capacity. In other words, this method gives the general performance of blind steganalysis without specifying cer- tain false positive rate. In contrast, if the performance of binary classier is measured by true positive rates for certain false positive rate, then this can be considered to be not application independent. This is because this measure does not give any information 20 about the performance of binary classier for applications with dierent false positives except specied values. The main goal of blind steganalysis for the warden Eve in the passive warden scenario is to decide whether the data sent from user Alice to user Bob contains a secret message or not. In other words, the goal of blind steganalysis is to make an accurate decision whether an unknown test image is a cover or stego image. Thus, the performance of blind steganalysis is measured by the average detection accuracy given by A detect = 1P error ; (2.4) where P error is the average error probability. Before explaining the procedure of calcu- lating the error probability, we introduce two types of errors made in the decision process from statistics: false positives and false negatives. Blind steganalysis tries to minimize these two errors in order to obtain higher detection accuracy. Statistically, this can be viewed as the error of rejecting a null hypothesis when it is actually true. False positives (false alarms) happen when a secret message is detected from a given cover image. In contrast, false negatives (misses) occur when a secret message is not detected from a given stego image. Statistically, this is the error of failing to reject a null hypothesis when it is in fact not true. Table 2.1: Confusion matrix of a binary classier. Decision Actual Cover Image Stego Image Cover Image True Negative False Negative Stego Image False Positive True Positive Considering these two types of errors, the average error probability P error [19] can be represented as P error = 1 2 (P FP +P FN ); (2.5) 21 Figure 2.5: Two types of errors in statistics. where P FP is the probability of false positives and P FN is the probability of false nega- tives. After calculating the average error probability P error , the average detection accu- racy A detect can be computed as A detect = 1 2 (A cover +A stego ) = 1P error = 1 1 2 (P FP +P FN ); (2.6) whereA cover represents the detection accuracy of cover images andA stego represents the detection accuracy of stego images. The average detection accuracy A avg can be used to evaluate the performance of blind steganalysis, which is considered to have better performance if the average detection accuracy is higher. 2.4.2 Multi-Classier WhenL dierent steganographic algorithms are used to create stego images instead of a single stegnographic algorithm, we need to decide which steganographic algorithm was used to embed the secret message into a given stego image. In other words, the main goal of a multi-classier is to decide the image type of a given test image from a set of image types I, which consists of L + 1 image types: I =fI 1 =cover;I 2 =stego 1 ;:::;I L+1 =stego L g (2.7) 22 where I 1 denotes the cover image and I 2 ;I L+1 denote stego images created with L dierent steganographic algorithms, respectively. The performance of blind steganalysis for a multi-classier can be evaluated with a confusion matrix as shown in Table2.2. Table 2.2: Confusion matrix of a multi-classier. Decision Actual Cover Stego 1 ... Stego L Cover Correct Incorrect ... Incorrect Stego 1 Incorrect Correct ... Incorrect ... ... ... ... ... Stego L Incorrect Incorrect ... Correct In addition, we can nd the overall detection accuracy from the confusion matrix by averaging the detection accuracy of all test images including cover images and stego images with L dierent steganographic algorithms. The overall detection accuracy of a multi-classier can be represented as A detect = 1 L + 1 (A I 1 +A I 2 + +A I L+1 ); (2.8) where A I l represents the detection accuracy of the multi classier when actual images are image type I l , l = 1; 2;:::;L + 1. 23 Chapter 3 Block-Based Image Steganalysis: Algorithm and Performance Evaluation 3.1 Introduction The goal of image steganography is to embed secret messages in an image so that no one except intended recipients can detect the presence of secret messages. There are many applications for image steganography such as embedding the copyright information in professional images, the personal information in photographs of smart IDs (identity cards), and the patient information in medical images [4]. By image steganalysis, one would try to detect the presence of secret messages hidden in images. Blind steganalysis attempts to dierentiate stego images from cover images without the knowledge of steganographic embedding algorithms [11]. Using features extracted from cover and stego images in a training set, the classier learns the characteristics of cover and stego images in a multi-dimensional feature space. With the classier obtained from the training process, blind steganalysis decides whether an unknown image is a cover or stego image. Previous research in this eld has focused on extracting features from images for the purpose of steganalysis [15], [53], [43]. The coecients in the DCT domain were used to extract features. DCT-based features were proposed in [15] for steganalysis, which exploits the fact that the inter-block dependency between neighboring blocks is often 24 aected by steganographic algorithms. Shi et al. [53] considered the dierences between absolute values of neighboring DCT coecients as a Markov process and proposed 324 features accordingly. This feature set was motivated by the observation that the intra- block correlation among DCT coecients within the same block can be aected by steganographic embedding. Pevny and Fridrich [43] proposed a set of 274 merged features by combining the Markov-process-based features and DCT-based features together. This merged feature set provides one of the-state-of-the-art feature sets for steganalysis. The characteristics of cover images are known to have an in uence on steganalysis [16]. For example, high frequency images such as texture images usually have more spread-out statistical distribution of DCT coecients, which makes steganalysis of high frequency images more dicult. In contrast, low pass lter operation such as blurring, denoising will make stego images more detectable by steganalysis. The image size is another factor which has an in uence on the performance of steganalysis. The feature values extracted from larger images are expected to have better steganalysis performance because feature values tend to be more reliable due to a larger number of statistical samples. In addition, the quality factor of JPEG images is also important because statistical distributions of DCT coecients change signicantly with dierent quality factors. However, little attention has been paid to the characteristics of the cover images, where a secret message is embedded, to design the content-adaptive classier for ste- ganalysis. Intuitively speaking, the eect of steganographic embedding on cover images with similar characteristics is likely to have stronger correlation than that on cover images with dierent characteristics [47]. On the other hand, natural images consist of heterogeneous regions, which makes it a challenging task to classify them. The above observation leads to an idea that decomposes an image into smaller homogeneous blocks. Then, we can treat each block as a basic image unit for steganalysis. Here, the merged Markov and DCT features in [43] are extracted from image blocks for block-based ste- ganalysis. Based on features from blocks selected by random sampling, tree-structured 25 vector quantization (TSVQ) can be adopted to classify blocks into multiple classes. For each class, a specic classier can be trained using block features, which represent the characteristics of that block type. For a given unknown test image, instead of making a single decision for the entire image, we repeat the block decomposition process and choose a classier to make a cover or stego decision for each block depending on the block class. Finally, the majority voting rule is used to fuse the decision results from all blocks so that we can decide whether the unknown image is a cover or stego image. To the best of our knowledge, this is the rst work that exploits the homogeneous characteristics of image blocks for steganalysis. In addition, the block-based image steganalysis gives us decision reliability information even when only one unknown image is given, which was not possible with previous works. Besides binary classier scenario, blind steganalysis needs to consider more prac- tical and realistic steganalysis scenarios where stego images are created with dierent steganographic algorithms. In addition to deciding whether a given test image is a cover or stego image, blind steganalysis also needs to determine which steganographic algo- rithm was used to embed a secret message in the case of stego images. This is called a \multi-classier", which is a more dicult task in terms of obtaining correct classication results as opposed to a binary classier. Another contribution made in this chapter is to provide the performance improvement for blind steganalysis with a multi-classier using block-based image steganalysis. In addition, dierent weights are given to block decisions depending on block classes and image types, which improves the decision accuracy of the majority voting rule. As a result, the performance of the block-based approach can be further enhanced for multi- classier scenario. The rest of this chapter is organized as follows. The proposed block-based image steganalysis system consists of the training process and the testing process is presented 26 in Sec. 3.2. Experimental results for binary classier and multi-classier are discussed in Sec. 3.3. Finally, concluding remarks are given in Sec. 3.4. 3.2 Block-Based Image Steganalysis 3.2.1 System Overview The block-diagram of a block-based image steganalysis system is shown in Fig. 3.1. It consists of the training process and the testing process, which will be detailed in the following two subsections, respectively. The training process The system decomposes an image into smaller homogeneous blocks and treats each block as a basic unit for steganalysis. A set of features is extracted from each individual image block and, then, a tree-structured hierarchical clustering technique is used to classify blocks into multiple classes based on extracted features. For each class of blocks, a specic classier can be trained using extracted features which represent the characteristics of that block type. Note that if the number of training blocks is too large, a statistical sampling method can be used to reduce the number of training blocks. The testing process The system performs the same block decomposition and feature extraction tasks on the test image. Then, it classies each image block into one specic block class, and uses its associated classier to make a decision whether the underlying block is a cover/stego block. Finally, a decision fusion step that integrates the decisions of multiple blocks into a single decision for the test image is conducted. For the block-based image steganalysis, the merged feature set proposed in [43] was extracted from image blocks, the random sampling was adopted as the statistical sam- pling method in the training process, and the majority voting rule was used to fuse 27 decision results from all image blocks. Block-based image steganalysis can be used both for a binary classier [6] and a multi-classier [7]. Figure 3.1: The block-based image steganalysis system. There are two main advantages with the block-based image steganalysis. First, it can oer better steganalysis performance without increasing the number of features. Actually, it provides a methodology to complement traditional frame-based steganalysis research that has focused on the search for more eective features. Second, the block- based steganalysis scheme can provide a more robust detection result for a single test image since the block decomposition step will generate more samples, and each of them can be tested independently. In contrast, the performance of a frame-based steganalysis scheme is highly dependent on the correlation between the test image and the set of training images. If the test image happens to have characteristics that are dierent from those of the training images, the classier obtained from the training process may not work well for the test images. It is worthwhile to emphasize one main dierence of traditional frame-based ste- ganalysis and new block-based steganalysis. That is, only one classier is obtained after the training process in frame-based steganalysis. In contrast, multiple classiers 28 can be adopted for blocks of dierent types for a test image. Intuitively speaking, the content-adaptive classier should provide more accurate steganalysis performance since each classier can focus more on the feature change due to steganographic embedding rather than the feature variation between dierent block types. 3.2.2 Training Process Block Decomposition, Feature Extraction and Random Sampling For the multi-classier, we embed the secret message into a cover image to get its cor- responding stego image for each of L steganographic algorithms. Note that the binary classier is the special case when only one steganographic algorithm (L = 1) is used to create stego images. This process is applied to all cover images to result in image sets consisting of a cover image and the corresponding stego images. Then, we decompose all images of size MN in the training set into smaller homogeneous blocks of size BB (B = 8b;b = 2; 3;:::;min(M;N)=8). The merged DCT and Markov features [43] are extracted from each block of images in the image set. If the number of decomposed image blocks is too large, we may use a random sampling method to select a subset of the image blocks to reduce the classication complexity. For example, for an image of sizeMN, we have aboutAMN=B 2 blocks of sizeBB. If A is too large, we can select a subset of size K randomly. This process is denoted as \random sampling" in Fig. 3.1. In order to get an equal number of sampled blocks from L + 1 image types, K=(L + 1) sample blocks are randomly selected from the cover images along with K=(L + 1) sample blocks that correspond to the same location from the stego images created from L dierent steganographic algorithms. Generally speaking, random sampling is better than sampling in a spatial order, since it allows us to collect blocks with more diversity so that more representative sample blocks can be used in the block classication process. 29 (a) (b) Figure 3.2: Random sampling: (a) sampled blocks in cover image and (b) corresponding sampled blocks in stego image. Sampled Block Classication Based on the merged features from the B B blocks, we would like to classify K sampled blocks intoC dierent classes, where each class consists of homogeneous blocks. Block classication has been considered in various image processing contexts. The tree structured vector quantization (TSVQ) technique has been used to classify image blocks using a binary tree structure based on block similarity. We borrow this idea and apply it to our current application. The main dierence is that block similarity is measured by the Eulidean distance between the pixel-wise dierence of two image blocks in the vector quantization context. Here, we consider a dierent criterion as described below. Following the spirit of TSVQ, we divide the whole set of sampled blocks into 2 sub- sets, and repeat the same process within each sub-set until all blocks are homogeneous enough within a sub-set. At each classication step, the K-means clustering algorithm is used to partition blocks in the same class, denoted byS, into 2 sub-classes, denoted by 30 S 1 andS 2 , by minimizing the within-cluster sum of energies E(S 1 ;S 2 ). Mathematically, this can be written as E(S 1 ;S 2 ) = X X i 2S 1 kX i 1 k 2 + X X i 2S 2 kX i 2 k 2 ; (3.1) whereX 1 ;X 2 ; ;X n are 274-dimensional feature vectors ofn blocks and i is the mean of feature vectors in S i ; namely, 1 = X X i 2S 1 X i ; 2 = X X i 2S 2 X i : (3.2) After classifying K blocks into C classes, the averaged feature vector in each class is computed, which is called the codeword for that class. This is shown in Fig. 3.3, where feature vectors are marked withX in the 2-dimensional space and codewords are marked with red circles. In vector quantization, the Voronoi region for a specic class is a region where feature vectors are closest to the same codeword of specic class. In other words, all feature vectors in the same Voronoi region can be classied into the same class with a specic codeword. The codewords will be used to classify blocks of a test image using the minimum distortion energy criterion in the feature space. It is worthwhile to point out another dierence between our classication scheme and TSVQ. In TSVQ, each intermediate node of the tree, representing a subset of code- words, is split into 2 sub-classes repeatedly to create a symmetric tree. However, our classication scheme does not demand a symmetric tree. If all blocks within a node are homogeneous enough, we can stop further division. Our stopping criterion is based on the value of E(S 1 ;S 2 ). That is, we always split a node with the largest minimum E(S 1 ;S 2 ) value. The process is repeated until we have C leaves (or classes). 31 Figure 3.3: Codewords computed as the average of feature vectors for each class in 2-dimensional space. General Block Classication and Classier Design After getting C codewords to represent C classes from K sampled blocks, all BB sample blocks in the image sets of the training set will be classied into one of the C classes. The classication is based on a distortion measure E i (f I 1 ;:::;f I L+1 ), which is dened to be the sum of L + 1 energies from a codeword of the ith class: E i (f I 1 ;:::;f I L+1 ) =E i (f I 1 ) +::: +E i (f I L+1 ); (3.3) where f I 1 ;:::;f I L+1 are the feature vectors of a block from images of type I 1 ;:::;I L+1 in Eq. (2.7) The energy E i (f I l ) between the 274 merged features of a block from image type I l (l = 1; 2;:::;L + 1), f I l , and the codeword, i , of the ith class is represented as E i (f I l ) = 274 X k=1 jf I l ;k i;k j 2 ; (3.4) 32 where f I l ;k is the kth component of f I l and i;k is the kth component of i . After com- putingE i (f I 1 ;:::;f I L+1 ) fori = 1; ;C, the block set from the cover image and the cor- responding stego images in the training set is classied into class C j , ifE j (f I 1 ;:::;f I L+1 ) has the smallest value among all E i (f I 1 ;:::;f I L+1 ), 1 i C. Using the features of blocks from the cover and stego images of each class, a specic classier such as a linear Bayes classier for each class can be obtained for all C classes. The change of feature values after steganographic embedding has more correlation within blocks in the same class than between blocks in dierent classes [47]. In fact, the eect of steganographic embedding can be reduced by the variation of feature values between dierent classes. If decomposed blocks in a given test image can be classied into specic classes, the classier can more accurately dierentiate blocks in stego images from blocks in cover images. In other words, the classier can focus only on changes of feature values after steganographic embedding instead of being distracted by the variation of feature values between dierent classes. 3.2.3 Testing Process Block Classication and Classier Selection For a given test image, we can perform exactly the same image decomposition and feature extraction as described in the training process. Each block of the test image is classied into a class using the minimum distortion energy. Depending on the class of each block, the classier obtained from the training process is applied here. We call them content-dependent classiers since they are adaptively chosen according to the block class. Content-dependent classiers are useful because the change of feature values after steganographic embedding have higher correlation with blocks of the same class than those of dierent classes. 33 Computation of Weights It is known that blind steganalysis can make more accurate decision with low-frequency images such as smooth images than high-frequency images such as texture images [16]. Thus, it would be benecial for the decision fusion process to consider the fact that decisions from certain blocks are more accurate than those from other blocks. Before applying the majority voting rule, two types of weights can make block deci- sions more accurate: 1) weights that depend on dierent block classes, and 2) weights that depend on dierent image types. These two types of weights for blocks are used to give us the weight for each block decision. Whenever a block is decided to be from a cover image or a stego image from one of L dierent steganographic algorithms, we can give a weight to this result depending on its class of each block. The classier for each block class can be used to compute the correct decision rate for all C classes. The block decision accuracy is dened to be the ratio of correct block decisions when blocks are decided to be from a specic type of image. The correct decision rate for the kth class when the blocks are from image type I l (l = 1; 2;:::;L + 1) can be represented as W b=I l (k) =P (actual =I l jdecide =I l ); (3.5) where P (actual =I l jdecide =I l ) represents the probability of blocks which are actually from image type I l when blocks are decided to be from image type I l . This measure is dierent from the detection accuracy in that the detection accuracy is the ratio of correct image decisions when the actual image is a specic type of image. In addition, we can take advantage of the detection accuracy results using images in the training set and give weights to block decisions in the decision fusion process. After obtainingC classiers with images in the training set, we can use the same images instead of those in the testing set to get the detection accuracy result. If the detection accuracy of a specic type of image is relatively low, it is benecial to give a higher 34 weight to block decisions of this type of image. This is because the classier can make more accurate decisions if higher weights are given to correct block decisions of a specic image type. For example, if the detection accuracy of cover images is relatively low, then it is benecial to give more weights to block decisions when they are decided to be from cover images. In contrast, if many images are incorrectly classied as a specic type of image with a high probability, we should give a lower weight to block decisions of this image type. For example, if stego images are incorrectly classied as cover images frequently, it is better to give a lower weight to cover block decisions to improve the performance of the steganalysis. This is because fewer blocks will be classied as cover blocks, which reduces the probability of stego images being incorrectly classied as cover images. The weights for block decisions of image typeI l ,l = 1; 2; ;L+1, can be represented as W i=I l = [1A I l ] + [1P e (decide =I l jactual6=I l )]; (3.6) where A I l is the detection accuracy of image type I l in Eq. (2.8) and P e (decide = I l jactual6=I l ) is the error probability of making a decision of image type I l when this is not true. Finally, we can compute the weights for blocks by integrating weights of block classes and weights of image types. The weights for blocks with dierent block classes and image types can be represented as ~ W b=I l (k) =W i=I l W b=I l (k) (3.7) where ~ W b=I l (k), l = 1; 2;:::;L + 1, represents weights for blocks from image type I l in the kth class. These weights are used to determine the importance of block decisions in the majority voting rule. In other words, these weights represent the reliability of block decisions. Note that these weights are expected be more benecial for the multi-classier rather than the binary classier. 35 Note that weights are computed from training images without testing images. After getting classiers for dierent block type classes in the training process, images in the training set are used to measure the performance of these classiers in terms of block decision accuracy. Decision Fusion by Majority Voting EachMN test image consists ofMN=B 2 blocks of sizeBB. Based on the classier for each class, we can make a decision whether each block is a block from a cover image or a stego image created with one of L steganographic algorithms. Each decision is weighted by W b=I l (k), which determines the importance of this decision. Thus, the total number of weighted decisions made is equal to MN=B 2 . After makingMN=B 2 weighted decisions for a test image, a majority voting rule is adopted to make the nal decision on whether a given test image is a cover image or a stego image created from one of L steganographic algorithms. The nal decision can be made by selecting a cover image or a stego image with a specic steganographic algorithm which has the largest sum of weights. If we do not consider weights for block decisions, the nal decision can be made by selecting a cover image or a stego image with a specic steganographic algorithm which has the largest number of block decisions. For the binary classier, it is declared a cover (or a stego) image if the number of cover blocks is larger (or smaller) than that of stego blocks. 3.2.4 Comparison with Traditional Steganalysis We would like to emphasize the main dierence of the proposed steganalysis and the traditional steganalysis. Although steganographic embedding will have a dierent impact on features of dierent classes, only a single classier is obtained after the training process of all images in the training set in traditional steganalysis. In the proposed steganalysis, we can adopt dierent classiers for blocks of dierent types in test images, and the content-adaptive classier can determine with higher accuracy whether each block is 36 a block from a cover image or a stego image created with one of L steganographic algorithms. This is because the classier can focus more on changes of feature values due to steganographic embedding rather than the variation of feature vectors between dierent classes. 3.3 Experimental Results The performance improvement in block-based image steganalysis for the binary classier and the multi-classier will be illustrated in this section. For the benchmarking purpose, we will compare this result with the performance of blind steganalysis using merged DCT and Markov features [43], which is the state-of-the-art work in the eld. 3.3.1 Binary Classier with Block-Based Image Steganalysis Experimental Setting In the experiment, we considered training and testing images of dimension MN = 384 512 and decomposed each image into 48 blocks of size BB = 64 64. After extracting 274 merged features from each block,K = 20; 000 sample blocks were selected from cover and stego images in the training set by random sampling. These sample blocks were classied into C = 16 classes. For the classier design, 16 dierent linear Bayes classiers were obtained for 16 classes with regulation parameter R =S = 0:001. Note that weights in block decisions as a function of block classes and image types are not considered in the binary classier. Image Database The uncompressed colour image database (UCID) [51] was used as the cover images in the training set. The INRIA Holidays dataset [23] was used for the cover images in the testing set. The UCID image database consists of 1338 images, and the Holidays image database has 1491 images, which have diverse subjects such as natural scenes and 37 articial objects. Although the original images were color images of dierent sizes, all images have been changed into 384 512 gray-level images and saved as JPEG les with a quality factor 85 with the JPEG compression. Figure 3.4: Sample images from the Uncompressed Colour Image Database (UCID) and the INRIA Holidays dataset. Steganographic Embedding After obtaining cover images from the image databases, the model-based steganography (MBS) method [50] and the perturbed quantization (PQ) method [18] were used to embed a secret message into the cover images to create the corresponding stego images. While the MBS method uses the original JPEG images obtained with a quality factor of 85 for cover images, the PQ method demands double-compressed JPEG images. In our experiment, the original JPEG images were compressed once again with a quality factor of 70 for the PQ method. As dierent images may have dierent embedding capacity, the embedding strength for each image is measured in units of BPC (bits per non-zero DCT AC coecients). This is because nonzero DCT AC coecients are used as candidate embedding locations in many steganographic embedding algorithms. In other words, it is not secure to embed a secret message into DC coecients and AC coecients with the 38 zero value. This allows us to embed a similar amount of a secret message considering the embedding capacity regardless of dierent image characteristics. In our experiments, ve embedding rates (0.05, 0.1, 0.2, 0.3, and 0.4 BPC) were tested for both the MBS and the PQ methods. Performance of Block-Based Image Steganalysis The detection accuracy of the proposed block-based image steganalysis is reported in this subsection. For the benchmarking purposes, detection accuracy of the merged features in [43] using the linear Bayes classier is also given. This frame-based approach is referred to as Pevny's method. The performance of these two methods is shown in Table 3.1 and Fig. 3.5. As the PQ embedding method is known to be a more robust method than the MBS embedding method, we see that the detection accuracy of both steganalysis methods is signicantly higher with MBS. In addition, we observe that the detection accuracy improves with higher embedding rates since it becomes easier to dierentiate stego images from cover images when a larger amount of hidden messages are embedded. Next, we compare the proposed block-based image steganalysis and the Pevny's method. The proposed steganalysis has better detection accuracy than Pevny's method regardless of steganographic algorithms and embedding rates. The maximum performance improvement of the proposed method over Pevny's method is close to 15% for the MBS method with an embedding rate of 0:20 BPC. Decision Reliability Another important issue to point out is that the block-based image steganalysis oers the reliability information of a decision. Here, decision reliability is dened to be the ratio of the number of correct decisions and the total number of decisions. Intuitively speaking, the voting dierence between the number of cover blocks and stego blocks serves as such an indicator. That is, if the voting dierence is larger, the decision should be more reliable. We show the relationship between decision reliability and the voting dierence 39 Table 3.1: Performance comparison of Pevny's method and the proposed block-based image steganalysis. Steganography BPC Pevny's Proposed MBS 0.05 55.94 65.79 MBS 0.10 62.58 75.42 MBS 0.20 74.75 89.57 MBS 0.30 83.37 95.00 MBS 0.40 89.34 98.09 PQ 0.05 55.37 58.22 PQ 0.10 55.70 60.36 PQ 0.20 56.04 63.65 PQ 0.30 57.08 66.50 PQ 0.40 58.12 69.42 in Table 3.2, which is obtained using the MBS method with an embedding rate of 0:20 BPC. It is clear that detection reliability improves with larger voting dierence. The decision reliability increases from 77:98% with 0 5 voting dierence to 99:76% with 21 48 voting dierence. Note that the traditional frame-based steganalysis cannot provide the measure of detection reliability for a single test image. Table 3.2: Decision Reliability with Voting Dierence Voting Correct Incorrect Decision Dierence Decisions Decisions Reliability 0 - 5 432 122 77.98 6 - 10 717 73 90.76 11 - 15 541 15 97.30 16 - 20 656 6 99.09 21 - 48 419 1 99.76 40 3.3.2 Multi-Classier with Block-Based Image Steganalysis Experimental Setting In the experiment, we used training and testing images of dimensionMN = 384512 from an uncompressed colour image database (UCID) and the INRIA Holidays dataset, which is similar to the case of the binary classier. Then, each image is decomposed into 48 blocks of sizeBB = 6464. After extracting 274 merged features from each block, K = 20; 000 sample blocks were selected from the cover images and corresponding stego images with L = 3 dierent steganographic algorithms in the training set by random sampling. These sample blocks were classied into C = 8 classes. For the classier design, 8 dierent linear Bayes classiers were obtained for 8 classes with regulation parameter R =S = 0:001. Steganographic Embedding After obtaining the cover images from the image databases, L = 3 dierent stegano- graphic algorithms were used to embed a secret message into the cover images to cre- ate the corresponding stego images: OutGuess (OG) [45], F5 [59], and model-based steganography (MBS) [50]. As dierent images may have dierent embedding capacity, the embedding strength for each image is measured in units of BPC (bits per non-zero DCT AC coecients). In our experiments, two dierent embedding rates (namely, 0:2, 0:3 BPC) were tested for OutGuess, F5, MBS. Performance of Block-Based Image Steganalysis The detection accuracy of the proposed block-based image steganalysis for a multi- classier is reported in this subsection. For the benchmarking purpose, the detection accuracy of a multi-classier using the merged features [43] (linear Bayes classier) is also given. This frame-based approach is referred to as Pevny's method. The max- wins strategy was not used for Penvy's method in order to make a fair comparison with 41 the proposed method. The multi-classier for both methods will have four possible classication results: cover image and stego image created from OutGuess (OG), F5, and MBS. Table 3.3: Confusion matrices of Pevny's method and the proposed method with BPC 0:3 Decision Actual Cover OutGuess F5 MBS Pevny's Method, Detection Accuracy = 76:79% Cover 81.76 6.84 5.90 20.46 OutGuess 1.48 76.73 0.94 3.82 F5 3.55 2.15 78.67 5.70 MBS 13.21 14.29 14.29 70.02 Proposed Method, Detection Accuracy = 86:23% Cover 77.33 0.00 8.58 3.02 OutGuess 1.01 94.90 1.01 4.43 F5 16.30 0.74 86.32 6.17 MBS 5.37 4.36 4.09 86.38 Table 3.4 shows the confusion matrices of Penvy's method and the proposed method in the case ofBPC = 0:2. While the detection accuracy of cover images and stego images from F5 decreased around 10% percent with the proposed method, that of stego images created from OutGuess and MBS improved more than 20%. When test images were embedded with OutGuess, the detection accuracy improved the most with 25:9%. The overall detection accuracy improved from 63:63% to 70:93% with the proposed method, which is a 7:3% improvement. Table 3.3 shows the confusion matrices of Penvy's method and the proposed method in the case of BPC = 0:3. We observe that the overall detection accuracy is higher for the case of BPC = 0:3 since it becomes easier to classify test images into four dierent classes when a secret message with a longer length is embedded. The detection accuracy of all the stego images improved 8 18% with the proposed method. Similar to the case of BPC = 0:2, the detection accuracy improved the most by 18:2%, when test images 42 Table 3.4: Confusion matrices of Pevny's method and the proposed method with BPC 0:2 Decision Actual Cover OutGuess F5 MBS Pevny's Method, Detection Accuracy = 63:63% Cover 72.30 16.16 10.13 28.71 OutGuess 2.41 57.14 2.55 5.77 F5 5.70 3.76 67.61 8.05 MBS 19.58 22.94 19.72 57.48 Proposed Method, Detection Accuracy = 70:93% Cover 63.25 0.27 21.93 4.63 OutGuess 2.35 83.03 3.49 8.05 F5 20.39 2.01 58.82 8.72 MBS 14.02 14.69 15.76 78.60 were embedded using OutGuess. The overall detection accuracy improved from 76:79% to 86:23% with the proposed method, which is a 9:4% improvement. As shown in the experimental results, the proposed method has a signicant improve- ment over Pevny's method for both cases when BPC equal to 0:2 and 0:3. This is possible because a block-based approach takes advantage of characteristics of images which are decomposed into smaller homogeneous blocks. This allows the block-based approach to achieve better training with cover and stego images in the training set. Depending on block classes, dierent classiers are designed to exploit characteristics of blocks in dierent classes. In addition, weights derived based on dierent block classes and image types are used in the majority voting rule, which improves the overall detection accuracy of the proposed method. 3.4 Conclusion A block-based image steganalysis was proposed in this chapter, where dierent classiers were adopted for dierent block classes based on their characteristics. It was shown by 43 experimental results that the proposed method oers a signicant improvement in detec- tion accuracy for the binary classier and the multi-classier as compared with previous work using a frame-based approach. In addition, the block-based image steganalysis can provide reliable decision even when only one test image is given. Finally, the weight- ing process of block decisions based on block classes and image types was proposed to improve the performance of block-based image steganalysis. 44 (a) Model-based Steganography (b) Perturbed Quantization Figure 3.5: The performance comparison of frame-based approach (Pevny's method) and block-based approach (the proposed block-based image steganalysis) for dierent steganographic algorithms: (a) Model-based Steganography and (b) Perturbed Quanti- zation. 45 Chapter 4 Performance Study on Block-Based Image Steganalysis 4.1 Introduction Most previous work on image steganalysis focused on extracting features from images for steganalysis and then used a binary classier to dierentiate stego images from cover images. The research objective was to nd a better feature set for images to improve the steganalysis performance. Fridrich [15] proposed the use of DCT features for steganalysis by exploiting the fact that the inter-block dependency between neighboring blocks is often aected by steganographic algorithms. Shi et al. [53] proposed to use Markov features, where the dierences between absolute values of neighboring DCT coecients are modeled as a Markov process. Pevn y and Fridrich [43] proposed a set of 274 merged features by combining the DCT features and the Markov features together. The number of features was increased to achieve better steganalysis performance by researchers in recent years. Some examples are given below. Chen et al. [5] proposed a set of updated Markov features (486 features) by considering both intra-block and inter-block correlations among DCT coecients of JPEG images. Kodovsk y et al. [31] examined a set of updated merged features (548 features) using the concept of Carte- sian calibration. Pevn y et al. [42] used higher order Markov models to model dierences between neighboring pixels in the spatial domain and developed a subtractive pixel adja- cency model feature set (686 features). More recently, Kodovsk y et al. [32] introduced the cross-domain feature set (1234 features), which considers features from the spatial domain and the DCT domain at the same time. 46 A dierent path to image steganalysis was explained in Chapter 3, which is known as block-based image steganalysis. Since an input image typically consists of heterogeneous regions, the block-based image steganalysis decomposes an image into smaller homoge- neous blocks and treats each block as a basic unit for steganalysis. By exploiting the property that the eect of steganographic embedding on similar image blocks is likely to have stronger correlation [47], the characteristics of homogeneous blocks can be used to design dierent content-adaptive classiers for dierent block types. While the frame- based approach extracts a set of features from an image, the block-based approach takes advantage of rich information of images by extracting a set of features from each individ- ual block of an image. Finally, the steganalysis of the whole image can be conducted by fusing steganalysis results of all blocks through a voting process. The block-based image steganalysis was shown to have better steganalysis performance than the frame-based approach in both the binary classier and the multi-classier. In this chapter, we aim to study the performance of block-based image steganalysis as a function of the block size and the block number. First, we analyze the dependence of steganalysis performance on each individual factor, and show that a larger block size and a larger block number will lead to better steganalysis performance. Our analysis will be veried by experimental results. For a given test image, there exists a trade-o between the block size and the block number. To exploit both eectively, we propose to use overlapping blocks to improve steganalysis performance furthermore. The rest of this chapter is organized as follows. First, the relationship between block size and block number is examined in Sec. 4.2.1. The performance of block-based image steganalysis as a function of the block size is studied in Sec. 4.2.2. The relationship between block decision accuracy and image decision accuracy is analyzed in Sec. 4.2.3. Then, the performance of block-based image steganalysis as a function of the block size is studied in Sec. 4.2.4. The idea of overlapping blocks is introduced in Sec. 4.2.5. Experimental results are presented in Sec. 4.3. Furthermore, additional performance improvement of block-based image steganalysis with dierent block class numbers and 47 dierent classiers will be presented in Sec. 4.3. Finally, concluding remarks and sug- gestions for future work are given in Sec. 4.4. 4.2 Analysis of Block Size, Block Number, and Block Overlapping Eects There exists a relationship between the block size and the block number for a given image. If the block size is smaller, there are more blocks. We may ask \what is the best block decomposition strategy?". In this chapter, we would like to study this problem in depth. We will rst examine the non-overlapping block case [8] when blocks are not overlapped with neighboring blocks after image decomposition. Then, in the last subsection, we will consider the overlapping block case to have larger block numbe for block-based image steganalysis. 4.2.1 The Relationship between Block Size and Block Number Without loss of generality, we assume an image of size II (pixels), a block of size BB (pixels) and I =MB with M being a positive integer. Then, we have B 2 M 2 =I 2 ; M 2 =N; (4.1) where N is the number of blocks (or, simply, the block number) in this image. This relationship between the block number, N, and the block size, B 2 , is shown in Fig. 4.1 for an image of size 512 512. 4.2.2 Analysis of Block Size Eect We study the block size eect for a xed block number in this subsection. Intuitively speaking, a larger block size should give better steganalysis performance. To understand the block size eect, we analyze the distribution of feature vectors. If the feature vectors 48 Figure 4.1: The relationship between block numberN and block sizeB 2 for an image of size 512 512, which follows a curve of form N = (512 2 )=(B 2 ). of the cover and stego image blocks are more concentrated, it will be easier to design a classier with higher discriminative power, which has better steganalysis performance. Among the 274 merged features in [43], we observe that the blockiness features have the largest standard deviations. Thus, we will focus on them in our analysis. There are two blockiness features B with = 1; 2, which are used to measure the inter-block dependency of the JPEG image over all DCT modes between neighboring 8 8 DCT blocks. They are dened as [43] B = C W () +C H () Wb(H 1)=8c +Hb(W 1)=8c ; (4.2) where H and W are the height and the width of the input image in pixels and C W () = b(H1)=8c X i=1 W X j=1 jc 8i;j c 8i+1;j j ; (4.3) C H () = b(W1)=8c X j=1 H X i=1 jc i;8j c i;8j+1 j ; (4.4) 49 and where c i;j is the gray value of the (i;j)-th pixel in the JPEG image. These features are traditionally extracted from each image frame but they are computed from image blocks in the proposed scheme. Consider an image block that consists of n neighboring DCT block pairs in both horizontal and vertical directions. Let F i be a feature value extracted from the ith neighboring DCT block pair. Then, the feature value extracted from the image block, F , can be written as F = 1 n n X i=1 F i : It is a sample mean of feature values from neighboring DCT block pairs. For blockiness features B , C W () and C H () represent feature values from neighboring block pairs in vertical and horizontal directions, respectively. Furthermore, by assuming that F i is an independently and identically distributed (i.i.d.) random varible with mean m and variance 2 , we can obtain the mean and the standard deviation of F as E[ F ] =m; and Std[ F ] = p n : (4.5) In words, the mean of F is the same as the mean of F i while its standard deviation is reduced by a factor of 1=( p n). If the block size becomes larger (i.e., a larger value of n), the number of DCT blocks in the image is the same but the number of DCT blocks for each block increases. Then, the standard deviations of feature values become smaller, and it is easier to design a classier which dierentiates stego images from cover images. Note that the feature values also go through a calibration process [43] to improve their sensitivity to steganographic embedding. Since the statistical properties of DCT coecients remain about the same after the calibration process, the analytical result in Eq. (4.5) still holds after the calibration process. We conduct experiments to verify the relationship between the standard deviations of blockiness features and the block size as derived above. The results are shown in Table 4.1, where the block size is chosen to be 6464, 128128 and 256256. The blockiness 50 Table 4.1: The standard deviations of blockiness (B 1 , B 2 ) features with dierent block sizes (BB). Block Size Standard Deviations B 1 B 2 64x64 1.98 187.87 128x128 1.09 95.55 256x256 0.67 56.98 features are extracted from horizontally and vertically neighboring image block pairs in 200 JPEG images. As shown in Table 4.1, the standard deviations of blockiness features decrease with an increased block size, which is approximated well by the relationship in Eq. (4.5). Clearly, larger block sizes result in higher discriminative power of extracted features. 4.2.3 The Relationship between Block Decision Accuracy and Image Decision Accuracy In this subsection, we would like to nd the relationship between block decision accuracy and image decision accuracy. We will compute image decision accuracy from block decisions using majority voting rule for 2 dierent cases: non-identical block decision accuracy case and identical block decision accuracy case. Non-Identical Block Decision Accuracy Case In general, block decision accuracy for each block type class is not identical because each class consists of blocks with dierent characteristics. Thus, we need to assign dierent block decision accuracy for each block type class, which can be referred to as \non-identical case". Before nding the general formula to compute image decision accuracy (P ), we would like to consider a simple example when a given image is decomposed into N = 5 smaller blocks. Here, we assume that there are 2 block type classes C 1 and C 2 . For each block 51 class, the block decision accuracy is denoted as p 1 andp 2 . For each block type class, we represent the number of blocks classied into C 1 and C 2 classes as N 1 and N 2 . Among N 1 and N 2 blocks in C 1 and C 2 classes, the number of correct decisions is represented as n 1 and n 2 . In order to get correct majority voting result for a given image, we need to have at least 3 correct decisions out of 5 block decisions. The image decision accuracy (P ) can be computed as P =P (X 3) = 1P (X 2) = 1 [P (X = 0) +P (X = 1) +P (X = 2)] (4.6) where a random variable X represents the number of correct block decisions. In this equation, we need to compute 3 terms: P (X = 0), P (X = 1), and P (X = 2). First, we can consider the probability P (X = 0), when there is no correct block decision among 5 block decisions. Then, the probability P (X = 0) can be represented as follows. P (X = 0) = (1p 1 ) N 1 (1p 2 ) N 2 (4.7) When there is only one correct decision among 5 block decisions, this correct deci- sion can be from either class C 1 or class C 2 . Then, the probability P (X = 1) can be represented as follows. P (X = 1) = 0 @ N 1 1 1 A p 1 (1p 1 ) N 1 1 (1p 2 ) N 2 +(1p 1 ) N 1 0 @ N 2 1 1 A p 2 (1p 2 ) N 2 1 (4.8) When there are 2 correct decisions, both decisions can be from the same class C 1 or C 2 . Otherwise, one decision can be from class C 1 and the other decision can come from class C 2 . Then, the probability P (X = 2) can be represented as follows. 52 P (X = 2) = 0 @ N 1 2 1 A p 2 1 (1p 1 ) N 1 2 (1p 2 ) N 2 + (1p 1 ) N 1 0 @ N 2 2 1 A p 2 2 (1p 2 ) N 2 2 + 0 @ N 1 1 1 A p 1 (1p 1 ) N 1 1 0 @ N 2 1 1 A p 2 (1p 2 ) N 2 1 (4.9) In fact, the above equations can be written in the general form P (X =n) as P (X =n) = n 1 +n 2 =n X n 1 ;n 2 C=2 Y k=1 0 @ N k n k 1 A p n k k (1p k ) N k n k (4.10) where n 1 +n 2 =n P n 1 ;n 2 represents the summation of terms with all the possible combinations of n 1 and n 2 satisfying the condition n 1 +n 2 =n. If we add all 3 terms together, we can write P (X <= 2) as follows. P(X 2) = [P(X = 0) + P(X = 1) + P(X = 2)] = 2 P n=0 P (X =n) = 2 P n=0 2 4 n 1 +n 2 =n P n 1 ;n 2 C=2 Q k=1 0 @ N k n k 1 A (1p k ) N k n k p n k k 3 5 (4.11) Finally, we can compute the image decision accuracyP when a given image is decom- posed into N = 5 blocks as follows. P = P(X 3) = 1 P(X 2) = 1 2 P n=0 P (X =n) = 1 2 P n=0 2 4 n 1 +n 2 =n P n 1 ;n 2 C=2 Q k=1 0 @ N k n k 1 A (1p k ) N k n k p n k k 3 5 (4.12) The above derivation can be extended to the case when there exist N blocks in a given image. We can assume that there areC dierent number of block type classes. The 53 number of blocks classied into each block class can be represented as N 1 ;N 2 ;:::;N C , which satisfy the following equation. N =N 1 +N 2 + +N C (4.13) The probability of making a correct image decision fromN block decisions,P (X >= (N + 1)=2), can be generalized as P (X (N + 1)=2) = 1P (X (N 1)=2) = 1 (N1)=2 P n=0 P (X =n) = 1 (N1)=2 P n=0 2 4 n 1 +:::+n C =n P n 1 ;:::;n C C Q k=1 0 @ N k n k 1 A (1p k ) N k n k p n k k 3 5 (4.14) where n 1 +:::+n C =n P n 1 ;:::;n C represents the summation of terms with all the possible combinations of n 1 ;n 2 ;:::;n C satisfying the condition n 1 +n 2 +::: +n C = n for each n value in P (X = n). This result shows the relationship between image decision accuracy and block decision accuracy when each block type class has dierent block decision accuracy values. (non-identical block decision accuracy case) Identical Block Decision Accuracy Case We also nd the relationship between block decision accuracy and image decision accu- racy when block decision accuracy is identical regardless of dierent block type classes. Suppose that a test image is decomposed into N blocks, and the stego/cover decision is made for each individual block based on the extracted features. Then, the majority voting rule is adopted in the testing process to fuse these N block decisions. If N is an odd number, we need at least (N + 1)=2 correct decisions in order to obtain a correct majority voting result. Thus, the probability of making a correct decision for the test image, P , can be expressed as 54 P =P (X (N + 1)=2) = 1P (X (N 1)=2); (4.15) where a random variableX denotes the number of correct block decisions. If the random variable of making a correct decision for each block is i.i.d., the cumulative distribution function of obtaining less or equal to k correct decisions from N block decisions can be expressed as P (Xk) =F (k;N;p) = k X i=0 0 @ N i 1 A p i (1p) Ni ; (4.16) wherep is the probability of a correct decision for each block. Clearly, the probability of a correct decision, P , for the test image is closely related to the probability of a correct block decision, p, as well as the number of block decisions, N. 4.2.4 Analysis of Block Number Eect In this subsection, we study the block number eect for a xed block size. Intuitively speaking, the performance of the block-based steganalysis should be better if more blocks are involved in the decision process. This will be demonstrated below. Consider a test image that consists ofN blocks, and the cover/stego decision is made for each individual block based on the extracted features, and the majority voting rule is adopted in the testing process to fuse theseN block decisions. IfN is an odd number, we need at least (N + 1)=2 correct decisions in order to obtain a correct majority voting result. Then, the probability of making a correct decision for the test image can be expressed as P =P (X (N + 1)=2) = 1P (X (N 1)=2); (4.17) where X is a random variable denoting the number of correct block decisions. If the random variable of making a correct decision for each block is i.i.d., the cumulative 55 distribution function of obtaining less than or equal to k correct decisions fromN block decisions can be expressed as P (Xk) =F (k;N;p) = k X i=0 0 @ N i 1 A p i (1p) Ni ; (4.18) where p is the probability of a correct decision for each block. Clearly, the probability of correct decision, P , for the test image is closely related to the probability of correct block decision,p, as well as the number of block decisions,N. This relationship between P and N parameterized by a xed value of p will be examined below. By using the Hoeding inequality F (k;N;p) exp 2 (Npk) 2 N ; (4.19) we can determine the upper bound of the cumulative distribution function in Eq. (4.18) as P (X (N 1)=2) = F ((N 1)=2;N;p) exp 2 (Np (N 1)=2) 2 N : (4.20) For the majority voting rule to work properly, p should be greater than 0:5(50%), or p = 0:5 +" (0<"< 0:5): (4.21) The limit of the exponential term in Eq. (4.20) can be computed as lim N!1 exp 2 (Np (N 1)=2) 2 N (4.22) = lim N!1 exp 2(" 2 N + 1=4N +") = 0 56 The above equation, together with Eq.(4.20), leads to lim N!1 P (X (N + 1)=2) = 1 lim N!1 P (X (N 1)=2) = 1: (4.23) This means that the probability of making a correct decision from N block decisions,P , using the majority voting converges to 1 (100% detection accuracy) as the block number, N, goes to the innity. We plot the image decision accuracy, P , as a function of the block number,N, parameterized by thep value using the majority voting rule in Fig. 4.2, where p = 51%; 55%; 60%. As shown in the gure, we get a higher decision accuracy for a test image if we have a larger block number. In practice, the block decision is not an independent event, and the block decision accuracy, p, is not identical since it depends on the block class (e.g. smooth, edged and textured regions). Although being over-simplied, the above analysis does provide a general trend. Figure 4.2: The image decision accuracy (P ) as a function of the block number (N) parameterized by the block decision accuracy p = 51%; 55%; 60%. 57 4.2.5 Analysis of Block Overlapping Eect Although it is benecial to have a large block size and a large block number for the block- based image steganalysis, there exists a trade-o between the block size and the block number for image decomposition with non-overlapping blocks. The use of overlapping blocks provides an alternative to increase the block number for a xed image size. Figure 4.3: Illustration of the overlap size (O) and the step size (S) for the overlapping block case. For overlapping blocks, the step size is used to measure the degree of overlap between two neighboring overlapping blocks in both the horizontal and vertical directions. An example is illustrated in Fig. 4.3, where the image size is 512 512 and the block size is 256 256. The overlap size, O, is the overlapped distance between two neighboring overlapping blocks while the step size, S, is the displacement of two neighboring blocks. Clearly, O +S = B. For block size BB and step size S, we can compute the block number as N = [(WB)=S + 1] [(HB)=S + 1]; (4.24) 58 where H and W are the height and the width of the image, respectively. The block number in an 512 512 image with dierent block sizes and step sizes is given in Table 4.2. For a block of size BB, the block number for 3 dierent step sizes is computed: non-overlapping blocks (S = B), overlapping blocks with a step size set to one half of the block size (S =B=2) and one quarter of the block size (S =B=4). Table 4.2: The block number (N) in 512x512 image with dierent block sizes (BB) and step sizes (S). Block Size Block Number S = B S = B/2 S = B/4 256x256 4 9 25 128x128 16 49 169 64x64 64 225 841 32x32 256 961 3,721 The advantage of using overlapping blocks in block-based steganalysis is shown in Fig. 4.4. By reducing the step size from B to one half and one quarter of B, we obtain more block samples. As we have larger block numbers with smaller step sizes, the curve in Fig. 4.4 moves towards the upper right direction. Intuitively, for a given block size, if there are more block samples, the classier can provide a better decision. For example, for a block size of 64 64, the total number of blocks is 64 with non-overlapping blocks (S =B). With overlapping blocks, the total number of blocks increases to 225 and 841 for step size equal to 32 (S =B=2) and 16 (S =B=4), respectively. 4.3 Performance Evaluation The performance of block-based image steganalysis for a binary classier (either stego or cover image) will be studied in this section. We will compare the proposed block- based approach with the frame-based approach. We will provide experimental results by varying parameters in block-based image steganalysis so as to understand the eects of block sizes, block numbers, and block overlapping. 59 Figure 4.4: The block number (N) in a 512 512 image for dierent block sizes (BB) and step sizes (S). 4.3.1 Experimental Set-up In the experiment, we consider training and testing images of dimension MN = 384 512 and decompose each image into blocks of size BB. After extracting 274 merged features from each block, K = 20; 000 sample blocks are selected from cover and stego images in the training set by the random sampling. These sample blocks are classied into C classes and a classier is obtained for each class. The uncompressed colour image database (UCID) [51] was used as the cover images in the training set. The INRIA Holidays dataset [23] was used as the cover images in the test set. The UCID image database consists of 1338 images, and the Holidays image database has 1491 images, which have diverse subjects such as natural scenes and articial objects. Although the original images were color images of dierent sizes, all images have been changed into 384 512 gray-level images and saved as JPEG les with a quality factor of 85 with JPEG compression. After obtaining cover images from the image databases, the model-based steganogra- phy (MBS) method [50] and the perturbed quantization (PQ) method [18] were used to 60 embed a secret message into the cover images to create the corresponding stego images. While the MBS method uses the original JPEG images obtained with a quality factor of 85 for cover images, the PQ method demands double-compressed JPEG images. In our experiment, the original JPEG images were compressed once again with a quality factor of 70 for the PQ method. As dierent images may have dierent embedding capacity, the embedding strength for each image is measured in units of BPC (bits per non-zero DCT AC coecients). Unless explicitly stated, the default BPC value was set to 0.20 for both MBS and PQ methods. 4.3.2 Performance Study of Block-Based Image Steganalysis For the performance study of block-based image steganalysis, 200 images from the uncompressed colour image database (UCID) [51] and the INRIA Holidays dataset [23] were used as cover images in the training set and the testing set, respectively. The MBS method [50] was used to create stego images with an embedding rate of 0.20 BPC. In the experiment, blocks were classied into C = 8 classes and 8 linear Bayes classiers were obtained with regularization parameters R =S = 0:001. Eect of Block Sizes First, we study the eect of block sizes. We would like to check whether the merged features from blocks of a larger size have better discriminative power to dierentiate cover and stego images. For each block size, we counted the number of correct and incorrect block decisions from all blocks obtained from 200 test images to compute the average block decision accuracy (p). The discriminative power of merged features for 4 block sizes (32 32; 64 64; 128 128; 256 256) is shown in Table 4.3. Note that overlapping block decomposition is used for block size 256 256. As shown in this table, the discriminative power of merged features from a larger block is better than that of merged features from a smaller block. The average block decision accuracy increases from 56:62% to 62:54% when the block size increases from 32 32 to 256 256. 61 Table 4.3: The average block decision accuracy (p) with dierent block sizes (BB). Block Size Block No. of Block Decisions Decision Number Correct Incorrect Accuracy 32x32 192 43,486 33,314 56.62 64x64 48 11,254 7,946 58.61 128x128 12 2,962 1,838 61.71 256x256 6 1,501 899 62.54 Eect of Block Numbers Next, we study the eect of block numbers. In the experiment, a block size of B B = 32 32 was used for images of size 384 512. Then, each image consists of 192 blocks. Among these 192 blocks, a dierent number of blocks was randomly selected for majority voting in the testing process. The average image decision accuracy, P , is plotted as a function of the block number,N in Fig. 4.5. We see from this gure that the average image decision accuracy, P , improves as more blocks are selected from the test image. The detection accuracy increases from 75:75% to 85:50% when the block number increases from 10 to 192. This experimental result clearly demonstrates the advantage of having a larger block number in block-based image steganlayis. Figure 4.5: The image decision accuracy, P , as a function of the block number, N. 62 Eect of Block Overlapping There exists a trade-o between the block size and the block number in non-overlapping block decomposition. If the block size becomes smaller, the block decision accuracy gets lower. On the other hand, if the block decision accuracy becomes higher with a larger block size, only a small number of blocks are available for the majority voting process. For this experiment, 200 images were used for the training set and the testing set, respectively. The average image decision accuracy P (detection accuracy) with dierent block sizes (BB) for the non-overlapping block decomposition case is shown in Table 4.4. Table 4.4: The average image decision accuracy (P ) for non-overlapping block decom- position with xed image size 384 512. Block Block Detection Size Number Accuracy 32x32 196 81.16 64x64 48 82.16 128x128 12 69.50 256x256 2 59.25 Among 4 dierent block sizes, the block-based image steganalysis with block size 64 64 has the best detection accuracy of 82:16%. If the block size is larger than 64 64, the detection accuracy decreases due to a smaller block number. The detection accuracy also decreases when the block size is less than 3232 due to lower block decision accuracy. The advantage of using overlapping blocks is shown in Table 4.5. In this experiment, 400 images were used for the training set and the testing set, respectively. If the step size is the same as the block size (S =B), it is the same as the non-overlapping block case. With the use of overlapping blocks, the average image decision accuracy (the average detection accuracy) increases from 71:82% to 80:96% for block size of 128 128, and from 79:22% to 82:66% for block size of 64 64. Overall, we can achieve the detection 63 accuracy slightly over 80% using block-based image steganalysis. Furthermore, we see that a larger block number contributes more to detection accuracy than a larger block size. For example, the detection accuracy increases from 80:96% to 82:66% as the block size decreases from 128 128 to 64 64 when overlapping blocks with step size 16 were used. Table 4.5: The average image decision accuracy (P ) with dierent block sizes (BB) and dierent step sizes (S). Block Step Overlap Block Detection Size Size Size Number Accuracy 128x128 128 0 12 71.82 128x128 32 96 117 79.92 128x128 16 112 425 80.96 64x64 64 0 48 79.22 64x64 32 32 165 80.55 64x64 16 48 609 82.66 Eect of Block Class Number The performance of block-based image steganalysis depends on the number of block classes, C. The more block classes we have, more codewords can be used to make the average distance between the codeword and block samples smaller. Thus, detection accu- racy is expected to improve with a higher block class number. The detection accuracy with dierent numbers of block classes is shown in Table 4.6. We see that detection accu- racy increases with the block class number. As the block class number increases from 2 to 64, detection accuracy increases from 71.40% to 88.56%. However, the performance improvement saturates as the block class number reaches 32 and beyond. 64 Table 4.6: Detection accuracy with dierent number of block classes. Number Detection Accuracy of Classes Cover Image Stego Image Total 2 64.52 78.27 71.40 4 73.98 75.45 74.71 8 81.29 82.29 81.79 16 86.32 85.11 85.71 32 86.92 88.26 87.59 64 90.48 86.65 88.56 Eect of Classiers A linear Bayes classier was used in all experiments in Sec. 4.3.2. In this subsection, we will compare the performance of block-based image steganalysis with dierent classiers (including the linear Bayes classier, the Fisher linear discriminant classier and the logistic classier). The detection accuracy results are shown for 8, 16 block classes in Table 4.7 and Table 4.8, respectively. In the experiment, the MBS method [50] was used to create stego images with an embedding rate of 0.20 BPC. We decompose each image from the training set and the testing set into blocks of size BB = 64 64. Sample blocks are classied into C = 8; 16 classes and a classier is obtained for each class. The majority voting scheme was adopted to fuse block decision results to make the nal decision. We see that both the logistic classier and the Fisher linear discriminant classier outperform the linear Bayes classer by a signicant margin. When the number of block classes is 8, the detection accuracy improves from 81.79% to 93.96% and 93.86% and, when the number of classes is 16, the detection accuracy improves from 86.55% to 94.97% and 95.10%, for the logistic classier and the Fisher linear discriminant classier, respectively. We also observe performance improvement for the PQ method with dierent classi- ers. The performance comparison of block-based image steganalysis for the PQ method 65 Table 4.7: The performance improvement of block-based image steganalysis with dier- ent classiers for MBS (8 block classes). Classier Type Detection Accuracy Cover Image Stego Image Total Linear Bayes Classier 81.29 82.29 81.79 Logistic Classier 97.38 90.54 93.96 Fisher Linear Discriminant Classier 97.59 90.14 93.86 Table 4.8: The performance improvement of block-based image steganalysis with dier- ent classiers for MBS (16 block classes). Classier Type Detection Accuracy Cover Image Stego Image Total Linear Bayes Classier 85.24 87.86 86.55 Logistic Classier 96.31 93.63 94.97 Fisher Linear Discriminant Classier 96.85 93.36 95.10 with dierent classiers is given for 8, 16 block classes in Table 4.9 and Table 4.10, respec- tively. The embedding rate was set to 0.2 BPC for this experiment. As the PQ method is known to be more secure than the MBS method, the detection accuracy is lower regardless of classier type and the class number. When the block class number is 8, detection accuracy improves from 57.55% to 64.45% and 64.52% and, when the number of classes is 16, detection accuracy improves from 58.22% to 64.08% and 64.82%, for the logistic classier and the Fisher linear discriminant classier, respectively. The perfor- mance improvement is around 6% for both cases, which is smaller than that of the MBS method. Table 4.9: The performance improvement of block-based image steganalysis with dier- ent classiers for MBS (8 block classes). Classier Type Detection Accuracy Cover Image Stego Image Total Linear Bayes Classier 56.07 59.02 57.55 Logistic Classier 70.49 58.42 64.45 Fisher Linear Discriminant Classier 64.92 64.12 64.52 66 Table 4.10: The performance improvement of block-based image steganalysis with dif- ferent classiers for MBS (16 block classes). Classier Type Detection Accuracy Cover Image Stego Image Total Linear Bayes Classier 56.00 60.43 58.22 Logistic Classier 68.88 59.29 64.08 Fisher Linear Discriminant Classier 65.12 64.52 64.82 4.4 Conclusion The performance of block-based image steganalysis as a function of the block size and the block number was studied in this chapter. Although a larger block size and a larger block number give better steganalaysis performance, there exists a trade-o between the block size and the block number for a given image in non-overlapping block decom- position. Thus, we can improve the performance of block-based image steganalysis by nding a balance between the block size and the block number. In addition, the use of overlapping blocks was proposed to increase the block number for overlapping block decomposition case. Experimental results were given to demonstrate that the block- based image steganalysis approach can achieve better detection accuracy with overlap- ping block decomposition. In addition, it is shown by experimental results that we can select larger block class number and dierent classier such as logistic classier and Fisher linear discriminant classier to improve the performance of block-based image steganalysis. The performance study of block-based image steganalysis shows us the eect of dierent factors, which enables us to maximize the performance of block-based image steganalysis. 67 Chapter 5 Decision Fusion for Block-Based Image Steganalysis 5.1 Introduction Many works related to decision fusion started in the areas of machine learning and pattern recognition [21], [37]. These works provide theoretical backgrounds for decision fusion, which combines individual decisions together to get a nal decision result. Decision fusion theory was also previously used for the image steganalysis problem. Kharrazi et al. proposed the use of information fusion methods to combine the outputs of multiple steganalysis techniques [27] for image steganalysis. They fused decisions from a set of steganalysis techniques trained independently using only one embedding technique. The outputs from dierent steganalyzers, each trained for one of the embedding techniques, are fused using the mean rule. Decision fusion theory is not needed for the frame-based approach, because a single decision result (cover or stego) is provided for each test image. However, decision fusion is important for block-based image steganalysis, because we have numerous block decisions from each image. As these block decision results are not always consistent, it is important to combine these results together to make a nal decision for a given test image. We originally used the majority voting rule, which is a simple method for decision fusion, to combine block decision results together to get a nal decision. In this chapter, we want to understand not only how to decide the importance of each block decision, but also what would be the best decision fusion strategy to combine block decisions together for block-based image steganalysis. 68 The rest of this chapter is organized as follows. Decision fusion theory including decision fusion levels and decision fusion topologies is explained in Sec. 5.2. Dierent decision level fusion techniques are discussed in Sec. 5.3. Experimental results using dierent decision level fusion techniques are presented in Sec. 5.4. Finally, concluding remarks and future work are given in Sec. 5.5. 5.2 Decision Fusion Theory 5.2.1 Decision Fusion Levels Decision fusion can be applied at dierent fusion levels. Ross et al. considered 5 dierent fusion levels for biometrics applications: sensor, feature, match, rank, and decision level fusion [49]. Kraetzer et al. also considered 5 fusion levels for audio steganalysis [33]. Figure 5.1: Overview of 5 dierent fusion levels [41]. Sensor level fusion is applicable only when there are multiple sources of raw data. For example, this is the case when multiple pictures are taken by a camera from the same scene or object to provide us multiple instances. As we are dealing with single JPEG test image at a time from the image database, sensor level fusion is not applicable for our steganalysis task. Feature level fusion is used when we can extract multiple feature sets from the same source. For example, we can extract many dierent feature sets (Wavelet features, DCT features, and Markov features) from the same test image. We need to consider all the possible features together to get the best possible performance. We can consider the merged DCT and Markov feature set [43] as one example of feature level fusion in that it combines DCT features and Markov features. In addition, we can measure the 69 Bhattacharyya distance to select important features from each feature set and merge them together to nd the optimal feature set. Match level fusion involves the usage of dierent classiers such as support vector machine and linear classier [33]. Match level fusion is also called score level fusion, measurement level fusion, and condence level fusion. Match level fusion uses continuous value scores or feature vectors from dierent classiers and uses this information together to make a nal decision for a given pattern. Although match level fusion might be computationally expensive for block-based image steganalysis, it is possible to apply it to improve the performance of block-based image steganalysis. Match level fusion can be achieved by getting classication results using dierent classiers and fusing decision results from these classiers. Then, we can consider giving more weights to decisions from classiers which give us more accurate decision results. It is also worthwhile to consider weights for dierent classiers dependent on several factors including the block type and the distance of feature value from the decision boundary. Rank level fusion combines ranking results of classes from dierent classiers to make a nal decision for a given pattern [2], [21]. This fusion can be useful because ranking results from dierent classiers are not always consistent. Thus, rank level fusion tries to fuse ranking results from dierent classiers together to nd the optimal ranking result. Ranking results from dierent classiers can be combined using the highest rank method, the Borda count method, and the logistic regression method. For block-based image steganalysis, this method is expected to be useful for the multi-classier scenario, when stego images are embedded with dierent steganographic algorithms. For each block decision, rankings are given for each image type (class) and combined later on to give the nal decision for a given test image. Decision level fusion tries to combine all the decisions together to make a nal deci- sion. Decision level fusion is also called abstract level fusion in that it uses discrete value decisions instead of continuous value scores or features. Scores from classiers are thresholded to provide discrete decision values for decision level fusion. For block-based 70 image steganalysis, the problem of making a nal decision for a given test image from block decisions can be considered as decision level fusion problem. The decision combi- nation includes Boolean functions such as the AND or OR operation, weighted majority voting, and Bayesian decision fusion. In this chapter, we will focus on using decision level fusion for block-based image steganalysis. 5.2.2 Decision Fusion Topologies For block-based image steganalysis, we need to consider a systematic method of combin- ing decision results from dierent classiers. This is because we select a content-adaptive classier for each block depending on its block class. This procedure is called classier combination, because it tries to use multiple classiers together to obtain the best pos- sible result. The categorization of combination methods can be made by considering the combination topologies or structures employed. These topologies can be broadly classied as multiple, conditional, hierarchical, or hybrid [37], [44]. For the conditional topology, two classiers (primary, secondary) are used. The primary classier tries to make an accurate decision for a given classication task. The secondary classier is only used when the primary classier is unable to provide decision results with high accuracy. The conditional topology is known to provide computational eciency. For example, a linear classier can be used as the primary classier and a support vector machine can be used for the secondary classier. As the support vector machine is known to be computationally intensive compared to the linear classier, it is better to use the support vector machine only when the linear classier cannot make accurate decisions. For the hierarchical (serial) topology, multiple classiers are used in a sequential order to get smaller number of possible decisions at each step from a set of possible decisions in the previous step. For example, each classier can select a smaller set with half of the original size until it can narrow the decision down to a single class. This would be an 71 easier task for each classier than the task when a single class needs to be selected from many classes. For the hybrid topology, dierent classiers are used depending on the features extracted from the given test data. For example, we can design dierent classiers based on features extracted from blocks of dierent types for block-based image steganalysis. This approach is content-adaptive in that it considers dierent block types based on features and selects corresponding classier for a specic block type. This approach will be useful especially when there exists feature variation among dierent block types. For the multiple (parallel) topology, several classiers are used to provide decision results, which are then combined to give the nal decision result for a given pattern. The decision fusion process in block-based image steganalysis can be considered to have a multiple topology. This is because we use multiple classiers to get decision results for all the blocks from a given image and then combine these results together to make a nal decision for a given test image. The simplest combination method is majority voting, which is also used in our framework. In addition, dierent weights in the form of posterior probabilities can be obtained for a specic class and specic classier from the training data. As block-based image steganalysis provides a block decision accuracy value for each block decision, simple operators such as product rule, sum rule, min rule, max rule, and median rule can be used to combine block decisions [30], [29]. The fuzzy integral can also be used for decision fusion with the multiple topology [25], [9], [37]. For a fuzzy integral, information sources rst need to be dened, which can be features extracted from images in the image steganalysis problem. A fuzzy measure is computed for each block class using images in the training set. For block-based image steganalysis, block decision accuracy for each block class can be considered to be a fuzzy measure. After the training process, an evidence function for each test image and each block class is computed to classify a given test image into a cover image or a stego image. The evidence functions are then integrated with respect to their corresponding class fuzzy measures, resulting in one condence value for each class. Finally, these 72 condence values are used to make a nal classication decision to be a specic image type with the highest condence. 5.3 Decision Fusion for Block-Based Image Steganalysis In block-based image steganalysis, majority voting was used to fuse block decision results to make a nal decision for a given test image. In this section, we want to see whether additional performance improvement is achievable with dierent decision fusion tech- niques. Decision level fusion tries to combine all the decisions from dierent classiers together to make a nal decision [33], [49]. For block-based image steganalyis, the prob- lem of making a nal decision for a given test image from block decision results can be considered as decision level fusion with the multiple (parallel) topology. Methods pro- posed for decision level fusion include \AND" and \OR" rules [12], weighted majority voting [38], Bayesian decision fusion [60], the Dempster-Shafer theory of evidence [60], and behavior knowledge space [22]. In this section, I will focus on theoretical explana- tions using dierent decision level fusion techniques including weighted majority voting, Bayesian decision fusion, and Dempster-Shafer theory of evidence. 5.3.1 Weighted Majority Voting Before applying the majority voting rule, the reliability of each block decision can be represented by the weights determined from the performance of the classier for dierent block classes. Whenever a block is decided to be from a cover image or a stego image by the binary classier, we can give a weight to this result depending on its block class. For the binary classier, there are only two image types: cover image (l = 1) and stego image (l = 2). For the multi-classier, we embed the secret message into cover images to get their corresponding stego images using dierent steganographic algorithms. The classier for each block class can be used to compute the correct decision rate for all 73 C classes. Block decision accuracy is dened to be the ratio of correct block decisions when blocks are decided to be from a specic type of image. Block decision accuracy can be represented with the probability of blocks which are actually from image type I l when blocks are decided to be from the image typeI l ,P (actual =I l jdecide =I l ). Block decision accuracy is dierent from the detection accuracy (the image decision accuracy) in that the detection accuracy is the ratio of correct image decisions when the actual image is of a specic type. The weights for blocks from the image type I l (l = 1; 2;:::;L) in the kth class, W b=I l (k), can be represented with the block decision accuracy as follows. W b=I l (k) =P (actual =I l jdecide =I l ); (5.1) These weights are used to determine the reliability of block decisions in the majority voting rule. After computing the weights for block decisions, we need to fuse these decisions to make the nal decision for a given test image. Each MN test image consists of MN=B 2 blocks of size BB. Based on the classier for each block class, we can make a decision whether each block is a block from a cover image or a stego image created with one ofL 1 steganographic algorithms. The decision for each block is weighted by W b=I l (k), which determines the importance of that decision. Thus, the total number of weighted decisions is equal to MN=B 2 . After making MN=B 2 weighted decisions for a test image, the majority voting rule is adopted to make a nal decision on whether a given test image is a cover image or a stego image created from one of theL1 steganographic algorithms. The nal decision can be made to be a cover image or a stego image with a specic steganographic algorithm which has the largest sum of weights. Instead of using weights for blocks, W b=I l (k), directly, we can consider several meth- ods to make weighted majority voting more accurate. First, we can decide to use weights only when they are larger than predened threshold, which is called the thresholding 74 method. Second, we can consider giving weights adaptively depending on the magnitude of weights, which is called the adaptive weighting method. Thresholding Method As block decisions are not always reliable, the thresholding method can be used to select more reliable blocks for weighted majority voting. Weights for blocks can be represented with a block decision accuracy value, which is between 0 and 1. It is benecial to give more weights to block decisions which have higher block decision accuracy values. The original weighting process gives weights for block decisions proportional to block decision accuracy values. However, if the block decision accuracy is smaller than 0:5, there are more incorrect block decisions than correct block decisions. Thus, a threshold value T , which decides whether to give weights for block decisions, needs to be at least 0:5. The updated weights for blocks from the image type I l (l = 1; 2;:::;L) in the kth class, W 0 b=I l (k), can be represented as follows using the thresholding method. W 0 b=I l (k) = 8 < : W b=I l (k) if W b=I l (k)T 0 if W b=I l (k)<T (5.2) Adaptive Weighting Method The adaptive weighting method can be used to give more weights depending on magni- tudes of weights obtained from block decision accuracy values. For the adaptive weight- ing method, we rst need to decide whether to give weights directly to block decision accuracy values. As there are more incorrect block decisions when the block decision accuracy is smaller than 0:5, it is better to apply the adaptive weighting method after subtracting 0:5 from block decision accuracy value. Then, we multiply this value by 2 so that the computed value stays within the range of 0 and 1. The normalized weights for blocks from the image type I l (l = 1; 2;:::;L) in the kth class, W b=I l (k), can be dened as follows. 75 W b=I l (k) = 8 < : 2 (W b=I l (k) 0:5) if W b=I l (k) 0:5 0 if W b=I l (k)< 0:5 (5.3) There are 3 dierent methods for the adaptive weighting method. First, we can give weights proportional to the normalized weights, which are called linear weights. Second, we can give weights based on the square root of normalized weights, which are called square root weights. Third, we can give weights based on the square of normalized weights, which are called square weights. The updated weights for blocks from the image type I l (l = 1; 2;:::;L) in the kth class, W 00 b=I l (k), can be represented as follows. W 00 b=I l (k) = 8 > > > > < > > > > : W b=I l (k) if linear weight q W b=I l (k) if root square weight ( W b=I l (k)) 2 if square weight (5.4) The updated weights from 3 dierent adaptive weighting methods, W 00 b=I l (k), are shown in Fig. 5.2. As block decision accuracy increases starting from 0:5, the root square weights give more variation when block decision accuracy values are close to 0:5. In contrast, square weights give similar weights when block decision accuracy values are close to 0:5, but give more variation when block decision accuracy values are close to 1. 5.3.2 Bayesian Decision Fusion Bayesian decision fusion transforms the discrete decision labels from the individual classi- ers into continuous probability values [49]. For block-based image steganalysis, Bayesian decision fusion can be used to combine block decision results within a given test image to compute the probability of making a nal decision as a specic class. For a binary classier, each block decision gives us the result whether each block is a cover block or a stego block. The rst step is to compute the confusion matrix. For block-based image steganalysis, we can use images in the training set to compute the confusion matrix for each block 76 Figure 5.2: Adaptive weighting method. class. Then, we can select the confusion matrix for each block decision based on its block class. Let us assume that there are N block decision results for a given test image. We want to classify this test image into one of L classes (image types). For example, there are N = 48 block decisions for each image when the image size is 384 512 and the block size is 64 64. For a binary classier, we only have 2 classes: w 1 (cover image), w 2 (stego image). For a multi-classier, we haveL classes which include cover imagew 1 , stego images with L 1 steganographic algorithms w 2 ;:::;w L . The confusion matrix CM is an LL matrix, which can be computed for each block class. Then, CM i represents the confusion matrix selected based on the block class of the ith block. The (p;q)th element of the matrix CM i represents the number of blocks from the training set classied into the qth class w q by the pth block decision when it is actually the ith class w i . When p = q, this is the case when blocks are correctly classied. Let c i be the decision result for the ith block from a given test image. Then, the (j;i)th element of matrixCM i ,CM i (j;i), can be used as an estimate of the conditional probability P (c i jw j ). 77 We can also estimate the prior probability of classw j . Let the total number of blocks from images in the training set beN and the number of blocks classied into classw j be N j . Then, N j =N can be used as an estimate of the prior probability of class w j , P (w j ). After obtaining N block decision results c = [c 1 ;:::;c N ] from a given test image, we need to decide which class w j this test image belongs to. This can be done by computing the posterior probability P (w j jc) for all the classes w 1 ;:::;w L and choosing the classw j which has the largest P (w j jc) value. The posterior probability P (w k jc) can be considered as the probability of making a decision w k for a test image given block decisions c = [c 1 ;:::;c N ]. In order to compute P (cjw j ), we can assume conditional independence between dif- ferent block decisions. Under this assumption, the conditional joint probability density P (cjw k ) can be expressed as the product of the marginal conditional densities as follows. P (cjw j ) =P (c 1 ;:::;c N jw j ) = N Y i=1 P (c i jw j ): (5.5) In fact, block decisions are not actually independent. However, the approximation with strong conditional independence is still known to be fairly accurate [13]. Thus, we can compute the conditional joint probability density to classify a given test image to a specic class w j . The Bayes rule can be used to compute the posterior probability of classw j ,P (w j jc). P (w j jc) = P (cjw j )P (w j ) P (c) (5.6) As P (c) is independent of the class w j , we can ignore this term when we want to compare P (w j jc) for dierent classes. Therefore, the discriminant function g j for class w j can be dened as follows. g j =P (cjw j )P (w j ) (5.7) 78 The Bayes decision fusion technique classies a given test image to a specic class w j which has the largest value of discriminant function g j . 5.3.3 Dempster-Shafer Theory of Evidence The Dempster-Shafer theory of evidence was proposed for decision fusion, which is a methodology to compute and accumulate belief functions according to the Dempster's rule [48], [35], [36], [49]. The degree of belief for an event is proposed instead of the probability of the event because it does not completely remove the possibility of the event even when the degree of belief is 0. The concepts of \decision templates" and \decision proles" were introduced to compute the accumulated degree of belief for each class. In the block-based image steganalysis, the decision template corresponds to block decision accuracy for dierent block classes and the decision prole corresponds to block decisions for a given test image. The decision template DT j for each class w j can be dened to be decision results obtained using the training set. For each class w j , we can observe block decision results for test images which belong to class w j . The ith row of decision template DT j is represented as DT j i , which is the ith block decision result. The decision template DT j i (NL matrix, i = 1;:::;N and j = 1;:::;L) can be computed from the confusion matrixCM j by dividingCM i j byN j , whereN j represents the total number of images for class w j in the training set. DT j i = CM i j N j (5.8) The (p;q)th element of the matrix CM j represents the number of blocks from the training set classied into the qth class by the pth block decision when they actually belong to the jth class. In our framework, we can use block decision accuracy as DT j i when it is computed using blocks of the same block class with that of the ith block from the training set. 79 The decision template is dierent from confusion matrices in that it represents the probability of classifying blocks into dierent classes. In contrast, confusion matrices represent the actual number of blocks classied into dierent classes. For a given test image, we can represent N block decision results for an L class problem with a concept called \decision prole (DP)". We need to make a decision for a given image to be either a cover image or a stego image for a binary classier scenario (L = 2). When multiple steganographic algorithms are used to create stego images, then it is a multi-classier scenario with L> 2. The decision prole can be represented with a NL matrix as DP = 2 6 6 6 6 6 6 6 6 6 6 4 S 1 S i ::: S N 3 7 7 7 7 7 7 7 7 7 7 5 = 2 6 6 6 6 6 6 6 6 6 6 4 s 1;1 s 1;j s 1;L s i;1 s i;j s i;L ::: s N;1 s N;j s N;L 3 7 7 7 7 7 7 7 7 7 7 5 ; (5.9) whereS i represents the degree of support for each ofL classes from theith block decision result. The degree of support represents the level of belief for each ofL classes computed from each block decision result. The degree of supports i;j is dened for thejth classw j and the ith block decision as follows. s i;j = 8 < : 1; if output of the i th block decision is class w j 0; otherwise (5.10) After extracting features from each block, the degree of support is computed to represent the probability of a given block being assigned to a specic class. The decision value is 1 for specic class when the ith block decision is made to be that specic class. Then, all the remaining decision values for other classes are 0. This is the reason why only one value from each S i is 1 and all other values are 0. 80 After computing decision proles and decision templates, the similarity i;j mea- sures how similar the decision prole of the ith block for the jth class is to the decision template. The intuition behind similarity is that it tries to compute the probability of making a decision for a specic class based on statistics (decision templates) obtained from the training set. The similarity is larger when there is a small dierence between the decision prole and the decision template. Based on past decision results (deci- sion templates) from the training set, current decision results (decision prole) can be analyzed by computing the similarity. The similarity i;j between the decision prole DP i and the decision template DT j i for the ith block decision and the jth class is dened as i;j = 1 + DT j i DP i 2 1 L P k=1 1 + DT k i DP i 2 1 ; (5.11) where DP i represents the ith row of DP and DT j i represents the ith row of of DT j belonging to class w j , andkk denotes the matrix norm. For the ith block decision (i = 1;:::;N) of class w j (j = 1;:::;L), the degree of belief is dened as b i;j = i;j " L Q k=1;k6=j (1 i;k ) # 1 i;j " L Q k=1;k6=j (1 i;k ) #: (5.12) Both the degree of belief and the similarity have larger values when the decision prole is similar to the decision template. However, the degree of belief is dierent from the similarity in that it not only considers the possibility of selecting a specic class, but also considers the possibility of not selecting the remaining classes at the same time. This is done by multiplying all the similarity terms whether it is the value itself i;j or they are subtracted from 1, 1 i;k (k6=j). 81 When compared to the similarity, the degree of belief for a specic class is larger when the remaining similarity values are about the same. In contrast, the degree of belief for a specic class is smaller when some similarity values among remaining similarity values are much larger than the others. For example, the degree of belief for a specic class is larger when the remaining similarity values are 0:2; 0:2; 0:2 rather than the case when the remaining similarity values are 0:4; 0:1; 0:1. Finally, the accumulated degree of belief for each class w j (j = 1;:::;L) based on N block decisions can be computed using the Dempster's rule as follows. g j = N Y i=1 b i;j (5.13) Based on the accumulated degree of belief for each class w j , we can classify a test image into the class w j which has the largest g j value. 5.4 Experimental Results 5.4.1 Experimental Set-up In the experiment, we consider training and testing images of dimension MN = 384 512. The uncompressed colour image database (UCID) [51] was used as the cover images in the training set. The INRIA Holidays dataset [23] was used as the cover images in the testing set. For the steganographic embedding method, the MBS method [50] was used to create stego images with an embedding rate 0:2 BPC. Cover and stego images are decomposed into blocks of size BB = 64 64. After extracting 274 merged features from each block, K = 20; 000 sample blocks are selected from all the cover and stego images in the training set by random sampling. These sample blocks are classied into C = 8 classes and a classier is obtained for each class, which is used for the testing process. 82 5.4.2 Weighted Majority Voting Thresholding Method In this subsection, the thresholding method is used for weighted majority voting to improve the performance of block-based image steganalysis. In order to nd a thresh- old value used to select block decisions for weighted majority voting, we computed block decision accuracy values for dierent block classes of cover and stego images from the training process. We observed that block decision accuracy values for many block classes are distributed around 0:5. Also, we found out the maximum value is 0:6347 and the minimum value is 0:4922. Thus, we decided to use 4 dierent threshold values (T = 0:52; 0:53; 0:55; 0:58) for our experiments. We will not consider block decisions for weighted majority voting if block decision accuracy values are smaller than threshold value T . The detection accuracy result with 4 dierent thresholds is given in Table 5.1. Table 5.1: Performance comparison of block-based image steganalysis with dierent threshold values. Threshold Detection Accuracy Cover Image Stego Image Total NO 78.54 84.91 81.72 0.52 79.88 84.24 82.06 0.53 79.07 84.91 81.99 0.55 79.14 84.98 82.06 0.58 75.25 86.65 80.95 As shown in Table 5.1, we have slight performance improvement when use thresh- olding method. However, detection accuracy decreases if we remove too many block decisions by large threshold value 0:58. This means that it is important to have not only block decisions with higher block decision accuracy values, but also enough number of block decisions for weighted majority voting. 83 Adaptive Weighting Method In this subsection, the adaptive weighting method is used for weighted majority voting to improve the performance of block-based image steganalysis. The detection accuracy result with 3 dierent adaptive weighting methods (linear, root square, square) is given in Table 5.2. We got slight performance improvement after using adaptive weighting method. As 3 dierent adaptive weighting methods have similar detection accuracy result, computing the normalized weights seems to be more important factor for weighted majority voting. Table 5.2: Performance comparison of block-based image steganalysis with dierent adaptive weighting methods. Weight Methods Detection Accuracy Cover Image Stego Image Total NO 78.54 84.91 81.72 Linear 78.67 85.45 82.06 Root Square 78.87 85.04 81.96 Square 79.21 84.84 82.03 5.4.3 Performance Comparison The detection accuracy of block-based image steganalysis using 3 dierent decision fusion techniques (weighted majority voting, Bayesian decision fusion, Dempster-Shafer theory of evidence) is given in Table 5.3. Note that 8 dierent block classes are used for this experiment. Table 5.3: Performance comparison of block-based image steganalysis with dierent fusion methods. Decision Fusion Techniques Detection Accuracy Cover Image Stego Image Total Weighted Majority Voting 78.54 84.91 81.72 Bayesian Decision Fusion 78.74 85.38 82.06 Dempster-Shafer Theory of Evidence 79.28 85.51 82.39 84 Although we have slight performance improvement with Bayesian decision fusion and Dempster-Shafer theory of evidence, the performance of block-based image steganalysis is similar irrespective of dierent decision fusion techniques. In fact, the overall detection accuracy increased from 81:72% to 82:06% and 82:39%, which are performance improve- ments of less than 1%. This result shows us that block-based image steganalysis has consistent performance even if dierent decision fusion techniques are used. 5.5 Conclusion In this chapter, decision level fusion with the multiple topology is used to improve the performance of block-based image steganalysis. For decision level fusion, we applied 3 dierent approaches: weighted majority voting, Bayesian decision fusion, the Dempster- Shafer theory of evidence. Instead of using majority voting for decision fusion, we are able to improve the performance of block-based image steganalysis using these 3 methods. In addition, we proposed updates for weighted majority voting, which are the thresholding method and the adaptive weighting method, to get additional performance improvement. Although we mainly focused on decision level fusion in this chapter, it would be interesting to apply feature level fusion and rank level fusion for additional performance improvements even with more complicated multi-classier scenario. For decision level fusion, we made a single decision for given block and got multiple decisions using all the blocks in a given test image. However, we can consider the case when multiple decisions are made with the same given data using dierent classiers (linear classier, support vector machine). Then, we can use dierent types of classiers for the same block to get additional information. 85 Chapter 6 Content-Dependent Feature Selection for Block-Based Image Steganalysis 6.1 Introduction Most previous work on image steganalysis, referred to as the frame-based approach, focused on nding a better feature set with a large number of features from the entire image frame to improve detection performance [15], [43], [53]. On the other hand, since an input image typically consists of heterogeneous regions, block-based image steganalysis was proposed to decompose an image into smaller homogeneous blocks and treat each block as a basic unit for steganalysis [6]. While the frame-based approach extracts features from an image, the block-based approach extracts features from each individual block within an image. The use of a large number of features demands a high computational complexity in the training process, where many cover images and stego images will be analyzed for blind steganalysis. For block-based image steganalysis, this problem is more severe since features are extracted separately from each block instead of the whole image. Furthermore, a large dimensional feature space tends to result in a sparse distribution and, consequently, the steganalysis result could be less accurate. Feature selection and/or reduction provides a way to alleviate the above-mentioned problems. This technique has 86 been extensively studied in various disciplines such as machine learning with a large number of features extracted from massive data sets [39], [56]. For the frame-based image steganalysis, it is dicult to reduce the dimension of the feature space eectively. Although one feature might not be discrimant in one region, it may be useful in other regions. In contrast, it is feasible to select features adaptively in block-based image steganalysis. That is, we can select fewer yet discrimant features depending on the characteristics of the block type, which is called content-dependent fea- ture selection. We will discuss the use of content-dependent feature selection to improve the performance of block-based image steganalysis with a signicantly smaller number of features. The rest of this chapter is organized as follows. The procedure for content-dependent feature selection in block-based image steganalysis is discussed in Sec. 6.2. Experimental result which shows the performance improvement of content-dependent feature selection method over content-independent feature selection method is presented in Sec. 6.3. Finally, concluding remarks and the future work are given in Sec. 6.4. 6.2 Content-Dependent Feature Selection 6.2.1 Dierent Block Types for Block-Based Image Steganalysis For content-dependent feature selection, we rst need to think about ways to classify blocks into dierent types for block-based image steganalysis. After block classication, a specic classier will be designed for each block type using highly discriminant features. We may consider two dierent methods for block classication as detailed below. 1. Scheme A: classication based on gray levels One intuitive way to classify block types is to use gray levels of the block. If we deal with blocks of size 8 8, each block has 64 gray level values. Then, vector quantization based on gray scale values can be used to classify blocks into dierent block types. However, gray scale values from blocks do not re ect the dierence 87 between cover images and stego images. In fact, cover images and stego images are visually identical in most cases. This is because gray scale values are not sensitive to subtle changes made after steganographic embedding. 2. Scheme B: classication based on derived steganalysis features Another way to do block classication is to use derived steganalysis features. Since our goal is to maximize the performance of classiers trained by features of dierent block types, it is desirable to classify blocks into multiple types based on the same features used in steganalysis. These classiers are sensitive to the change of these features as a result of steganographic embedding. After classifying blocks into dierent types, the averaged feature vector in each block type is computed, which is called the codeword of that block type. When the merged features are used for block classication, each codeword has 274 feature components. We apply the k-means clustering technique to partition blocks into 4 groups, where each group corresponds to one block type. Then, we apply the principal component analysis to reduce the feature dimension to two. The centroids of all 4 clusters, called the codewords, in the 2-D feature space are shown in Fig. 6.1. We compared the performance of block classication schemes A and B, and observed that the detection accuracy of Scheme B is higher than that of Scheme A by 10% or more. Thus, we adopt Scheme B in our experiments. Sample image from UCID image database and its block classication result using scheme B are shown in Fig. 6.2 and Fig. 6.3, respectively. As we used steganalysis features to classify blocks, blocks from the same block type do not have similar visual characteristic. 6.2.2 Feature Selection In order to select important features from a large feature set, we can measure each fea- ture's discriminatory power. There are three measures to evaluate feature discriminatory power [39], [56]; namely, distance, information, correlation measures. 88 Figure 6.1: The 4 codewords representing 4 block types in the 2-D feature space derived from the principal component analysis. Distance Measures In order to select desired features from a large feature set, we measure each feature's discriminatory power using a distance measure. The distance measure computes the distance between distributions of the feature vector in two dierent classes [56]. It is also known as the separability, divergence, or discrimination measure. Our objective is to nd features that maximize the distance (and thus separability) between classes. A feature that yields a larger distance is said to have a larger discriminatory capability since it is easier to dierentiate one class from the other when there is large disparity between distributions from these two classes. Examples of distance measures include: Minkowski, Euclidean, Chebychev, Mahalanobis, Bhattacharyya, Divergence, and Patrick-Fischer measures. The Bhattacharyya distance is a typical distance measure used to evaluate feature discriminatory power. It can be computed as B(p 1 ;p 2 ) = log Z p p 1 (x)p 2 (x)dx; (6.1) 89 Figure 6.2: Sample image from UCID image database. wherex is a feature (or a feature vector), is the feature space, andp 1 (x) andp 2 (x) are the feature probability density functions under class 1 and class 2, respectively. Information Measures Information measures determine the information gain from a feature, the mutual infor- mation between feature distributions in two classes, or the entropy based on class sep- arability. For the mutual information, a feature's discriminant power is stronger if the mutual information between its distributions in two classes is smaller. Less mutual infor- mation means that there exists smaller similarity between distributions and, thus, larger separability. Mutual information can be dened as I(X;Y ) = Z Y Z X p(x;y) log p(x;y) p 1 (x)p 2 (y) dxdy; (6.2) where p(x;y) is the joint probability density function of random variables X and Y , and p 1 (x) and p 2 (y) are the marginal feature probability density functions of X and Y , respectively. In the current context, random variables X and Y denote values of the same feature in two classes to be separated. 90 (a) block type 1 (b) block type 2 (c) block type 3 (d) block type 4 Figure 6.3: 4 dierent block types based on steganalysis features (Scheme B). Correlation Measures Correlation measures, also known as dependence measures or similarity measures, quan- tify the ability to predict the distribution of one class from the distribution of another class. The correlation coecient is one correlation measure that nds the correlation between distributions from two dierent classes. It is dened as X;Y = cov(X;Y ) X Y = E[(X X )(Y Y )] X Y ; (6.3) where cov(X;Y ) is the covariance of random variables X and Y while X and Y are their standard deviations, respectively. If the correlation between distributions from two classes is large, it is more dicult to separate them. 91 Among correlation measures, the Fisher ratio [14], [34] oers a good measure to rank features for classication tasks. It is dened as F k = ( 1k 2k ) 2 ( 2 1k + 2 2k ) ; (6.4) where 1k and 2k are the cluster means for thekth vector component of classes 1 and 2, respectively; and 2 1k and 2 2k are the corresponding cluster variances. The Fisher ratio is larger with larger inter-class separation and smaller intra-class spread. However, it is dicult to dierentiate two distributions with the Fisher ratio when two distributions have the same overall mean and variance values. 6.2.3 Measuring Feature Discriminatory Power To compute a feature's discriminatory power, we rst determine the probability distri- butions of feature values extracted from cover images and stego images for each feature component and, then, calculate the Bhattacharyya distance as dened in Eq. (6.1) based on these two probability distributions. Since there are only a limited number of samples for probability distributions, we approximate the integral as given in Eq. (6.1) with a discrete summation. The detailed numerical procedure is described below. First, we nd the maximum and minimum values of each feature component which are from either cover or stego images. Then, we divide the feature values into a pre- dened number of bins to estimate probability distribution functions (PDFs). A larger number of bins tend to result in more accurate PDFs at the cost of higher computational complexity. We observe experimentally that 100 bins can lead to a close approximation of the Bhattacharyya distance. To give an example, sample feature value distribution with 100 bins extracted from cover images are given in Fig. 6.4. Finally, we can compute the Bhattacharyya distance as dened in Eq. (6.1) by summing over all possible values of p 1 (x)p 2 (x). 92 Figure 6.4: Illustration of the distribution of sample feature component values obtained from cover images. 6.2.4 Feature Ranking We use the Bhattacharyya distance measure to rank the discriminant power of features for block-based image steganalysis. The feature set of consideration is the merged fea- tures proposed in [43], which has a dimension of 274. The merged features consist of 193 DCT features [15] and 81 Markov features [53]. The Markov features are computed from 4 transition probability matrices along 4 directions: horizontal, vertical, diagonal, and minor diagonal. The list of the merged features with their index numbers is summarized in Table 6.1. The feature ranking result is shown in Table 6.2, where we list the top 5 features that have the largest discriminant power for each of the 4 block types. As shown in Table 6.2, the most discriminant features for each of the 4 block types are not the same. In other words, for a given block type, we are able to nd the most suitable feature subset in conducting the steganalysis. Instead of using the full feature set for all block types, the proposed content-dependent feature selection method selects features of higher discriminant power for each block type. 93 Table 6.1: Index table of 274 merged feature components. Feature Index Features 1 11 Global Histogram 12 66 AC Histogram 67 165 Dual Histogram 166 Variation 167 168 Blockiness 169 193 Co-occurrence Matrix 194 274 Markov Features Table 6.2: Feature ranking with top 5 features for 4 block types Block Feature Ranking Type 1 2 3 4 5 1 250 251 236 227 179 2 227 251 173 229 267 3 173 250 168 4 268 4 227 232 171 218 4 6.3 Experimental Results 6.3.1 Experimental Setting First, we collected 1300 images from the uncompressed color image database (UCID) [51] and another 1300 images from the INRIA Holidays dataset [23]. These 2600 images are used as cover images. Then, we use the model-based steganography (MBS) scheme [50] to embed a secret message into cover images to create their associated stego images with an embedding rate of 0:2 BPC (bits per non-zero DCT AC coecients). The dimension of the cover and stego images is the same; namely, HW = 384 512. Each image is decomposed into blocks of sizeBB = 6464. After extracting 274 merged features [43] from each block,K = 20; 000 sample blocks were randomly selected from cover and stego images in the training set. These sample blocks were classied intoC = 4 classes. For the 94 classier design, the linear Bayes classier was adopted for each class with regularization parameters R =S = 0:001. To apply the 5-fold cross-validation, we divided both the UCID image dataset and the INRIA Holidays dataset into 5 groups of equal size (i.e., 260 images per group). Then, 4 groups were selected for the training set and the remaining 1 group was used as the test set. As a result, 2080 image pairs (cover image and corresponding stego image) were used in the training set and 520 image pairs were used in the test set for one image dataset (4160 images in the training set and 1040 images in the test set). Furthermore, we repeated this process ve times, where we select one of the 5 groups as the test set and the other 4 groups as the training set at each time. The nal detection accuracy was computed as the averaged detection results obtained from these 5 trials. 6.3.2 Content-Independent and Content-Dependent Feature Selection In this section, we provide experimental results to demonstrate the advantage of the content-dependent feature selection method. For performance benchmarking, we con- sider the case when the same subset of features is selected based on the Bhattacharyya distance regardless of the block types. This scenario is called the \content-independent feature selection method". It will be evident to see the advantage when we compare the performance of content-dependent and content-independent feature selection methods. The content-dependent feature selection method is expected to perform better since it selects features which have larger discriminant power for a given block type. The detection accuracy of content-dependent and content-independent feature selec- tion schemes with dierent numbers of features (8, 16, 32, 64, 128, 256, 274) are plotted as two curves in Fig. 6.5. We can clearly see that the content-dependent feature selection method always outperforms the content-independent feature selection method. The per- formance gap is signicant when the feature number is lower than 200, and the dierence in detect accuracy is more than 10%. To give a concrete example, when 8 features are selected, the detection accuracy of content-dependent feature selection is 65:73% while 95 Figure 6.5: Performance Comparison of Content-Dependent and Content-Independent Feature Selection Methods. that of content-independent feature selection is 52:23%. As the feature number goes beyond 200, the performance gap becomes narrower. These two methods coincide with each other when the entire feature set (274 features in total) is fully used. 6.3.3 Discussion We used 4 block types in this experiment. However, one may use a larger number of block types and the performance improvement of the content-dependent feature selection scheme over the content-independent feature selection scheme is expected to be higher. This is because content-dependent feature selection provides selected features tailored to each block type. Note also that the detection accuracy does not degrade when the number of features becomes larger. This can be explained by the fact that the decision is a binary one. For the binary classier, we only need to partition the feature space into two subspaces, and a larger number of features can help a binary classier as observed previously in [42], [32]. On the other hand, for the case of a multi-classier, the detection accuracy may not always increase with the number of features. 96 6.4 Conclusion A content-dependent feature selection scheme was proposed for block-based image ste- ganalysis in this chapter. The proposed scheme outperforms the content-independent feature selection scheme by a siginicant margin with a wide range of feature numbers. There are several possible extensions along this direction. First, we used the distance measure for feature ranking and selection here. It is however possible to consider other measures including information measures and correlation measures in the optimal feature selection. Second, besides content-dependent feature selection, we can apply the principle component analysis (PCA) to reduce the feature dimension furthermore. It would be interesting to nd out how much performance improvement can be reached via PCA. Third, for a binary classier, the experimental results show that it is benecial to have a larger number of features. However, a larger number of features may not necessarily improve the steganalysis performance for a multi-classier. We will examine content- dependent feature selection for the multi-classier as well. 97 Chapter 7 Conclusion and Future Work 7.1 Summary of the Research A block-based image steganalysis was proposed in Chapter 3, where dierent classiers were adopted for dierent block classes based on their characteristics. As natural images consist of heterogeneous regions, we decompose cover images and stego images with dierent steganographic algorithms in the training set into smaller homogeneous blocks. Then, we treat each block as a basic image unit and extract Markov and DCT features from image blocks for block-based steganalysis. Based on features from blocks selected by the random sampling, the tree-structured vector quantization (TSVQ) technique is adopted to classify blocks into multiple classes. For each class, a classier can be trained using block features, which represent the characteristics of that block type. For a given unknown test image, instead of making a single decision for the entire image, we repeat the block decomposition process and choose a classier to make a decision for each block depending on the block class. Depending on block classes and image types, weights can be given to block decisions. Finally, the majority voting rule is used to fuse the weighted decision results from all the blocks so that we can decide whether the unknown image is a cover image or a stego image created with a specic algorithm. It was shown by experimental results that the proposed method oers a signicant improvement in detection accuracy as compared with previous work using a frame-based approach in both the binary classier and the multi-classier. For the decision fusion process, dierent weights for block decisions based on block classes and image types are proposed to improve the performance of block-based image steganalysis. In addition, 98 the block-based image steganalysis also provides the decision reliability information for each given image based on voting dierence in the majority voting rule, which was not possible with frame-based approaches. The performance of block-based image steganalysis as a function of the block size and the block number was studied in Chapter 4. The analysis is given to support that a larger block size and a larger block number lead to better steganalysis performance. As the block size becomes larger, the standard deviations of feature values become smaller to have more discriminative power. As the block number goes to innity, the probability of making a correct decision using the majority voting converges to 1. Since there exists a trade-o between the block size and the block number for a given image in non- overlapping block decomposition, we can improve the performance of block-based image steganalysis by nding a balance between the block size and the block number. In addition, the use of overlapping blocks was proposed to increase the block number for overlapping block decomposition case. Experimental results were given to demon- strate that the block-based image steganalysis approach can achieve better detection accuracy with overlapping block decomposition. Thus, besides the search of more and better features, the block-based methodology provides a complementary way to increase the image steganalysis performance. Finally, experimental results were given to show that the detection accuracy of for block-based image steganalysis increases with a larger number of block classes and dierent classiers such as Fisher linear discriminant classi- er and logistic classier. Decision fusion for block-based image steganalysis is discussed in Chapter 5. As we have multiple block decisions from each test image, we need to nd a systematical method to combine all the block decision results together to make a nal decision of given image. After explaining dierent decision fusion levels and topologies, we used decision level fusion with multiple (parallel) topology. For decision level fusion, 3 dierent approaches are used: weighted majority voting, Bayesian decision fusion, and Dempster- Shafer theory of evidence. In addition, we also proposed how to give importance for each 99 block decision for weighted majority voting by using thresholding method and adaptive weighting method. Finally, the performance comparison of dierent decision level fusion techniques are presented. Content-dependent feature selection for block-based image steganalysis is proposed in Chapter 6. As high dimensional feature set is used for block-based image steganalysis, it is necessary to select features with high discriminatory power to reduce the compu- tational complexity without sacricing the detection accuracy performance. Although it is dicult to reduce the size of feature set with traditional frame-based approach, we can select discrimant features depending on the characteristics of the block type for block-based image steganalysis. For content-dependent feature selection, we rst dene block types based on steganalysis features and discuss several approaches to measure feature discriminatory power. Finally, we presented experimental result which shows performance improvement using content-dependent feature selection. 7.2 Future Research Directions To make our work complete, we would like to extend the block-based image steganalysis along the following directions. Consideration of dependency between neighboring block decisions Although the block number increases due to the use of overlapping blocks, we expect that the eectiveness of block overlapping will diminish as the step size becomes smaller and smaller since block decisions of adjacent blocks will correlate more and more. It will be interesting to analyze the performance of block-based image steganalysis more accurately by taking the dependency of block decisions into account. Advanced procedure for weights computation The process of computing weights based on block classes in block-based image steganalysis has a lot to be improved. The original weighting process gives weights 100 for block decisions proportional to block decision accuracy. In order to improve the weighting process, we need to consider 2 important factors. First, we need to give more weights to block decisions which have high block decision accuracy values. Second, we need to give penalties to block decisions which have block decision accuracy lower than 0:5. We should not give any weight at all to block decisions if they are more likely to make errors. Block adaptive feature set It is worthwhile to consider a block adaptive feature set depending on block classes determined by characteristics of image blocks. This can be considered as additional methodology to take advantage of homogeneous blocks from heterogeneous image. Dierent features sets can be selected based on block classes in order to analyze image blocks for steganalysis purpose. Thus, it is possible to use multiple feature sets together to obtain better steganalysis performance for given test image. Adaptive block decomposition based on block characteristics In the block-based image steganalysis, images are decomposed into smaller homo- geneous blocks of the same size. However, not all the blocks are homogenous with a xed block size depending on block characteristics. For example, larger blocks from smooth regions of given image can still be homogenous while smaller blocks are required to be homogeneous if blocks are from edge regions. Thus, it would be benecial to consider adaptive block decomposition, which changes the block size adaptively based on blocks characteristics. Multi-resolution approach Dierent block sizes can be used for block decomposition in the block-based image steganalysis. We can consider multi-resolution approach by combining steganalysis results from dierent block sizes together. This is possible if we can give dierent weights for steganalysis results from dierent block sizes. This multi-resolution 101 approach is another methodology to exploit rich information of natural images in depth. Using stego images with full embedding rate for the training process Many steganographic embedding algorithms are block-based, which embeds a secret message separately for each 8 8 DCT block. This implies that not all DCT blocks are embedded with the message depending on the embedding rate. However, the block-based image steganalysis has an assumption that all the homo- geneous blocks from stego images in the training set are embedded with the secret message. If the embedding rate for stego images is low, then some blocks will not be embedded with the message. Thus, it would be benecial to use stego images with full embedding rate for the training process to design more accurate classier for the block-based image steganalysis. 102 Bibliography [1] I. Avciba s, N. Memon, and B. Sankur. Image Steganalysis with Binary Similarity Measures. In Proc. IEEE Int'l Conf. Image Processing, Rochester, NY, September 2002. [2] S.C. Bagui and N.R. Pal. A multistage generalization of the rank nearest neighbor classication rule. Pattern Recognition Letters, 16(6):601{614, 1995. [3] C. Cachin. An information-theoretic model for steganography. Information and Computation, 192(1):41{56, July 2004. [4] A. Cheddad, J. Condell, K. Curran, and P. Mc Kevitt. Digital Image Steganography: Survey and Analysis of Current Methods. EURASIP Journal on Signal Processing, March 2010. [5] C. Chen and Y.Q. Shi. JPEG image steganalysis utilizing both intrablock and interblock correlations. In Proc. IEEE Int'l Sym. Circuits and Systems, Seattle, WA, May 2008. [6] S. Cho, B.-H. Cha, J. Wang, and C.-C. J. Kuo. Block-Based Image Steganalysis: Algorithm and Performance Evaluation. In Proc. IEEE Int'l Sym. Circuits and Systems, Paris, France, May 2010. [7] S. Cho, B.-H. Cha, J. Wang, and C.-C. J. Kuo. Block-Based Image Steganalysis for a Multi-Classier. In Proc. IEEE Int'l Conf. Multimedia and Expo, Singapore, July 2010. [8] S. Cho, B.-H. Cha, J. Wang, and C.-C. J. Kuo. Performance Study on Block-Based Image Steganalysis. In Proc. IEEE Int'l Sym. Circuits and Systems, Rio de Janeiro, Brazil, May 2011. [9] S.B. Cho and J.H. Kim. Multiple network fusion using fuzzy logic. Neural Networks, IEEE Transactions on, 6(2):497{501, 1995. [10] T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley-Interscience, second edition, 1991. [11] I.J. Cox, M. Miller, J. Bloom, J. Fridrich, and T. Kalker. Digital watermarking and steganography. Morgan Kaufmann, 2007. 103 [12] J. Daugman. Combining multiple biometrics. the Computer Laboratory at Cambridge University, 2000. [13] P. Domingos and M. Pazzani. On the optimality of the simple bayesian classier under zero-one loss. Machine learning, 29(2):103{130, 1997. [14] R. O. Duda and P. E. Hart. Pattern classication and scene analysis. Wiley, 1973. [15] J. Fridrich. Feature-based steganalysis for JPEG images and its implications for future design of steganographic schemes. In Proc. ACM Int'l Workshop on Information Hiding, Toronto, Canada, May 2004. [16] J. Fridrich. Steganography in Digital Media: Principles, Algorithms, and Applications. Cambridge University Press, 2009. [17] J. Fridrich, M. Goljan, and D. Soukal. Perturbed quantization steganography using wet paper codes. In Proc. ACM Multimedia & Security Workshop, Magdeburg, Germany, September 2004. [18] J. Fridrich, M. Goljan, and D. Soukal. Perturbed quantization steganography. ACM Multimedia System Journal, 11(2):98{107, 2005. [19] J. Fridrich, T. Pevn y, and J. Kodovsk y. Statistically undetectable jpeg steganogra- phy: dead ends challenges, and opportunities. In Proc. ACM Multimedia & Security Workshop, Dallas, TX, September 2007. ACM. [20] S. Hetzl and P. Mutzel. A graph{theoretic approach to steganography. In Communications and Multimedia Security, 9th IFIP TC-6 TC-11 International Conference, CMS 2005, Salzburg, Austria, September 2005. Springer. [21] T.K. Ho, J.J. Hull, and S.N. Srihari. Decision combination in multiple classier sys- tems. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 16(1):66{ 75, 1994. [22] Y.S. Huang and C.Y. Suen. A method of combining multiple experts for the recognition of unconstrained handwritten numerals. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(1):90{94, 1995. [23] H. J egou, M. Douze, and C. Schmid. Hamming embedding and weak geometric con- sistency for large scale image search. In European Conf. Computer Vision, Marseille, France, October 2008. [24] J. Katz and Y. Lindell. Introduction to Modern Cryptography. Chapman & Hall/CRC, 2008. [25] J.M. Keller, P. Gader, H. Tahani, J.H. Chiang, and M. Mohamed. Advances in fuzzy integration for pattern recognition. Fuzzy sets and systems, 65(2-3):273{283, 1994. 104 [26] A.D. Ker. The ultimate steganalysis benchmark? In Proc. ACM Multimedia & Security Workshop, Dallas, TX, September 2007. ACM. [27] M. Kharrazi, H.T. Sencar, and N. Memon. Improving Steganalysis by Fusion Tech- niques: A Case Study with Image Steganography. 4300:123{137, 2006. [28] Y. Kim, Z. Duric, and D. Richards. Modied matrix encoding technique for minimal distortion steganography. In Proc. ACM Int'l Workshop on Information Hiding, Alexandria, VA, July 2006. [29] J. Kittler. Combining classiers: A theoretical framework. Pattern Analysis & Applications, 1(1):18{27, 1998. [30] J. Kittler, M. Hatef, R.P.W. Duin, and J. Matas. On combining classiers. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 20(3):226{239, 1998. [31] J. Kodovsk y and J. Fridrich. Calibration revisited. In Proc. ACM Multimedia & Security Workshop, Princeton, NJ, September 2009. [32] J. Kodovsk y and J. Fridrich. Modern Steganalysis Can Detect YASS. In Proc. SPIE Conf. Electronic Imaging, Media Forensics and Security, San Jose, CA, January 2010. [33] C. Kraetzer and J. Dittmann. The Impact of Information Fusion in Steganalysis on the Example of Audio Steganalysis. In Proceedings of Media Forensics and Security XI, IS&T/SPIE Electronic Imaging Conference, San Jose, CA, January 2009. [34] S. Krishnan, K. Samudravijaya, and P. V. S. Rao. Feature selection for pattern classication with Gaussian Mixture Models: A new objective criterion. Pattern Recognition Letters, 17(8):803{809, July 1996. [35] L.I. Kuncheva. Using measures of similarity and inclusion for multiple classier fusion by decision templates. Fuzzy Sets and Systems, 122(3):401{407, 2001. [36] L.I. Kuncheva, J.C. Bezdek, and R.P.W. Duin. Decision templates for multiple classier fusion: an experimental comparison. Pattern Recognition, 34(2):299{314, 2001. [37] L. Lam. Classier combinations: implementations and theoretical issues. Multiple classier systems, pages 77{86, 2000. [38] L. Lam and SY Suen. Application of majority voting to pattern recognition: An analysis of its behavior and performance. Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on, 27(5):553{568, 1997. [39] H. Liu and L. Yu. Toward integrating feature selection algorithms for classication and clustering. IEEE Transactions on Knowledge and Data Engineering, 17(4):491{ 502, April 2005. 105 [40] S. Lyu and H. Farid. Detecting hidden messages using higher-order statistics and support vector machines. Lecture Notes in Computer Science, 2578:340{354, 2003. [41] A. Oermann, T. Scheidat, C. Vielhauer, and J. Dittmann. Semantic Fusion for Biometric User Authentication as Multimodal Signal Processing. pages 546{553. Springer, 2006. [42] T. Pevn y, P. Bas, and J. Fridrich. Steganalysis by subtractive pixel adjacency matrix. In Proc. ACM Multimedia & Security Workshop, Princeton, NJ, September 2009. [43] T. Pevn y and J. Fridrich. Merging Markov and DCT features for multi-class JPEG steganalysis. In Proc. SPIE Conf. Security, Watermarking, and Steganography, San Jose, CA, February 2007. [44] R.K. Powalka, N. Sherkat, and R.J. Whitrow. Multiple recognizer combination topologies. Handwriting and Drawing Research: Basic and Applied Issues, pages 329{342, 1996. [45] N. Provos. Defending against statistical steganalysis. In 10th USENIX Security Symposium, volume 10, pages 323{336. Citeseer, 2001. [46] N. Provos and P. Honeyman. Hide and seek: An introduction to steganography. IEEE Security & Privacy, 1(3):32{44, 2003. [47] B. Rodriguez, G. Peterson, K. Bauer, and S. Agaian. Steganalysis Embedding Percentage Determination with Learning Vector Quantization. In Proc. IEEE Int'l Conf. Systems, Man and Cybernetics, Taipei, Taiwan, October 2006. [48] G. Rogova. Combining the results of several neural network classiers. Neural networks, 7(5):777{781, 1994. [49] A.A. Ross, K. Nandakumar, and A.K. Jain. Handbook of Multibiometrics. Inter- national Series on Biometrics. Springer Verlag, 2006. [50] P. Sallee. Model-based steganography. In Proc. Int'l Workshop on Digital Watermarking, Seoul, Korea, October 2003. [51] G. Schaefer and M. Stich. UCID - An Uncompressed Colour Image Database. In Proc. SPIE Conf. Storage and Retrieval Methods and Applications for Multimedia, San Jose, CA, January 2004. [52] C.E. Shannon. Communication Theory of Secrecy Systems. Bell System Technical Journal, 28(4):656{715, 1949. [53] Y.Q. Shi, C. Chen, and W. Chen. A Markov process based approach to eective attacking JPEG steganography. In Proc. ACM Int'l Workshop on Information Hiding, Alexandria, VA, July 2006. 106 [54] G. J. Simmons. The prisoners problem and the subliminal channel. Advances in Cryptology: Proceedings of Crypto 83, pages 51{67, 1983. [55] K. Solanki, A. Sarkar, and BS Manjunath. YASS: Yet another steganographic scheme that resists blind steganalysis. In Proc. ACM Int'l Workshop on Imformation Hiding, Saint Malo, France, June 2007. [56] P. Somo, P. Vacha, S. Mikes, J. Hora, P. Pudil, and P. Zid. Introduction to Feature Selection Toolbox 3 - The C++ Library for Subset Search, Data Modeling and Clas- sication. In Research Report for Institute of Information Theory and Automation, Academy of Sciences of the Czech Republic,, October 2010. [57] H. Wang and S. Wang. Cyber warfare: steganography vs. steganalysis. Communications of the ACM, 47(10):76{82, 2004. [58] Y. Wang and P. Moulin. Steganalysis of block-structured stegotext. In Proc. SPIE Conf. Security, Steganography, and Watermarking of Multimedia Contents, San Jose, CA, January 2004. [59] A. Westfeld. F5-a steganographic algorithm: High capacity despite better steganal- ysis. In Proc. ACM Int'l Workshop on Information Hiding, Pittsburgh, PA, April 2001. [60] L. Xu, A. Krzyzak, and C.Y. Suen. Methods of combining multiple classiers and their applications to handwriting recognition. Systems, Man and Cybernetics, IEEE Transactions on, 22(3):418{435, 1992. 107 
Asset Metadata
Creator Cho, Seong Ho (author) 
Core Title Block-based image steganalysis: algorithm and performance evaluation 
Contributor Electronically uploaded by the author (provenance) 
School Andrew and Erna Viterbi School of Engineering 
Degree Doctor of Philosophy 
Degree Program Electrical Engineering 
Publication Date 08/01/2012 
Defense Date 05/16/2012 
Publisher University of Southern California (original), University of Southern California. Libraries (digital) 
Tag decision fusion,feature selection,OAI-PMH Harvest,steganalysis,steganography 
Language English
Advisor Kuo, C.-C. Jay (committee chair), Huang, Ming-Deh (committee member), Jenkins, Brian Keith (committee member), Liu, Yan (committee member), Narayanan, Shrikanth S. (committee member), Ortega, Antonio K. (committee member) 
Creator Email seonghoc@usc.edu,surfsky@gmail.com 
Permanent Link (DOI) https://doi.org/10.25549/usctheses-c3-81733 
Unique identifier UC11288336 
Identifier usctheses-c3-81733 (legacy record id) 
Legacy Identifier etd-ChoSeongHo-1103.pdf 
Dmrecord 81733 
Document Type Dissertation 
Rights Cho, Seong Ho 
Type texts
Source University of Southern California (contributing entity), University of Southern California Dissertations and Theses (collection) 
Access Conditions The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law.  Electronic access is being provided by the USC Libraries in agreement with the a... 
Repository Name University of Southern California Digital Library
Repository Location USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Abstract (if available)
Abstract Traditional image steganalysis techniques are conducted with respect to the entire image. In this work, we aim to differentiate a stego image from its cover image based on steganalysis results of decomposed image blocks. We also target at the design of a multi-classifier which classifies stego images depending on their steganographic algorithms. As a natural image often consists of heterogeneous regions, its decomposition will lead to smaller image blocks, each of which is more homogeneous. We classify these image blocks into multiple classes and find a classifier for each class to decide whether a block is from a cover or stego image. Consequently, the steganalysis of the whole image can be conducted by fusing steganalysis results of all image blocks through a decision fusion process. Experimental results will be given to show the advantage of the proposed block-based image steganalysis for both binary classifier and multi-classifier. ❧ In addition, performance study on block-based image steganalysis in terms of block sizes and block numbers will be given in this work. First, we analyze the dependence of the steganalysis performance on one of these two factors, and show that a larger block size and a larger block number will lead to better steganalysis performance. Our study is verified by experimental results. For a given test image, there exists a trade-off between the block size and the block number. To exploit both effectively, we propose to use overlapping blocks to improve the steganalysis performance furthermore. Moreover, additional performance improvement of block-based image steganalysis with different number of classes and different classifiers will be shown with experimental results. ❧ Decision fusion for block-based image steganalysis will be discussed. As multiple block decisions are obtained from each image, decision fusion will play a crucial role in combining all the block decisions together to make a final decision for a given image. Among decision fusion techniques at different levels with different topologies, decision level fusion with parallel topology will be used for block-based image steganalysis. In addition, the importance of block decision result will be considered for decision fusion to improve the steganalysis performance. Experimental results with different decision level fusion techniques for block-based image steganalysis will be presented. ❧ Content-dependent feature selection for block-based image steganalysis will be proposed to reduce computational complexity with a significantly smaller number of features. Depending on the characteristic of block type, features with high discriminatory power will be selected for each block type. Several approaches to measure feature discriminatory power will be introduced. Finally, experimental result, which shows performance improvement using content-dependent feature selection, will be presented. 
Tags
decision fusion
feature selection
steganalysis
steganography
Linked assets
University of Southern California Dissertations and Theses
doctype icon
University of Southern California Dissertations and Theses 
Action button