Preview only show first 10 pages with watermark. For full document please download

Compressed-domain Temporal Adaptation

   EMBED


Share

Transcript

COMPRESSED-DOMAIN TEMPORAL ADAPTATION-RESILIENT WATERMARKING FOR H.264 VIDEO AUTHENTICATION Sharmeen Shahabuddin1, Razib Iqbal1, Shervin Shirmohammadi1,2, Jiying Zhao2 1 Distributed Collaborative Virtual Environments Research Laboratory 2 Multimedia Communications Research Laboratory School of Information Technology and Engineering University of Ottawa, Ottawa, Canada { sshahabuddin | riqbal | shervin }@discover.uottawa.ca, [email protected] ABSTRACT In this paper we present a DCT domain watermarking approach for H.264/AVC video coding standard. This scheme is resilient to compressed-domain temporal adaptation. A cryptographic hash function is used to generate a semi-fragile watermark to provide content-based authentication. The embedded watermark can withstand frame-dropping due to temporal adaptation, yet it is able to detect malicious attacks such as content modification, transcoding etc. Simulation results demonstrate that the watermarking scheme is computationally efficient and suitable for practical use. Index Terms – H.264 watermarking, adaptation. protection, video security, 1. INTRODUCTION The diversity of devices via which multimedia contents are accessed and interacted with has grown significantly. Handheld devices like iPod, PSP, iPhone can now easily download and play rich media files from the Internet and other sources. Multimedia researchers are now focusing on accommodating these small devices for live streaming and other day-to-day media-based services like video surveillance. Keeping in mind the sensitivity of some of these contents, encryption and/or authentication is applied to ascertain trustworthiness. Video adaptation is a newly introduced practice for direct manipulation of an encoded bitstream to meet resource constraints without having to encode the video from scratch. For example, temporal adaptation is performed to match the original video to a device’s processing ability or varying network bandwidth by intermittently dropping frames from the original bitstream. Now, a mechanism should be in place to ensure the authenticity of the video even after temporal adaptation. Usually digital watermarking is applied to authenticate media contents like video, audio, image etc. by embedding some customized data directly into the content. In terms of the video itself, H.264 is the latest coding and compression standard, and is expected to dominate the field due to its advanced compression technique, improved perceptual quality, network friendliness and versatility [1]. A few watermarking techniques have been proposed for authenticating H.264 video coding standards recently, but to the best of our knowledge, none of the existing works can withstand temporal adaptation. Keeping in mind the recent advances in video adaptation which allows compresseddomain video processing (i.e. no cascaded decoding and reencoding) [2], we are proposing in this paper a watermarking mechanism for authentication of H.264 video that can withstand compressed-domain temporal adaptation. The rest of the paper is organized as follows: in Section 2, we briefly discuss about watermarking and H.264 video coding standard. We review some existing watermarking approaches for H.264 video authentication in Section 3. Our proposed watermarking method is discussed in Section 4. Section 5 presents some simulation results to demonstrate the performance of our scheme. In Section 6, we analyze our watermarking scheme in a deployment scenario. Finally, conclusive remarks are given in Section 7. 2. VIDEO WATERMARKING AND H.264 Video watermarking methods can be classified into spatial domain, transform domain, and compressed-domain approaches. In spatial domain watermarking, the watermark is embedded directly in the pixel domain. Techniques involving spatial domain watermarking are theoretically straightforward and demonstrate low time complexity during watermark embedding and detection steps. On the downside, these techniques fail to meet adequate robustness and imperceptibility requirements. In transform domain watermarking, the host signal is transformed into a different (frequency) domain first, and then the watermark is embedded in selective coefficients. Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT) are two examples of widely used transformation techniques. Watermark embedding in the transform domain is advantageous in terms of visibility and security. In compressed-domain watermarking approaches, the watermark bits are either directly inserted into the encoded bitstream, or in a partially decoded bitstream. Benefits are that these approaches are computationally less expensive and require less processing power. In H.264, there are three types of frames: Intra-coded frame (I-frame), Inter-coded frame (P-frame), and Bidirectional frame (B-frame). Each coded frame consists of one or more slices, each containing a number of macroblocks. Each macroblock contains luminance component (luma) and chrominance component (chroma). Imacroblocks are predicted using intra-prediction from previously decoded samples in the same slice. P-and Bmacroblocks are predicted using inter-prediction from previously decoded samples from reference frames. A residual macroblock is formed by subtracting the predicted macroblock from the actual macroblock. This residual block is transformed (from spatial to frequency domain) using a 4×4 integer transform, which is an approximate form of the DCT. The transform outputs a set of DC and AC coefficients which are then quantized and coded. For H.264 video coding details, please see [1]. 3. LITERATURE REVIEW In this section we introduce some prominent watermarking schemes for H.264 video authentication. In [3], Qiu et al. propose a hybrid watermarking scheme that embeds robust watermark in the DCT coefficients of I-frames and a fragile watermark in the motion vectors of P-frames during encoding for authentication. Pröfrock et al. propose a transcoder in [4], which analyses the original H.264 bitstream, computes a watermark, embeds the watermark for hard authentication and generates a new H.264 bitstream. In [5], authors propose a scheme that uses the tree-structured motion compensation, motion estimation and Lagrangian optimization of the H.264 standard. The authentication information is represented by a binary watermark sequence and embedded into video frames. None of these watermarking schemes is resilient to compressed-domain temporal adaptation, as can be seen in [2], where intermittent frames are dropped directly from the compressed bitstream using corresponding metadata (generated during the encoding process). Our watermarking scheme proposed here can be applied to authenticate such adapted video streams. 4. OVERVIEW OF OUR SCHEME In our approach, we insert a watermark into each frame of the video stream during the encoding process. The detection process is quite straightforward which can be performed during the decoding process in an H.264 player or using an external watermark authenticator. In this section, we briefly introduce the embedding and the detection procedures. An overview of the entire scheme is illustrated in Figure 1. (a) Watermark Embedding (b) Watermark Detection Figure 1: Watermark embedding and extraction schemes 4.1. Embedding Procedure In our scheme, every frame contains an independent watermark to authenticate itself, except for I-frames where the watermark is dependent on the previous I-frame, with the obvious exception of the very first I frame. We propose that, removal of I-frames should be considered as a malicious operation because of its significant impact on the video quality, and the dependence of P- and B-frames on the previous I-frames. If a P- or B-frame is dropped from the encoded video, the watermark does not break, thus the client can properly authenticate the remaining frames in the rest of the video with their respective watermarks. At the same time, if an I-frame is dropped, the client is able to detect it from the watermark of the following I-frame. However, this step is optional. If dropping an I-frame is permitted by the media server, then the watermarks for I-frames are computed similarly like that of P- and B-frames. The watermark is embedded only in the quantized AC residuals of the 4×4 luma component of each frame, since the human visual system is more sensitive to luminance than chrominance information [6]. One of the quantized AC coefficients in high frequency is selected for embedding since the value of AC coefficients is usually zero at higher indices. The sequence of steps to generate and embed the watermark data into the video stream are as follows: Step 1: To generate the watermark data, the AC coefficient at the selected high-indexed location of every 4×4 DCT coefficient matrix is set to 0. Step 2: For P- and B-frames, the hash value of the entire frame (i.e. motion vectors and modified DCT coefficients from the previous step) is computed. For I-frames, the hash value for the combined contents of the frame itself and the preceding I-frame is computed to ensure that the removal of an I-frame can be detected. This step enables the clients to detect any attack that modifies the content of the frames, since such modification(s) will change the cryptographic hash value of the frame, thus invalidating the embedded hash value. Step 3: The computed hash value is signed with the private key of the encoder. This step prohibits any malicious attacker to insert new frames into the stream, even with prior knowledge of the watermarking scheme, because the inserted frame will lack the cryptographic signature of the encoder. Step 4: The computed hash value and the signature are embedded into the selected location of the 4×4 DCT coefficient matrices. addition, SSIM indices for the test sequences vary in a very small amount as it can be seen from Figure 2(b). This illustrates that the quality degradation of the video stream due to our watermarking scheme is negligible. The quantization parameter used for encoding has a large effect on the perceptual quality of the video. To investigate the rate distortions caused by embedding watermark in the stream, we encode the clips using fixed Quantization Parameters (QP) = [26, 28, 30, 32, 34, 36, 38, 40, 42], corresponding to typical QPs for low bit-rate applications. We present the results of this investigation in Figure 2(c). 4.2 Detection Procedure 39 37 PSNR (dB) In order to detect the watermark, at first the compressed video stream is partially decoded to obtain the quantized DCT coefficients. The watermark message for each frame is then extracted from the DCT coefficients. To verify the authenticity, the decoder or a separate watermark authenticator performs the Steps 1 and Steps 2 of the encoding process to compute the hash value for the frame. The authenticity of the video stream is guaranteed if the hash value computed for the frame is same as the hash value extracted from the watermark, and the signature extracted from the watermark can be verified correctly using the encoder’s public key. Peak Signal to Noise Ratio 41 35 33 31 Foreman 29 Forman (watermarked) Hall 27 Hall (watermarked) 25 20 30 40 50 60 70 80 90 100 Bitrate (kb/s) (a) Structural Similarity 1 5. PERFORMANCE EVALUATION 0.95 0.9 SSIM index To test our watermarking scheme, we have enhanced the X264 encoder available in [7]. We also used SHA1 hashfunction [8] and RSA signing function [9] from the OpenSSL library [10]. For our experiments, we have used two famous test sequences ‘Foreman’ and ‘Hall’. The resolution of each video is 176×144 pixels (QCIF), and IDR-interval is set to 20 frames. 0.85 0.8 0.75 Foreman Forman (watermarked) 0.7 Hall 0.65 Hall (watermarked) 0.6 20 30 40 50 60 70 80 90 100 Bitrate (kb/s) 5.1. Performance Metrics PSNR vs QP 37 36 35 PSNR (dB) To evaluate the performance of the watermarking scheme, we look at two criteria: video quality after embedding the watermark, and the processing time overhead during encoding due to watermark embedding. A good watermarking scheme needs to ensure that the video does not get distorted during the watermark embedding process. To show the effect of our watermarking scheme on the video quality, we use two different measures: PSNR (Peak Signal to Noise Ratio) and SSIM (Structural Similarity) index [11]. (b) Foreman Forman (watermarked) Hall Hall (watermarked) 34 33 32 31 30 29 26 5.2. Results In Figure 2(a) and Figure 2(b), we compare the PSNR and the SSIM indices of the watermarked video streams with that of their own non-watermarked video, respectively. From Figure 2(a), we can see that the differences between the PSNRs are very small, less than 1.0 dB for both clips. In 28 30 32 34 36 38 40 42 Quantization Parameter (c) Figure 2: Effect of watermark on the video quality Figure 3 below shows the time overhead during encoding due to our watermarking scheme. This overhead is heavily dependent on the RSA signing method. If we use a 1024-bit private key, the processing time is increased by roughly 15 to 20%. However, the processing overhead comes down drastically as we reduce the size of the public key. For example, if we use a 256-bit private key, the overhead is only 2 to 3%. Despite using computationally expensive cryptographic hash functions and digital signing methods for each frame of the video, this increase in time is fairly small, and suitable for a real-time system. Processing Time (Foreman) 20 18 Time (in seconds) 16 14 12 10 8 6 Without Waterm ark 4 Watermarked (1024 bit PK) 2 Watermarked (256 bit PK) Watermarked (512 bit PK) 38 0 34 0 30 0 26 0 22 0 18 0 14 0 60 10 0 20 0 watermark. Alternatively, the MSS will send the original watermarked H.264 video and the PS will adapt the video, which will still contain the watermark, and serve it to its clients. The client then can authenticate the original source of the video (MSS) from the authentication signature embedded within the watermark. 7. CONCLUSION In this paper, we presented a semi-fragile authentication approach by using cryptographic hash function. The watermark payloads are self-contained within each frame which makes it resilient to temporal adaptation of H.264 videos. The experimental results show that our method is suitable for real time systems. Moreover, it does not result in noticeable artifacts in the video quality. Our current research team is now focusing on spatial adaptation resilient watermarking for localized authentication. 8. REFERENCES # of Frames (a) Processing Time (Hall) 12 [2] R. Iqbal, S. Shirmohammadi, A. El Saddik, “A Framework for MPEG-21 DIA Based Adaptation and Perceptual Encryption of H.264 Video”, in Proc. of SPIE/ACM MMCN, 2007. 10 Time (in seconds) [1] C. Gomila and P. Yin, “New features and applications of the H.264 video coding standard,” in Proc. of Intl. Conf. on Info. Tech.: Research and Education, pp. 6 – 10, 2003. 8 6 4 Without Watermark Watermarked (1024 bit PK) Watermarked (512 bit PK) Watermarked (256 bit PK) 2 0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 # of Frames (b) Figure 3: Encoding time overhead for embedding watermark 6. ANALYSIS In the proposed approach, the watermarked video is H.264 compliant and the change in bitrate for the watermarking is negligible. For authentication, the original H.264 video is not required; rather, a separate authenticator can verify the validity of the received video data. The authenticator can be independent of the decoder, in which case there will be no lag added while decoding the video. Lastly, watermark embedding can be done in live video streams. In a Universal Media Access (UMA) environment, a Media Streaming Server (MSS) can be connected to several Proxy Servers (PS) and clients. To serve directly connected clients, an MSS sends the watermarked video content after the requested adaptation. A PS serving only classified clients (e.g. small handheld devices) can request a specific adapted video (e.g. QCIF, 15fps) from the MSS. The MSS will thus send the specified adapted video content with the [3] Qiu, G., Marziliano, P., Ho, A. T. S., He, D., and Sun, Q., “A hybrid watermarking scheme for H.264/AVC video,” in Proceedings of the 17th International Conference on Pattern Recognition, vol. 4, (Cambridge, UK), pp. 865–868, 2004. [4] D. Pröfrock, H. Richter, M. Schlauweg, E. Müller, “H.264/AVC Video Authentication Using Skipped Macroblocks for an Erasable Watermark”, Proc. SPIE Visual Communications and Image Processing, Vol. 5960, pp. 1480-1489, 2005. [5] J. Zhang and A.T.S. Ho, “Efficient Video Authentication for H.264/AVC”, Proc. 1st Intl. Conf. on Innovative Computing, Information and Control, Vol. 3, pp. 46-49, 2006. [6] I.E.G. Richardson, “H.264 and MPEG-4 Video Compression”. Chichester, West Sussex: Wiley, 2003. [7] http://www.videolan.org/developers/x264.html [8] Secure Hash Standard (SHA-1, SHA-224, SHA-256, SHA384, and SHA-512), 1 August 2002, amended 25 February 2004. [9] R. L. Rivest, A. Shamir, and L. Adleman. “A method for obtaining digital signatures and public key cryptosystems”. Communication of the ACM, 1978. [10] http://www.openssl.org [11] Z Wang, A. C. Bovik, H. R. Sheikh and E. P. Simoncelli, "Image quality assessment: From error visibility to structural similarity," IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600-612, Apr. 2004.