A Robust Audio Watermarking Algorithm Based on SVD-DWT

1 Abstract— In this paper, we proposed a novel blind audio watermarking algorithm, which combined Singular Value Decomposition(SVD) with Discrete Wavelet Transform(DWT). In our algorithm, We first partition the rearranged audio signal into blocks, then generate the vector by selecting the biggest singular values after performing SVD on these blocks. Finally we embed the watermark into the approximate components obtained from the DWT decomposition of the vector by means of quantization process. Experimental results showed that our algorithm has good robustness against the common audio signals processing operations. Compared with earlier schemes based on SVD, the proposed scheme has satisfying imperceptibility and improved payload.


I. INTRODUCTION
With the rapid development of network technology and digital media audio technique, illegal users can easily obtain the audio resource and spread them.Therefore, that how to solve the issues about the copyright of audio has attracted a great deal of attention [1].
The problem above-mentioned can be efficiently solved by embedding the copyright information in a host audio, namely audio watermarking approach.As stipulated in the International Federation of the Phonographic Industry (IFPI) [2], determining whether an audio watermarking algorithm is effective can be based on whether the algorithm meet the following four basic requirements or not: 1) Robustness: Unless the audio suffers from serious damage, the embedded information can be accurately extracted from a watermarked audio even under the condition that the host audio undergoes common audio signal processing operations; 2) Imperceptibility: The distinction between the original audio and embedded audio can hardly be distinguished by the human ears.Besides, Signal to Noise Ration (SNR) which is used to appraise the quality of audio should be more then 20 dB; 3) Payload: we usually evaluate the payload of an audio watermarking scheme with bits per second, and the payload of an effective watermarking algorithm should be higher than 20 bps without affecting the imperceptibility of the audio; 4) Security: the security of a scheme should not relay on its algorithm.People without authorization can not extract embedded information from a covered audio.
In previous works [3], many audio watermarking techniques have been proposed, several good schemes have made contribution to significant progress.We can classify most of the audio watermarking algorithms into two categories according to the position where watermarking bit is embedded.One is time-domain algorithms, and the other one is transform-domain algorithms.On the one hand, the time-domain algorithms usually embed watermark information by directly modifying audio signal, such as Least Significant Bit (LSB) algorithm [4] and Echo algorithm [5].Although these algorithms may have an outstanding performance at imperceptibility, embedded regions are still affected by many common signal and geometric processing.In other words, they can not meet the requirement of robustness.On the other hand, the transform-domain algorithms [6] usually use Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT) or Discrete Fourier Transform (DFT) to transform the audio signal to locate appropriate embedded location.
Similarly, pure transform-domain algorithms still show unsatisfactory robustness in signals.To cope with thus problem, Singular Value Decomposition (SVD) has been applied to robust watermarking as an effective transformation technique [7].Recently, several audio watermarking techniques, which combined DWT or Short-Time Fourier Transform (STFT) with SVD have been proposed in [8]- [10].For instance, Bhat et al. [11] proposed an adaptive audio watermarking algorithm; the watermarking information is embedded in the DWT domain based on SVD.The scheme is found to be imperceptible and robust to many attacks.However, its capacity is not very high.Bai Ying [12] presented an audio watermarking algorithm based on SVD-DCT.This method is different from the schemes, which use SVD to modify the transformed coefficients.The watermarking information is embedded by modifying the DCT coefficients of the blocks which is composed of SVs obtained by performing SVD transform on audio blocks.This method also has a good performance at imperceptibility and robustness.However, its capacity has the same drawback with [11].
In this paper, different from traditional SVD-based schemes, which embed watermark bits information by modifying SVs directly.The proposed scheme obtains the biggest coefficient of SVs after SVD transformed audio sub blocks at first.Second it concatenates the selected coefficients to obtain a vector.Then the DWT is performed on the vector.Finally, we embed watermark bits by modifying the obtained approximate component.To improve the security of the scheme, a pseudo-random location sequence, which has been produced based on a secret key, is used to modify the approximate component.
The paper is structured as below: Section II introduces the related knowledge of SVD and DWT which used in our scheme; Section III describes the embedding algorithm in detail.Watermark extraction is addressed in Section IV.The experimental results are presented in Section V.The algorithm performance comparison is discussed in Section VI.Finally, we conclude the paper in Section VII.

A. SVD
The SVD of a matrix is a factorization of the matrix into three matrices, the SVD of a m  n matrix M can be described as bellow where Given matrix The above mentioned SVD properties means slight modification of some components in matrix S rarely reduces the perception features of cover matrix.This property can be used to satisfy the robustness requirements of watermarking scheme.

B. DWT
DWT is a useful tool for processing digital signal and has been widely used in computer science and engineering [14].The DWT of signal ( ) x k can be described as follows: where g and h mean low-pass fitter and high-pass fitter respectively.An original signal will resolve into two parts: the approximate part and detail part.Further, the detail part can also be decomposed into two parts: high frequencies part and low frequencies part.A 3-lever DWT decomposition of a signal is shown in Fig. 1.In this paper, we choose the approximate part 3 w used for watermark embedding because [15] proved that it has the excellent spatio-frequency location properties.

A. Synchronization Code
Embedding synchronization code in host audio can effectively resist the cropping and compressing attacks.The strategy has been widely used for watermarking techniques [16].In order to reduce the cost of computation and time when searching the synchronization code, we choose the time-domain to embed the synchronization code.In our scheme, 16-bit barker code which has low autocorrelation properties [1111100110101110] o C  is treated as synchronization code.We divide the host audio into two parts 0 1 A and 0 2 A , one is for synchronization code embedding, and the other is for watermark embedding.We can embed the synchronization code as bellow: Step 1.First we segment the audio signal 0 1 A and suppose that each audio segment has n samples.The synchronization code is embedded into Lsyn audio segments where SP(k) means the kth audio segment, Step 2. Calculate the mean value of the audio segment where where ( ) S P k  can be obtained from formula (9), and ) floor  means rounding to the minus integer.
( ) mod  denotes modulus after division, and 1  is the quantization step

B. Embedding Method
The whole steps of the watermark embedding procedure are displayed in Fig. 2. The details of watermark bits embedding procedures are described as bellow.
Step 1. Partitioning.In the watermark embedding process, we first rearrange the audio signals 0 2 A into a matrix ( , ) B m m , and then the matrix is divided into non-overlapping 8 8  blocks b; Step 2. SVD transformation.For each block i b which can be denoted as , we apply SVD on it; Step 3. DWT decomposition.First we generate the vector    is the quantization step.Supposing that the watermark bit w is going to be embedded in position 1 p , we can describe the embedding process as (10) where Step 5. Inverse DWT and SVD.We first perform inverse DWT on modified vector L then reconstruct S  and calculate . Finally, reduce the dimensions of the matrix so that the watermark can be embedded in the audio.

A. Search for the Synchronization Code
Set the start index of the audio to l, the synchronization code searching process can be described as follows: Step 1. Extract the synchronization code from the sample l to sample l Lsyn n   , where n is the length of the synchronization code.For each synchronization code bit ( ) V k , the formulas are given as follows where 1 k Lsyn   ; Step 2. Calculate the distance between the 0 C and the extracted synchronization code where  means XOR operation.
Watermark embedding in the selected position Step 3. Check the value of T. If T is less than the given threshold, which means that V is the synchronization, move to water extraction.Otherwise, set 1 l l   , then repeat Step 1 and Step 2.

B. Watermark Eextraction
The watermark extraction without original audio signals procedures can be described as bellow: Step 1. From the position where synchronization code search decided retrieve SVD-DWT block.Then obtain the vector based on the secret key.Where Ls is the length of vector 3 L  .Finally, the watermark is extracted as (15):

V. EXPERIMENTAL RESULTS
In this section, some experimental results are presented to illustrate the performance of our scheme.We choose five common music types to build the test database.The detailed description is listed in Table I

A. Imperceptibility
In this paper, we use the following two common performance indicators to objectively evaluate the imperceptibility of the proposed algorithm.
SNR (Signal to Noise Ratio) is a statistical difference criterion that aimed to analyse the perceptual similarity between audio embedded watermark and original audio.SNR is calculated by SegSNR (Segmental SNR) is defined as the mean of evenly segmented signal-noise ratios.It is widely used as an estimator for signal quality, which is described as below where s'(i) and s(i) mean the watermarked and original signal respectively, K is the number of frames in the watermarked audio signal and r is the sample numbers of each frame.The SNR and SegSNR values of tested audios are much above 20 dB, which satisfy the IFPI standard, are shown in Table II.The original and watermarked audio signals in time domain are presented in Fig. 3, the difference between them two is invisible.

B. Robustness Test
According to IFPI, an effective audio watermarking algorithm should be robust to many common attacks.For the purpose of illustrating the robustness of the proposed watermarking algorithm, some attacks are performed by using MATLAB 2010 and CoolEdit2.0.We present the description of the attacks as follows: 1) Noise-attack: mix the watermarked signal with white Gaussian noise until the SNR of verarbeitet audio is 20 dB; 2) Resampling: the watermarked audio with a sampling rate of 44.1 kHz is down-sampled to 22.05 kHz, then up-sampled back to 44.1 kHz; in other situation, the watermarked audio with a sample of 44.1 kHz is up-sampled to 88.2 kHz, then down-sampled back to 44.1 kHz; 3) Requantization: quantize the 16-bit watermarked signal to 8 bits/sample and then back recovery the verarbeitet audio signal to 16 bits/sample; 4) Low-pass filtering: apply a low-pass filter on the watermarked audio with a cut-off frequency of 22.05 kHz; 5) Cropping: remove 100 samples of the watermarked audio at three random positions; 6) MP3 compression: the MPEG layer III compression and decompression at a bit rate of 196kbps, 96kbps and 64 kbps is applied respectively.
Our estimate of the similarity between the original watermark image and the extracted image rests on the two following common formulas: NC (Normalized Cross-correlation) can be calculated from (18) BER (the Bit Error Rate) can be obtained from ( ( , ) ( , ) ( , ) , where I  and I are the extracted watermarks and the original watermarks respectively.M means the number of embedded image bits, j and i are indexes of the image, and  means XOR operation.From Table II and Fig. 5, we can observe that all extracted watermarks have high NC values and low BER values against common audio signal attacks.These results indicate that our algorithm satisfies the requirement of the robustness.For compression attacks, even the compressive strength is 64 kbps, which means the recovery audio will lose many bits, the recovered watermark image is still identifiable.

A. Data payload
We define the data payload of a watermarking algorithm as the amount of information which is embedded in a host audio signal.By convention, we measure the data payload by bps and describe it as / , where S F is the sampling rate of the host audio.Meanwhile we defined the amount of samples for one bit information as L N .Then we can figure out the data payload of the scheme is 86 bit/s.

B. Security
In the proposed watermarking scheme, the selection of the embedding position is based on secret key and the quantization parameters, which greatly influence the effect of extracted watermark is unknown to illegal users.Therefore, the algorithm meets the requirement of security.

VI. PERFORMANCE CCOMPARATION AND DISCUSSION
In this section, we compared the proposed algorithm with related audio watermarking algorithms.We selected two lately related literatures, which also embedded watermark based on SVD.The detailed comparisons between imperceptibility and payload are shown in Table IV.The robustness of the scheme has been displayed in Table II.All algorithms listed in Table IV have a good performance of the imperceptibility.Their SNRs are much beyond 20 dB.As shown in Table III, our algorithm shows better imperceptibility than the algorithm in [11].The data payload of the proposed scheme is much higher than other algorithms listed in the Table IV.These data indicate that our scheme can embed much more information with the same audio file.Meanwhile the DCT is computational intensive and needs more execution time.In conclusion, our scheme has achieved a better balance among the requirements of robustness, payload and imperceptibility.

VII. CONCLUSIONS
A novel blind audio watermarking scheme that combined features of DWT and SVD was proposed in this paper.It Performs DWT on maximum singular values that obtained from SVD of host audio rather than on cover audio directly that fully exploited the outstanding characteristics of SVD.Experimental results have indicated that our algorithm has excellent imperceptibility and is robust to common audio attacks including add-noise, low-pass filtering, re-sampling, requantization, cropping and Mp3 compression.The comparison of proposed scheme with other SVD based algorithms in [10], [11] indicates that the proposed scheme is with higher payload and satisfying imperceptibility.The simulation results verify that our scheme fulfils the IFPI performance requirement and is suitable for application in copyright protection.

3 .
signal.The original watermark is shown in Fig. 4. In the experiment, we set 5 bit audio signal samples to embed one bit synchronization code.The haar wavelet basis has been used.The quantization parameter 1  is 0.035 and the quantization parameter 2  is 0.1.The threshold defined in Section IV A is set to be 3.The selection of all the parameter values aims at achieving a good trade-off between the requirements of imperceptibility, robustness and capacity.Audio signal a) riginal Pop audio and b) the watermarked host audio signal.
Manuscript received March 29, 2013; accepted October 24, 2013 This work was sponsored by National Nature Science Foundation of China (61173106), Specialized Research Fund for the Doctoral Program of Higher Education, China (20100161120021) and the Young Teacher's Growth Program of Hunan University.
. Each of the audio signals mentioned in this table is a mono wave file whose sampling rate is 44.1 kHz.A 32 32  binary image is used as watermark

TABLE I .
TEST DATABASE 16 BITS.

TABLE III .
SNR AND SEGSNR BETWEEN ORGINAL AUDIO AND WATERMARKED AUDIO.