Dr. Dobb's | Audio Watermarking

Audio Watermarking

Digital watermarking is a security technique that provides copy protection, authentication, and more for audio and other forms of information. The approach Michael presents here is based on a statistical algorithm working in the Fourier domain.

November 01, 2001
URL:http://www.drdobbs.com/security/audio-watermarking/web-development/audio-watermarking/184404839

The problems associated with copyrights and the protection of intellectual property rights (IPR) generally arise from the transition from analog to digital data representation. Easy bit-by-bit reproductions of originals and the simple distribution of information across networks cause additional problems. However, one way IPR can be protected is by integrating copyright information directly into the data. This technique is referred to as "digital watermarking."

In general, there are both secret watermarks and public watermarks. Secret watermarks can be used as authentication and content-integrity mechanisms. This implies that the watermark is a secured link readable only by authorized persons with knowledge of the secret key. Public watermarks, on the other hand, act as information carriers, and the watermark is readable to everybody. These public watermarks are detectable or removable by third parties.

Audio watermarking has a number of applications, including:

Copyright protection. Copyright owners can be authenticated by a secret key for reading a secret watermark.
Monitoring copying. Embedding secret watermarks to trace illegal copying.
Fingerprinting. In point-to-point distribution environments, information about authenticated customers can be embedded as secret watermarks prior to secure delivery of the data.
Detection of content manipulation. Determining whether tampering (content manipulation) of the authorized state has occurred.
Information carrier. A public watermark embedded in the datastream can act as a link to external databases storing information about copyright and license conditions.

Depending on the kind of watermark and its intended use, watermarking techniques have various properties. For instance, in terms of signal-processing properties, the watermark should not be perceivable by observers, but should be robust against intentional or anticipated manipulations (compression, filtering, resampling, requantization, cropping, scaling, and the like).

As for security properties, the watermarking procedure should rely on a key to ensure security — not the algorithm's secrecy — and the watermark should be statistically undetectable. Furthermore, the algorithm should have a mathematical formulation and be published. The coding procedure should be symmetric or asymmetric (in the same sense as public-key cryptographic algorithms), depending on the application. Also, the procedure should be able to withstand collusion attacks that use multiple watermarked copies.

Finally, the watermark algorithm should support real-time processing; be adjustable to different degrees of robustness, quality, and amount of data; tunable to different media; and should support simultaneous embedding of multiple watermarks.

The Watermarking Method

To implement watermarks, the large amount of data of typical CD-quality audio signals makes it possible for you to use statistical methods that rely on large sets. The method I'll present here is a statistical algorithm working in the Fourier domain. It embeds 1 bit of the watermark in every timeslice of about 116 milliseconds and doesn't need the original audio stream or additional data to read the watermark. The algorithms are based on the Patchwork approach (see "Techniques for Data Hiding," by W. Bender, D. Gruhl, N. Morimoto, and A. Lu, IBM Systems Journal, 1996). Similar methods work in the time domain (see "Robust Audio Watermarking in the Time Domain," by P. Bassia and I. Pitas, Proceedings of Eusipco-98, Ninth European Signal Processing Conference, September 1998). The audio watermarking method presented here has been adapted to the frequency domain and does not require the original audio track to detect the watermark.

For example, assume that a dataset contains 2N values (in our method, frequency coefficients of the Fourier domain). To embed one bit of the watermark into the dataset:

Map the secret key and the watermark to the seed of a random-number generator. Start the generator to pseudorandomly select two intermixed subsets A={a_i}_i₌₁_,...,M and B={b_i}_i₌₁_,...,M of equal size MN from the original set.
Formulate the test hypothesis (H₀) and alternative hypothesis (H₁). The appropriate test statistic z will be a function of the sets A and B with the probability distribution function PDF (z) in the unmarked case and _m(z) in the marked case:
- H₀: The watermark is not embedded (z follows PDF (z)).
- H₁: The watermark is embedded (z follows PDF _m(z)).
The equations in Figures 1(a) and (b) describe the two kinds of errors incorporated in hypothesis testing. Hypothesis testing is used in the detection procedure to decide whether the watermark bit is embedded. The threshold T is used in the detection step.

Figure 1: Embedding the watermark.
Alter the selected elements a_iA and b_iB, I=1,..., M according to the embedding functions in Figure 1(c). The alterations of the Fourier coefficients have to be performed in a way that achieves inaudibility. Therefore, the changes are derived from a psychoacoustic model and are different for the individual Fourier coefficients. The alterations of the average value + can be described by an effective embedding factor defined by k:=(+)/(+), ='–, and ='–.

Psychoacoustic Models

Psychoacoustic models used in current audio-compression coders apply frequency and temporal masking effects to ensure inaudibility by shaping the quantization noise according to the masking threshold. A watermarking procedure, in turn, should use already-existing models for shaping the watermark noise. The various psychoacoustic models differ in complexity and in implementation of masking effects. I used the psychoacoustic Model 1 Layer I of ISO-MPEG with fs=44.1 KHz sampling rate. To iteratively allocate the necessary bits, the MPEG standard calculates the signal-to-mask ratios (SMR) of all the subbands. This is not necessary in our case, since only the masking threshold for each block of samples is of interest. Therefore, the necessary steps in calculation of the masking threshold for each block are:

Calculate the power spectrum.
Identify the tonal (sinusoid-like) and nontonal (noise-like) components.
Decimate the maskers to eliminate all irrelevant maskers.
Compute the individual masking thresholds.
Compute the global masking threshold.
Determine the minimum masking threshold in each subband.

Listing One is MATLAB code that implements the calculation of the masking threshold according to the MPEG 1 Layer I model just described.

Listing One

function [LTMin, Delta] = PsychoAcousticModel(Input, NumberOfBands)
% Main function - sampling rate fs = 44100; bitrate = 128;
%   Author: 
%          Fabien A.P. Petitcolas ([email protected])
%          Computer Laboratory
%          University of Cambridge
%   Corrections and improvements:
%          Teddy Furon ([email protected]), 
%          Laboratoire TSI - Telecom Paris
%          UIIS Lab - Thomson multimedia R&D France 
%          Michael Arnold ([email protected])
%          Fraunhofer Institute for Computer Graphics (IGD)     
%   References: 
%    [1] Information technology -- Coding of moving pictures and associated 
%      audio for digital storage media at up to 1,5 Mbits/s -- Part3: audio. 
%      British standard. BSI, London. October 1993. Implementation of 
%      ISO/IEC 11172-3:1993. BSI, London. First edition 1993-08-01. 
%   Legal notice: 
%    This computer program is based on ISO/IEC 11172-3:1993, Information 
%    technology -- Coding of moving pictures and associated audio for digital 
%    storage media at up to about 1,5 Mbit/s -- Part 3: Audio, with the 
%    permission of ISO. Copies of this standards can be purchased from the 
%    British Standards Institution, 389 Chiswick High Road, GB-London W4 4AL,  
%    Telephone:+ 44 181 996 90 00, Telefax:+ 44 181 996 74 00 or from ISO, 
%    postal box 56, CH-1211 Geneva 20, Telephone +41 22 749 0111, Telefax 
%    +4122 734 1079. Copyright remains with ISO. 
%---------------------------------------------------------------------------- 
%
% [LTmin, Delta] = PsychoAcousticModel(Input, NumberOfBands) computes the
% minimum masking threshold LTmin from Input vector. NumberOfBands specifies
% the required frequency resolution.
%
% -- INPUT --
% Input: Row vector of Blocksize (= FFT_SIZE = 512) samples with float values
% scaled within the range [-1, 1].
%   
% NumberOfBands: Integer value. For Blocksize samples this value is
% of the elements [16 | 32 | 64 | 128 | 256].
%  
% -- OUTPUT --
% LTmin: Column vector with FFT_SIZE/2 elements containing the minium loudness
% threshold values in dB.
%  
% Delta: Delta = 96dB - max(X). Delta is a scalar containing the difference to
% 96 dB for the input.  
% ------------
  
% Define global constants 
% (loaded from Common_Const.mat and Tables_fs_44100.mat in calling function) 
% FFT_SIZE = 512: Length of analysis window (Input vector). 
% MIN_POWER = -200: Used for initialisation to avoid taking log(0).
%
% INDEX = 1, BARK = 2, ATH = 3: Column indexes for TH, Tonal_list and
% Non_tonal_list.
% SPL = 2: Column indexes for the Tonal_list and Non_tonal_list for Sound
% Pressure Level. 

% TH is a 106x3 matrix. 
% TH(:, INDEX): Frequency indexes at the top end of each critical band
% (corresponding to absolute frequency values of table D.1b (pp. 117), fs =
% 44.1 kHz).
% TH(:, BARK): Top end of each critical band rate.
% TH(:, ATH): Absolute ThresHold in quiet (includes offset of -12dB for bit
% rates >= 96 kbits/s from table D.1b (pp. 117) for fs = 44.1 kHz)

% NOT_EXAMINED = 0, TONAL = 1, NON_TONAL = 2, IRRELEVANT = 3: Flags
% describing the component type.

% Map is a row vector with 256 elements. It maps the 106 non-linear frequency
% coefficients onto the 256 frequency indexes.

% CB is a column vector with 25 elements.
% CB: It contains the indexes for the top end of each critical band (24 bands)
% in terms of the 106 indexes (column two of D.2b (pp. 123) for 44.1 kHz).
  
% LTq: Column vector with 106 elements, approximating absolute threshold. 
% -------------------------------------------------------------------------
  
global FFT_SIZE MIN_POWER NOT_EXAMINED IRRELEVANT TONAL NON_TONAL 
global TH INDEX BARK ATH SPL Map CB LTq 

% Psychoacoustic analysis 

% Compute the FFT for power spectrum estimation [1, pp. 110]. 
[X, Delta] = FFT_Analysis(Input); 

% Find the tonal (sine like) and non-tonal (noise like) components of the
% signal [1, pp. 111--113]
[Flags Tonal_list Non_tonal_list] = Find_tonal_components(X); 

% Decimate the maskers: eliminate all irrelevant maskers [1, pp. 114] 
[Flags Tonal_list Non_tonal_list] = 
                   Decimation(Tonal_list, ... Non_tonal_list, Flags);
% Compute the individual masking thresholds [1, pp. 113--114]  
[LTt, LTn] = Individual_masking_thresholds(X', Tonal_list, Non_tonal_list);  

% Compute the global masking threshold [1, pp. 114] 
LTg = Global_masking_threshold(LTt, LTn);

if NumberOfBands < FFT_SIZE/2,
  % Determine the minimum masking threshold in each subband of NumberOfBands
  % [1, pp. 114]. 
  LTMin = LTmin(LTg, NumberOfBands);
else 
  % Map threshold LTg from non-linear to linear frequency indexes.
  LTMin = LTg(Map);
end

% Transpose row vectors for output
LTMin = LTMin';
Delta = Delta';

Implementing the Watermark

The masking threshold defines the frequency response of the loudness threshold minimum LTMin-Filter, which shapes the watermark. The filtered watermark is scaled to shift the watermark noise and added to the delayed original signal to produce the watermarked track. Listing Two is MATLAB code that implements the watermarking algorithm just described. In this implementation, an ASCII string is converted in its bit representation. The individual bits are embedded by means of two different bit patterns in each frame of the audio sample. The bit patterns are derived from the secret key used in the embedding process.

Listing Two

function [result, message] = Write(watermark, password)
% [result, message] = Write(watermark, password)
% Embeds the string specified by watermark into an audio file. result = 1 if
% watermarking process is successful otherwise = 0. message contains a
% string reporting possible errors. The watermarking process is determined 
% by a secret key string in password.
% See also READ.
% Copyright (c) 2001 by Fraunhofer-IGD
% $Revision: 1.0 $  $Date: 2001/07/12 $
% M.Arnold, 07/01
% -- INPUT --
% watermark: Information string to be embedded.  
% password: Secret key used during embedding and detection process for
% generation of random sequences.
% -- OUTPUT --
% result: Flag scalar indicating success or failure of the embedding procedure.
% message: Message string reporting the result of the embedding process.
%-------------

% Load tables for global variables only one time
global FFT_SIZE MIN_POWER NOT_EXAMINED IRRELEVANT TONAL NON_TONAL 
global TH INDEX BARK ATH SPL Map CB LTq HAMMING_513 EXP_RAD_513 
% Constants used by the psycho acoustic model
load('Common_Const.mat');
load('Tables_fs_44100.mat');

% --- Set parameter ---
parameter.WavFile              = 'UseFileDialog';
% Name of the marked file
parameter.WavFileNew           = 'marked_1.wav';
parameter.Blocksize            = FFT_SIZE;
parameter.BlocksPerFrame       = 10;
parameter.SamplesPerFrame      = parameter.Blocksize*parameter.BlocksPerFrame; 
parameter.NumberOfFilterCoeffs = FFT_SIZE/2; 
parameter.FilterGroupDelay     = parameter.NumberOfFilterCoeffs/2;
% Specify number of frequency bands used in the psycho acoustic model
parameter.PAMResolution        = FFT_SIZE/2;
% Specify quality of watermarked audio track
parameter.LTMinWANoiseConstant = 0;
% Specify frequencies which will be altered
parameter.frequencyIdxVector   = [20:120];
parameter.watermark            = watermark;
parameter.demo_size            = 5;

% Check if WavFile is specified. Otherwise popup file dailog. 
if isempty(parameter.WavFile) | strcmp(parameter.WavFile, 'UseFileDialog'),
  % Prepend the path to the home directory.
  if isunix
    fileName = '~/docs/research/own-paper/DrDobbs/';
  else
    fileName = 'U:\docs\research\own-paper\DrDobbs\';
  end
  % Get Filename and Path for Input-AudioFile  
  [fname, pname] = uigetfile( strcat(fileName, '*.wav') );
  if fname == 0 & pname == 0,
    message = ['Writing process canceled!'];
    result = 0;
    return
  end
  parameter.WavFile = strcat(pname, fname);
end
% --- Load WAVE file and calculate algorithm parameter ---
[properties Fs bits] = wavread(parameter.WavFile, 'size');
samples = properties(1); numChannels = properties(2)  
if  numChannels > 1,  
  message = ['Only MONO files!'];
  result = 0;
end 
% Calculate total number of frames for the file (fix rounds towards zero).
total_frames = fix(samples/parameter.SamplesPerFrame);
% --- Convert from ASCII to bit representation
stringLen = length(watermark);
if stringLen ~= parameter.demo_size,
  message = ['Length of watermark doesn''t match with configured size!'];
  result = 0;
end 
wmBits = reshape(dec2bin(double(watermark), 8)', 1, stringLen*8);
% Exit if to few frames
if total_frames < length(wmBits),
  message = ['Audio file to short to embed whole watermark!'];
  result = 0;
  return
end
% Write the WAVE header for output file
fmt = wavwrite_Header(samples, numChannels, Fs, bits, parameter.WavFileNew);
% --- Generate pattern matrix for different bits
for i = 0:1
  key = strcat(password, num2str(i));
  pattern(i + 1).Matrix = GeneratePatternMatrix(key, parameter);
end
% --- loop over bit frames --
for frameIdx = 0:total_frames - 1
  % Take bit frame from WAVE file
  Offset      = 1;
  StartSample = Offset + frameIdx*parameter.SamplesPerFrame;
  EndSample   = StartSample + parameter.SamplesPerFrame - 1;
  WavFrame    = wavread(parameter.WavFile, [StartSample EndSample]);  
  % Calculate index of Bit in wmBit, which has to be embedded.
  idx           = rem(frameIdx, length(wmBits)) + 1;
  pattIdx       = str2num(wmBits(idx)) + 1;
  patternMatrix = pattern(pattIdx).Matrix;
  % Embed watermark into frame. 
  [MarkedFrame, shiftSamples] = markFrame(WavFrame(:)', ...
                    patternMatrix, ...
                    parameter);
  % Write bit frame to output file. Don't write delay for first frame.
  if frameIdx == 0,
    [fidNew, err, samples_append] = ...
        append_wavedat(parameter.WavFileNew, fmt, ...
           real(MarkedFrame(parameter.FilterGroupDelay+1:end,:)) );
  else
    [fidNew, err, samples_append] = ...
        append_wavedat(parameter.WavFileNew, fmt, ...
                       real(MarkedFrame) );
  end
end
% Read remaining samples
WavFrameNew = wavread(parameter.WavFile, [(EndSample + 1) samples])';
% Write delayed shiftSamples + remaining samples to file.
WavFrameNew = [shiftSamples WavFrameNew];
[fidNew, err, samples_append] = ...
    append_wavedat(parameter.WavFileNew, fmt, WavFrameNew);
% Close output file
if fclose(fidNew) == -1
  message = ['Error closing WAVE file!'];
  result = 0;
  return
end
% Exit and report success
message = ['Watermark successfully embedded!'];
result = 1;
return

To detect the watermark that's embedded in the dataset:

Map the secret key and watermark to the seed of the random-number generator to generate the subsets C and D. This step is exactly the same as in the embedding process. C=A and D=B if the right key is used.
Decide for the probability of correct rejection 1-P_I according to the application and calculate the threshold T from the equation in Figure 1(a).
Calculate the sample mean E(z)=E(f(C,D)) and choose between the two mutually exclusive propositions:
- H₀: E(z)T watermark bit is not embedded.
- H₁: E(z)>T watermark bit is embedded.

The algorithm is based on hypothesis testing, which in turn depends on the appropriate test statistic.

Detecting the Watermark

During the detection process, the interpretation of the embedded bit is based on the comparison of the two different expectation values E(z) for the two different patterns. The different test statistics are implemented in CalcStatistic.m (available in the sourcecode that accompanies this article, download from opening page).

Test Statistic I

To determine the quality of the watermarking process, you can test the outcome. One metric is the difference between the population means of A and B rather than their values per se. The standardized test statistic is given by the equation in Figure 2(a).

Figure 2: Test statistic I.

Both sample means can be assumed normally distributed under the Central Limit Theorem, because of the large amount (M>>30) of Fourier coefficients. Therefore, '–' is normally distributed and you can make the estimations _'_',_'_'and . The independence of the random variables and , from the same set AUB of Fourier coefficients, leads to the approximations – 0 and –0.

Therefore, you can formulate the mutually exclusive propositions:

H₀: (z)=N(0,1)
H₁: _m(z)=N(_m,1), _m(k/ 1 +k²)x1/

with N(,s2) the normal distribution, mean , and the variation coefficent defined in Figure 2(d). _mcan be derived with the approximation in Figure 2(b). With these equations, the threshold can be calculated as T=z₁-P_I. According to the symmetry of the normal distributions (z) and _m(z), k can be calculated from _m–T=z₁–P_IIand the approximation for _m, see above. This is a reasonable assumption because and are both estimators calculated from the mixed sets A and B; see Figure 2(c).

Therefore, you can define the probabilities of correct detection (1-P_II), rejection (1-P_I), measure the variation coefficient in Figure 2(d) of the mean of set AUB, and calculate the necessary embedding factor k from the equation in Figure 2(c). But a factor k, ensuring the probabilities of correct detection or rejection according to Figure 2(c), may result in audible distortions. To satisfy both requirements, you can define the probabilities of correct detection (1–P_II), rejection (1–P_II), use the effective embedding factor k according to the psychoacoustic model, and calculate the number of Fourier coefficients from Figure 2(c).

Test Statistic II

According to the equation in Figure 2(a), you can extend the test statistic by normalizing the random variable z from equation >Figure 2(a) to the mean ('+')/2 of the whole set AUB; see Figure 3. Therefore, you expect z to be a random variable with mean 0 in the unmarked and an approximated mean of _m2k in the marked case; again, see Figure 3.

Figure 3: Test statistic II.

One disadvantage of this kind of metric is the complicated PDF is not suitable for the calculation of the threshold and embedding parameter. I measured _m for different k and verified the linear relation of equation in Figure 3.

Properties of the Algorithm

The general approach in quality evaluation is to compare the original signal with the watermarked signal for different k during subjective listener tests. To ensure the quality, the power of the watermarked noise is shifted just below the masking threshold. This results in an average embedding factor that is equivalent to k=0.15 derived without using the psychoacoustic model. Shifting the watermark noise to higher power levels decreases the quality and increases the average embedding factor. The embedding factor k=0.10 was used for robustness tests.

Security

The watermark and the marking procedure should also possess certain security properties. The space of distinguishable watermarks should be large enough, which is surely the case, because the number of different watermarks for M=256 results in approximately 4.7x10152 watermarks. The probability of false detection of a watermark can be determined by performing the following steps:

Determine the number of bits B out of the M bits of the embedded watermark, which gives a false detection.
Calculate the probability that i=B...M bits match with the embedded bitpattern from Figure 4.

Figure 4: Security.

The threshold T, indicating detection of the watermark, was chosen to be half the size of the maximum value z for the embedded watermark. The thresholds are plotted as horizontal lines; see Figure 5.

Figure 5: Security evaluation about the faked watermarks.

I used different audio tracks and marked all the tracks with the same watermark and an embedding factor of k=0.15. Afterwards, I constructed 30 different bit patterns with a certain percentage identical to the original pattern, then measured the values for the random variables. For both cases, I received a critical similarity of about 67 percent with the original watermark. According to the equation in Figure 4, the probability of detecting a watermark that isn't embedded is 10-35 for Statistic I.

Robustness

To test the robustness of our watermarking algorithm, I randomly choose 400 different combinations of watermarks and keys, and embedded this bit pattern 10 times into one audio track. All files were 16-bits signed stereo sampled at 44.1 KHz (CD quality).

Test conditions. All audio signals were watermarked with a embedding factor of k=0.10, then subsequently manipulated; see Figure 6.

Figure 6: Test scenario to measure detection values z manipulating the audio file.

To measure the robustness the total error probability, P_error=P_I+P_II due to the overlap of the PDFs in the unmarked and marked case were calculated from equations in Figures 1(a) and 1(b).
MPEG1 Layer III audio compression. I tested the robustness against MPEG 1 audio Layer III compression by using a compression rate of 128 Kbps for the watermarked signal.
Filtering. To test the robustness against filtering, I applied a bandpass filter to the watermarked signal by amplifying the signal by –9 dB in the low- and high-frequency domain. The cut-off frequencies have been 441 Hz for the low-pass filter, and 4410 Hz for the high-pass one.
Resampling. I sampled the original audio signals with a sampling rate of 44.1 KHz. The sampling rate of the watermarked audio data was reduced to 22.05 KHz and resampled to the original rate of 44.1 KHz. This causes audible distortions, especially in audio tracks carrying high frequencies.
Requantization. I also sampled audio tracks at 8 bits, since these are often used in games and multimedia applications. More specifically, I tested the process of requantization of a 16-bit watermarked audio signal to 8 bits and back to 16 bits. This increases the incoherent background noise of the audio track due to the rounding errors during the processing.
Robustness against cropping. The embedding of a watermark bit in every timeslice of about 116 milliseconds enables the detection of the watermark, even in the case of cropping or cutting, provided I have a contiguous part of at least 232 milliseconds from the watermarked audio stream.

Table 1 shows the test results for the different manipulations. The error probabilities of the several manipulation of the watermarked signal can be easily reduced by using more frequency coefficients. This is equivalent to the embedding of the watermark in a longer timeslice.

Table 1: Total error probabilities for different manipulations.

Possible Algorithm Enhancements

The watermarking algorithm presented here leaves room for further research in a variety of ways. Other test statistics and embedding functions can be investigated. The embedding of multiple noninterfering watermarks into the same audio track is demanded by certain applications. Perhaps one of the greatest challenges is the robustness against the so-called "jitter" attack (see "Attacks on Copyright Marking Systems," by F.A.P. Petitcolas, R.J. Anderson, and M.G. Kuhn, Lecture Notes in Computer Science, Vol. 1525, Springer-Verlag, 1998).

Acknowledgment

This work was partially funded by WEDELMUSIC (http://www.dsi.unifi.it/wedelmusic/), a Research and Development project in the IST program of the European Commission.

Michael is a member of the Department for Security Technology for Graphics and Communication Systems Fraunhofer-Institute for Computer Graphics. He can be contacted at [email protected].