Web Development

Audio Watermarking

By Michael Arnold, November 01, 2001

Digital watermarking is a security technique that provides copy protection, authentication, and more for audio and other forms of information. The approach Michael presents here is based on a statistical algorithm working in the Fourier domain.

The problems associated with copyrights and the protection of intellectual property rights (IPR) generally arise from the transition from analog to digital data representation. Easy bit-by-bit reproductions of originals and the simple distribution of information across networks cause additional problems. However, one way IPR can be protected is by integrating copyright information directly into the data. This technique is referred to as "digital watermarking."

In general, there are both secret watermarks and public watermarks. Secret watermarks can be used as authentication and content-integrity mechanisms. This implies that the watermark is a secured link readable only by authorized persons with knowledge of the secret key. Public watermarks, on the other hand, act as information carriers, and the watermark is readable to everybody. These public watermarks are detectable or removable by third parties.

Audio watermarking has a number of applications, including:

Copyright protection. Copyright owners can be authenticated by a secret key for reading a secret watermark.
Monitoring copying. Embedding secret watermarks to trace illegal copying.
Fingerprinting. In point-to-point distribution environments, information about authenticated customers can be embedded as secret watermarks prior to secure delivery of the data.
Detection of content manipulation. Determining whether tampering (content manipulation) of the authorized state has occurred.
Information carrier. A public watermark embedded in the datastream can act as a link to external databases storing information about copyright and license conditions.

Depending on the kind of watermark and its intended use, watermarking techniques have various properties. For instance, in terms of signal-processing properties, the watermark should not be perceivable by observers, but should be robust against intentional or anticipated manipulations (compression, filtering, resampling, requantization, cropping, scaling, and the like).

As for security properties, the watermarking procedure should rely on a key to ensure security — not the algorithm's secrecy — and the watermark should be statistically undetectable. Furthermore, the algorithm should have a mathematical formulation and be published. The coding procedure should be symmetric or asymmetric (in the same sense as public-key cryptographic algorithms), depending on the application. Also, the procedure should be able to withstand collusion attacks that use multiple watermarked copies.

Finally, the watermark algorithm should support real-time processing; be adjustable to different degrees of robustness, quality, and amount of data; tunable to different media; and should support simultaneous embedding of multiple watermarks.

The Watermarking Method

To implement watermarks, the large amount of data of typical CD-quality audio signals makes it possible for you to use statistical methods that rely on large sets. The method I'll present here is a statistical algorithm working in the Fourier domain. It embeds 1 bit of the watermark in every timeslice of about 116 milliseconds and doesn't need the original audio stream or additional data to read the watermark. The algorithms are based on the Patchwork approach (see "Techniques for Data Hiding," by W. Bender, D. Gruhl, N. Morimoto, and A. Lu, IBM Systems Journal, 1996). Similar methods work in the time domain (see "Robust Audio Watermarking in the Time Domain," by P. Bassia and I. Pitas, Proceedings of Eusipco-98, Ninth European Signal Processing Conference, September 1998). The audio watermarking method presented here has been adapted to the frequency domain and does not require the original audio track to detect the watermark.

For example, assume that a dataset contains 2N values (in our method, frequency coefficients of the Fourier domain). To embed one bit of the watermark into the dataset:

Map the secret key and the watermark to the seed of a random-number generator. Start the generator to pseudorandomly select two intermixed subsets A={a_i}_i₌₁_,...,M and B={b_i}_i₌₁_,...,M of equal size MN from the original set.
Formulate the test hypothesis (H₀) and alternative hypothesis (H₁). The appropriate test statistic z will be a function of the sets A and B with the probability distribution function PDF (z) in the unmarked case and _m(z) in the marked case:
- H₀: The watermark is not embedded (z follows PDF (z)).
- H₁: The watermark is embedded (z follows PDF _m(z)).
The equations in Figures 1(a) and (b) describe the two kinds of errors incorporated in hypothesis testing. Hypothesis testing is used in the detection procedure to decide whether the watermark bit is embedded. The threshold T is used in the detection step.

Figure 1: Embedding the watermark.
Alter the selected elements a_iA and b_iB, I=1,..., M according to the embedding functions in Figure 1(c). The alterations of the Fourier coefficients have to be performed in a way that achieves inaudibility. Therefore, the changes are derived from a psychoacoustic model and are different for the individual Fourier coefficients. The alterations of the average value + can be described by an effective embedding factor defined by k:=(+)/(+), ='–, and ='–.

Psychoacoustic Models

Psychoacoustic models used in current audio-compression coders apply frequency and temporal masking effects to ensure inaudibility by shaping the quantization noise according to the masking threshold. A watermarking procedure, in turn, should use already-existing models for shaping the watermark noise. The various psychoacoustic models differ in complexity and in implementation of masking effects. I used the psychoacoustic Model 1 Layer I of ISO-MPEG with fs=44.1 KHz sampling rate. To iteratively allocate the necessary bits, the MPEG standard calculates the signal-to-mask ratios (SMR) of all the subbands. This is not necessary in our case, since only the masking threshold for each block of samples is of interest. Therefore, the necessary steps in calculation of the masking threshold for each block are:

Calculate the power spectrum.
Identify the tonal (sinusoid-like) and nontonal (noise-like) components.
Decimate the maskers to eliminate all irrelevant maskers.
Compute the individual masking thresholds.
Compute the global masking threshold.
Determine the minimum masking threshold in each subband.

Listing One is MATLAB code that implements the calculation of the masking threshold according to the MPEG 1 Layer I model just described.

Listing One

function [LTMin, Delta] = PsychoAcousticModel(Input, NumberOfBands)
% Main function - sampling rate fs = 44100; bitrate = 128;
%   Author: 
%          Fabien A.P. Petitcolas ([email protected])
%          Computer Laboratory
%          University of Cambridge
%   Corrections and improvements:
%          Teddy Furon ([email protected]), 
%          Laboratoire TSI - Telecom Paris
%          UIIS Lab - Thomson multimedia R&D France 
%          Michael Arnold ([email protected])
%          Fraunhofer Institute for Computer Graphics (IGD)     
%   References: 
%    [1] Information technology -- Coding of moving pictures and associated 
%      audio for digital storage media at up to 1,5 Mbits/s -- Part3: audio. 
%      British standard. BSI, London. October 1993. Implementation of 
%      ISO/IEC 11172-3:1993. BSI, London. First edition 1993-08-01. 
%   Legal notice: 
%    This computer program is based on ISO/IEC 11172-3:1993, Information 
%    technology -- Coding of moving pictures and associated audio for digital 
%    storage media at up to about 1,5 Mbit/s -- Part 3: Audio, with the 
%    permission of ISO. Copies of this standards can be purchased from the 
%    British Standards Institution, 389 Chiswick High Road, GB-London W4 4AL,  
%    Telephone:+ 44 181 996 90 00, Telefax:+ 44 181 996 74 00 or from ISO, 
%    postal box 56, CH-1211 Geneva 20, Telephone +41 22 749 0111, Telefax 
%    +4122 734 1079. Copyright remains with ISO. 
%---------------------------------------------------------------------------- 
%
% [LTmin, Delta] = PsychoAcousticModel(Input, NumberOfBands) computes the
% minimum masking threshold LTmin from Input vector. NumberOfBands specifies
% the required frequency resolution.
%
% -- INPUT --
% Input: Row vector of Blocksize (= FFT_SIZE = 512) samples with float values
% scaled within the range [-1, 1].
%   
% NumberOfBands: Integer value. For Blocksize samples this value is
% of the elements [16 | 32 | 64 | 128 | 256].
%  
% -- OUTPUT --
% LTmin: Column vector with FFT_SIZE/2 elements containing the minium loudness
% threshold values in dB.
%  
% Delta: Delta = 96dB - max(X). Delta is a scalar containing the difference to
% 96 dB for the input.  
% ------------
  
% Define global constants 
% (loaded from Common_Const.mat and Tables_fs_44100.mat in calling function) 
% FFT_SIZE = 512: Length of analysis window (Input vector). 
% MIN_POWER = -200: Used for initialisation to avoid taking log(0).
%
% INDEX = 1, BARK = 2, ATH = 3: Column indexes for TH, Tonal_list and
% Non_tonal_list.
% SPL = 2: Column indexes for the Tonal_list and Non_tonal_list for Sound
% Pressure Level. 

% TH is a 106x3 matrix. 
% TH(:, INDEX): Frequency indexes at the top end of each critical band
% (corresponding to absolute frequency values of table D.1b (pp. 117), fs =
% 44.1 kHz).
% TH(:, BARK): Top end of each critical band rate.
% TH(:, ATH): Absolute ThresHold in quiet (includes offset of -12dB for bit
% rates >= 96 kbits/s from table D.1b (pp. 117) for fs = 44.1 kHz)

% NOT_EXAMINED = 0, TONAL = 1, NON_TONAL = 2, IRRELEVANT = 3: Flags
% describing the component type.

% Map is a row vector with 256 elements. It maps the 106 non-linear frequency
% coefficients onto the 256 frequency indexes.

% CB is a column vector with 25 elements.
% CB: It contains the indexes for the top end of each critical band (24 bands)
% in terms of the 106 indexes (column two of D.2b (pp. 123) for 44.1 kHz).
  
% LTq: Column vector with 106 elements, approximating absolute threshold. 
% -------------------------------------------------------------------------
  
global FFT_SIZE MIN_POWER NOT_EXAMINED IRRELEVANT TONAL NON_TONAL 
global TH INDEX BARK ATH SPL Map CB LTq 

% Psychoacoustic analysis 

% Compute the FFT for power spectrum estimation [1, pp. 110]. 
[X, Delta] = FFT_Analysis(Input); 

% Find the tonal (sine like) and non-tonal (noise like) components of the
% signal [1, pp. 111--113]
[Flags Tonal_list Non_tonal_list] = Find_tonal_components(X); 

% Decimate the maskers: eliminate all irrelevant maskers [1, pp. 114] 
[Flags Tonal_list Non_tonal_list] = 
                   Decimation(Tonal_list, ... Non_tonal_list, Flags);
% Compute the individual masking thresholds [1, pp. 113--114]  
[LTt, LTn] = Individual_masking_thresholds(X', Tonal_list, Non_tonal_list);  

% Compute the global masking threshold [1, pp. 114] 
LTg = Global_masking_threshold(LTt, LTn);

if NumberOfBands < FFT_SIZE/2,
  % Determine the minimum masking threshold in each subband of NumberOfBands
  % [1, pp. 114]. 
  LTMin = LTmin(LTg, NumberOfBands);
else 
  % Map threshold LTg from non-linear to linear frequency indexes.
  LTMin = LTg(Map);
end

% Transpose row vectors for output
LTMin = LTMin';
Delta = Delta';

1 2 3 4 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Web Development