Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Error-Resilient Coding for Audio Communication - Part 1: Waveform and CELP Speech Codecs


3.3 LOSS CONCEALMENT FOR CELP SPEECH CODECS
In the previous section we looked at error concealment for PCM coded speech. In PCM coded speech, each speech frame is encoded independently (in fact, each sample is encoded independently). For this reason, the loss of one packet does not impair the decoding of subsequent frames. However, since no redundancy is removed from the signal, toll quality speech using G.711 requires 64 Kbps. Many other codecs will remove more redundancy from the signal, and therefore require a lower rate.

More recent codecs are actually quite aggressive in removing redundancy. For example, several flavors of CELP coding have been used in speech codecs standardized by the ITU, including G.728 [3], G.729 [4], and G.722.2 [6]. Other organizations have also standardized several other CELP codecs, including the European Telecommunications Standards Institute (ETSI), which standardized several GSM (Global System for Mobile Communications) codecs [10] and the 3GPP (Third Generation Partnership Project) AMR (Adaptive Multi-Rate) codec [11], as well as the US Department of Defense (DoD), which standardized one of the first LPC codecs, the DoD FS-1016 [12], and more recently a 2.4-Kbps mixed excitation linear prediction (MELP) codec, the MILSTD-3005 [14].

While a full understanding of a CELP codec is outside the scope of this chapter, we will need a basic understanding in order to deal with the concealment techniques used in association with these codecs. We will now present a quick summary of important elements of a CELP codec.

Figure 3.2 shows a block diagram of a typical CELP decoder. The first important element in these codecs is the use of a Linear Prediction (LP) filter, indicated as "LPC Synthesis Filter" in the figure. The second element is the use of a codebook as the input to the filter (thus the name "code excited linear prediction, CELP"). We are mostly concerned with the decoding operation so that we can verify what will happen when a frame is lost. In Figure 3.2, the wide arrows indicate the places where data or parameters are received.

FIGURE 3.2: Block diagram of a basic CELP codec.

We see that the decoder will receive information relating to the LP filter (possibly including a long-term predictor, based on pitch) and on what part of the codebook to use as excitation. Specific CELP codecs will vary in how the codebook is populated, if the codebook is adaptive or not, and on how the filter coefficients are encoded and transmitted. Other differences, less relevant to our problem, relate to how the search on the codebook is performed, how filter coefficients are interpolated, and so on. More details about CELP codecs can be obtained from several sources, for example, from [13].

To understand the key elements of loss concealment for CELP codecs, we will now take a look at the loss concealment technique used in G.729. This ITU codec is a typical CELP codec and operates at 8 Kbps. It uses 10-ms frames and two codebooks: a fixed algebraic codebook and an adaptive codebook (based on the recent past excitation signal). The LPC filter is transmitted by first converting from LPC coefficients to Line Spectral Pairs (LSP), which are then differentially encoded by a vector quantization scheme. When a frame is lost, the decoder will take four specific actions to conceal the loss:

  • Repeat the synthesis filter parameters. Since the differential information from the lost frame is not available, the same parameters of last received frame are used.
  • Attenuate the adaptive and fixed codebook gains. The fixed codebook gain is reduced by 2% at each 5-ms subframe. The adaptive codebook gain is attenuated by 10% at each subframe and is also limited to 0.9. Note that reducing these gains will decrease the output energy, helping to hide artifacts produced by the concealment.
  • Generate the replacement excitation. Since no excitation is received regarding the lost frame, a replacement excitation needs to be generated. The way the excitation is generated depends on the periodicity classification of the previous frame. If the previous frame was classified as periodic, the excitation is generated by the adaptive codebook only, and the pitch delay is set to the same as the previous frame. If more then one frame is lost, each lost frame will increment the pitch by one. However, if the previous frame was classified as aperiodic, the excitation is taken only from the fixed codebook. The entry of the codebook to be used as excitation is based on a pseudorandom algorithm.
  • Attenuate the memory of the gain predictor. Since the gains are transmitted on a recursive basis, by using a predictor, the exact state of the predictor is lost when a frame is missing. That will imply that even if the next frame is received without errors, the gains will not be correctly decoded. To help alleviate this problem, the value of the gain predictor is updated with an attenuated version of the codebook energy.

Note that the first three actions are related to generating the signal segment corresponding to the lost frame. The fourth item is related to reducing the artifacts produced in future frames, due to the mismatch in the internal state of the decoder. Rosenberg [15] analyzed the behavior of G.729 under losses and concluded that the artifacts produced by the internal state mismatch are actually more significant (subjectively) than the artifacts introduced by synthesizing the lost frame per se. This parallels the findings for video detailed in the previous chapter. He also concluded that the artifacts due to the mismatch last for approximately 70 to 100 ms.

Error concealment algorithms for CELP codecs are generally very codec specific. The error concealment used in G.729 is relatively simple, but it is a good example of how error concealment for CELP codecs work. Because of the importance of mitigating the effects of the internal state mismatch, more elaborate concealment techniques are highly associated with the particular codec they apply to. Furthermore, many modern CELP codecs are already designed with error concealment in mind and provide an associated algorithm that usually performs well.

An example of a more elaborate concealment technique is the one used in the Wideband Adaptive Multirate codec (AMR-WB). This codec is standardized as the 3GPP recommendation TS 26.190 and as ITU G.722.2 [6]. The error concealment algorithm is described in standards ITU G.722.2 Annex I and in 3GPP TS 26.191. It follows the same basic principles of the technique described earlier, but it increases the performance at higher loss rates by having several different procedures for each one of six different states. The states are essentially a measure of how reliable the current state of the codec is. The reader is directed to the specification for more details of the concealment algorithm [6].

Coming up in Part 2: Loss concealment for lapped transform codecs.

References:
[1] ITU-T Recommendation G.711, Pulse code modulation (PCM) of voice frequencies, November 1988.
[2] ITU-T Recommendation G.711, Appendix I, A high quality low-complexity algorithm for packet loss concealment with G.711, September 1999.
[3] ITU-T Recommendation G.728, Coding of speech at 16 kbit/s using low-delay code excited linear prediction, September 1992.
[4] ITU-T Recommendation G.728, Coding of speech at 8 kbit/s using conjugate-structure algebraic-code-excited linear prediction (CS-ACELP), March 1996.
[5] ITU-T Recommendation G.722.1, Low-complexity coding at 24 and 32 kbit/s for hands-free operation in systems with low frame loss, May 2005.
[6] ITU-T Recommendation G.722.2, Wideband coding of speech at around 16 kbit/s using Adaptive Multi-Rate Wideband (AMR-WB), July 2003.
[7] R. V. Cox, D. Malah, and D. Kapilow, "Improving upon toll quality speech for VOIP," Signals, Systems and Computers, 2004. Conference Record of the Thirty-Eighth Asilomar Conference, vol. 1, pp. 405"409, November 2004.
[8] E. Gunduzhan and K. Momtahan, "Linear prediction based packet loss concealment algorithm for PCM coded speech," IEEE Transactions on Speech and Audio Processing, vol. 9, num. 8, pp. 778"785, November 2001.
[9] M. Elsabrouty, M. Bouchard, and T. Aboulnasr, "Receiver-based packet loss concealment for pulse code modulation (PCM G.711) coder," Signal Processing, vol. 84, pp. 663"667, 2004.
[10] K. Jarvinen et al., "GSM enhanced full rate speech codec," Proc. of ICASSP, vol.2, pp. 771"774, April 1997.
[11] 3GPP Recommendation TS 26.071, AMR speech Codec; General description, ver 6.0.0, December 2004.
[12] T. E. Tremain, "The Government Standard Linear Predictive Coding Algorithm: LPC-10," Speech Technology Magazine, pp. 40"49, April 1982.
[13] X. Huang, A. Acero, and H. Hon, Spoken language processing: A guide to theory, algorithms and system development," Prentice Hall, 2001.
[14] "MELP vocoder algorithm: The new 2400 bps federal standard speech coder," Atlanta Signal Processors, Inc., available at http://www.aspi.com/tech/specs/pdfs/melp.pdf.
[15] J. Rosenberg, "Distributed Algorithms and Protocols for Scalable Internet Telephony," Ph.D. thesis, Columbia University, 2001.

Printed with permission from Academic Press, a division of Elsevier. Copyright 2007. "Multimedia Over IP and Wireless Networks" edited by Mihaela van der Schaar and Philip Chou. For more information about this title, please visit Academic Press.

Related links:
Want to know how VoIP works? Protocols, codecs, and more
TCP/IP poised to take over audio distribution
Enhancing VoIP Voice Quality with Wideband Speech
High-Quality Speech Compression Without Royalties
Introduction to A/V transcoding for CE


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.