The first modification needed in the G.711 decoder in order to allow for the error concealment is to introduce a 30 sample delay. This delay is used to smooth the transition between the end of the original (received) segment and the start of the synthesized segment. The second modification is that we maintain a circular buffer containing the last 390 samples (48.75 ms). The signal in this buffer is used to select a segment for replacing the lost frame(s).
When a loss is detected, the concealment algorithm starts by estimating the pitch period of the speech. This is done by finding the peak of the normalized cross-correlation between the most recent 20 ms of signal and the signal stored in the buffer. The peak is searched in the interval 40 to 120 samples, corresponding to a pitch of 200 to 66 Hz.
After the pitch period has been estimated, a segment corresponding to 1.25 periods is taken from the buffer and is used to conceal the missing segment. More specifically, the selected segment is overlap-added with the existing signal, with the overlap spanning 0.25 of the pitch period. Note that this overlap will start in the last few samples of the good frame (which is the reason we had to insert the 30 sample delay in the signal). The process is repeated until enough samples to fill the gap are produced. The transition between the synthesized signal and the first good frame is also smoothed by using an overlap-add with the first several samples of the received frame.
Special treatment is given to a number of situations. For example, if two or more consecutive frames are missing, the method uses a segment several pitch periods long as the replication method, instead of repeating several times the same pitch period. Also, after the first 10 ms, the signal is progressively attenuated, such that after 60 ms the synthesized signal is zero. This can be seen in Figure 3.1(c), where the amplitude of the synthesized signal starts to decrease slightly after 160 samples, even though the synthesized signal is still based on the same (preceding) data segment. Also, note that since the period of the missing segment is not identical to the synthesized segment, the transition to the new next frame may present a very atypical pitch period, which can be observed in Figure 3.1(c) around sample 1000.
The reader is directed to the ITU Recommendation [2] for more details of the algorithm. Results of the subjective tests performed with the algorithm, as well as some considerations about bandwidth expansion, can be found in [7]. Alternatively, the reader may refer to Chapter 16, which gives details of a related timescale modification procedure. For our purposes, it suffices to understand that the algorithm works by replicating pitch periods.
Other important elements are the gradual muting when the loss is too long and the overlap-add to smooth transitions. These elements will be present in most other concealment algorithms. By the nature of the algorithm, it can be easily understood why it works well for single losses in the middle of voiced phonemes. As expected, the level of artifacts is higher for unvoiced phonemes and transitions.
More elaborate concealment techniques will address each of these issues more carefully, further reducing the level of artifacts, at the cost of complexity. One possibility is to use an LPC filter and do the concealment in the "residual domain" [8,9]. Note that this is unrelated to the concealment of CELP codecs (which we will investigate in the next section). Here we simply use LPC to improve the extrapolation of the signal; the coefficients are actually computed at the decoder. In CELP codecs, we have to handle the problem of lost LPC coefficients.