Multilevel Delta Modulation with Switched First-Order Prediction for Wideband Speech Coding

In this paper a delta modulation speech coding scheme based on the ITU-T G.711 standard and the switched first-order predictor is presented. The forward adaptive scheme is used, where the adaptation to the signal variance is performed on frame-by-frame basis. The classification of the frames into weakly and highly correlated was done based on the correlation coefficient calculated for each frame, providing a basis for choosing the appropriate predictor coefficient. The obtained results indicate that the proposed model significantly outperforms the scalar companding system based on the G.711 standard. The obtained experimental results were verified using the theoretical model in the wide dynamic range of the input variance. DOI: http://dx.doi.org/10.5755/j01.eie.24.1.20156


I. INTRODUCTION
Speech coding is the process of obtaining a compact representation of speech signals for efficient transmission and/or storing in digital media [1]- [3].Speech coders are essential parts of Public Switched Telephone Networks (PSTN), Voice over Internet Protocol (VoIP), mobile communications, videoconferencing etc.To this end it is fundamental to discover the speech coding algorithms that provide high intelligibility and quality of speech at the consumer side.
The high bit rate ITU-T G.711 codec [4] and its extensions G.711.0 and G.711.1 have been accepted as a standard in many modern speech coding applications.G.711 is the companding system employing the piecewise linear approximation to the μ-law or the A-law logarithmic compression function.Its main qualities are low complex encoding algorithm and small delay for the high-quality reproduced speech [4].The wideband extension of G.711 known as G.711.1 is proposed in [5], [6] and has been standardized for wideband audio and speech signal processing.The authors in [7] proposed the two-stage quantization with embedded G.711 coder in the first processing stage, followed by the segmental uniform quantization that performs the reduction of the quantization error introduced in the first stage.In this way higher signal quality is achieved in comparison with the G.711 quantizer.
As a nonstationary process speech is usually transmitted in frames (certain number of samples), since the speech properties remain mostly unchanged within one frame.To obtain the desired performance, the quantizer requires some kind of adaptation at the frame level, e.g. using the variance or the probability density function (pdf).Different types of adaptation have been used in speech coding algorithms based on Pulse Code Modulation [8], [9], Differential Pulse Code Modulation (DPCM) [10] or Delta Modulation (ΔM) [11].
The later approaches, DPCM and ΔM as its special case, belong to the class of predictive coding algorithms [1]- [3], [12], [13].Delta modulation has become an attractive method for signal processing, due to its simple architecture [14].In particular, it includes one-bit quantizer along with the first-order predictor.Various modifications of ΔM have been proposed over the years to improve the performance of the basic structure, including the adaptive delta modulation [11], and the sigma-delta modulation.
High-quality DPCM speech coding scheme employing the scalar companding quantizer and the switched first-order predictor is proposed in [10].In this paper we keep the switched first-order predictor, but propose the modified ΔM configuration, with the embedded high-rate G.711 quantizer, and denote it as the multilevel delta modulation system.The adaptation to the signal statistics is performed frame-wise using the short-term estimate of the variance.The forward adaptive scheme was used, as it offers better performance with respect to the backward adaptation [15], and it is less sensitive to the transmission error [1]; however it requires sending the side information to the decoder side.The switched predictor chooses between two coefficients, one for weakly and one for highly correlated frames, based on the correlation coefficient calculated for the particular frame.
We test the performance of the proposed algorithm in the real environment using the speech signal, and we use Signal to Noise Ratio (SNR) as a measure of performance.The efficiency of the proposed algorithm is compared to the scalar companding system based on the G.711 standard [4].
The rest of the paper is organized as follows: Section II describes the companding system with embedded G.711 quantizer.Section III gives the overview of the proposed multilevel delta modulation speech coding scheme.Section IV summarizes and discusses the experimental results, and finally Section V concludes the paper.

A. Non-adaptive Scalar Compandor
In the companding quantization, the input is first transformed using a nonlinear compressor function, then further quantized using the uniform quantizer, and finally restored in expander using the inverse nonlinear function.The compressor and expander form a compander.In the G.711 standard [4], the piecewise linear approximation to the logarithmic μ-law compression function is performed, which is given by   where N is the number of quantization levels.
The borders between the segments xi are given as where 0,1,..., i L  while the cells borders xij and the representative levels yij in i-th (i = 0, 1,..., L−1) segment are given by , where 1,..., j m  . If we assume that information source is memoryless Laplacian with zero mean and unit variance having PDF where σ is the standard deviation, then the mean-squared distortion, which is a measure of irreversible error incurred during the quantization, consists of granular Dg and overload D0 distortion [1]: where yL−1,m is the representative level of the last cell in the last segment, that can be determined from (6).
Along with distortion, we use Signal to Quantization Noise Ratio for the performance estimation [1]- [3] 2 10 10 log .

B. Adaptive Scalar Compandor
Forward adaptive coding scheme operating on frame-byframe basis is illustrated in Fig. 1, where the adaptation to the short-term estimate of the variance is done for each frame of input signal.The building blocks of such scheme are a buffer, a variance estimator, a log-uniform quantizer QLU with L levels for the quantization of frame variance and an adaptive scalar compandor (Q), which codebook is updated frame-wise.The bit rate of the forward adaptive quantizer is given by where RLU = log2 L bits is side information.
In order to provide the appropriate theoretical analysis in a wide dynamic range of the input variances, we define the distortion and SQNR for the particular variance:     2 10 10 log .
III. DELTA WITH ADAPTIVE G.711 QUANTIZER AND SWITCHED FIRST-ORDER PREDICTOR A simple delta modulation scheme with a switched firstorder predictor is depicted in Fig. 2, where the adaptive scalar compandor is implemented using the forward adaptive scheme described in previous section.
The switched predictor in the feedback has at disposal two coefficients a1 and a2, and it chooses one of them according to correlation coefficient ρ estimated for each frame Specifically, if ρ < 0.8 the input frame is classified as weakly correlated and the switched predictor uses the coefficient a1, otherwise the frame is considered as highly correlated and the coefficient a2 is employed.
The introduced coding scheme works in a similar manner The prediction error signal is obtained such that the first sample in each frame x [1] is predicted using the last sample from the previous reconstructed frame x̂ [M], except for the first frame where x[1] = 0, since there is no previous frame in that case.Hence, frames should overlap by one sample.
Note that adaptation to the variance σe 2 = σx 2 (1 − ρ 2 ) is performed for each frame, where σx 2 is the variance of input signal and σe 2 is the variance of prediction error.
Encoder (Fig. 2(a)) sends to decoder one signal more (index K) compared to the one in Fig. 1, since one bit information about the selected switched predictor coefficient has to be transmitted, giving the bit rate Decoder (Fig. 2(b)) decodes the signal samples for each signal frame based on indices I, J and K.
In predictive coding, the overall Signal to Noise Ratio has two components, SQNR of the quantizer (see (15)) and the prediction gain G defined as [1] 10 2

IV. EXPERIMENTAL RESULTS AND DISCUSSION
In the beginning of this section, we present the theoretical results for two variants of the scalar companding system based on G.711, non-adaptive and adaptive, described in Section II, which are used as baselines.We assume Laplacian source signal at the input in the wide dynamic range of 50 dB, and we adopt σref 2 = 2 × 10 −3 .
The robustness of the considered non-adaptive scalar compandor is shown in Fig. 3, where the SQNR is plotted as a function of input signal variance using (10).Let us further analyse the theoretical multilevel adaptive ΔM model with the switched first-order predictor, which is equivalent to the proposed coder in Section III.It is known that the percentage of silence in speech is normally around 25 % [17]; hence we adopt the weight w = 0.25 that defines the share of weakly correlated frames.The adjacent samples in speech signal are highly correlated with correlation coefficient close to one [1]; hence for voiced frames of speech we adopt ρ2 = a2 = 0.97.On the other hand, for weakly correlated frames we use ρ1 = a1 = 0.3.The equivalent gain of the switched predictor can be calculated as where G1 and G2 refers to gain of weakly and highly correlated frames, respectively.The theoretical results (overall SNR) in this case are presented in Fig. 5 and show an evident improvement over the two non-predictive models of approximately 10 dB.
- Furthermore, we performed experiments on the speech signal that consists of 66 500 speech samples, sampled at 16 kHz.Speech is divided into F frames, each composed of M samples.We use L = 32 levels log-uniform quantizer for the frame variance quantization.As an objective measure of performance we use Signal to Noise Ratio (SNR).
SNR for j-th frame is determined as where xij and x̂ij are the input and the output speech samples, respectively.The average SNR is given by 1 1 .
Let us assume that P out of F frames are classified as weakly correlated, then using (20) we have: where indices "wc" and "hc" define weakly and highly correlated frames, respectively.
SNR of the whole system is given by where / w P F  is the experimentally determined probability of occurrence of the weakly correlated frame.
In Fig. 6 we present the correlation coefficient estimated using (16) and SNR using (21) over all frames of size M = 80 for the proposed multilevel ΔM (μ = 255, N = 256, L = 8, m = 16, RG.711 = log2 N = 8 bit/sample) [4].As it is obvious, in the areas of active speech the correlation coefficient ρ is close to 1, indicating the high predictability of the signal.Moreover, note higher SNR in the active speech area (up to 60 dB), and lower SNR in inactive speech frames (below 40 dB).
The switched predictor coefficients are determined in accordance to the estimated correlation coefficient of the available input speech.Thus, for a1 we adopted the value of average correlation coefficient calculated over all weakly correlated frames.Vice-versa, a2 is taken to be the average correlation coefficient calculated over all highly correlated frames.For the tested speech signal, assuming M = 80 samples for the frame, we get a1 = 0.23 and a2 = 0.95.
Table I summarizes the average values of SNR obtained according to (24) for the proposed multilevel ΔM, for various frame lengths (i.e.M = 80, 160, 240 and 320 samples) and different number of quantization levels (32, 64, 128 or 256).One can observe that the highest SNR values in all considered scenarios is obtained for M = 80, which is expected as the quantizer codebook is adjusted more often.As a baseline, we also provide in Table I the results for the non-predictive case, i.e. forward adaptive scalar compandor (Fig. 1), denoted as SNRPCM.The proposed multilevel ΔM is superior compared to the baseline, with 10 dB gain in SNR.Note that the obtained experimental results are in agreement with the theoretical ones shown in Fig. 3-Fig.5, indicating that there is a valid reason to apply the proposed solution in high-quality quantization of speech signal.
The complexity of the proposed algorithm remains unchanged compared to the baseline, it is equal to O(N 2 ).

V. CONCLUSIONS
In this paper, the speech coding algorithm based on the multilevel delta modulation and the first-order switched prediction based on correlation is considered.The obtained results indicate that the proposed algorithm offers significantly better signal quality compared to the G.711 algorithm, with about 10 dB gain in SNR.Moreover, since the proposed solution has a small complexity, it can be successfully employed for high-quality speech coding.

Manuscript received 12
May, 2017; accepted 28 September, 2017.This work was supported in part by the by the Ministry of Education and Science of the Republic of Serbia, grant no.TR 32051 and TR 32035, within the Technological Development Program.

,
μ is the compression factor and xmax is the upper support region threshold.The support region [−xmax, xmax] of the quantizer is divided into 2L segments (L positive and L negative), where each segment is composed of m uniform cells.Each consecutive segment in the positive part of the quantizer characteristic is twice as large as the previous.As the quantizer characteristic is symmetric, the same holds true for the negative segments.The segments width denoted by Δi are determined as xmax is given as in[16]

Fig. 1 .
Fig. 1.Forward adaptive coding scheme.It works in the following way.The buffer stores one frame or M samples of the input signal and the variance of the input speech is determined in the variance estimator the variance quantization we use log-uniform quantizer having L-levels which outputs are the input signal is defined as [20 log10(σmin), 20 log10(σmax)].The quantizer codebook is updated according to the quantized variance, hence, for adaptive threshold and levels we get: tf and yf are threshold and levels of nonadaptive quantizer, respectively.Observe in Fig.1two digital signals I and J, where signal I carries log2 N bit code-words that represent signal within the frame and signal J carries the information for quantized variance used for adaptation consisted of log2 L bits per frame (additional or side information).

Fig. 2 .
as described in Section II, where the prediction error signal e[n] = x[n] − x̂[n] is fed to the quantizer input, where x[n] is the original sample value and x̂[n] is the predicted sample value provided at the local decoder in the feedback of the system.Delta modulation system: (a) coder; (b) decoder.

Fig. 3 .
Fig. 3. Theoretical model: Non-adaptive scalar compandor (Section II-A) for various N and μ = 255 in a wide dynamic range.

Figure 4 Fig. 4 .
Figure4plots the theoretical SNR using (15) of the forward adaptive scalar compandor in the assumed variance range, for L = 32 levels log-uniform quantizer used for variance quantization and different number of quantization levels N for the adaptive quantizer.As it is evident, SNR is quite constant across the entire range.

Fig. 5 .
Fig. 5. Theoretical model: multilevel adaptive ΔM with the switched first order predictor for various N, μ = 255 and L = 32 for QLU in a wide dynamic range.

TABLE I .
THE PERFORMANCE OF THE PROPOSED MULTILEVEL ΔM, FOR VARIOUS FRAME LENGTHS AND NUMBER OF QUANTIZATION LEVELS.