Speech Signal Coding Using Forward Adaptive Quantization and Simple Transform Coding

The paper proposes a novel speech signal coding scheme that implements a simple transform coding and forward adaptive quantization. The proposed scheme is adapted to the input signal variance, providing highly efficient bandwidth usage, whereas implemented transform coding provides sub-sequences with more predictable signal characteristics, so that more suitable signal processing can be performed. The aforementioned transform coding precedes adaptive quantization, providing additional compression. The objective quality measure used for system performance estimation is SQNR (signal-to-quantization-noise ratio), which represents a standard measure for lossy coding types. The influence of transform coding is discussed by comparing the obtained results with the corresponding one achieved by applying only the same adaptive quantization. Furthermore, the comparison with system performance of PCM (pulse-code modulation) coding system confirms that the proposed coding scheme has a lot of potential for further implementation, since that the proposed system ensures SQNR gain up to 4.0983 [dB] for various values of system parameters. DOI: http://dx.doi.org/10.5755/j01.eie.22.3.15318

Index Terms-Speech signal coding; transform coding; forward adaptive quantization; pulse-code modulation.

I. INTRODUCTION
Speech signal presents a non-stationary process, whose average power is significantly varying in time domain, resulting in a wide dynamic range. However, occurrence of a wide set of frequencies is rare. Consequently, signal compression methods can be applied successfully, whereas input signal can be described with Laplacian source. This way, signal compression makes storing and transmission of the digital signal easier, since it requires less memory resources and narrower bandwidth for transmission while customers' experience is satisfactory [1], [2]. Although natural speech signal processing is the mostly researched, traditionally, with the growth of information technologies a lot of papers are dedicated to synthetic speech signal processing, due to its' importance in education (distance learning, foreign languages, blind individuals) and automatic recognition [3], [4].
This paper proposes a novel speech signal coding scheme based on transform coding with application of forward adaptive quantization. The main idea is to exploit simple transformations for signal decomposition, providing narrower bandwidth usage for signal transmission over the Manuscript  channel and lower bit-rates. The proposed transformations decompose input signal into two independent sequences with more predictable statistical characteristics. Moreover, those facts have motivated us to adapt the coding scheme to individual sequence and perform more efficient signal quantization. Thus, special attention is paid to support range discussion [5] [6].
There are two main kinds of adaptive quantizers: forward adaptive and backward adaptive quantizers. The coding scheme proposed in this research implements forward adaptive quantization, due to its' less sensitivity to transmission errors -a common problem in voice over IP applications. Moreover, it is well-known from literature that forward adaptive quantization provides SQNR gain of 1 [dB] in comparison to backward adaptation [7]. To sum up, the forward adaptation was introduced to further adapt range of varying subsequences of the input signal [8].
The paper is organized as follows. In Section II it is provided an overview of transform coding and it is proposed quantizer design. The numerical results and performance analysis will be presented in Section III. In the end, conclusions and future research plans will be presented in Section IV.

II. TRANSFORM CODING AND QUANTIZERS DESIGN
Transform coding is one of the most popular methods for data compression, and certainly an indispensable part of systems that perform high compression rate. Nowadays, a lot of transformations such as wavelet, DCT (Discrete Cosine Transform) or Hadamard are widely used in different systems -not only for speech signal processing, but also for image and video processing. However, with increasing the level of transformation complexity, required processing time is also increased, producing higher delay. On the other hand, one of the most important characteristics of systems that transmit speech signal over the channel is very low processing delay, aiming real-time transmission. Consequently, in this paper we propose a novel speech signal coding scheme that implements a simple transform coding. The applied transformations divide the input signal into two branches with different support range. Next, forward adaptive quantization in each branch is applied in order to make additional signal compression. Speech signal is sampled at 8 [kHz], which is standard sampling frequency for digital telephone communication and VoIP. The proposed coding scheme is shown in Fig. 1  Simple transformations applied into this coding scheme are defined by following expressions [8]: where xn and xn+1 are samples of the input signal, whereas y1 and y2 represent transformed signals. Signals obtained after transformations have variances σ1 2 and σ2 2 , which depend on the input signal variance σx 2 and correlation coefficient ρ Transformed sequences are independent and they are further coded separately by using quasi-logarithmic quantizers with forward adaptation (quantizers Q1 and Q2).
Quantizers Q1 and Q2 are designed using μ-logarithmic compression law, whose compression function is defined with [7] where max x x  , xmax is support range of quantizer whereas μ is compression factor. According to μ-logarithmic compression function, decision thresholds xi  and representation levels yi  are obtained as [7] [10]: where 1, 2,..., i N  . In order to achieve higher reconstructed signal quality, adaptive quantization on signal's variance has been performed. The scheme of adaptive quantization is shown in Fig. 2 [10].
After reception of transformed signal, it is lead to the buffer at adaptive quantizer entrance.
where k=1,.., Ng, Δ lu represents the quant width, whereas Ng is a number of representation levels [3], [11]. Log-uniform quantizer is designed for low and middle bit-rates (number of quantization levels (Ng) is 2, 4, 8 and 16). Variance estimator sends calculated standard deviation of frame to the log-uniform quantizer. This way, quantized standard deviation is used for determining support range values (xmax1 and xmax2) for quantizers Q1 and Q2, respectively. This way, quantizers Q1 and Q2 are adapted to the input signal for each frame. These values are calculated with: where represents quantized standard deviation, ρ is correlation coefficient of the input signal whereas xml is support range of Q1 and Q2 for Laplacian source with the unit variance [9], [12], [13]: where N is the average number of quantization levels of quantizers Q1 and Q2, whereas µ is compression factor. SQNR for proposed coding scheme is calculated by averaging SQNR value for all frames [ (13) where SQNR per frame is obtained by In previous equation, with Dj is denoted signal distortion, which can be calculated with where with ^ are denoted quantized values that are transmitted to the decoder.

III. NUMERICAL RESULTS
In this section, experimental results of applying the proposed coding scheme are presented. The benchmark speech signal is recorded in Laboratory of Acoustics, Faculty of Electronic Engineering, University of Nis. The signal is natural, it is sampled at 8 [kH] and its' variance is equal to 0.9086. The obtained results, after processing it with the proposed scheme, are compared to the corresponding one obtained by using coding scheme without transformation coding that precedes forward adaptive quantization as well as with performance of PCM coding system. The experiments are performed for different frame sizes M (40, 80, 160, 240), low and middle bit-rates for log-uniform quantizer Q0, Ng  (2,4,8,16), different average bit-rates of quantizers Q1 and Q2, and for compression factor µ = 16.9227, taken from paper [4]. Table I shows SQNR values obtained by using the proposed system, for various value of average bit-rate:  Table I and Table II, the total average bitrate is calculated with Rav = R+Rv, where Rv = Ng /(2M) represents the required number of bits/sample to transmit signal variance for observed frame.
By observing results shown in Table I, it can be concluded that the performance achieved by using the proposed scheme for all presented combinations of frame size and number of representational levels of log-uniform quantizer Q0, ensures higher SQNR than PCM system -the gain is reaches even 4 . Furthermore, it is evident that the proposed system ensures different gain depending on the total average bit-rate. Table II shows the comparison of SQNR values obtained by using the proposed scheme without transformation coding included and performance of PCM system. The comparison is provided for various values of system parameters. By observing Table II, it can be seen that for R = 6[bits/sample], obtained results show gain for 4, 8 and 16 representational levels (Ng), whereas for Ng = 2, PCM system provides better performance.
However, it should be noted that the achieved gain is much lower, unlike using transformation coding, and it is less than 0.3 [dB]. Furthermore, it can be concluded that for R = 7[bits/sample], PCM system ensures better performance for the all observed cases. Taking aforementioned discussion into account, we can conclude that the proposed transformation coding ensures much better performance, and that relative gain is higher than 4[dB]. Next, if we compare the best obtained performance (R = 7, M = 240, Ng = 16) with the log-PCM based on G. 711 [14], achieved gain is about 10[dB].

IV. CONCLUSIONS
In this paper, it has been proposed a new two-branch speech signal coding scheme, which implements a simple transform coding and forward adaptive quantization. The implemented transformations divide signal, making two independent streams with more predictable characteristics that are exploited for more accurate signal reconstruction. Forward adaptation is applied in both branches, providing additional signal compression of previously transformed input signal. The proposed system was tested for benchmark speech signal, recorded in Laboratory of Acoustics, Faculty of Electronic Engineering, University of Nis. The obtained results have shown that the proposed simple transform coding is very suitable for speech signal processing and it ensures gain even more than 4[dB], comparing to the same forward adaptive scheme, without transform coding. Furthermore, it has been shown that the proposed scheme has better performance than PCM system and that for all discussed system configurations achieves gain between 1.85 [dB] and 4.0983 [dB].
In the future, we will intend to research not only the natural speech signal, but the synthetic speech signal due to its' rising importance in pattern recognition systems. Different quantizers' design will be one of the most challenging tasks, due to different dynamics of synthetic speech signal. We can expect that design may include some simpler coding solutions, and possibly dual-mode quantization instead of forward adaptation will provide the same quality level, as the proposed scheme for natural speech signal in this paper.