FPGA Based Walsh and Inverse Walsh Transforms for Signal Processing

 Abstract—In this paper, we design and implement a set of Walsh transform and inverse Walsh transforms for signal processing. The Walsh and inverse Walsh transforms are designed to produce correct results for any input data combinations by providing sufficient word lengths at every steps of the design. Addition, subtraction and dyadic convolution processes have been chosen to demonstrate the performance of the designs. Detail word lengths designs in order to minimize the circuits are presented. It is found that the proposed Walsh transform structure is superior to many of the reported results when it is implemented on FPGAs in terms of area and speed.


I. INTRODUCTION
Digital signal processing (DSP) is a well established area of research.The techniques for analysis, synthesis and processing of two or more digital signals have been established.However, certain tasks of DSP are difficult to be performed in time domain and hence the information has to be transformed to other domains [1].
Frequency domain is the most popularly used domain for the tasks which cannot be, or are difficult to be done in the time domain.Fourier transform is the most widely used technique for transforming the information from time domain to frequency domain.Discrete Fourier Transform (DFT) techniques for analyzing periodic digital signals exist, however the DFT technique is quite complex resulting in many problems during hardware realization and its use is justified only with very complex systems.
Orthogonal functions like Walsh Transform may also be used to analyze signals in frequency domain.The Walsh transform may be obtained using the Walsh functions [1], [2].It is also well known that the Walsh functions may be evaluated using Rademacher functions [1]- [3].Many scientists preferred this technique since the Rademacher functions may be conveniently realized using a counter [4], [5].However, it should be noted that it is not always necessary to use the above technique and the Walsh Manuscript received October 6, 2011; accepted May 17, 2012.The authors gratefully acknowledge the financial support from National Plan for Science, Technology and Innovation (NPST), Saudi Arabia under project no.09-ELE854-02.transform (WT) may be performed just by using adders and subtractors [6].This idea interested many scientists and engineers for hardware realization of the Walsh transform.This method is known as Fast Hadamard Transform (FHT) since it is derived from Hadamard matrices [7]- [10].
In order to further simplify the FHT, two types of structures -Distributed Arithmetic (DA) and Systolic Architechture (SA) have proposed by some wokers [7]- [10].Amira et al proposed an improved structure and claimed better performance on the basis of an elaborate comparative study [11].They also reported the results of power analysis.Later on, Meher and Patra [12] introduced a very simple technique based on combination of unified algorithm [6] and Rademacher functions.This technique is based upon the use of simple 4 points FHTs arranged in such a way that the higher points FHTs (8,16,32) may be obtained easily.The technique was also implemented on FPGAs.Superior results were claimed.
We found that the original Walsh functions, defined in terms of products of Rademacher functions can be used to transform the information into frequency domain faster than FHT and thus leads to speed up the DSP process.Therefore, the original Walsh transform technique based on Walsh functions and Rademacher functions is proposed and the results of hardware realization on FPGAs are presented.Further, we also designed and implemented the Inverse Walsh Transform (IWT) for conversion from frequency domain to time domain.
FPGA based hardware realization has been used to process two digital signals using the Walsh transform and the Inverse Walsh transform techniques.The results of hardware realization like the occupied area and delays have been compared with the results reported by other workers.

II. WALSH TRANSFORMS AND SIGNAL PROCESSING
We preferred definition of Walsh transform based upon derivation of Walsh functions from Rademacher functions which is found to be more appropriate for hardware implementation.The Rademacher functions are defined as follows [1] where   1 , 0  x  and the signum function Sgn(y) is defined by The Walsh functions are defined in terms of product of Rademacher functions as [1]- [3]: .
A signal x(t) of length N may be represented as a Walsh series given by [1] The Walsh coefficients An are evaluated as If the signals h(t), p(t) and q(t) are defined as the summation, subtraction and multiplication of two signals given by: where the function x(t) has the Walsh series expansion as in (5) and g(t) has the following Walsh series expansion then the Walsh expansion of h(t), p(t) and q(t) are given by: where the expansion coefficients of C n , D n and E n are computed as [3], [13], [14]: , , where  refers to dyadic addition (XOR) and the last expression is the well known dyadic convolution.

III. DESIGN OF WALSH AND INVERSE WALSH TRANSFORMS
A set of circuits which perform signal processing of two digital signals is shown in Fig. 1.The design consists of two Walsh transform blocks, a DSP block, and one Inverse Walsh transform block.Input digital signals x(t) and g(t) are passed into the system serially.Similarly, the output signal is also produced in series.Two identical Walsh transform blocks are used in order to speed up the transform process.

IV. WALSH TRANSFORM
The previous Walsh transform methods arrange outputs and inputs in parallel or series.We prefer combinations of this arrangement.Table I shows comparison of the proposed method to the previous methods in term of inputs and outputs arrangement.By passing input data serially will obviously require less numbers of pins when the circuit is implemented.Meanwhile, because of signal processing purposes, the outputs of WT are arranged in parallel.
The basic blocks of the proposed WT circuit, shown in Fig. 2, consists of:  Negative circuit (WI bits) -One no;  Walsh circuit (N-1 order) -One no;  2 to 1 multiplexers (WI bits) -(N-1) nos;  Accumulators (WO bits) -N nos;  Data buffers (WI bits) -N nos;  Output buffers (WO bits) N nos.N input data (samples) X are passed into the circuit serially and they are controlled by Enter signal.Walsh circuit is used to select the suitable data X or -X and pass it through the multiplexers.The outputs of the multiplexers will be accumulated at the accumulators and they will form the output transformed coefficients (A's).Fig. 3 views circuit realization of the proposed Walsh transform for N=4 and input word lengths WI=4 (used to represent input data X).The output transformed coefficients (A's) are represented in 6 bits (output word lengths WO=6) [13].W( 2) In determining coefficients A's, and to avoid floating numbers, factor 1/N in (6) is ignored for the time being.This factor will be added towards the end of the process of Inverse Walsh transform circuit.
Inverse Walsh Transforms The input data of inverse Walsh transform are new coefficients as results processing of Walsh transform coefficients.Thus, the IWT is designed to receive inputs in parallel and produce outputs serially.
The basic blocks of the proposed inverse Walsh transform circuit are shown in Fig. 4. The design consists of:  Negative circuit (WIC bits) -(N-1) nos;  Walsh circuit (N-1 order) -One no;  2 to 1 multiplexers (WIC bits) -(N-1) nos;  Adders (WOC bits) -(N-1) nos;  Data buffer (WIC bits) -(N-1) nos;  Output buffer (WOO bits) -One no.Fig. 5 views circuit realization of the proposed Inverse Walsh transform for N=4 and input data (coefficients) word lengths WIC=6 (used to represent input coefficient C's).It is assumed here that no DSP process has been done before IWT circuit or the system is assumed under signal generation mode.The output data (H) are represented in 4 bits (output word lengths WOO=4).

V. DESIGN OF WORD LENGTHS
The proposed Walsh and inverse Walsh transforms are designed as part of complete system for signal processing.Therefore, in order to achieve efficient hardware utilization, it is necessery to carefully design the word lengths for all steps of processes.
Word lengths to represent the output information of Walsh transform may be chosen directly from equation 6.Consequently, the sufficient word lengths to represent coefficients A (WO) are as follow This number of bits will guarantee that any input data combinations can be accommodated by coefficients A's.
The processed coefficients C, D and E described before have to be represented in different number of bits (word lengths) according to DSP types.It needs special care to determine the right number of bits to represent these coefficients.
For addition and subtraction processes, the word lengths WIC of the processed coefficients C and D are equal.These word lengths are evaluated as follows . 1 For multiplication, the word lengths WIC of the processed coefficients E can be estimated directly from (16) which gives teh following expression However, in order to obtain a more efficient area required, the WIC according to (19) is more carefully analyzed using MATLAB.It was found that, in practice, the processed coefficients E do not require the amount of bits that are suggested by that equation.Instead, the amount of bits given by the following equation is enough For inverse Walsh transform, it is assumed that input coefficients C are represented by WIC bits for transform length N. The maximum number of bits required, WOC may be obtained by ). ( However, again in order to obtain a more efficient resource uilization, the maximum bits required according to (21) and corresponding to Walsh transform is more carefully analyzed by providing all possible input values in MATLAB.It was found that, in practice, the maximum bits for performing IWT do not require the number suggested by (21).Instead, the number given by the following equation is enough .
It should be noted, the input coefficients of IWT have certain patterns and donot take all possible values like the input signal x(t) and g(t).
The output signal h(t) of inverse Walsh transform has to be represented in WOO bits as given in ( ). ( The subtraction factor log 2 (K) is due to ignoring factor of 1/N in performing Walsh transform.Where K depends on the digital signal processing types performed before IWT.For generation, addition and subtraction processes, K=N and for multiplication process, K=N 2 .
Table II summarizes all word lengths required for signal generation, addition, subtraction and multiplication processes of transform lengths N and input word lengths WI.

VI. WALSH CIRCUIT
One of the most useful properties of the Rademacher functions and Walsh functions is that they take only two values +1 and -1.Hence they are ideal for implementation on digital systems.If the ±1 amplitudes of the Walsh functions are converted to a binary logic {0,1} representation with the conversions +1 → "0" and −1 → "1", then multiplication of Rademacher functions is reduced to Exclusive-OR (XOR) or modulo-2 addition operation.
The Walsh circuit realization for transform lengths N=4 and N=8 are shown in Fig. 6.The circuit is designed to produce 2 nd , 3 rd and 4 th orders of Walsh functions based on Hadamard ordering.The 1 st order of Walsh Hadamard functions or any Walsh ordering is always +1 → "0", therefore, it is not necessary to generate it.

VII. ARRANGEMENT OF NET CONNECTIONS
In the Walsh transform circuit of Fig. 3, the accumulator AC 1 is used to accumulate input data which are made to enter into data buffer F 1 .Accumulators are designed to accommodate data up to WO=6 bits.Since the output of data buffers are only 4 bits, net of F 1 (3) is to be connected to three inputs of the accumulator AC 1 as shown in the circuit in Fig. 7.For performing Walsh transform with higher transform lengths such as N=8, the nets have to be connected to 4 accumulator inputs.These connections are based on additional bits require as derived from (17).In case of inverse Walsh transform circuit in Fig. 5, only 4 of 6 nets of adder 3 are connected to the buffer B 0 , nets of A 3 (0) and A 3 (1) are ignored (not used).This net manipulation is due to subtraction factor K=N=4. Fig. 8 shows net connections between adder 3 and buffer B 0 .

VIII. IMPLEMENTATION RESULTS
The implementations are targeted to Virtex chips from Xilinx.Fig. 9 shows Walsh transform of input signal x(t).The coefficients C's and D's are represented in WIC=7 bits based on (18).Meanwhile, the coefficients E's are represented in WIC=11 bits based on (20).Implementation of signal generation is not shown here because it may be obtained just by combining Walsh transform and inverse Walsh transform (see Table II for signal generation).
In order to make a comparison with the results reported by other works, we implemented Walsh transform for N=4, 8, 16 and WI=8 on Xilinx Virtex-E, Virtex-IIP and Virtex-4.The results of comparison are presented in Table IV.

Fig. 2 .
Fig. 2. Design of Walsh transform for transform length of N.

Fig. 4 .
Fig. 4. Design of inverse Walsh transform for transform length of N.

Fig. 10
Fig. 10 views result of inverse Walsh transforms of coefficeients C's.Where C's are addition results of coefficients A's and B's (14).These simulations have been done using ISE Simulator under Windows 7 on intel Core Duo processor (T2050) computer.

TABLE I .
COMPARISON OF INPUT AND OUTPUT ARRANGEMENTS OF WALSHTRANSFORMS CIRCUITS.

TABLE II .
WORD LENGTHS DESIGN FOR TRANSFORM LENGTHS N AND INPUT WORD LENGTHS WI.

Table
III shows the numerical results of all signals and coefficients.These values are represented in signed number format.

TABLE III .
LIST ALL SIGNALS AND COEFFICIENTS.

TABLE IV .
COMPARISON THE PROPOSED WALSH TRANSFORM TO PREVIOUS METHODS FOR VARIOUS XILINX VIRTEX SERIES (WI=8).