Acceleration of Digital Stochastic Measurement Simulation Based on Concurrent Programming

A/D conversion methods improved through stochastic signal superposition, along with oversampling techniques present significant research direction in the area of signal processing and measurement. Concerning that accuracy of those methods rises with length of measurement interval, i.e. integration time; it turns them appropriate for calculation / measurement of the orthogonal transformations. Simulation and validation of above mentioned digital stochastic methods, requires significant computing resource allocation. Long measurement intervals assigned for processing of numerous arithmetic operations over oversampled input signals presents the most demanding computing requirements. In this paper, a novel digital stochastic measurement simulation approach is presented and validated. Simulation approach is based on Concurrent Programming technique. General orthogonal transformations are analysed through the stochastic measurement technique. As a reference test case Discrete Fourier Transform is calculated over several periodic input signals converted by the stochastic A/D converter. Time required for a simulation test case accomplishment is analysed as a main performance metric. Final results have proven that Concurrent Programming technique improves simulation speed, without other consequences on measurement performance. DOI:  http://dx.doi.org/10.5755/j01.eie.24.6.22284

Discrete Fourier Transform (DFT) In (6), k is the particular sample of Hartley Transform, yn is particular signal sample, n is counter variable and N is overall number of samples that represents signal.If sampling frequency is at least one order of magnitude higher than the ELEKTRONIKA IR ELEKTROTECHNIKA, ISSN 1392-1215, VOL. 24, NO. 6, 2018 frequency of measured signal, we are talking about oversampling method.Throughout design of acquisition system, it is possible to match A/D converter resolution, sampling frequency and data processing speed.In case of orthogonal transforms, there is a need to find average value of integral sum of signal product with corresponding memorized base function for a given coefficient.Basic advantage of the oversampling measurement is that measurement precision increases with the square root of the number of samples.
The rest of the paper is organized as follows.Related work is presented in Section II.Section III describes stochastic processor of the orthogonal transformation.Section IV exposes theory of operation.Section V presents simulation results.Section VI is discussion, followed by the conclusion in Section VII.

II. RELATED WORK
A/D conversion methods enhanced with stochastic signals superposition are well known in the domain of signal processing and measurement.
In general, those methods are derivatives of the oversampling techniques, like for example Delta-Sigma A/D conversion [7] with main difference that stochastic approach excludes feedback loop, which is inevitable in Delta-Sigma A/D conversion.Feedback increases conversion speed and efficiently suppresses quantization noise.However, if the measured input signal is noisy, i.e. if the noise floor is too high in the band of interest, efficiency of the feedback loop drops [8].
It is not easy to determine when significant scientific interest gets focused on A/D conversion systems based on superimposed uniform noise, but the work of Schuchman [9] could be highlighted as an essential from the system level perspective, as the author analyses effects on the generic sinusoidal input signal.Wagdy [10] gives exact fuctional relation between Probability Density Funciton (PDF) and variance of the quantization error if the sinusoidal signal is applied to the uniform quantizer: where () e fe is PDF of the quantization noise e , 0 J is Bessel function of order zero, A is the amplitude of the sinusoidal input signal.
Normalization of the quantization error squared is expressed as Such result matches the most of the theoretical approximations in which uniform distribution of the quantization error is assumed.
If the presented concept is utilized towards True Root Mean Square -True RMS measurement system implementation [11], result brings high performance regarding method robustness and noise immunity.Further research has brought further generalization of this concept applied on high resolution sampling A/D converters [12].Practical implementation of prototyping instruments, e.g.harmonic analysers is published in [13] and [14].
Recent results in domain of power measurement [15] expose high accuracy reached with only dual bit A/D converter structure improved with offset error suppression technique.Similar technique is also extended to frequency measurement system [16] where noise immunity creates valuable advantage.
Especially interesting research direction is targeted on biomedical area, where low signal strength, within noisy environment presents demanding measurement task.Typical problems, successfully solved with presented technique are in the domain of electrophysiological monitoring, e.g.Electroencephalography -EEG [17], [18].

III. STOCHASTIC PROCESSOR OF ORTHOGONAL TRANSFORMS
Stochastic additive A/D converter with two noise generators [6] (abbreviated SAADC 2G) is oversampling measurement method.With this method, we digitally measure average value of the integral of the two analog input signals product.Let's denote these signals by y1(t) and y2(t).Block scheme of SAADC 2G converter is given in Fig. 1.
As seen in Fig. 1, it is necessary to perform analog addition of noise sources h1(t) and h2(t) to measured signals y1(t) and y2(t) respectively.Constraint for operation of SAADC 2G is that noise signals are mutually uncorrelated and that distribution of its amplitudes is uniform within the range of ±i/2, i = 1, 2. i are steps of A/D1 and A/D2 from Fig. 1 respectively.If these conditions are fulfilled, numerical accumulator from Fig. 1 1 .

T y t y t dt T
   (10) Upper bound for its absolute measurement error squared is Block B in Fig. 1 can be replaced with memory block.This memory block holds dithered samples of basis functions.Such instrument is called stochastic processor of orthogonal transformations (SPOT) [6] and is shown ELEKTRONIKA IR ELEKTROTECHNIKA, ISSN 1392-1215, VOL. 24, NO. 6, 2018 schematically in Fig. 2. It is shown in [6] that optimal resolution of the stored samples is exactly 2 bits higher than resolutoin of the input A/D converter.In that case, measured signal waveform shape does not influence the measurement accuracy.
If this constraint is satisfied, measurement uncertainty squared is where, R is input voltage range in A/D converter, N is overall number of samples in the measurement result and 1 is step of A/D converter.Equations ( 10)-( 12) are explained in detail in [6] and deeper involvement in metrological part of the problem is out of scope of this paper.
If we take resolution of stored samples of 8 bits, optimal resolution of A/D converter is 6 bits.Input voltage range in A/D converter is R = 2.5 V. Sampling frequency is 1 MHz (20,000 samples per period for 50 Hz mains).Overall duration of the measurement is 2 s (100 periods of mains).With given measurement parameters, upper bound limit for absolute measurement error is  

IV. THEORY OF OPERATION
In simulation, each signal consists of 50 harmonic components.Samples are created from 50 sine and 50 cosine signals with corresponding frequencies.Also, there is a need to calculate noise samples that are with uniform distribution of the amplitudes and with infinite pattern length (infinite period).For that purpose, lagged Fibonacci generator [19] was used.Lagged Fibonacci generator takes another 20 arithmetic operations per sample.According to simulation parameters given, it takes 20,000 samples per one period of mains (50 Hz) and for 2 s of measurements it takes 100 periods of mains.Finally, for reliable statistical result, 50 measurements per one coefficient must be taken into account.Overall, for one coefficient it takes i.e. 210 11 arithmetic operations.In (16), NS is overall number of arithmetic operations that must be calculated to obtain all signal spectra.NC is number of arithmetic operations for one sine or cosine DFT coefficient component.
It is desirable to divide such large number of arithmetic operations among several logical processors inside one physical processor.Utilizing .NET framework and C# programming language, there are three mechanisms for parallel processing SPOT operations.In all of these cases, multithreading can be achieved by putting complete code for one sine or cosine component, including 50 measurement loops into single calculus method.This method has to take information about base function (sine or cosine) as well as harmonic order.The simplest way to pass this data to the method is to send single integer parameter.The calculation method then analyses this integer parameter taken, and if it is even, sine component will be calculated, otherwise if it is odd, cosine component will be calculated.Harmonic order can by calculated from the very same parameter value according to (17) and (18).If parameter P is even number, harmonic order H is calculated as 1, 2 P H  (17) and if parameter P is odd number, order of harmonic H is 1 , 2 where P is the value of the parameter transferred to calculus method, while H is harmonic order.For example, if number zero is transferred, method will simulate measurement of the first sine harmonic.If number one is transferred method will simulate first cosine harmonic.
There are generally several ways to spread execution of a calculating code throughout the logical processors in a single and/or multiple processor cores.Each of these techniques has some advantages and drawbacks that will be discussed in detail in following subchapters A, B and C.

A. Usage of Threads
Threads [21] are independent code sequences that are executed at the same time together with other code sequences on different logical processors of the same or more physical processors.In this way, parallel execution of ELEKTRONIKA IR ELEKTROTECHNIKA, ISSN 1392-1215, VOL. 24, NO. 6, 2018 the same code with utilization of different parameters (choice of sine or cosine functions and order of harmonics that is taken) does real parallel processing.However, threads are specific objects each of which has its own program counter and stack.The size of a thread in .NET framework is 4 Mb on a 32 bit operating system (OS), or 8 Mb on a 64 bit OS.In multithreading, several threads are part of the same virtual address space and they share code, files and data in both virtual and physical mapping.So, each thread has its own program counter and stack, and shared files, data and code, as it is shown in Fig. 3.

Stack
Regs.General problem with threads is that threads are "time hungry" objects and operating system (OS) spends significant time during its creation.Second problem is that once started thread can't be started again (unless Thread Pool is used), but still it occupies memory.

B. Usage of ForEach Method of Parallel Class
Inside System.Threading.Tasks namespace of .NET framework, there is a class "Parallel" that allows parallel execution of repeating loop.This is a static class (class that can't instantiate an object of it, but methods of such class are accessible directly via class name), that has "ForEach" method.This method has purpose to start one other method several times with different parameters passed to it in parallel fashion.In case that CPU contains several logical processors, execution of the code is much faster.Opposite to sequential loops execution, in use of "ForEach" method, order of execution can't be guaranteed.For SPOT simulation presented in this paper, order of each coefficient measurement simulation has no influence on overall result, so "ForEach" method is applicable for this purpose.In "ForEach" method programmer can't define level of parallelism.Instead, Run-time environment executes steps of the "ForEach" method in amount that is possible to apply in given moment, as it is shown in Fig. 4.Call of "ForEach" method is made in such a way that in single call, pointer to array with all parameters that has to be processed is sent together with delegate to a method that will be executed in parallel fashion.In particular simulation, array of parameters is array of integers that has values form 1 to 100.

C. Usage of Thread Pooling
Main advantage of Thread Pool concept [22] is that multithreading environment is created and instead of creating new threads over and over again (which is quite time consuming), once formed pool of threads processing new request over and over again.Schematically it is shown in Fig. 5.In Fig. 5 we can see N requests and M threads in pool, where M < N. Creating new threads burdens CPU and if we have, like in the given simulation 100 requests that has to be fulfilled while each request assumes 2×10 9 arithmetic operations (15), time savings can be significant.The only limitation in Thread Pool usage is that all threads in the pool must be with the same priority and order of thread execution can't be controlled as well as number of threads in the pool, since the Run-time environment creates and controls threads.However, these limitations have no influence on SPOT measurement simulation, and benefit is two order of magnitude shorter simulation time compared with other approaches.

V. SIMULATION RESULTS
Simulation verification has been performed over a five different test cases.Each test case assumes digital stochastic measurement simulation of DFT transformation applied over a signal which includes significant higher harmonic components.Let's denote given test case signals with s1, s2, s3, s4 and s5 respectively.
Waveforms of given signals are presented in Fig. 6 to Fig. 10, while precise mathematical formulas are given in ( 19)-( 23) respectively.    Without any loss of generality and in order to avoid too dense graphics, only initial three (first, second and third) and final three (forty-eight, forty-ninth and fiftieth) harmonic errors are presented for each simulated test case.Figure 11-Fig.15 correspond to simulated harmonic measurement error levels for signals s1 to s5 respectively.
In Table I-Table V results of simulations are presented together with simulation time.In tables, parameter sA is averaged standard deviation, obtained as arithmetic mean value of 100 variances square rots.Each of these variance square roots is calculated for corresponding sine or cosine coefficient and for all 50 DFT coefficients makes one hundred results which are Among these 100 results value M presents maximum.Fourth column presents simulation time.

VI. DISCUSSION
According to the results given in tables from I to V, it is obvious that foremost simulation approach is usage of Thread Pool, which accelerates simulation process significantly and generally saves OS resources.
Using Thread Pool in simulations that takes 200 billion arithmetic operations reduces simulation time for two orders of magnitude, compared with simulation on the same CPU without Thread Pool.On CPU with four cores and eight logical processors with 3.3 GHz clock, and PC platform with 16 GB RAM memory; time taken to finish simulation was 2 hours and 15 minutes by using threads, and only 6 minutes by using Thread Pool.Advantage on the system level reveals through the optimal parallel distribution of computing tasks on available logical processors.Threads usage generally offers execution parallelism, while optimal solution within multithreading environment brings Pool.
For simulation of SPOT operations on five proposed test case signals, simulation results are completely inside theoretically predicted boundaries, but simulation time on the same CPU is significantly different depending on the thread technique used.

VII. CONCLUSIONS
Simulation of the electronic measurement systems is generally demanding computing task.It could be solved through the usage of state of the art simulation tools, which offers libraries of predefined models and objects, which mainly shortens the time for system modelling, but on the other hand keeps designers away from the physical implementation and as a drawback in many cases extends execution time.
Opposite approach analysed in this paper assumes creation of local library of models and blocks used for a system set-up which could be precisely tuned for accomplishment of the required measurement simulations in a minimal time frame.Such approach requires significant time needed for initial generation of necessary models, libraries and multithreading environment adjustment.Advantage of presented solution is minimization of the CPU resources, on the first place execution time.
Digital Stochastic Measurement technique generally trades oversampling and large number of arithmetic operations for final accuracy.In applications like orthogonal transformations, numerical complexity rises and it makes simulations long lasting tasks.Presented results have proven that concurrent programming technique offers optimal computing method which successfully bridges complex numerical requirements and shortens execution time.
For implementation of demanding simulation tasks, like those on stochastic implementation of orthogonal transformations, Thread Pool should be considered as regular practice.
From practical engineering point of view, quite promising research direction assumes simulations running on virtual scalable machine in the cloud, where CPU can be defined with a lot more logical processors.This final proposal presents main future study path.
Applying IEEE standard 1547-2003 [20], for complete analysis of the signal in distributive network, 50 harmonics has to be measured, where every harmonic has it's cosine and sine component.Further, it means 11

Fig. 3 .
Fig. 3. Block scheme of memory usage by threads in multithreading.

TABLE II .
SIMULATION RESULTS FOR SIGNAL S2.

TABLE III .
SIMULATION RESULTS FOR SIGNAL S3.

TABLE IV .
SIMULATION RESULTS FOR SIGNAL S4.

TABLE V .
SIMULATION RESULTS FOR SIGNAL S5.