Acceleration Techniques for Analysis of Microstrip Structures

This paper discusses the acceleration techniques for analysis of microstrip structures. Accurate calculation of parameters of such structures with numerical techniques requires the solution of dense matrix equations involving thousands of unknowns. Solution of this large problem takes long time. In this paper we present three techniques for such computations acceleration: parallel algorithm implemented in computer cluster, sparse bound-matrix technique, and graphic processing unit in conjunction with CUDA technology. The execution time and speed-up of proposed techniques are evaluated through comparing of different numbers of processors and unknowns. The results indicate that all presented techniques can significantly reduce computation time. DOI: http://dx.doi.org/10.5755/j01.eee.20.5.7109


I. INTRODUCTION
Microstrip devices are widely used in modern microwave systems [1]- [5].Microstrip transmission line, coupled lines, as well as multiconductor lines (Fig. 1) are used as basic elements in the design of such devices as filters [1], couplers [2], antennas [3], delay lines [4], [5], etc.Despite the fact that the microstrip lines have been known and used for more than 50 years, it is necessary to pay much attention to their analysis when new microstrip devices are designed.Most accurately microstrip structures may be analysed by numerical techniques such as: finite difference method (FDM) [6], finite elements method (FEM) [7], method of moments (MoM) [8], finite difference time domain (FDTD) method [9], as well as hybrid methods [10] and simulators [11].
The main drawback of numerical methods is their significant demand of computer resources, one of the main of which is the computation time, which in some cases can reach tens of hours [12].Achievements of computer technology allow different ways to speed up calculation of electromagnetic problems.For example Cui et al. in [13] and Jobava et al. in [14] applying MoM to PC clusters for calculation correspondently of scattering by large 3D objects and currents distribution.Ergul and Gurel in [15] also use computer cluster to solve scattering problems.Angeli et al. in [16] demonstrated the implementation of FDM on 64 processors cluster.Yu et al. in [17] and Geterud et al. in Manuscript received September 30, 2013; accepted November 18, 2013.[12] realized FDTD method on computer clusters.There are also examples of the use of graphic processing unit (GPU) instead of a CPU to solve electromagnetic problems: Potratz et al. in [18] use FEM in conjunction with GPU to calculate scattering parameters of waveguide structures, and Livesey et al. in [19] apply GPU and CUDA technology to accelerate FDTD calculations.Motuk et al. in [20] presented implementing of FDM on a multiprocessor architecture on a FPGA device.Overview of open publications [12]- [20] reveals that computer hardware devices and other computations accelerating techniques are not used to analyse microstrip structures and we will try to do it in this paper.
The paper is organized as follows.In Section II, the parallel calculation of general parameters of microstrip structures using a computer cluster is given.Sparse bandmatrices technique to accelerate calculation of microstrip structures parameters briefly described in Section III.The general principle of organizing calculations using GPU and CUDA technology and its application to the analysis of microstrip structures is presented in Section IV. Conclusions are discussed in Section V.In our previous work [6], we proposed a parallel algorithm for the analysis of coupled microstrip structures, i.e. calculation of dependency of electrical parameters of these structures on their design parameters.
Microstrip structures main electrical parameters -the effective permittivity r eff i c, and characteristic impedanceZ0 i c, for cand -normal waves can be found from corresponding capacities per-unit-length: where c0 is speed of light in vacuum; Ci c, is the i-th microstrip capacity per-unit-length correspondently for c-or is the capacity of the same microstrip when the substrate dielectric constant is changed to r = 1.
According to ( 1) and ( 2) electrical parameters of coupled or the multiconductor microstrip lines for cand -normal waves in the line are calculated two times: first time with dielectric substrate and second, when substrate substituted with air (r = 1).
From that follows that the analysis of coupled microstrip structures can be arranged in a parallel fashion, combining 5 computers in the cluster (Fig. 2).Cyclic calculations in the case of analysis of microstrip structures are necessary to perform when the influence of design parameters of these structures on their electrical parameters is investigated.In this case the master-node (Fig. 2) sends range of possible variations of design parameters and variation steps to slave-nodes.Slave-nodes, operating in a given cycle, calculate capacitances per-unitlength: Slave-node "c-substrate" -Ci c capacitance; Slavenode "-substrate" -Ci ; Slave-node "c-air" - (a)   c i C , and Slave-node "-air" - (a)   π i C capacitance.After slave-nodes finish their calculations, they send the results to the masternode.The masternode sorts the received data and calculates the effective permittivity and impedance according to (1) and ( 2).
It should be noted, that any numerical method and quasi-TEM approach could be used as an analysis method in the proposed parallel algorithm.We have implemented the algorithm in [6] using FDM and solving the problem by iterative technique.Exploring the performance of the proposed parallel analysis algorithm we calculated in [6] the electrical parameters of a multiconductor microstrip line.The analysis area of 100 × 500 unknowns was used.It was found that the parallel algorithm execution time on the 5 computers cluster 3.4 times less than execution time on a single computer.This means that the increase in computing performance exceeds 250 percent.

III. BOUND SPARSE MATRIXES
Solution of partial derivative equations in finite difference method leads to large systems of algebraic equations and computation of these equations is done in two main techniques:  Iterative;  Coupled matrices.
Calculation time using iterative technique depends on the desired accuracy and problem area and can be very long.For calculation speed up a coupled matrices technique can be successfully used.According to this technique finite difference solution is found by composing and solving linear equation system.
According to finite difference method problem area is divided into square nodes mesh.Value of each node depends on mean of other nodes closest to it where φ -electric field potential; i, j -indexes indicating the position of potential in 2D.So remote nodes have not influence on the calculated potential φ(i,j).Therefore analysis of all nodes in the problem area can be found in resolving the equation where [A] is coefficient matrix with many zero elements; [X] is a vector consisting of unknown node values; [B] is a vector consisting of known node values.Unknown nodes vector [X] can be calculated using for example this equation where [A] -1 is the inverse matrix of coefficients.By solving (5), unknown potential vector can be obtained and recomposed to potential distribution matrix -the problem area.Potential distribution can be farther analysed to find the device electrical parameters e.g.: electric charge density, capacity per-unit-length and so on.
Since calculated potential depends only on the neighbouring potentials -coefficient matrix [A] consist mostly of zero elements, which takes significant amount of memory -each "double precision" type value occupies 8 bytes.It is possible to reduce the memory space occupied by the coefficient matrix and to speed up the calculations using sparse matrices.In sparse matrices only non-zero elements are stored in memory.Unknown nodes vector [X] can also be found through various elimination methods (Gaussian, Gaussian and Jordan et al.).
In order to evaluate the speedup of the FDM calculations the coupled microstrip lines will be analysed.Their constructive parameters are as follows: substrate dielectric constant r = 6.0, normalized microstrips width W1/h = W2/h = 0.5, normalized space between microstrips S/h = 0.5.
Electrical parameters and potential distribution calculation speeds by different techniques are represented in Fig 3 .Two electrical parameters where calculated in the process: characteristic impedance Z0 and effective permittivity εeff (Table I).The investigation area chosen square and one side varied from 52 to 122.

IV. GPGPU & CUDA TECHNOLOGY
Analysing of the microstrip devices can be done using a general-purpose computing on graphics processing units (GPGPU).These processors can have up to 512 and even more processor cores (so-called general-purpose streaming multiprocessors) it means they have 100 times more cores than usual general-purpose CPU has.Their advantage is also that it is not additional specialized computing device.All the GPGPU are embedded on all desktop computers and laptops manufactured from a couple of years ago.Also they are extremely fast and efficient to perform operations with real numbers and with a high degree of data parallelism.In this way, computing performance increases many times comparing with a general-purpose CPU.It is becoming increasing prevalent to develop and investigate techniques to allow using of these computing capabilities.
There are two competing GPGPU programming platforms.A patented CUDA technology developed by NVIDIA Company [21], which integrating technology only in company produced GPGPU's.However, with the using of NVIDIA manufactured GPGPU video card the developing programs in CUDA is free.It should be also noted that CUDA technology appeared a little earlier than the second -OpenCL technology [22], so, at this time, it is more developed and designed scientific and engineering solutions specifically for CUDA technology.On the other hand, it is becoming now more pervasive technology -OpenCL.
OpenCL programming technology was created a bit later, who not only supports GPGPU's, but also the general propose CPU and special accelerators, they are used in mobile phones and embedded systems.This technology is completely free, so it can be integrated into any microprocessor or accelerator by any company and any scientist who wishes to build applications.The main problem of this technology, where is no many created or modified mathematical functions library for OpenCL technology yet.Therefore, in order to perform vector and matrix operations, it is needed to self-create the desired function, or settle for a lower calculation speed compared with CUDA technology.
By solving electromagnetic problems the iterative calculations are applied mostly because iterations could reduce the space occupied by variables in main memory.But iterative calculations limits the accuracy of the results, because of a given accuracy level for the iterative calculation.On the other side, the direct linear solvers allow to instantly get the correct result.Downside for direct linear solvers is that usage amount of main memory is significantly greater, what was recently simply impossible.Also iterative calculations are more complicated to split into smaller tasks in order to distribute them to parallel computing systems, than solving a system of linear equations using the direct methods.Undoubtedly solving linear equations also apply iteration calculations, but in this case the system of linear equations with special methods is decomposed into blocks those facilitate the distinction between linear calculations of parallel systems.
In order to evaluate the speedup of the FDM calculations the coupled microstrip lines will be analysed.Their design parameters are the same as described in Section III.
To solve linear equations system two libraries CULA and ViennaCL [23] will be used.Gaussian elimination technique will be used for execution time comparison.These libraries designed to solve linear equations system using dense matrices and sparse matrices, but for sparse matrices created coefficient matrix must be converted into matrix storage format.CULA library is optimized and works only with CUDA technology, ViennaCL can operate with OpenCL technology also.
Curves in Fig. 4 show implemented algorithms execution time with different number of unknowns to find (problem area).It is seen that, comparing execution time of CULA library curve and authors implemented Gaussian elimination technique curve, when 2500 unknowns were found, differ 120 times, and when 14400 unknowns were calculatedthese curves differ, more than 1000 times.Comparing curves corresponded to ViennaCL library and Gaussian elimination technique it is evidence that execution time in both case practically not differs for low number of unknowns -6400 and at 14400 number of unknowns differ only 1.24 times.Such negligible difference between calculations using ViennaCL library and Gaussian elimination technique can be explained by the fact that the larger set of features and hardware support in ViennaCL library case typically come at the cost of lower performance comparing with CUDA based implementations.This is also partly due to the fact that CUDA is tailored to the architecture of NVIDIA products, while OpenCL represents in some sense a reasonable compromise between different many-core architectures.Also one of the reasons is the different focus of ViennaCL -solvers for sparse instead of dense linear algebra.
Calculated electrical parameters of coupled microstrip lines analysed by GPGPU & CUDA technology are presented in Table II.

V. CONCLUSIONS
Accurate calculation of parameters of microstrip structures with numerical techniques requires the solution of dense matrix equations involving thousands of unknowns.Solution of this large problem takes long time.We present three techniques for such computations acceleration: parallel algorithm implemented in computer cluster, sparse boundmatrix technique, and graphic processing unit (GPU) in conjunction with CUDA technology.The execution time and speed-up of proposed techniques are evaluated through comparing of different numbers of processors and unknowns.The results indicate that all presented techniques can significantly reduce computation time: reduction of the parallel algorithm execution time is inversely proportional to the number of computers in the cluster, sparse bound-matrix is capable of hundreds of times to reduce the computation time compared with the iterative technique, GPUs reduce computation time in thousands of times compared with conventional mathematical techniques.

Fig. 2 .
Fig. 2. Organization of a computer cluster for the analysis of coupled microstrip structures.

Fig. 3 .
Fig. 3. Execution times of the implemented algorithms, where A algorithm using coupled sparse matrices technique, B -iterative algorithm and C -algorithm using coupled dense matrices technique.

Figure 3
Figure 3 execution times of different implemented algorithms with different number of unknowns.Comparison was done with implemented code in Matlab and as A curve show sparse matrix implementation vastly reduces calculation time.

TABLE II .
ELECTRICAL PARAMETERS OF COUPLED MICROSTRIP LINES* CALCULATED BY GPGPU & CUDA TECHNOLOGY.Note: Design parameters are the same as in TableI.