Energy Consumption Estimation for Embedded Applications

1Abstract—Energy consumption, indeed, represents one of the essential properties of embedded applications, especially for those devices whose autonomy depends on battery life. The lack of accurate and suitable methodology for energy consumption estimation for embedded applications based on ultra-low power heterogeneous multicore DSP platforms inspired a solution that will be presented in this paper. The solution has been developed as a plugin for the Eclipse based MIDE (Multicore Integrated Development Environment), in order to facilitate production of energy efficient firmware solutions. Evaluation of energy loss has been calculated using instruction-level power analysis, virtual platform, debug information, and diverse input loads. The primary goal was to obtain a precise model of energy consumption that will establish a direct link between program solutions and the amount of energy required for their execution, whilst processing different input loads. Estimation has been validated against empirical data, measured on a real DSP platform. Results show that very high accuracy has been reached.

1 Abstract-Energy consumption, indeed, represents one of the essential properties of embedded applications, especially for those devices whose autonomy depends on battery life.The lack of accurate and suitable methodology for energy consumption estimation for embedded applications based on ultra-low power heterogeneous multicore DSP platforms inspired a solution that will be presented in this paper.The solution has been developed as a plugin for the Eclipse based MIDE (Multicore Integrated Development Environment), in order to facilitate production of energy efficient firmware solutions.Evaluation of energy loss has been calculated using instruction-level power analysis, virtual platform, debug information, and diverse input loads.The primary goal was to obtain a precise model of energy consumption that will establish a direct link between program solutions and the amount of energy required for their execution, whilst processing different input loads.Estimation has been validated against empirical data, measured on a real DSP platform.Results show that very high accuracy has been reached.Index Terms-Embedded software; energy consumption; performance evaluation; software metrics.

I. INTRODUCTION
Energy consumption has always been promoted as one of the most important aspects of engineering in general, since it has immense influence on designing process.Accurate estimation and evaluation of energy consumption, therefore, could facilitate development and production of energy efficient solutions.The scope of this paper is focused on the calculus for power analysis and energy cost of software solutions whilst running on embedded DSP platforms.Increased ongoing expansion of embedded devices implies the necessity of a research in this area.
Basically, there are two different approaches to energy consumption estimation that can be applied to embedded devices [1]: Physical measurements on real hardware, and simulation based modelling.It was established in [2], [3] that the first approach, which includes measurements of the current drawn by the processor, gives the most accurate information about the power cost.Taking that into consideration, as well as the lack of a hardware simulation model, such as SPICE or similar, for the target DSP platform, has encouraged us to choose the first strategy, and to conduct measurements on our own.
There are five distinct levels of power management that can be applied to computer systems [4]: application level, compiler level, operation system level, architecture level, and circuit level.The case study presented in this paper deals with power management mostly on application level [5].The main goal was to provide an efficient and accurate tool for power analysis that will guide a firmware developer through the application development process in order to enhance software energy efficiency.The solution presented in this paper extends significantly the simple model presented in [6].This paper proposes a new instructionlevel, cycle accurate, energy consumption estimation model that was applied and tested on a multicore, ultra-low power, heterogeneous DSP platform.Besides instruction energy costs, the model also takes into account energy costs related to DSP platform peripherals.The estimation model is universal and applicable to any DSP platform; only target specific measurements should be performed using the methodology described in this paper.
The entire solution has been developed as a plugin for the Eclipse RCP [7] based Multicore IDE presented in [6]; therefore, it is easily transferrable to any other RCP based IDE, thus contributing to the universality of the solution.
Sections below are organized in the following order.Section II depicts related papers.Section III presents mathematical model of energy consumption estimation.Section IV provides insight into the proposed measurement methodology.Section V contains a detailed description of experiments, and the results, that have been conducted as a part of validation and verification process.Section VI concludes the paper.

II. RELATED WORK
As diverse as embedded platforms and the appliance domains are, there is a wide variety of solutions dealing with energy consumption estimation, but an absence of universal solutions that can be applied to the entire set.The model proposed by this solution aims to make a contribution towards that direction.
The models proposed in [1], [8] require hardware simulation models that are, in most cases, unavailable (like for the target platform presented in this paper), and less accurate than the physically measured data.
Methodologies presented by [2], [9], [10] take into account only the energy that is consumed by the processor while executing instructions, as well as inter-instructions effects, but discussion about other energy dependent modules, such as peripherals, is omitted.
In [3], the spotlight was placed on the power estimator that could be used for making architectural choices in the design process, so the intent was to achieve power management on the architectural level, unlike the solution described in this paper, which provides a framework for power analysis on application level.
Experimental results obtained in [11]- [13] have shown that current consumption of the entire instruction set, for selected target platforms, is quite uniform, so energy consumption models were adjusted according to that observation.However, instruction power profiling performed on the DSP platform presented in this paper has shown that current consumption varies from 120 µA for the NOP instruction, up to 387.5 µA for the LOAD X[addr], X0 instruction, which lead us to the conclusion that models represented in [11]- [13] are not applicable to this platform.
Estimation model presented by [14] excluded detailed inter-instruction cost estimations and, instead of that, used a Hamming distance and weight of the instructions, since the results have shown that inter-instruction energy cost measured on that target platform is around 5 %.We have used a different approach, because, according to our measurements, inter-instruction effect has a much higher influence on the consumption for this target platform, in some cases over 40 % of the instruction's base energy cost [2].
None of the abovementioned models include energy consumption estimation of a multicore system, whereas the estimation model presented here does.Multicore energy consumption estimation model provided by [15] calculates consumption based on cores' frequency and utilization, unlike the solution presented here, which evaluates each core average power based on instruction execution at the current cycle and the cycle before that one.Also, it is worth noticing that this estimation model considers heterogeneous multicore platform, unlike the model presented in [15].
Since this paper represents an enhanced and extended version of the solution presented in [6], many significant improvements have been made on the existing platform.In the previous version of energy consumption estimation, only core activity and average DSP consumption per core were used as parameters in power analysis, whereas the enhanced version, presented in this paper, calculates power at instruction level, whilst achieving finer granulation and significantly higher accuracy.

III. ENERGY CONSUMPTION ESTIMATION MODEL
The embedded platform, for which this model has been derived, was developed to enhance performance in a hearing aid.Nevertheless, the model is universal and applicable to any embedded platform.The DSP platform that was used for this research contains five heterogeneous cores: two DSPs used mostly for numerical accelerations and three general purpose DSP cores.One general purpose DSP is a micro-controller which synchronizes and controls the whole system.All cores were developed with emphasis on an ultralow power design.Bearing in mind that one of the most important characteristics of hearing aids, and all other embedded devices that depend on batteries, is autonomy, it could be concluded that any energy savings will enhance product competitiveness.Therefore, besides the energy savings that were achieved in hardware design, it should also be considered what energy savings could be accomplished in a software solution.
Besides DSPs, the target platform also contains several other peripherals, such as: analog, system, input/output, local processor unit, utility, and wireless.In order to obtain an accurate energy consumption model, besides the instruction-level energy profiling, it is necessary to empirically measure the average energy consumption of all counted peripherals, and to incorporate those values into the estimation model, unlike the estimation models presented in [2], [9], [10], [14].
The main issue that inspired the research was: is it possible to conduct software energy efficiency profiling depending on various input signals, and to provide an answer to the question how much time will pass until the hearing aid battery is discharged when processing a specific input signal.Figure 1 depicts software energy consumption profiling at various input loads.The abscissa represents discrete time in resolution of cycles and instructions being executed within it.Execution flow, as well as cores and peripherals activity, have been obtained using virtual platform, profiler tool [16], and debug information.
The ordinate shows average power consumption, which is calculated based on empirical data.With this kind of representation, critical points, power consumption peaks, are easy to catch, as well as parts of the source code that consume the greatest amount of energy.
For example, it was measured that average power consumption during FFT processing was Ps = 3.4 mW, at DSP platform voltage Up = 1.25 V, and battery capacity Kb = 310 mAh.Then, evaluation of battery life time -Tb can be calculated with the following equations: This implies that the battery will run out of charge after 113.97 hours of continuous FFT processing.
Further on, a mathematical model, which has been developed in order to facilitate this kind of analysis, is presented.
It is obvious from (2) that the only parameter that should be analysed furthermore is average power consumption Ps.This parameter can be calculated as an arithmetic mean of cycle's average power consumptions Pc In (3), n represents the number of executed cycles, and Pc(k) denotes the average power consumption of the k-th cycle.The average power consumption of a cycle remains to be defined where Pdc is power consumption of the static component, and it is not dependent on the contents of a program memory, but represents overall power consumption of a DSP platform, whilst all cores and peripherals are in the idle state.Pf(l) denotes average power consumption of the l-th peripheral.Pb(m) stands for base power cost [2], [17] of the m-th core instruction.Po(m) denotes power consumption overhead of the m-th core instruction, which is generated by the inter-instruction effect [2], [18], [14] caused by a circuit state overhead [19].Inter-instruction effect appears only when two different adjacent instructions are executed successively; therefore it is worth noticing that Po(m) element participates in the calculus only in the case when instruction from previous cycle differs from the current one.Stalls and cache misses that were included into the energy consumption model presented in [19] are omitted from this model, since the DSP platform presented here does not support cache and stalls by hardware design.Parameter p from the first sum denotes the number of active peripherals.N from the second sum represents the number of active cores.It should be noticed that power consumption of cores is considered as the sum of power consumptions of each individual core, and the effect presented in [20], power consumption overhead due to the core's shared resources, was not included in this model, since hardware design does not support such mechanism.By combining (3) and ( 4), the following derivation is performed: where pk and Nk denote the number of active peripherals and the number of active cores, respectively, at the k-th cycle.
IV. PROPOSED MEASUREMENT METHODOLOGY From ( 6), we identified four different aspects of power consumption that should be measured: DSP platform static component -Pdc, power consumption of peripherals -Pf, instruction's base power cost -Pb, and instruction's overhead power cost -Po.All measurements were performed using Fluke True-RMS Industrial Logging Multimeter.

A. DSP Platform Static Component
Static component has been obtained by measuring the instantaneous current, drawn by the DSP platform, whilst DSP cores and peripherals run in idle state 716µA 1.25V 0 [mW], .895 where Im denotes measured current, and Up is DSP platform voltage.

B. Peripheral's Power Cost
Average power consumption of a peripheral is measured in the similar manner as the static component, but the crucial difference is configuration of the DSP platform, as well as the test image that should be created for each individual peripheral.Then, the current drawn by the peripheral (If) can be measured as the difference between overall current consumption (Im) and the sum of the static component (Idc) and the current drawn by the core (Icore) before the peripheral has received the clock: ( ), , where Pf denotes peripheral's average power consumption.

C. Base Power Cost
Instructions base costs has been determined using the well-established methodologies, described in [2], [21].The methodology is intuitive.The base current, drawn by the instruction execution, could be measured when target instruction is executed simultaneously.That could be achieved by executing target instruction in an infinite loop.In order to minimize the influence of a jump instruction from the loop, it is recommended to put a number of instances of target instruction in the loop.
Figure 2 represents the source code that has been used for measurement of the instruction's SUB x1, B0, B0 base cost.This method differs from the one presented in [17], where the loop contains only one instance of the target instruction and the number of instances of reference instruction (NOP) Equation ( 10) represents evaluation of the base cost.The difference between the measured current and the static component represents the base current.Values in (10) were obtained after launching the source code from Fig. 2. Since the processor used in this research is a three-stage pipelined DSP, then the base cost obtained in (10) is related to all three stages of execution: fetch, decode, and execute.Table I contains base costs of eleven different instructions, measured using the described methodology.The quantum of energy (Eb) that is consumed during an instruction execution can be calculated using the following equation where N denotes the number of cycles required for instruction execution, and Tc represents cycle time (f = 10 MHz).So, it can be noticed that instruction SUB x1, B0, B0 consumes a minimum of 0.159 nWs every time it is executed.

D. Overhead Power Cost
Overhead cost occurs as a consequence of execution of two different adjacent instructions.This is caused by switching activity in the circuit, which is also known as a circuit state effect [1], [2], [18], [19], [21].There are several approaches in modelling an inter-instructions energy effect, and some of those are explained in [1] in detail.Methodology for measurement of instructions overhead costs proposed in [2], [18], [19], [21] requires measurements between all individual instructions.Taking into account that target DSP, considered here, contains a set of approximately one hundred instructions, leads us to the conclusion that a huge amount of tests and measurements (12) needs to be conducted to assure overarching information about inter- In (12), n denotes the size of instruction's set, and r is the number of selections.This observation affected the research to find appropriate approximation that will reduce the amount of tests required for measurements of overhead costs.The main idea was to find one instruction that will serve as a reference point for measurements.The candidate that stood out immediately was a NOP instruction.The new paradigm has been adopted: when the NOP instruction is executed with the target instruction, the whole inter-instruction cost could be assigned to the target instruction.The following derivation of (6) explains this approximation: ,     where Io(inst) denotes the current consumed by the circuit state effect that can be assigned to a target instruction, Is is an overall current measured during the experiment, Idc current of the static component, Ib(nop) represents the base cost of NOP instruction, and Ib(inst) denotes the base cost of the target instruction.
Figure 3 represents the source code that has been used for measurement of the instruction's (SUB x1, B0, B0) overhead cost.Derivation presented in ( 13) and ( 14) has been made based on the facts that only one core was launching the source code from Figure 3 and peripherals were in idle state during the experiment.Approximation based on the adopted paradigm is introduced in (15).The described approximation decreases the required set of measurements O(N 2 ) space to O(N) space, where N denotes the size of instructions set, which implies great savings in time and resources.

V. EXPERIMENTAL RESULTS AND VALIDATION
Validation of the proposed model has been performed using two benchmarks, as in [14].In Benchmark 1, the current drawn by the processor was measured while executing separate blocks of instructions from Table I., where each individual block contains only instructions of the same type, thousand instances, in order to cancel interinstruction effect.Also, peripherals were in idle state.Bearing that in mind, the fact that only one core was running during the experiment, and the value calculated in ( 7), (6) derives into During the experiment, the measured current was Im = 902 µA, which implies m 902μA 1.25V 1127.5 [μW].
From ( 20) and ( 21) accuracy ensues   Benchmark 2 has been performed on an interleaved set of instructions from Table III.
In this experiment, the same as in the first one, peripherals were in idle state and only one core was active, but since instructions were interleaved, inter-instructions influence was not ignored, so ( 6) that derives into   The current that was measured during the experiment was Im = 970 µA, which implies m 970μA 1.25V 1212.5 [μW], From the formula used in (22), and the results obtained in (23) and ( 24), it can be calculated that the accuracy for the interleaved set of instructions is 99.96 %.
Bearing in mind the accuracy that has been achieved, it seems that selection of the NOP instruction was a good choice at this point of the research.If estimation accuracy decreases when measurements extend to the entire instruction set, then some alternatives should be considered as reference points.These results confirm the proposed estimation model and measurement methodology, but it should be noticed that new benchmarks should be developed that will aim to challenge the proposed model against diverse applications.The results certainly seem promising and they establish a solid ground for further research.

VI. CONCLUSIONS
Estimation of energy consumption may have a significant impact on the final version of the software solution, as it would provide an accessible tool for power analysis of the source code.This observation will be further discussed in impending research, when measurements extend to entire instructions set.At this point, we are not sure about overall impact on firmware development, since this is in early stages of the research, but that is certainly one of our goals.The connection between energy consumption and the source code has been accomplished at the instructional level, so each cycle would be tagged with the amount of energy that it consumes and instructions executed within it.This has been achieved using the proposed mathematical model (4), measurement methodology and the profiler tool [16].A vivid representation of the energy consumed by the source code (Fig. 1.) may provide a straightforward input for reorganisation of the source code at the critical points, power consumption peaks, in order to achieve energy savings.Also, instructions' power profiles will provide an insight into the energy costs, which may influence instructions selection during application development, since the proposed measurement methodology and estimation model provide a high level of accuracy (over 99 %).
In Benchmark 1, measurements were conducted whilst executing separate blocks of instructions, and it was measured that 1127.5 µW was consumed during the experiment.Also, power consumption estimation was calculated using Table I and mathematical model (20), and it amounts to 1127.8 µW (20).That implies that estimation accuracy in Benchmark 1 was 99.97 % (22).In Benchmark 2, measurements were performed during execution of interleaved set of instructions, and it was measured that this setup consumes 1212.5 µW.Using the estimation model (23) and Table III, power consumption estimation value was calculated to be 1212.1 µW.These numbers lead us to the conclusion that the achieved estimation accuracy in Benchmark 2 was 99.96 %.It should be emphasized that the achieved accuracy was calculated for 10 % of the entire instruction set.Future work could be focused on expansion of measurements to the entire instruction set.Afterwards, estimation should be challenged against various applications in order to obtain more reliable data about accuracy.Also, application level optimization, presented in here, could be extended to a compiler level power management.The obtained empirical data could be used as an input during compiler's instructions selection and scheduling, similarly to what was proposed in [21], [22].
The proposed model, as well as the measurement methodology, are rather general and it seems that they could be applied to other embedded processors.

Fig. 1 .
Fig. 1.Trend of energy consumption at various input loads.

Fig. 2 .
Fig. 2. The source code for measurement of the base cost of instruction SUB x1, B0, B0.

Fig. 4 .
Fig. 4. Trend of current with and without overhead cost.

TABLE II .
OVERHEAD COSTS OF INSTRUCTIONS.Figure 4 depicts the trend of current consumption with and without overhead cost.The trend is obtained fromTable I (lower line), base cost, and Table II (upper line), overhead cost.

TABLE III .
BASE COSTS + OVERHEADS OF INSTRUCTIONS