Bi-Level Video Codec for Machine Vision Embedded Applications

Wireless Visual Sensor Networks (WVSN) are feasible today due to the advancement in many fields of electronics such as Complementary Metal Oxide Semiconductor (CMOS) cameras, low power electronics, distributed computing and radio transceivers. The energy budget in WVSN is limited due to the small form factor of the Visual Sensor Nodes (VSNs) and the wireless nature of the application. The images captured by VSN contain huge amount of data which leads to high communication energy consumptions. Hence there is a need for designing efficient algorithms which are computationally less complex and provide high compression ratio. The change coding and Region of Interest (ROIs) coding are the options for data reduction of the VSN. But, for higher number of objects in the images, the compression efficiency of both the change coding and ROI coding becomes worse than that of image coding. This paper explores the compression efficiency of the Bi-Level Video Codec (BVC) for several representative machine vision applications. We proposed to implement image coding, change coding and ROI coding at the VSN and to select the smallest bit stream among the three. Results show that the compression performance of the BVC for such applications is always better than that of change coding and ROI coding. DOI: http://dx.doi.org/10.5755/j01.eee.19.8.5401


I. INTRODUCTION
The Wireless Visual Sensor Network (WVSN) is composed of many Visual Sensor Nodes (VSNs).Each VSN consist of an image sensor for capturing images of the field of interest, microcontroller or Field Programmable gate Arrays (FPGA) for image processing, a radio transceiver for transmitting the results and energy resource for providing power to all the other components.The VSN is capable of performing complex image processing algorithms using limited energy resources due to the technological advancement in many fields of electronics such as Complementary Metal Oxide Semiconductor (CMOS) image sensors, wireless transceivers, low power computing platforms/embedded systems and distributed computing.
Many authors in the literature e.g.[1]- [3] have focused on different implementation strategies for performing vision Manuscript received January 27, 2013; accepted May 23, 2013.This research is funded by Higher Education Commission (HEC) Pakistan and Mid Sweden University Sweden.
processing tasks either locally at the VSN or at the server.Some authors considered to capture the image, compress it and send it to the server for further processing e.g.[1].In this case the transmission energy consumption is higher because they considered transmitting the raw compressed images directly to the server.
On the other hand, the proposal from other researchers has been on performing all the vision processing tasks at the VSN and transmitting the final object dimensions to the server.In this case, the transmission energy consumption is low but the processing energy consumption is high because the VSN is performing operations for a longer duration.
An example representing all image processing tasks locally at the VSN is SensEye [2] which is a multi-tier network of heterogeneous VSNs and it aims at low power, low latency detection and wakeup.They implemented a surveillance application using the SensEye and performed advance image processing operations such as object detection, recognition and tracking.
Another such example is presented in [3], where the authors implemented a distributed vision processing system for human pose interpretation on a wireless smart camera network.They extracted critical joints of the subject in the scene in real time by performing local processing at the smart camera.The results achieved by various smart cameras are then transmitted to a server through the wireless channel for the reconstruction of the human pose.
The local processing as well as the wireless communication consumes a huge part of the total energy budget of the VSN.Communicating the results from the VSN without local processing reduces the processing time but the consequence of this is the higher transmission energy consumption due to communicating the huge raw data.In contrast, executing the entire image processing tasks locally at the VSN and transmitting the final results, reduces the transmission energy consumption but its drawback is the higher processing energy consumption because of the increased processing time at the VSN.
The studies on Intelligence Partitioning (IP) between the VSN and the server in [4] and [5] have concluded that selecting a suitable IP strategy has a positive impact on the total energy consumption of the VSN.It is concluded in [6] that compressing the binary image after pre-processing and segmentation is a good strategy for achieving a general architecture for the measurement applications of WVSN.
Figure 1 shows the generalized architecture from [6].Based on this generalized architecture, we explored the compression efficiency of the six bi-level image coding methods and presented the results in [7].The conclusion drawn in [7] was that the JBIG2 [8], the Group4 [9] and Gzip_pack [10] offer better compression performance.Bi-level image compression standards are helpful in reducing the information amount in the segmented images.But the compression efficiency of the compression standards is restricted to the entropy of the used compression algorithm.So, there is a need for the exploration of other efficient compression techniques which can provide further data reduction.
The adjacent frames of the video in some machine vision applications are almost similar.In other words the differences in the neighbouring frames of the video in these applications are very few.We explored the possibility of further data reduction in [11] based on change coding, where we concluded that change coding is better than image coding (the image coding is investigated in [7]) for up to 95% changes in terms of number of objects in the neighbouring frames.We explored the compression efficiency of Region of Interest (ROI) coding for various geometrical shaped objects in the black images and presented the results in [12].The results of ROI coding in [12] showed some improvements in the compression performance compared to the results of change coding in [11].
One trouble with both ROI coding and change coding is the degradation of their compression performance for too many objects in the frames.In [13], we proposed a generalized method which provides better compression efficiency compared to both the ROI coding and change coding but we analysed its performance for statistical images only.In current work, our aim is to explore the performance of Bi-Level Video Codec (BVC) for real life machine vision applications.Our aim is to explore and provide a detailed explanation of the compression efficiency based on BVC for three categories of real life machine vision embedded applications.
Our aim is to show that BVC is applicable to all kinds of applications and by doing this, we will be able to show that BVC provides better performance for applications with too few or too many objects in the frames (The compression performance of BVC will never become worse than image coding for frames with too many objects which was the case for ROI coding in [12] and change coding in [11]).
The remainder of the paper is organized as follows.Section II provides the related work while section III presents an overview of the BVC.Evaluated applications are presented in section IV.The results are presented in section V. Finally, section VI concludes the paper.

II. RELATED WORK
Many methods have been proposed for the compression of scanned textual and medical images [14]- [16].In machine vision applications, the images are quite different from scanned textual and medical images.The compression efficiency of the well known compression methods is investigated in [7] for machine vision applications.It is concluded in [7] that JBIG2, CCITT Group 4 and Gzip_pack provide better compression efficiency compared to other Bi-level image compression methods.
We investigated the possibility of further data reduction based on, change coding and ROI coding in [11], [12] respectively.We concluded in [11] that change coding provides better compression efficiency than image coding.We concluded in [12] that the compression efficiency of ROI coding is better than that of change coding.But both change coding and ROI coding provides better compression efficiency only for applications attributing too few changes in adjacent frames.A generalized method which is effective for all applications is needed.We think that BVC is such a method and will provide better compression efficiency for all applications i.e. applications with too few or too many objects in the change frame.

III. BI-LEVEL VIDEO COMPRESSION
A number of lossless Bi-level image compression algorithms exist and some of them are the Ziv-Lempel algorithms [15], Efficient partitioning into rectangular regions [16], Arithmetic coding [17], CCITT Group 4 [9], JBIG2 [8] etc.Any of these best compression schemes can be used for the analysis of BVC in this paper where we have selected the well known CCITT Group 4 because of its relatively high compression efficiency and low computational complexity based on the analysis in [7].
The architecture for BVC is shown in Fig. 2. Every new frame needs to be saved (overwritten) in memory in the form of alternating black and white runs.We have applied run length encoding for frame storage for efficiency purposes.The change frame is determined by performing the exclusive-OR operation on the pixels (segmented pixels) of the current frame from the camera and the pixels of previous frame from memory.The change frame is given as input to the ROI block for finding the ROIs in it.
The Group 4 is used to compress the original frame, the change frame and the detected ROIs.Smallest of the three compressed bit stream is selected as the output of the BVC.

IV. EVALUATED APPLICATIONS
We have selected three categories of applications for evaluating the performance of our proposed method.These applications represent three categories regarding the transition from black to white pixels in the frames and vice versa.One category represents applications attributing too many objects in the frames such as in Meter Reading applications.The second category represents application having too few objects in the frames e.g.LED light detection (for tracking/localization of robots/objects) and Magnetic particle detection in hydraulic systems etc.The third category is the one where there are too many transitions from black to white pixels and white to black pixels in the change frames due to the illumination noise or movements of the objects such as in Human Detection applications.

A. Applications attributing too many objects
The authors in [1] designed and implemented a VSN for Meter Reading application, where they considered transmitting the raw compressed images.In their implementation, they evaluated various image compression standards.In contrast, we proposed to detect the ROIs i.e. the digits in the images and then transmit those ROIs along with the run length codes to the server.Compare to the results in [1], we achieved much better compression performance, in addition to much reduced processing complexity.The interested readers are referred to details of Meter Reading application in [1] and [18].

B. Applications attributing too few objects
LED light detection has been used for object localization.In both magnetic particle detection and LED light detection, the frames usually contain very few objects.We applied BVC to these applications to analyse its performance for applications attributing too few objects in the frames.The interested readers are referred to the details of node localization using LED Light detection in [18] and [19].

C. Applications with too much transitions in the pixels
We analysed the application which have too many transitions from black to white and white to black pixels in the images.The selected application is the Human Detection, where due to illumination noise and the movement of the object, there are too many transitions in the change frames.

V. EXPERIMENTAL RESULTS
This section is comprised of the results based on applying our proposed method to the three categories of real life machine vision applications.The Libtiff library in [17] is compiled and the execution file is generated for performing Group 4 compression scheme.The change frames and the ROIs in the change frames are determined using Matlab.The Group 4 is used to compress the original frames, the change frames and the detected ROIs and the results are shown in tables, where we discussed the remarkable points.
The compressed file size for the image coding, change coding, ROI coding and BVC is shown in Table I for Human Detection.We analysed two data sets (measurements), one where the human is in vertical position and the other in which the human is lying down on the earth.It must be observed in Table I that the compression efficiency of image coding is better than that of change coding and ROI coding.The reason for this is that there are too many transitions in the change frames of this application.So, in such applications, the output of the image coding is selected for BVC.The Table II shows the results of BVC for Meter Reading application.In Meter Reading, there are no movements involved, so the change frames have very few changes due to which the compression efficiency of change coding is better than image coding.Also in Meter Reading, there are too many objects (digits) in the frames (around 12-15) due to which the compressed file size of ROI coding is larger than that of change coding which must be observed in Table II.So, ROI coding is not a better option for such applications.The reason is that there are very few objects in the change frames in such applications.Hence for such applications, the output of ROI coding is selected as the output of the BVC.

VI. CONCLUSIONS
The processing complexity and the compression efficiency of the compression methods affect the total energy consumption of the Visual Sensor Node (VSN).Optimization in both computation and communication energy consumption of the VSN is required for the energy constrained outdoor embedded applications of Wireless Visual Sensor Networks (WVSN).The compression efficiency of both change coding and Region of Interest (ROI) coding is better than that of image coding for applications involving too few objects in the change frames but it becomes worse than that of image coding if the change frame contains too many objects.In this paper, we analyzed the compression efficiency of the Bi-Level Video Codec (BVC) for various real life machine vision applications.In BVC, we proposed to implement all the three information reduction techniques i.e. image coding, change coding and ROI coding at the VSN and to select the smallest of the three bit stream as the final output.We observed that the compressed file size of the BVC is always smaller or equal to that of change coding, ROI coding and image coding.In contrast, the compression efficiency of both change coding and ROI coding becomes worse than that of image coding for applications attributing frames with too many objects.Thus we conclude that BVC is better than or at least equal to that of change coding, ROI coding and image coding and can be used for all of applications i.e. applications with too many or too few objects in the frames.

TABLE I .
EFFICIENCY OF BVC FOR HUMAN DETECTION.

TABLE II :
EFFICIENCY OF BVC FOR METER READING.

TABLE III .
EFFICIENCY OF BVC FOR LED LIGHT DETECTION.The compressed file size for the image coding, change coding, ROI coding and BVC for LED Light Detection and Magnetic Particle Detection is shown in Table III and Table IV respectively.It must be observed in Table III and Table IV that the compression efficiency of ROI coding is better than that of both image coding and change coding.