Powerful vision processors have existed for some time now, as exemplified by supercomputers and the longstanding academic research on computer vision. What's recently changed are the "low-cost" and "energy-efficient" aspects of vision processing, along with evolutionary (and sometimes revolutionary) accuracy and other improvements in the algorithms running on the processors.
Well-known image classification competitions like the yearly ImageNet Large Scale Visual Recognition Challenge (ILSVRC) focus only on accuracy; the speed, cost and power consumption of the solutions tested isn't factored into the results. In light of the growing importance of cost-effective, energy-efficient real-time vision solutions, professors Yung-Hsiang Lu of Purdue University and Alex Berg of the University of North Carolina have created the Low Power Image Recognition Competition (LPIRC), the first iteration of which took place this past June during the Design Automation Conference (DAC) in San Francisco, California.
Professor Lu presented LPIRC objectives, results and plans at the September 2015 Embedded Vision Alliance Member Meeting. Below you'll find his slides, along with a video of his talk:
LPIRC leverages the same ImageNet image database used in ILSVRC. Unlike ILSVRC, however, LPIRC entries must complete their analysis of the specified image set within 10 minutes. In addition, power consumption is logged during the analysis period. LPIRC scoring takes into account processing time and power consumption, as well as recognition accuracy. Given these more challenging criteria, it’s not surprising that first-round LPIRC entries achieved much lower accuracy compared to state-of-the-art recognition solutions. Professor Lu estimated that the winning 2015 LPIRC entry delivered 8% of the detection and classification accuracy of today's state-of-the-art solutions. Stronger comparative results are anticipated in future years as participants further optimize their algorithms and transition from off-the-shelf laptops, tablets and development boards to tailored hardware platforms.
Yu Wang, Associate Professor at Tsinghua University in Beijing, China, led the winning team (co-sponsored by Huawei) at the 2015 LPIRC. The team developed its solution using an NVIDIA Jetson TK1 development board, based on the NVIDIA Tegra K1 application processor. Read on for our recent interview with Professor Wang about his team’s winning LPIRC entry.
What type of algorithm did you use?
We used Fast R-CNN, a state-of-the-art algorithm. "R" stands for region and "CNN" is short for convolutional neural network. The R-CNN algorithm will first extract several image region proposals, where an object may exist. Then a CNN will classify these region proposals as background or containing objects in known categories. We use the fast variant of R-CNN to balance accuracy and energy.
How did you optimize your software for the unique challenges of the LPIRC?
The region proposal algorithm runs on the CPU, while the CNN and classifier algorithms use the embedded GPU. In this way, a two-stage pipeline takes full advantage of the computing resources. Specifically, the first stage of the pipeline downloads images from the competition "referee," opens the image file, and extracts region proposals using the ARM CPU cores. The second stage performs the CNN using the GPU and then uploads the results back to the referee using the CPU cores. The two stages are executed by two processes connected via a shared queue. The length of the queue was carefully configured in order to balance the speed of the two stages.
Did you obtain assistance from the processor chip/board/system suppliers, the operating system/other software suppliers, and/or others as you developed your winning LPIRC entry?
First, we want to express our appreciation to the author of Fast R-CNN, who gave us insight into the whole software framework. Second, we appreciate the open-source CNN model, which is a good starting point to fine-tune our algorithm. Third, we appreciate the author of Caffe, who gave us an easy-to-use platform to set up the algorithm framework.
We proposed our final parallelized version with the assistance of one of our lab mates who offered valuable suggestions on an efficient parallel implementation. Because we implemented our solutions with high-level abstractions of the available hardware resources, we were seldom bothered with platform-specific issues.
What key learnings did you obtain from this project, and how do you plan to apply these learnings to further optimizing next year's and future years' LPIRC entries?
This year, our solution was developed under the Fast R-CNN framework, which made the region proposal and CNN modules somewhat decoupled. We trained and evaluated the two modules separately. From our experience, we learned that region proposal methods are currently the performance bottleneck for the entire pipeline, especially with Fast R-CNN.
Therefore, for next year's LPIRC we intend to build an "end-to-end" framework. By "end-to-end," we mean a method to both train and evaluate using only a single module. A recent proposed method named Faster R-CNN already realizes this goal. I think it is a good reference for a better detection system for next year’s LPIRC.
Another thing our team hopes to develop is an FPGA-based version of the hardware, since FPGAs have been demonstrated to be more energy efficient than CPUs and GPUs. Perhaps we can demonstrate a FPGA design in next year's competition.
What else would you like to share with our readers?
Currently, the research on deep learning is very active. A large number of available tools (e.g., Caffe, Theano, Torch) greatly assist various research and development activities, as do the many impressive demos and published papers. However, a huge gap still exists between impressive deep learning papers and impressive deep learning products, not only because customer demand is currently unclear but also because real-life systems have to meet the requirements of high energy efficiency and real-time speed.
Everyone can carry a mobile phone, but not a GPU cluster, right? And if we choose to send the entirety of the workload to a cloud server, what happens if the network goes down? Our group therefore focuses more on embedded deep learning system designs and we are trying to define the respective roles that the cloud server and the embedded client will have in the entire workflow. How they will cooperate is a very interesting topic.
Finally, if there are more vision processor vendors who could provide platforms for this competition, that would be great!
Next year's LPIRC is scheduled to take place in June 2016, once again in conjunction with DAC. Planned enhancements include the analysis of images live-captured by video cameras, versus this year's use of comparatively pristine digital image files, which will compel competition participants' algorithms to comprehend the real-life effects of various optical and environmental distortions. For more information on the 2016 LPIRC, please contact Professor Lu via email.