The VPU Report: Q4 2016 Issue

fit-1280x720

Introduction

Welcome to the Q4 2016 and inaugural VPU Report. The purpose and intent of this report is to progressively summarise the available hardware solutions for vision processing in a way that identifies their fundamental characteristics and capabilities without overloading the reader with too much technical detail. Engineers may want more but marketers and executives will find enough to keep them abreast of the field along with, hopefully, insightful and though-provoking commentary to help them develop a rounded view of the industry.

The methodology behind the report is to review publicly available documentation, interview company representatives where they are willing to give their comments, and synthesise these into a coherent summary, combining the facts from disparate sources to produce new information for the reader.

Each chapter of the report summarises the vision technology offerings of one company, with a brief company description and a closing commentary that includes sections on market focus and penetration as well as the technologies themselves.

This issue summarises technologies from four companies: Movidius, Intel, CEVA and Inuitive. Before its acquisition by Intel, Movidius was perhaps the highest profile independent fabless chip supplier with its own internally developed IP for vision processing. Intel has technologies that cover lots of ground from desktop and server GPU-based vision acceleration to dedicated computational photography and image processing hardware to depth sensing cameras. CEVA is one of the primary suppliers of vision IP, and Inuitive combines the use of CEVA IP with their own home grown depth extraction hardware and, in their upcoming chip, introduces an early use of dedicated neural network IP from Synopsys.

Taken together, these four companies provide a broad overview of activity at key points in the supply chain: block level IP, dedicated edge devices and vision-oriented GPU compute. All four companies have been in the business long enough to have experienced the shift in focus from computational photography to vision processing. They are all now dealing with the upheaval caused by the rapid ascendancy of neural networks for all aspects of classification and its broad adoption as the front end of AI systems. This shift is directly reflected in the products reviewed and a look at how these companies are responding gives a good look at the likely future path the vision market will take.

More from the author, Peter McGuinness:

This report covers Vision Processing Units (VPUs) but what makes a processor a *vision* processor? In many regards "You'll know one when you see one" covers the situation but there are some things that can be specified.

Included in the definition are IP blocks such as a GPU (think Qualcomm or Nvidia) or a wide-SIMD DSP (like Ceva or Tensilica), or a programmable camera pipeline (Apical or Intel). The term also includes full SoCs that combine one or more of the characteristics of the above IP blocks along with other functions such as Movidius and Inuitive which both add hardware acceleration of specific functions to their DSP-based blocks. Lastly, novel architectures such as Wave Computing which target a wider range of functions but are also well suited to acceleration of vision tasks are included.

The first thing that unites all these things is that, while they are programmable, they are not intended to be the main CPU in a system: they are primarily intended as accelerators and design decisions have been taken to optimise them for that role. Thus even though mainstream architectures like x86, ARM and MIPS are fully capable of executing vision programs (and in some cases do it very effectively) they are not included in the definition because they are optimised to be the main CPU, not an accelerator. 

Second, they are all massively parallel and depend for their performance on the fact that visual data comes in 2D arrays (at least) and that vision functions in general exhibit massive data parallelism. Their hardware architectures have been optimised to take advantage of this and they devote significant silicon area to it for example in the form of extreme multi-threading hardware or specific data handling and memory access optimisation hardware.

Third, the programmable elements have a vision-specific hardware architecture. The most obvious examples of this are the inclusion of high-efficiency integer hardware and the co-issue of smaller data types so that targeting 16-bit INT doubles throughput over 32-bit, for instance, but there are other optimisations as well, such as specific methods for handling non-linear functions.

Last, the unit must be usable as a vision processor so it must have an SDK that specifically targets that. It can be based on OpenCV, or OpenVX, or can take input directly from Tensorflow, or whatever the vendor sees as most appropriate but the tool chain must be in place to enable vision processing. A key feature of such an SDK is that it can expose complex intrinsic functions that access hardware accelerators.

That definition covers a lot of hardware architectures which is a reminder that we are very much in the same sort of situation graphics was in during the pre-OpenGL, pre-DirectX days when APIs were proprietary and  hardware architectures proliferated. That situation may settle down over the next few years as a smaller set of APIs becomes dominant but for now, that's where we are.

For the table of contents and illustration list of this premiere quarterly report, please see the summary PDF on the Visualize the World website. To order a copy of the full report, please contact Jon Peddie Research. It can be purchased as a single issue ($2K) or as an annual subscription ($6.5K).

Here you’ll find a wealth of practical technical insights and expert advice to help you bring AI and visual intelligence into your products without flying blind.

Contact

Address

Berkeley Design Technology, Inc.
PO Box #4446
Walnut Creek, CA 94596

Phone
Phone: +1 (925) 954-1411
Scroll to Top