OpenVX, a maturing API from the Khronos Group, enables embedded vision application software developers to efficiently harness the various processing resources available in SoCs and systems.
Vision technology is now enabling a wide range of products, that are more intelligent and responsive than before, and thus more valuable to users. Such image perception, understanding, and decision-making processes have historically been achievable only using large, heavy, expensive, and power-draining computers and cameras. Thus, computer vision has long been restricted to academic research and relatively low-volume production designs.
However, thanks to the emergence of increasingly capable and cost-effective processors, image sensors, memories and other semiconductor devices, along with robust algorithms, it's now practical to incorporate computer vision into a wide range of systems. The Embedded Vision Alliance uses the term “embedded vision” to refer to this growing use of practical computer vision technology in embedded systems, mobile devices, PCs, and the cloud.
Key to the widespread adoption of embedded vision is the ease of developing software that runs efficiently on a diversity of hardware platforms, with high performance, low power consumption and cost-effective system resource needs. In the past, this combination of objectives has been a tall order, since the combination of high performance and low power consumption has historically required significant code optimization for a particular device architecture, thereby hampering portability to other architectures. Fortunately, this situation is beginning to change with the emergence of OpenVX, an open standard created and maintained by the not-for-profit Khronos Group industry consortium.
Overview
OpenVX was developed for the cross-platform acceleration of computer vision applications, prompted by the need for high performance and low power with diverse vision processors. Processing options now in use (solely or in combination) by vision developers include single- and multi-core CPUs, GPUs, DSPs, FPGAs, ISPs (image signal processors), and dedicated vision processors and IP cores. According to the Khronos Group, OpenVX is specifically targeted at low-power, real-time applications, particularly those running on mobile and embedded platforms.
For example, a low-power host processor can set up and manage the vision-processing pipeline, with some or all of the functions running on dedicated hardware and/or one or multiple specialized coprocessors. More generally, the OpenVX specification provides a way for developers to tap into the performance of processor-specific optimized code, while still providing code portability via the API's standardized interface and higher-level abstraction versus with traditional vision software development approaches.
OpenVX graphs are at the foundation of the API’s efficiency advantages. In creating a vision-processing algorithmic pipeline, application developers construct a graph made up of operations, called nodes, which can be coded in any language and implemented using lower-level APIs. Each node, typically created by a vision processor vendor as part of its supplied OpenVX framework, is an instance of a computer vision function (i.e. a kernel) with associated data references, a return value, and performance information. And both application developers and processor vendors can optionally create user-defined nodes based on custom extensions to the OpenVX specification. Note, however, that unlike with vendor-defined custom nodes, which the vendor's OpenVX implementation inherently knows about and can therefore work with in an optimal way, developer-defined custom nodes operate outside of a vendor's OpenVX framework and therefore incur additional usage restrictions.
OpenVX's flexibility supports implementations optimized for performance, power consumption, and/or other parameters. For example, a vendor's framework implementation can "fuse" multiple nodes in order to eliminate intermediate memory transfers. Processing can be also be "tiled" in order to retain data in the local memory or cache. Workload parallelism enables the framework to simultaneously execute multiple nodes on multiple processing targets. And after the host processor initially sets up an application graph, it can then execute near-autonomously, according to the Khronos Group. The final OpenVX v1.0 specification was released in October 2014, with a subsequent maintenance update released in June 2015 and further evolution ongoing.
Standards Comparisons, Contrasts and Compatibilities
Naming similarities can lead to confusion between OpenVX and two other software standards (the first de facto, the second formal) commonly used in vision processing, OpenCV and OpenCL. OpenCV, the open-source computer vision library, is a collection of more than 2,500 software components released under a BSD license and available for free use for computer vision applications and research. OpenCV is written in optimized C and C++, has C++, C, Python and Java interfaces, and supports Windows, Linux, Mac OS, iOS and Android.
Several key distinctions between OpenVX and OpenCV bear mentioning. First, OpenVX is fundamentally targeted at efficient optimization of code on embedded and mobile processors. While OpenCV is also optimized for multiple processors, OpenVX is designed to provide many straightforward optimization opportunities. OpenVX's graph API, along with its opaque data structures, make it uncomplicated to relocate the execution of kernels from CPU to an accelerator with dedicated memory, such as GPU or DSP. The concept of virtual images also allows for further optimizations, such as filter stacking, that minimize data transfers through memory buses.
OpenVX has also been designed with a fundamental focus on functional portability. Any code based on OpenVX will produce functionally identical (albeit not necessarily identical-performance) results when executed on different processors. This achievement is enabled by a tightly defined specification together with conformance tests that enforce functional portability. OpenCV also has an extensive body of algorithmic tests, but they focus on making sure that the library works correctly on each of the platforms, rather than confirming that functions return the same results on multiple platforms.
Finally, while OpenCV contains an extensive set of functions that span the field of computer vision, optimizing OpenCV-sourced algorithms for a new hardware platform can therefore consume significant time and effort, depending on what percentage of the 500+ algorithms in the library require optimization for a particular project. Since OpenVX defines a much smaller set of functions, creating an efficient OpenVX implementation is more feasible (Table 1). And the OpenVX specification is written with the intent of enabling OpenCV users to easily migrate to OpenVX at any point in time.
Vision Function |
Unsigned 8-bit |
Unsigned 16-bit |
Signed 16-bit |
Unsigned 32-bit |
Signed 32-bit |
Floating Point 32-bit |
4CCC |
AbsDiff |
Input, Output |
|
|
|
|
|
|
Accumulate |
Input |
|
Output |
|
|
|
|
AccumulateSquared |
Input |
|
Output |
|
|
|
|
AccumulateWeighted |
Input, Output |
|
|
|
|
|
|
Add |
Input, Output |
|
Input, Output |
|
|
|
|
And |
Input, Output |
|
|
|
|
|
|
Box3x3 |
Input, Output |
|
|
|
|
|
|
CannyEdgeDetector |
Input, Output |
|
|
|
|
|
|
ChannelCombine |
Input |
|
|
|
|
|
Output |
ChannelExtract |
Output |
|
|
|
|
|
Input |
ColorConvert |
|
|
|
|
|
|
Input, Output |
ConvertDepth |
Input, Output |
|
Input, Output |
|
|
|
|
Colvolve |
Input, Output |
|
Output |
|
|
|
|
Dilate3x3 |
Input, Output |
|
|
|
|
|
|
EqualizeHistogram |
Input, Output |
|
|
|
|
|
|
Erode3x3 |
Input, Output |
|
|
|
|
|
|
FastCorners |
Input, Output |
|
|
|
|
|
|
Gaussian3x3 |
Input, Output |
|
|
|
|
|
|
HarrisCorners |
Input, Output |
|
|
|
|
|
|
HalfScaleGaussian |
Input, Output |
|
|
|
|
|
|
Histogram |
Input |
|
|
|
Output |
|
|
IntegralImage |
Input |
|
|
Output |
|
|
|
TableLookup |
Input, Output |
|
|
|
|
|
|
Magnitude |
|
|
Input, Output |
|
|
|
|
MeanStdDev |
Input |
|
|
|
|
Output |
|
Median3x3 |
Input, Output |
|
|
|
|
|
|
MinMaxLoc |
Input, Output |
|
Input, Output
|
|
Output |
|
|
Multiply |
Input, Output |
|
Input, Output |
|
|
|
|
Not |
Input, Output |
|
|
|
|
|
|
OpticalFlowLK |
Input |
|
|
Output |
|
|
|
Or |
Input, Output |
|
|
|
|
|
|
Phase |
Output |
|
Input |
|
|
|
|
GaussianPyramid |
Input, Output |
|
|
|
|
|
|
Remap |
Input, Output |
|
|
|
|
|
|
ScaleImage |
Input, Output |
|
|
|
|
|
|
Sobel3x3 |
Input |
|
Output |
|
|
|
|
Subtract |
Input, Output
|
|
Input, Output |
|
|
|
|
Threshold |
Input, Output |
|
|
|
|
|
|
WarpAffine |
Input, Output |
|
|
|
|
|
|
WarpPerspective |
Input, Output |
|
|
|
|
|
|
Xor |
Input, Output |
|
|
|
|
|
|
Table 1. OpenVX v1.0 Base Vision Functions and Associated Input/Output Types
Then there's OpenCL, also developed and maintained by the Khronos Group (as well as used by OpenCV to harness the power of GPUs and other accelerators). This language-based framework was created as a tool for efficient execution of data-parallel software on heterogeneous hardware architectures. Similarities exist between OpenCL and OpenVX: both are open standards targeted at code acceleration, and both enforce functional portability. And OpenVX is designed so that it is easy to implement in coexistence with OpenCL; an OpenVX node can be implemented using OpenCL, for example. But important differences also exist.
OpenCL specifies a C-based language to program kernels, for example, with a focus on general-purpose math functions. OpenVX, in contrast, targets a specific application domain, vision. Many elements of OpenVX make it extremely efficient on embedded hardware, and it possesses a set of computer vision functions essential for almost any computer vision use case. Also, like OpenCV, OpenCL implements a declarative i.e. explicit memory model. This characteristic prohibits certain types of optimizations that are possible with OpenVX, which as earlier mentioned specifies an opaque memory model for the most of its data objects.
Finally, floating-point requirements (or lack thereof) are an important consideration for embedded platforms. Some of the processors used to accelerate computer vision functions do not support floating-point calculations. Since floating-point data types are an integral part of OpenCL, implementing it on such platforms is challenging. OpenVX conversely does not require floating-point support.
A Developer's Perspective
OpenVX's "much smaller set of functions" compared with OpenCV is something of a mixed blessing. While the narrower focus enables processor suppliers to more rapidly develop optimized OpenVX implementations, the small set of OpenVX functions currently required for conformance isn't comprehensive from the standpoint of what's necessary to develop sophisticated vision applications. To fill any function gaps, as previously mentioned, developers can create custom "user nodes" or rely on custom extensions provided by the vendor of their preferred target vision processor.
An extension-evolution path akin to that already seen with other Khronos standards is also anticipated with OpenVX. When enough customers with similar needs emerge, processor vendors will support OpenVX extensions to meet those needs. And when multiple vendors adopt similar extensions, some of those will likely be incorporated in a future version of the OpenVX specification. This evolution will not occur overnight, among other reasons because OpenCV is well known by developers. But OpenVX advocates believe that the standard's performance, power consumption and portability benefits make extensive function support (initially via extensions, later built in to the specification) a matter of "when", not "if".
OpenVX extension development by means of an API "wrapper" around an OpenCV function is a possibility, but in doing so the extension developer will likely negate a key OpenVX benefit. OpenCV is fundamentally based on processing a frames' worth of data at a time. Conversely, since optimized data flow is at the core of OpenVX, it favors handling data in structures such as tiles and pixels. More generally, it focuses on moving bits versus frames, an efficiency distinction that would be discarded if an OpenCV function were simply inserted into the OpenVX framework without modification.
Graph and Extension Examples
Figure 1 shows a typical OpenVX graph for implementing a pseudo-Canny edge detection algorithm, common in many computer vision applications.
Figure 1. A conventional Canny edge detector OpenVX graph (source: "Addressing System-Level Optimization with OpenVX Graphs")
In a typical computer vision system, the camera's image sensor captures color information by means of a Bayer pattern filter array. Figure 2 shows a typical Bayer pattern. While the OpenVX standard includes multiple standard nodes, Bayer de-mosaicing is not one of them. However, a developer can define a custom OpenVX node that will ingest raw Bayer data and output a grayscale (monochrome) frame.
Figure 2. A conventional Bayer color filter array (CFA) pattern
Such an OpenVX node first needs to implement color interpolation. Specifically, at each pixel location, the node will need to interpolate the two missing color components. The most straightforward interpolation method is the bilinear approach, which can suffer from color artifacts in regions with edges or high-frequency details. More sophisticated de-mosaicing methods include higher-order interpolants such as bicubic or Lanczos resampling. For this particular application, however, a bilinear interpolant will likely suffice, since the output of the custom node is an elementary monochrome signal.
After calculating the R,G and B color components at each pixel location, it's straightforward to then compute monochrome luminance:
Y = 0.299*R + 0.587*G + 0.114*B
Vision systems additionally often employ a wide-angle or “fish-eye” lens in order to increase the field-of-view, with the consequence that lens correction must be utilized to counteract the effects of the lens's optical distortions. In this example, to accomplish the de-warping task, the grayscale frame will be routed to the standard OpenVX vxRemapNode node. A factory calibration procedure generates a remapping table that is applied to each pixel in the monochrome frame. In an optimized GPU-based OpenVX driver implementation, the vxRemapNode node will leverage the GPU texture unit to perform hardware-accelerated bilinear sampling.
Both vxRemapNode and the previously discussed custom OpenVX node for Bayer de-mosaicing and monochrome conversion are inserted at the front end of the graph, thereby allowing the OpenVX driver to optimize the layout for the architecture at hand (Figure 3). Bayer de-mosaicing and pixel warping processes are both highly data-parallel, so the associated computations are good candidates for tiling to achieve peak memory efficiency. And both processes can be accelerated in a fairly straightforward manner on GPU architectures, for example.
Figure 3. Modified OpenVX graph with new front-end
The code for this OpenVX graph example follows:
vx_context x = vxCreateContext();
vx_graph g = vxCreateGraph(c);
vx_image bayer = vxCreateImage(c,w,h,FOURCC_BYR2);
vx_image y = vxCreateImage(g,0,0,FOURCC_VIRT);
vx_image yUndistort = vxCreateImage(g,0,0,FOURCC_VIRT);
vx_image ySmooth = vxCreateImage(g,0,0,FOURCC_VIRT);
// [snip]
vx_node nodes[] = {
vxCustomBayerToLumaNode(g, bayer, y),
vxRemapNode(g, y, yUndistort),
vxGaussian3x3Node(g, yUndistort, ySmooth),
// [snip]
};
vxVerifyGraph(g);
vxProcessGraph(g);
Developer Assistance
Vision technology is enabling a wide range of products that are more intelligent and responsive than before, and thus more valuable to users. Vision processing can add valuable capabilities to existing products. And it can provide significant new markets for hardware, software and semiconductor suppliers. The Embedded Vision Alliance, a worldwide organization of technology developers and providers, is working to empower product creators to transform this potential into reality. BDTI, Cadence, Itseez and Vivante, the co-authors of this article, are members of the Embedded Vision Alliance. Cadence and Vivante are processor suppliers who support OpenVX, while Itseez chairs the OpenVX Working Group at Khronos and BDTI was one of the creators of the OpenVX specification and considers OpenVX for use with its engineering services clients.
First and foremost, the Alliance's mission is to provide product creators with practical education, information, and insights to help them incorporate vision capabilities into new and existing products. To execute this mission, the Alliance maintains a website providing tutorial articles, videos, code downloads and a discussion forum staffed by technology experts. Registered website users can also receive the Alliance’s twice-monthly email newsletter, Embedded Vision Insights, among other benefits.
The Embedded Vision Alliance also offers a free online training facility for vision-based product creators: the Embedded Vision Academy. This area of the Alliance website provides in-depth technical training and other resources to help product creators integrate visual intelligence into next-generation software and systems. Course material in the Embedded Vision Academy spans a wide range of vision-related subjects, from basic vision algorithms to image pre-processing, image sensor interfaces, and software development techniques and tools such as OpenVX, OpenCL and OpenCV. Access is free to all through a simple registration process.
The Alliance also holds Embedded Vision Summit conferences in Silicon Valley. Embedded Vision Summits are technical educational forums for product creators interested in incorporating visual intelligence into electronic systems and software. They provide how-to presentations, inspiring keynote talks, demonstrations, and opportunities to interact with technical experts from Alliance member companies. These events are intended to inspire attendees' imaginations about the potential applications for practical computer vision technology through exciting presentations and demonstrations, to offer practical know-how for attendees to help them incorporate vision capabilities into their hardware and software products, and to provide opportunities for attendees to meet and talk with leading vision technology companies and learn about their offerings.
The next Embedded Vision Summit will take place on May 2-4, 2016 in Santa Clara, California, and will include accompanying half- and full-day workshops. Please reserve a spot on your calendar and plan to attend. Online registration and additional Embedded Vision Summit information are available on the Alliance website.
By Brian Dipert
Editor-in-Chief, Embedded Vision Alliance
Amit Shoham
Distinguished Engineer, BDTI
Pulin Desai
Product Marketing Director, Cadence
Victor Eruhimov
CEO, Itseez
Chairperson, OpenVX Working Group, Khronos
Shang-Hung Lin
Vice President, Vision and Image Product Development, Vivante