This blog post was originally published at Intel’s website. It is reprinted here with the permission of Intel.
My first blog in this three-part series explored the opportunities associated with the growing workload diversity and the inevitable growth of heterogeneous computing in the exascale era. In this blog, I address the barriers that developers encounter when programming to multiple architectures and their diverse software stacks, and how oneAPI is making the coding process easier.
At SC’19, Intel launched oneAPI, a unified and scalable programming model to harness the power of diverse computing architectures in the era of HPC/AI convergence. The oneAPI initiative is supported by over 30 major companies and research organizations and growing. It will define programming for an increasingly AI-infused, multi-architecture world. oneAPI delivers a unified and open programming experience to developers, on the architecture of their choice, without compromising performance. It also mimimizes the complexity of separate code bases, multiple-programming languages, and different tools and workflows. Our goal is to reduce these barriers that developers encounter. We also want to ensure that existing software investments are preserved with support for existing languages while delivering flexibility for developers to create versatile applications.
Rick Stevens, associate laboratory director, Computing, Environment, and Life Sciences, Argonne National Laboratory, is leading the effort to deploy an exaflop-capable supercomputer, named Aurora, in 2021. As he puts it, “The future of advanced computing requires heterogeneous hardware to maximize the computing power needed for exascale-class workloads. The oneAPI industry initiative Intel is spearheading will ensure that programming across diverse compute architectures is greatly simplified.”
“We’ve been working closely with Intel on defining oneAPI and using oneAPI for our own internal development and testing,” explained Hal Finkel, Lead for Compiler Technology and Programming Languages at Argonne National Laboratory’s Leadership Computing Facility. “oneAPI provides extended capabilities, such as supporting unified memory and reductions, above what is available in the current SYCL 1.2.1 spec, and these capabilities are essential for us. Our development of a Kokkos backend for DPC++/oneAPI, for example, relies on these additional features. We’re looking forward to updates to the SYCL specification which we trust will contain important new features from DPC++ that address specific needs identified during these development activities.”
oneAPI: Industry Initiative and Intel Beta Product
oneAPI includes both an industry initiative based on open specifications and an Intel beta product. The oneAPI specification includes a direct programming language and domain specific libraries. It also includes migration, analysis, and debug tools needed to help redefine programmability in the XPU era. It also provides powerful APIs, a low-level hardware interface, and more, to assist in coding for a diversity of user workloads:
- Cross-architecture language: oneAPI’s Data Parallel C++ (DPC++) is a standards-based, cross-architecture language that is an evolution of ISO C++ for productivity. It incorporates Khronos SYCL to support data parallelism and heterogeneous programming. DPC++ also supports OpenMP and Python extensions for HPC developers. Programmers can also use Codeplay’s DPC++ open source compiler for Nvidia GPUs today.
- Domain-specific libraries: Intel has a history of building leading CPU performance-optimized libraries. With oneAPI, we’re broadening our domain-specific libraries, now in beta, with cross-architecture support and other new capabilities, too. To elaborate on just a couple of them, the oneAPI Math Kernel Library (oneMKL) offers developers math routines to optimize applications for Intel CPUs and GPUs. The oneAPI Deep Neural Network Library (oneDNN) provides performance-optimized building blocks that help make deep learning frameworks run faster.
- Level Zero (low-level hardware abstraction): This low latency scheduling and management layer is used by tool developers and other hardware vendors as a consistent cross-architecture runtime layer. Intel has also established open source projects for most of the oneAPI elements to make it easy for others in the industry to leverage oneAPI for their projects.
Tools and Compilers Ease Porting Process
Additionally, Intel created its reference implementation of oneAPI in a set of toolkits, which provides common tools that work for DPC++ (and other languages).
- Code Migration tool: To ease the porting process for developers’ CUDA code, we offer the Intel® DPC++ Compatibility Tool as part of the Intel® oneAPI Base Toolkit. The tool helps simplify the migration of existing code written in CUDA to DPC++. Usage scenarios so far indicate the tool’s ability to migrate about 80%-90% of existing CUDA code automatically. Once that migration is complete, code can run on oneAPI supported hardware.
- Compilers: We wanted to offer a compiler to help optimize code for different hardware architectures and make it easier for developers to optimize code for the Intel portfolio. The Intel® oneAPI DPC++ Compiler helps optimize code for CPU, GPU, and FPGA architectures.
- Advanced analysis and debug tools: Intel also provides tools like Intel® Advisor to model vectorization, threading, compute offload and do roofline analysis; Intel® VTune™ Profiler for system, performance, and memory analysis. An Intel-optimized GDB debugger, HPC cluster tools, and more to help developers optimize applications.
You can read more about the full breadth of the oneAPI beta toolset on Intel’s website.
These oneAPI components illustrate Intel’s commitment to a “software first” strategy for heterogeneous computing. Through all these complementary elements of oneAPI, developers have additional resources to help them innovate, while shortening the time needed to code applications optimized for multiple architectures.
Intel’s oneAPI beta toolkits are available for download. Or, we invite developers who wish to test their applications and workloads to visit the Intel® DevCloud for oneAPI to experiment for free with several Intel architectures, including Intel®Xeon® Scalable processors, Intel® Xeon processors with Intel® Processor Graphics (GPUs) and FPGAs. After a one-minute sign-up process, Intel DevCloud for oneAPI enables one line code change to target all XPU architectures. DevCloud does not require installation, setup or configuration, extra hardware, or downloads.
We appreciate all the feedback we’ve received from developers since oneAPI’s announcement last November. We look forward to more input this year and will continue refining oneAPI elements to meet the needs of our developer ecosystem.
In the coming weeks, I’ll post the third blog in this series, discussing details of our recently announced Xe architecture-based GPUs. I will also touch on the broad range of Intel technologies coming together to enable the US’ first exascale system at the Argonne National Laboratory.
Notices and Disclaimers
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.
Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks.
No product or component can be absolutely secure.
Your costs and results may vary.
Intel technologies may require enabled hardware, software or service activation.
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Patricia (Trish) A. Damkroger
Vice President and General Manager, High Performance Computing Organization, Data Platforms Group, Intel