Andrew Richards, CEO and Founder of Codeplay Software, presents the “Programming Techniques for Implementing Inference Software Efficiently” tutorial at the May 2018 Embedded Vision Summit.
When writing software to deploy deep neural network inferencing, developers are faced with an overwhelming range of options, from a custom-coded implementation of a single model to using a deep learning framework like TensorFlow or Caffe. If you custom code your own implementation, how do you balance the competing needs of performance, portability and capability? If you use an off-the-shelf framework, how do you get good performance? Andrew and his company have been building and standardizing developer tools for GPUs and AI accelerators for over 15 years.
This talk will explore the approaches available for implementing deep neural networks in software, from the low-level details of how to map software to the highly parallel processors needed for AI all the way up to major AI frameworks. This will start with the LLVM compiler chain used to compile for most GPUs, through the OpenCL, HSA and SYCL programming standards (including how they compare with CUDA), all the way up to TensorFlow and Caffe and how they affect the key metrics like performance.