This blog post was originally published at Intel's website. It is reprinted here with the permission of Intel.
Are you a data scientist who wants to optimize the performance of your machine learning (ML) inference workloads? Perhaps you’ve heard of the Intel® Optimization for TensorFlow* and the Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN), but have not yet seen a working application in your domain that takes full advantage of Intel’s optimizations. The Model Zoo for Intel Architecture is an open-sourced collection of optimized machine learning inference applications that demonstrates how to get the best performance on Intel platforms. The project contains more than 20 pre-trained models, benchmarking scripts, best practice documents, and step-by-step tutorials for running deep learning (DL) models optimized for Intel® Xeon® Scalable processors.
With the Model Zoo, you can easily:
- Learn which AI topologies and applications Intel has optimized to run on its hardware
- Benchmark the performance of optimized models on Intel hardware
- Get started efficiently running optimized models in the cloud or on bare metal
What’s in Version 1.3
The latest release of the Model Zoo features optimized models for the TensorFlow* framework and benchmarking scripts for both 32-bit floating point (FP32) and 8-bit integer (Int8) precision. Most commercial DL applications today use FP32 precision for training and inference, though 8-bit multipliers can be used for inference with minimal to no loss in accuracy. The Int8 models were created using post-training quantization techniques for reduced model size and lower latency.
FP32 TensorFlow Models | |
---|---|
Adversarial Networks | DCGAN |
Content Creation | DRAW |
Face Detection and Alignment | FaceNet MTCC |
Image Recognition | Inception ResNet V2 Inception V3 Inception V4 MobileNet V1 ResNet 101 ResNet 50 SqueezeNet |
Image Segmentation | Mask R-CNN UNet |
Language Translation | GNMT Transformer-LT |
Object Detection | Faster R-CNN R-FCN SSD-MobileNet SSD-ResNet34 |
Recommendation Systems | NCF Wide & Deep |
Text-to-Speech | WaveNet |
Int8 TensorFlow Models | |
---|---|
Image Recognition | Inception ResNet V2 Inception V3 Inception V4 ResNet 101 ResNet 50 |
Object Detection | Faster R-CNN R-FCN SSD-MobileNet |
Recommendation Systems | Wide & Deep |
You can run benchmarks by cloning the repository and following the step-by-step instructions for your topology of choice. Each model’s benchmark README contains detailed information for downloading a pre-trained model from a public cloud storage location, acquiring a test dataset, and launching the model’s benchmarking script. The benchmarking scripts are designed to run by default in a containerized environment using Intel-optimized TensorFlow Docker* images with all the necessary dependencies taken care of automatically. There is an alpha feature that allows you to run without Docker, but you must manually set up and install all the required dependencies to run benchmarks in this mode. Either way, the script automatically applies the optimal TensorFlow runtime settings for your Intel hardware and provides an output log describing the model performance metrics and settings used. There are options for testing real-time inference (latency with batch size 1), maximum throughput inference (large batch size), and some scripts also offer the option of measuring accuracy.
Example: FP32 Inception V3 Benchmarks
To show benchmarking in action, the following steps are reproduced from the FP32 Inception V3 benchmark README and adjusted for brevity:
-
Clone the intelai/models repository.
$ git clone https://github.com/IntelAI/models.git
-
Download the pre-trained model.
$ wget https://storage.googleapis.com/intel-optimized-tensorflow/models/inceptionv3_fp32_pretrained_model.pb
-
If you would like to run Inception V3 FP32 inference and test for accuracy, you will need the ImageNet dataset. Benchmarking for latency and throughput does not require the ImageNet dataset and will use synthetic/dummy data if no dataset is provided. Instructions for downloading the dataset and converting it to the TF Records format can be found in the TensorFlow documentation here.
-
Navigate to the benchmarks directory in your local clone of the intelai/models repo. The launch_benchmark.py script in the benchmarks directory is used for starting a benchmarking run in an optimized TensorFlow docker container. It has arguments to specify which model, framework, mode, precision, and docker image.
For latency (using –batch-size 1):
python launch_benchmark.py \
–model-name inceptionv3 \
–precision fp32 \
–mode inference \
–framework tensorflow \
–batch-size 1 \
–socket-id 0 \
–docker-image intelaipg/intel-optimized-tensorflow:latest-devel-mkl \
–in-graph /home/<user>/inceptionv3_fp32_pretrained_model.pbExample log tail:
Inference with dummy data.
Iteration 1: 1.075 sec
Iteration 2: 0.023 sec
Iteration 3: 0.016 sec
…
Iteration 38: 0.014 sec
Iteration 39: 0.014 sec
Iteration 40: 0.014 sec
Average time: 0.014 sec
Batch size = 1
Latency: 14.442 ms
Throughput: 69.243 images/sec
Ran inference with batch size 1
Log location outside container: {–output-dir
value}/benchmark_inceptionv3_inference_fp32_20190104_025220.logFor throughput (using –batch-size 128):
python launch_benchmark.py \
–model-name inceptionv3 \
–precision fp32 \
–mode inference \
–framework tensorflow \
–batch-size 128 \
–socket-id 0 \
–docker-image intelaipg/intel-optimized-tensorflow:latest-devel-mkl \
–in-graph /home/<user>/inceptionv3_fp32_pretrained_model.pbExample log tail:
Inference with dummy data.
Iteration 1: 2.024 sec
Iteration 2: 0.765 sec
Iteration 3: 0.781 sec
…
Iteration 38: 0.756 sec
Iteration 39: 0.760 sec
Iteration 40: 0.757 sec
Average time: 0.760 sec
Batch size = 128
Throughput: 168.431 images/sec
Ran inference with batch size 128
Log location outside container: {–output-dir
value}/benchmark_inceptionv3_inference_fp32_20190104_024842.log
Documentation
In addition to benchmarking scripts and instructions for each model, the repository contains a documentation section with best practice guides for achieving maximum performance with Intel Optimization for TensorFlow and TensorFlow Serving. These are the best resources for in-depth knowledge about installing and tuning the frameworks for optimal performance on Intel hardware. Included in the documentation are hands-on tutorials for a selection of models in the Model Zoo and a tutorial on how to quantize the FP32 ResNet50 model to Int8 precision for improved performance while retaining high accuracy. Here is a sample of the documents found in v1.3 (see the documentation README for a full list):
Intel Optimization for TensorFlow
- General Best Practices
- Tutorial: Image Recognition with ResNet50, InceptionV3, and ResNet101
- Tutorial: Recommendation Systems with Wide & Deep
- Tutorial: Optimization and Int8 Quantization with ResNet50
Intel Optimization for TensorFlow Serving
- General Best Practices
- Tutorial: Image Recognition with ResNet50 and InceptionV3
- Tutorial: Object Detection with R-FCN
What’s Next?
Future releases of the Model Zoo will add more Int8 precision models and more hands-on tutorials covering additional models for TensorFlow, TensorFlow Serving, and the Int8 quantization process. We are also working on expanding the Model Zoo to include additional frameworks and benchmarking scripts that cover training in addition to inference and accuracy.
Visit the project on GitHub for more information and instructions on getting started.
Melanie Hart Buehler
Cloud Software Engineer, Artificial Intelligence Products Group, Intel