This blog post is an abridged version of the Gyrfalcon white paper “AI-Powered Camera Sensors”.
Computing at the Edge: Smart Cameras, Robotic Vehicles and End-Point Devices
Visual data has grown volumetrically – Artificial Intelligence (AI) is transforming overwhelming amounts of video into timely and actionable intelligence at a rate like never before. AI-powered cameras at the edge enable smartphone, automotive, computing, industrial, and IoT devices to redefine the way they process, restore, enhance, analyze, search, and share video and images. On-device integrated AI-camera sensor co-processor chips with their built-in high-processing power and memory allow the machine- and human-vision applications to operate much faster, more energy-efficiently, cost-effectively, and securely without sending any data to remote servers.
Over the past few years, quality mobile cameras have proliferated in devices ranging from smartphones, surveillance devices, and robotic vehicles, including autonomous cars. These have all benefited from the integration of AI and image signal processing (ISP) engines. Machine Learning (ML) is used not only to enhance the quality of the video/images captured by cameras, but also to understand video contents like a human can detect, recognize, and classify objects, events, and even actions in a frame.
The edge AI chipset demand for on-device machine-vision and human viewing applications is mostly driven by smartphones, robotic vehicles, automotive, consumer electronics, mobile platforms, and similar edge-server markets. Smartphones and automotive are the dominant drivers due to their fastest growth and largest volume shipment and revenue in edge vision computing. The mobile phone market segment alone is forecast to account for over 50% of the 2025 global edge AI chipset market, according to OMDIA | TRACTICA.
An AI-powered camera sensor is a new technology that manufacturers like Sony, Google, Apple, Samsung, Huawei, Honor, Xiaomi, Vivo, Oppo, and others, are integrating on every launch of their new smartphones. Building AI-equipped cameras involves applying technologies from traditional image signal processing (ISP) techniques to modern computer vision and deep machine-learning networks. ISPs typically perform image enhancement as well as converting the one-color-component per pixel output of a raw image sensor into the RGB or YUV images that are more commonly used elsewhere in the system.
An ISP, in combination with an AI-based computer vision processor, can collaboratively deliver a more robust image and computer processing capabilities than a standalone ISP. Traditionally, ISPs are tuned to process images intended for human-viewing purposes. However, in handling applications involving both machine-vision and human-vision applications, a functional shift is required to efficiently and effectively execute both traditional and deep learning-based computer vision algorithms.
Today, many AI-based camera applications rely on sending images and videos to the cloud for analysis, exposing the processing of data to become slow and insecure. Additionally, manufacturers have to install specialized DSP or GPU processors on devices to handle the extra computational demand. A more streamlined solution for vision edge computing is to use dedicated, low-power, and high-performing AI processor chips capable of handling deep-learning algorithms for image quality enhancement and analysis on the device. One such solution is the Gyrfalcon Technology AI co-processor chips.
Human-like Senses
The ultimate purpose of an AI-based camera is to mimic the human eyes and brain and to make sense of what the camera envisions through artificial intelligence. AI-equipped camera modules offer distinct advantages over standard cameras by capturing the enhanced images AND also performing image analysis, content-aware, and event/pattern recognition, all in one compact system. AI-powered cameras turn your smartphone snapshots into DSLR-quality photos.
The need for AI on edge devices has been realized, and the race to design integrated and edge-optimized chipsets has begun. AI processing on the edge device, particularly AI vision computing, circumvents privacy concerns while avoiding the speed, bandwidth, latency, power consumption, and cost concerns of cloud computing. As the shipment of AI-equipped devices with a growing demand for higher compute is increasing rapidly, the need for AI acceleration chips has been realized on the edge.
Mobile cameras equipped with AI capabilities can now capture spectacular images that rival advanced high-end DSLR cameras. However, due to the compact form factor of edge and mobile devices, smart cameras are unable to carry large image sensors or lenses. This challenge compels manufacturers to push computational image processing technology for boosting the quality of the image to the next level by joint design of image capture, image reconstruction, and image analysis techniques. The arrival of AI and deep learning have provided an alternative image processing strategy for both image quality enhancement and machine-vision applications such as object detection and recognition, content analysis and search, and computational image processing.
Deep Learning
Deep learning (DL) is a branch of machine learning algorithms that aims at learning the hierarchical representations of data. DL has shown prominent superiority over other machine learning algorithms in many artificial intelligence domains, such as computer vision, speech recognition, and natural language processing. Generally, the strong capability of DL to address substantial unstructured data is attributed to the following three contributors: (1) the development of efficient computing hardware, (2) the availability of massive amounts of data, and (3) the advancement of sophisticated algorithms.
Due to low-resolution, inaccurate equipment, or severe weather and environmental conditions; captured images are subject to low quality, mosaicing, and noise artifacts that degrade the quality of information. On-device super-resolution (SR), demosaicing, denoising, and high dynamic range (HDR) procedures are often augmented to CMOS sensors to enhance the image quality by deploying sophisticated neural network algorithms with an integrated high-performing, cost-effective, and energy-efficient AI co-processor chip.
An intelligent image sensor in an AI camera can process, enhance, reconstruct, and analyze captured images and videos by incorporating not only a traditional ISP engine but also by deploying emerging deep learning-based machine vision networks into the sensor itself, according to Edge AI and Vision Alliance.
A high-performing neural network accelerator chip is a compelling candidate to combine with image signal processing functions that were historically handled by a standalone ISP. The output of the CMOS sensor can be pre-processed by an ISP to rectify lens distortion, pixel and color corrections, and de-noising prior to being routed to a deep learning vision processor for further analysis.
These emerging intelligent sensors not only capture light, but they also capture the details, meaning, scene understanding, and information from the light in front of them.
Edge Co-processing
An AI-powered camera using a dedicated co-processor chip, such as Gyrfalcon’s, with innovative deep learning algorithms can deliver a vision-based solution with unmatched performance, power efficiency, cost-effectiveness, and scalability for intelligent CMOS sensors particularly in the fast-growing and dominant markets of smartphones and automotive. A sophisticated ISP pipeline can be replaced with a single end-to-end deep learning model trained without any prior knowledge about the sensor and optics used in a particular device.
An AI image co-processor chip with a deep-learning CNN architecture and multi-scale multi-mode super-resolution (SR) capabilities can support various upscaling factors, image sizes, quantization-level options while being able to operate in various image enhancement modes depending on the target applications and performance requirements. Some of these capabilities can include multi-scale Super-Resolution/Zoom (SR Zoom), multi-type High Dynamic Range (HDR), AI-based or pre-processing-based denoising algorithms, or a combination of one or more of these supported functions.
An AI-powered camera module with an integrated image co-processor chip can generate 4K ultra-high-definition (UHD) at high frame rates with enhanced PSNR, superior visual quality, and lower cost compared with conventional leading CNN-based SR processors.
The emerging smart CMOS image sensors technology trend is to merge ISP functionality and deep learning network processor into a unified end-to-end AI co-processor. An AI image co-processor can be integrated into a camera module by directly using raw data from the sensor output to produce DSLR-quality images as well as highly accurate computer vision results.
Having a dedicated AI image co-processor on the device offers numerous benefits including enhanced vision quality, higher performance, improved privacy, reduced bandwidth and latency, less CPU computational load, efficient energy use, and less BOM cost for running critical vision applications in real-time, always-on, anywhere independent of Internet connection.
Manouchehr Rafie, Ph.D.
Vice President of Advanced Technologies, Gyrfalcon Technology
About the Author
Dr. Rafie is the Vice President of Advanced Technologies at Gyrfalcon Technology Inc. (GTI), where he is driving the company’s advanced technologies in the convergence of deep learning, AI Edge computing, and visual data analysis. He is also serving as the co-chair of the emerging Video Coding for Machines (VCM) at MPEG-VCM standards. Prior to joining GTI, Dr. Rafie held executive/senior technical roles in various startups and large companies including VP of Access Products at Exalt Wireless, Group Director & fellow-track positions at Cadence Design Services, and adjunct professor at UC Berkeley University. He has over 90 publications and served as chairman, lecturer, and editor in a number of technical conferences and professional associations worldwide.