This blog post was originally published at Macnica’s website. It is reprinted here with the permission of Macnica.
The introduction of artificial intelligence (AI) has ushered in exciting new applications for surveillance cameras and other devices with embedded vision technology. Tools like generative AI (gen AI), ChatGPT and Midjourney are augmenting computer vision-based results. At the same time, gen AI-based vision language models (VLM) and system-on-chip (SoC) hardware make deploying edge-based embedded vision a reality.
The industry is witnessing practical applications such as security cameras answering natural-language questions, like: “Did Amazon deliver a package today?” Smart construction site camera systems improve security and safety and perform real-time assessments of assets, such as whether construction employees are wearing hard hats onsite.
For these applications to become mainstream, camera and device manufacturers must combine gen AI with advanced hardware features to overcome the limitations of traditional embedded vision. With the right tools and insights, they can enable advanced capabilities on edge devices with lower costs, reduced power consumption, and less reliance on cloud services.
Companies that manufacture products with embedded vision are often challenged by traditional computer vision techniques’ performance and cost boundaries that limit their ability to plan product roadmaps beyond the status quo. These challenges encompass various aspects, from raw computing bandwidth to cloud deployment constraints, creating interrelated issues that cause manufacturers to pause their plans.
But by combining gen AI with advanced hardware features like edge-based embedded vision, manufacturers can overcome these limitations and leverage the full capabilities of edge devices with lower costs, reduced power consumption, and decreased reliance on cloud services.
Here’s a look at key challenges manufacturers face when planning products with next-generation embedded vision and some tips to help eliminate the barriers to embedded vision with edge AI. For a deeper dive into the topic, download and read our recent white paper: Edge and Generative AI for Embedded Vision: Breaking Through Performance and Cost Barriers.
Requirements for Edge-based, AI-embedded Vision
Edge-based embedded vision refers to computer or machine vision algorithms running on edge devices in place of remote servers or cloud infrastructure. It enables visual data analysis to be closer to where it is captured on devices like cameras, industrial sensors, drones, and cars.
Getting embedded vision right isn’t easy. It takes knowledge and special skills to ensure advanced imaging and vision solutions connect to other components using one of several competing interface standards. Some of these requirements include:
- Hardware design skills – to perform complex computations locally without exceeding the power and thermal requirements of the overall system. This typically requires some combination of CPU, signal processor, memory, and video interfaces to perform operations like on-chip video encoding and application-specific tasks.
- Vision algorithms – to extract meaningful information from visual data efficiently, enabling common tasks such as object detection, image classification, facial recognition, and anomaly detection.
- AI image processing – to perform advanced computer vision tasks, such as real-time image stabilization, low-light processing, and temporal filtering within the constraints of modern computing and storage capabilities.
- Integrated and optimized models – to work in the ecosystem of the embedded system. This includes quantization, pruning, and distillation to improve model performance and efficiency.
When embedded vision applications efficiently capture, process, and act on visual data, companies can make better decisions based on real-time data and drive more value. It’s the milliseconds difference between an industrial robot hitting a moving obstacle and avoiding it.
Primary Challenges
With these skills, tools, and knowledge, manufacturers can develop embedded vision solutions at the edge and overcome these four challenges.
1. Real-time Image Processing Demands
Real-time image processing requirements expand with every new product, driven by the capabilities of high-resolution sensors and the market demand for advanced features. A single 4K camera stream operating at 30 frames per second (4Kp30) generates approximately 746 MB/s of raw data – before the computational overhead of computer vision algorithms and AI image processing.
AI model computations often contain millions of parameters that require substantial resources and software optimizations. This challenge is compounded further when dealing with multiple video streams that are common in applications like industrial automation and security systems.
Helpful Tip: Image processing pipelines must be designed to handle challenging environmental conditions, varying lighting situations, and complex scene dynamics. The choice of sensor plays a critical role in computational efficiency. For example, selecting a Sony CMOS sensor with built-in HDR and noise reduction capabilities reduces resource requirements in an already demanding processing pipeline.
2. Edge Device Implementation Hurdles
The vision hardware ecosystem has traditionally been a limiting factor in deploying edge-based embedded vision solutions. Implementing AI-intensive workloads can push power consumption and thermal management beyond the limits of many edge devices. These constraints are particularly problematic in applications like outdoor surveillance systems and drones, where systems must operate reliably on battery power under varying scene conditions.
Form factor constraints further complicate implementation. Many applications require compact cameras and processing components that can be deployed in space-constrained environments. This impacts the physical size of the vision system hardware and the design options for heat management and power delivery.
Helpful Tip: Overcoming these challenges requires a holistic approach that balances physical constraints with powerful processing capabilities. Energy-efficient hardware (e.g., ASICs, MPU’s, SOC’s or FPGAs with power-gating) paired with robust thermal management solutions address battery and size limitations, while model optimization techniques (pruning, quantization, DVFS) further reduce power draw. By adopting this end-to-end strategy, teams can deploy AI-intensive workloads on compact, energy-conscious devices without sacrificing performance or accuracy.
3. Considering Cloud Deployments as the Default
Despite advances in edge device designs, many engineers and product leaders believe that AI-intensive workloads can only run on cloud infrastructures.
Helpful Tip: While cloud computing offers virtually unlimited processing power, it’s worthwhile considering edge-based embedded vision devices to overcome latency issues, limited cloud network bandwidth and throughput, operating costs of cloud-based workloads, and reliance on specific cloud providers that can lead to vendor lock-in.
4. Underestimating the Roles of Strategic and Technical Partnerships
The history of computer vision shows that the successful implementation of advanced systems requires deep expertise across multiple domains. AI-based embedded vision is no different. The complexity of these systems makes it easy to underestimate the level of cooperation and intellectual property knowledge required for successful deployment.
General-purpose processors struggle to meet performance requirements while staying within power, thermal, and space constraints. Custom hardware solutions from a single vendor often come with prohibitive costs or development timelines that make them impractical for product roadmaps.
Helpful Tip: Realistic and achievable product development requires a multidisciplinary approach, with skills in hardware design, AI algorithms, image processing, VLM integration, and application-specific requirements. No organization possesses all the necessary knowledge in hardware architecture, AI algorithms, image processing, and application-specific requirements that are critical to choosing the right multi-vendor expertise that best fits the application.
Macnica teams with companies such as Sony Semiconductor, iENSO, Infineon, and Ambarella to help organizations achieve embedded vision on edge devices by converging two technologies: gen AI-based VLMs and purpose-built System-on-Chip (SoC) hardware.
This multidisciplinary collaboration ensures manufacturers get a range of video encoding, computer vision, and AI image processing options to improve image quality and reliability, reduce bandwidth requirements and power consumption, and enhance data privacy and security, among other benefits.
To learn more and dive deeper into embedded vision on edge devices, check out our white paper: Edge and Generative AI for Embedded Vision: Breaking Through Performance and Cost Barriers.
Sebastien Dignard
President, Macnica Americas