Terminology

Edge AI and Vision Industry Terminology

The edge AI and vision industry uses many terms, sometimes in an inconsistent manner, to describe its technologies and products

The list that follows is the Edge AI and Vision Alliance’s attempt to bring constancy to this terminology, by providing a short definition for each word or expression, in combination with an additional-information link. If you notice any missing terms or have suggestions to enhance an existing term’s name and/or definition, please send an email to the Edge AI and Vision Alliance with your inputs.

2-D Sensor: An image sensor that discerns the horizontal and vertical location of objects in front of it, but not their distance from it. For more information, see “Selecting and Designing with an Image Sensor: The Tradeoffs You’ll Need to Master“.

3-D Sensor: An image sensor that discerns not only objects’ horizontal and vertical locations, but also their distance (i.e. depth) from it, by means of techniques such as stereo sensor arrays, structured light, or time-of-flight. For more information, see “Image sensors evolve to meet emerging embedded vision needs“.

4-D Sensor: See Plenoptic Camera

Active-Pixel Sensor: An APS, also commonly known as a CMOS sensor, this image sensor type consists of an array of pixels, each containing a photo detector and active amplifier. An APS is typically fabricated on a conventional semiconductor process, unlike the CCD. For more information, see  “Selecting and Designing with an Image Sensor: The Tradeoffs You’ll Need to Master“.

Adaptive Cruise Control: An ADAS system that dynamically varies an automobile’s speed in order to maintain an appropriate distance from vehicles ahead of it. For more information, see “An Introduction to the Market for Embedded Vision in Automotive Driver Assistance Applications“.

ADAS: Advanced Driver Assistance Systems, an “umbrella” term used to describe various technologies used in assisting a driver in navigating a vehicle. Examples include:

  • In-vehicle navigation with up-to-date traffic information
  • Adaptive cruise control
  • Lane departure warning
  • Lane change assistance
  • Collision avoidance
  • Intelligent speed adaptation/advice
  • Night vision
  • Adaptive headlight control
  • Pedestrian protection
  • Automatic parking (or parking assistance)
  • Traffic sign recognition
  • Blind spot detection
  • Driver drowsiness detection
  • Inter-vehicular communications, and
  • Hill descent control

For more information, see “An Introduction to the Market for Embedded Vision in Automotive Driver Assistance Applications“.

Algorithm: A method, expressed as a list of instructions, for calculating a function. Beginning with an initial state and initial input, the instructions describe a computation that, when executed, will proceed through a finite number of defined states, eventually producing an output and terminating at a final state. For more information, see Wikipedia’s entry.

Application Processor: A highly integrated system-on-chip, typically comprise a high-performance CPU core and a constellation of specialized co-processors, which may include a DSP, a GPU, a video processing unit (VPU), an image acquisition processor, etc. The specialized co-processors found in application processors are usually not user-programmable, which limits their utility for vision applications. For more information, see “Implementing Vision Capabilities in Embedded Systems“.

Augmented Reality: A live view of a physical, real-world environment whose elements are augmented by computer-generated sensory input such as sound, video, graphics or GPS data. The technology functions by enhancing one’s current perception of reality. In contrast, virtual reality replaces the real world with a simulated one. For more information, see Wikipedia’s entry.

Analytics: The discovery, analysis and reporting of meaningful patterns in data. With respect to embedded vision, the data input consists of still images and/or video frames. For more information, see Wikipedia’s entry.

API: Application Programming Interface, a specification intended for use as an interface to allow software components to communicate with each other. An API is typically source code-based, unlike an ABI (Application Binary Interface) which, as its name implies, is a binary interface. For more information, see Wikipedia’s entry.

Background Subtraction: A computational vision process that involves extracting foreground objects in a particular scene, in order to improve the subsequent analysis of them. For more information, see these Google search results.

Barrel Distortion: An optical system distortion effect that causes objects to become “spherised” or “inflated”, i.e. resulting in the bulging outward of normally straight lines at image margins. Such distortion is typically caused by wide-angle lenses, such as the fisheye lenses commonly found in automotive backup cameras. Embedded vision techniques can be used to reduce or eliminate barrel distortion effects. For more information, see “Lens Distortion Correction“.

Bayer Pattern: A common color filter pattern used to extract chroma information from a nominally monochrome photo detector array, via filters placed in front of the image sensor. The Bayer Pattern contains twice as many green filters as either red or blue filters, mimicking the physiology of the human eye, which is most sensitive to green-frequency light. Interpolation generates an approximation of the remainder of each photo detector’s full color spectrum. For more information, see  “Selecting and Designing with an Image Sensor: The Tradeoffs You’ll Need to Master“.

Biometrics: The identification of humans by their characteristics or traits. Embedded vision-based biometric schemes include facial recognition, fingerprint matching, and retina scanning. For more information, see Wikipedia’s entry.

Camera: A device used to record and store images; still, video, or both. Cameras typically contain several main subsystems; an optics assembly, an image sensor, and a high-speed data transfer bus to the remainder of the system. Image processing can occur in the camera, the system, or both. Cameras can also include supplemental illumination sources. For more information, see Wikipedia’s entry.

CCD: A charge-coupled device, used to store and subsequently transfer charge elsewhere for digital-value conversion and other analysis purposes. CCD-based image sensors employ specialized analog semiconductor processes and were the first technology to achieve widespread usage. They remain popular in comparatively cost-insensitive applications where high-quality image data is required, such as professional, medical, and scientific setting. For more information, see “Selecting and Designing with an Image Sensor: The Tradeoffs You’ll Need to Master“.

CImg: An open-source C++ toolkit for image processing, useful in embedded vision implementations. For more information, see the CImg Library website.

CMOS Sensor: See Active-Pixel Sensor

Collision Avoidance: An ADAS system that employs embedded vision, radar and/or other technologies to react to an object ahead of a vehicle. Passive collision avoidance systems alert the driver via sound, light, vibration of the steering wheel, etc. Active collision avoidance systems override the driver’s manual control of the steering wheel, accelerator and/or brakes in order to prevent a collision. For more information, see “An Introduction to the Market for Embedded Vision in Automotive Driver Assistance Applications“.

Computer Vision: The use of digital processing and intelligent algorithms to interpret meaning from images or video. Computer vision has mainly been a field of academic research over the past several decades. For more information, see the Wikipedia entry.

Contour: One of a number of algorithms which finds use in delineating the outline of an object contained within a 2-D image. For more information, see the Wikipedia entry.

Core Image: The pixel-accurate non-destructive image processing technology in Mac OS X (10.4 and later) and iOS (5 and later). Implemented as part of the QuartzCore framework, Core Image provides a plugin-based architecture for applying filters and effects within the Quartz graphics rendering layer. For more information, see the Wikipedia entry.

CPU: Central Processing Unit, the hardware within a computer system which carries out program instructions by performing basic arithmetical, logical, and input/output operations of the system. Two common CPU functional units are the arithmetic logic unit (ALU), which performs arithmetic and logical operations, and the control unit (CU), which extracts instructions from memory and decodes and executes them. For more information, see “Implementing Vision Capabilities in Embedded Systems“.

CUDA: Compute Unified Device Architecture, a parallel computing “engine” developed by NVIDIA, found in graphics processing units (GPUs), and accessible to software developers through variants of industry standard programming languages. Programmers use “C for CUDA” (C with NVIDIA extensions and certain restrictions), compiled through a PathScale or Open64 C compiler, to code algorithms for execution on the GPU. AMD’s competitive approach is known as Stream. For more information, see NVIDIA’s product page.

Development Tools: Programs and/or applications that software developers use to create, debug, maintain, or otherwise support other programs and applications. Integrated Development Environments (IDEs) combine the features of many tools into one package. For more information, see the Wikipedia entry.

DirectCompute: an application programming interface (API) that supports general-purpose computing on graphics processing units on Microsoft Windows Vista and Windows 7. DirectCompute is part of the Microsoft DirectX collection of APIs and was initially released with the DirectX 11 API but runs on both DirectX 10 and DirectX 11 graphics processing units. For more information, see the Wikipedia entry.

DSP: Digital Signal Processor, a specialized microprocessor with an architecture optimized for the fast operational needs of digital signal processing. Digital signal processing algorithms typically require a large number of mathematical operations to be performed quickly and repeatedly on a set of data. Many DSP applications have constraints on latency; that is, for the system to work, the DSP operation must be completed within some fixed time, and deferred (or batch) processing is not viable. For more information, see “Implementing Vision Capabilities in Embedded Systems“.

Edge Detection: A fundamental tool in image processing, machine vision and computer vision, particularly in the areas of feature detection and feature extraction, which aim at identifying points in a digital image at which the image brightness changes sharply or, more formally, has discontinuities. For more information, see “Introduction To Computer Vision Using OpenCV“.

Embedded Vision: The merging of two technologies: embedded systems and computer vision. An embedded system is any microprocessor-based system that isn’t a general-purpose computer.  Computer vision is the use of digital processing and intelligent algorithms to interpret meaning from images or video. Computer vision has mainly been a field of academic research over the past several decades. Today, however, due to the emergence of very powerful, low-cost, and energy-efficient processors, it has become possible to incorporate vision capabilities into a wide range of embedded systems. For more information, see “Challenges to Embedding Computer Vision“.

Emotion Discernment: Using embedded vision image processing to discern the emotional state of a person in front of a camera, by means of facial expression, skin color and pattern, eye movement, etc. One rudimentary example of the concept is the ‘smile’ feature of some cameras, which automatically takes a picture when the subject smiles. For more information, see “Emotion Recognition from Arbitrary View Facial Images” (PDF).

Epipolar Geometry: The geometry of stereo vision. When two cameras view a 3D scene from two distinct positions, a number of geometric relations exist between the 3-D points and their projections onto the 2-D images that lead to constraints between the image points. Epipolar geometry describes these relations between the two resulting views. For more information, see the Wikipedia entry.

Face Detection: Using embedded vision algorithms to determine that one or multiple human (usually) faces are present in a scene, and then taking appropriate action. A camera that incorporates face detection features might, for example, adjust focus and exposure settings for optimum image capture of people found in a scene. For more information, see “Design Guidelines for Embedded Real-Time Face Detection Applications“.

Face Recognition: An extrapolation of face detection, which attempts to recognize the person or people in an image. In the most advanced case, biometric face recognition algorithms might attempt to explicitly identify an individual by comparing a captured image against a database of already identified faces. On a more elementary level, face recognition can find use in ascertaining a person’s age, gender, ethnic orientation, etc. For more information, see the Wikipedia entry.

FPGA: Field Programmable Gate Array, an integrated circuit designed for configuration by a customer after manufacturing. The FPGA configuration is generally specified using a hardware description language (HDL), similar to that used for an application-specific integrated circuit (ASIC). FPGAs contain programmable logic components called “logic blocks”, and a hierarchy of reconfigurable interconnects that allow the blocks to be “wired together”. Logic blocks can be configured to perform complex combinational functions, or merely simple logic gates like AND and XOR. In most FPGAs, the logic blocks also include memory elements, which may be simple flip-flops or more complete blocks of memory. For more information, see “Implementing Vision Capabilities in Embedded Systems“.

Framework: A universal, reusable software platform used to develop applications, products and solutions. Frameworks include support programs, compilers, code libraries, an application programming interface (API) and tool sets that bring together all the different components needed to enable development of a project or solution. For more information, see the Wikipedia entry.

Function: Also known as a subroutine, a segment of source code within a larger computer program that performs a specific task and is relatively independent of the remaining code. For more information, see the Wikipedia entry.

Fusion: AMD’s brand for the combination of a CPU and GPU on a single integrated piece of silicon, with the GPU intended to implement general-purpose operations beyond just graphics processing. For more information, see AMD’s product page.

Gaze Tracking: Also known as eye tracking, the process of measuring the eye position and therefore the point of gaze (i.e. where the subject is looking). Embedded vision-based gaze tracking systems employ non-contact cameras in conjunction with infrared light reflected from the eye. Gaze tracking can be used as a computer user interface scheme, for example, with cursor location and movement that tracks eye movement, and it can also be used to assess driver alertness in ADAS applications. For more information, see the Wikipedia entry.

Gesture Interface: The control of a computer or other electronic system by means of gestures incorporating the position and movement of fingers, hands, arms and other parts of the human body. Successive images are captured and interpreted via embedded vision cameras. Conventional 2-D image sensors enable elementary gesture interfaces; more advanced 3-D sensors that discern not only horizontal and vertical movement but also per-image depth (distance) allow for more complex gestures, at the tradeoffs of increased cost and computational requirements. For more information, see “Vision-Based Gesture Recognition: An Ideal Human Interface for Industrial Control Applications“.

GPGPU: General-Purpose Computing on Graphics Processing Units, the design technique of using a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditionally handled by the central processing unit (CPU). For more information, see the Wikipedia entry.

GPU: Graphics Processing Unit, a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the building of images in a frame buffer intended for output to a display. GPUs are very efficient at manipulating computer graphics, and their highly parallel structure makes them more effective than general-purpose CPUs for algorithms where processing of large blocks of data is done in parallel. For more information, see “Implementing Vision Capabilities in Embedded Systems“.

HDR: High dynamic range imaging, a set of methods used to allow a greater dynamic range between the lightest and darkest areas of an image. This wide dynamic range allows HDR images to represent more accurately the range of intensity levels found in real scenes. For more information, see “HDR Sensors for Embedded Vision“.

Image Processor: A specialized digital signal processor used as a component of a digital camera. The image processing engine can perform a range of tasks, including Bayer-to-full RGB per-pixel transformation, de-mosaic techniques, noise reduction, and image sharpening. For more information, see “How Does Camera Performance Affect Analytics?

Image Search: The process of searching through a database of existing images to find a match between objects contained within one/some of them (such as a face) and content in a newly captured image. For more information, see the Wikipedia entry.

Image Sensor: A semiconductor device that converts an optical image into an electronic signal, commonly used in digital cameras, camera modules and other imaging devices. The most common image sensors are charge-coupled device (CCD) and complementary metal–oxide–semiconductor (CMOS) active pixel sensors. For more information, see “Selecting and Designing with an Image Sensor: The Tradeoffs You’ll Need to Master“.

Image Warping: The process of digitally manipulating an image such that any shapes portrayed in the image are notably altered. In embedded vision applications, warping may be used either for correcting image distortion or to further distort an image as a means of assisting subsequent processing. For more information, see “Lens Distortion Correction“.

IMGLIB: Image Processing Library, Texas Instruments’ DSP-optimized still image processing function library for C programmers. For more information, see Texas Instruments’ product page.

Industrial Vision: See Computer Vision

Infrared Sensor: An image sensor that responds to light in the infrared (and near-infrared, in some cases) frequency spectrum. The use of infrared light transmitters to assist in determining object distance from a camera can be useful in embedded vision applications because infrared light is not visible to the human eye. However, ambient infrared light in outdoor settings, for example, can interfere with the function of infrared-based embedded vision systems. For more information, see “Image sensors evolve to meet emerging embedded vision needs“.

Intelligent Video: A term commonly used in surveillance systems, it comprises any solution where the system automatically performs an analysis of the captured video. For more information, see “An Introduction to the Market for Embedded Vision in Security and Business Intelligence Applications“.

IPP: Intel’s Integrated Performance Primitives, a library of software functions for multimedia, data processing, and communications applications. Intel IPP offers thousands of optimized functions covering frequently used fundamental algorithms. For more information, see Intel’s product page.

Kinect: a motion sensing input add-on peripheral developed by Microsoft for the Xbox 360 video game console and Windows PCs. Kinect enables users to control and interact with the system without the need to touch a game controller, through a natural user interface using gestures and spoken commands. For more information, see the Wikipedia entry.

Lane Transition Alert: An ADAS system that employs embedded vision and/or other technologies to react to a vehicle in the process of transitioning from one roadway lane to another, or off the roadway to either side. Passive lane transition alert systems alert the driver via sound, light, vibration of the steering wheel, etc. Active collision avoidance systems override the driver’s manual control of the steering wheel in order to return the vehicle to the previously occupied roadway lane. For more information, see “An Introduction to the Market for Embedded Vision in Automotive Driver Assistance Applications“.

Lens Distortion Correction: Employs embedded vision algorithms to compensate for the image distortions caused by sub-optimal optics systems or those with inherent deformations, such as the barrel distortion of fisheye lenses. For more information, see “Lens Distortion Correction“.

Library: A collection of resources used by programs, often to develop software. Libraries may include configuration data, documentation, help data, message templates, pre-written code and subroutines, classes, values and type specifications. Libraries contain code and data that provide services to independent programs. These resources encourage the sharing and changing of code and data in a modular fashion, and ease the distribution of the code and data. For more information, see the Wikipedia entry.

Machine Vision: See Computer Vision

Middleware: Computer software that provides services to software applications beyond those available from the operating system. Middleware, which can be described as “software glue,” makes it easier for software developers to perform communication and input/output, so they can focus on the specific purpose of their application. For more information, see the Wikipedia entry.

Microprocessor: See CPU

Motion Capture: Also known as motion analysis, motion tracking, and mocap, the process of recording movement of one or more objects or persons. It is used in military, entertainment, sports, and medical applications, and for validation of computer vision and robotics. In filmmaking, and games, it refers to recording the movements (but not the visual appearance) of human actors via image samples taken many times per second, and using that information to animate digital character models in 2D or 3D computer animation. When it includes face and fingers or captures subtle expressions, it is often referred to as performance capture. For more information, see the Wikipedia entry.

NI Vision for LabVIEW: National Instruments’ configuration software and programming libraries that assist in building imaging applications. It comprises NI Vision Builder for Automated Inspection and the NI Vision Development Module, the latter a comprehensive library with hundreds of scientific imaging and machine vision functions that you can program using NI LabVIEW software and several text-based languages. For more information, see National Instruments’ product page.

NPP: The NVIDIA Performance Primitives library, a collection of GPU-accelerated image, video, and signal processing functions. NPP comprises over 1,900 image processing primatives and approximately 600 signal processing primitives. For more information, see NVIDIA’s product page.

Object Tracking: The process of locating a moving object (or multiple objects) over time using a camera. The objective is to associate target objects in consecutive video frames. However, this association can be especially difficult when the objects are moving faster than the frame rate. Another situation that increases the complexity of the problem is when the tracked object changes orientation over time. For more information, see the Wikipedia entry.

OCR: Optical Character Recognition, the conversion of scanned images of handwritten, typewritten or printed text into machine-encoded text. OCR is widely used as a form of data entry from some sort of original paper data source, whether documents, sales receipts, mail, or any number of printed records. It is crucial to the computerization of printed texts so that they can be electronically searched, stored more compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech and text mining. For more information, see the Wikipedia entry.

OpenCL: Open Computing Language, a framework for writing programs that execute across heterogeneous platforms consisting of central processing unit (CPUs), graphics processing unit (GPUs), and other processors. OpenCL includes a language (based on C99) for writing kernels (functions that execute on OpenCL devices), plus application programming interfaces (APIs) that are used to define and then control the platforms. OpenCL provides parallel computing using task-based and data-based parallelism. For more information, see the Khronos product page.

OpenCV: a library of programming functions mainly aimed at real-time image processing, originally developed by Intel, and now supported by Willow Garage and Itseez. It is free for use under the open source BSD license. The library is cross-platform. For more information, see “Introduction To Computer Vision Using OpenCV“.

OpenGL: Open Graphics Library, a standard specification defining a cross-language, multi-platform API for writing applications and simulating physics, that produce 2D and 3D computer graphics. The interface consists of over 250 different function calls, which can be used to draw complex three-dimensional scenes from simple primitives. OpenGL functions can also be used to implement some GPGPU operations. For more information, see the Khronos product page.

OpenNI: Open Natural Interaction, an industry-led, non-profit organization focused on certifying and improving interoperability of natural user interface and organic user interface for natural interaction devices, applications that use those devices, and middleware that facilitates access and use of such devices. For more information, see the organization website.

OpenVL: A modular, extensible, and high performance library for handling volumetric datasets. It provides a standard, uniform, and easy to use API for accessing volumetric data. It allows the volumetric data to be laid out in different ways to optimize memory usage and speed. It supports reading/writing of volumetric data from/to files in different formats using plugins. It provides a framework for implementing various algorithms as plugins that can be easily incorporated into user applications. The plugins are implemented as shared libraries, which can be dynamically loaded as needed. OpenVL software is developed openly and is freely available on the web. For more information, see the OpenVL library website.

Operating System: A set of software that manages computer hardware resources and provides common services for computer programs. For hardware functions such as input and output and memory allocation, the operating system acts as an intermediary between programs and the computer hardware, although the application code is usually executed directly by the hardware and will frequently make a system call to an operating system function, or be interrupted by it. For more information, see the Wikipedia entry.

Optical Flow: The pattern of apparent motion of objects, surfaces, and edges in a visual scene caused by the relative motion between an observer (an eye or a camera) and the scene. Optical flow techniques such as motion detection, object segmentation, time-to-collision and focus of expansion calculations, motion compensated encoding, and stereo disparity measurement utilize the motion of the objects’ surfaces and edges. For more information, see “Demonstration of Optical Flow algorithm on an FPGA“.

Photogrammetry: The practice of determining the geometric properties of objects from photographic images. Algorithms for photogrammetry typically express the problem as that of minimizing the sum of the squares of a set of errors. This minimization is known as bundle adjustment and is often performed using the Levenberg–Marquardt algorithm. For more information, see the Wikipedia entry.

Pincushion Distortion: The opposite of barrel distortion; image magnification increases with the distance from the optical axis. The visible effect is that lines that do not go through the centre of the image are bowed inwards, towards the centre of the image, like a pincushion. Embedded vision techniques can be used to reduce or eliminate pincushion distortion effects. For more information, see “Lens Distortion Correction“.

Plenoptic Camera: Also known as a light-field camera, it uses an array of microlenses, at the focal plane of the main lens and slightly ahead of the image sensor, to capture light field information about a scene. The displacement of image parts that are not in focus can be analyzed and depth information can be extracted. Such a camera system can therefore be used to refocus an image on a computer after the picture has been taken. For more information, see “Image sensors evolve to meet emerging embedded vision needs“.

Point Cloud: A set of vertices in a three-dimensional system, usually defined by X, Y, and Z coordinates, and typically intended to be representative of the external surface of an object. Point clouds are often created by 3D scanners. These devices measure in an automatic way a large number of points on the surface of an object, and often output a point cloud as a data file. The point cloud represents the set of points that the device has measured. For more information, see the Wikipedia entry.

Processor: See CPU

Reference Design: A technical blueprint of a system that is intended for others to copy. It contains the essential elements of the system; recipients may enhance or modify the design as required. Reference designs enable customers to shorten their time to market, thereby supporting the development of next generation products using latest technologies. The reference design is proof of the platform concept and is usually targeted for specific applications. Hardware and software technology vendors create reference designs in order to increase the likelihood that their products will be used by OEMs, thereby resulting in a competitive advantage for the supplier. For more information, see the Wikipedia entry.

Resolution: The amount of detail that an image holds. Higher resolution means more image detail. Resolution quantifies how close lines can be to each other and still be visibly resolved. Resolution units can be tied to physical sizes (e.g. lines per mm, lines per inch), to the overall size of a picture (lines per picture height), to angular subtenant, or to the number of pixels in an image sensor. Line pairs are sometimes used instead of lines; a line pair comprises a dark line and an adjacent light line. For more information, see “Selecting and Designing with an Image Sensor: The Tradeoffs You’ll Need to Master“.

SoC: System-On-A-Chip, an integrated circuit (IC) that integrates most if not all components of an electronic system into a single chip. It may contain digital, analog, mixed-signal, and often radio-frequency functions—all on a single chip substrate. For more information, see “Implementing Vision Capabilities in Embedded Systems“.

Stereo Vision: The use of multiple cameras, each viewing a scene from a slightly different perspective, to discern the perceived depth of various objects in the scene. Stereo vision is employed by the human vision system via the two eyes. The varying perspective of each camera is also known as binocular disparity. For more information, see “Image sensors evolve to meet emerging embedded vision needs“.

Stream: A set of hardware and software technologies originally developed by ATI Technologies and now managed by acquiring company AMD (and re-named App). Stream enables AMD graphics processors (GPUs), working with the system’s central processor (CPU), to accelerate applications beyond just graphics (i.e. GPGPU). The Stream Software Development Kit (SDK) allows development of applications in a high-level language called Brook+. Brook+ is built on top of ATI Compute Abstraction Layer (CAL), providing low-level hardware control and programmability. NVIDIA’s competitive approach is known as CUDA. For more information, see AMD’s product page.

Structured Light: A method of determining the depths of various objects in a scene, by projecting a predetermined pattern of light onto the scene for the purpose of analysis. 3-D sensors based on the structured light method use a projector to create the light pattern and a camera to sense the result. In the case of the Microsoft Kinect, the projector employs infrared light. Kinect uses an astigmatic lens with different focal lengths in the X and Y direction. An infrared laser behind the lens projects an image consisting of a large number of dots that transform into ellipses, whose particular shape and orientation in each case depends on how far the object is from the lens. For more information, see “Image sensors evolve to meet emerging embedded vision needs“.

Surveillance System: A camera-based system that implements scene monitoring and security functions. Historically, surveillance systems’ camera outputs were viewed by humans via television monitors. Embedded vision-based surveillance systems are now replacing the often-unreliable human surveillance factor, via automated ‘tripwire’, facial detection and other techniques. For more information, see “An Introduction to the Market for Embedded Vision in Security and Business Intelligence Applications“.

Time-of-Flight: A method of determining the depths of various objects in a scene. A time-of-flight camera contains an image sensor, a lens and an active illumination source. The camera derives distance from the time it takes for projected light to travel from the transmission source to the object and back to the image sensor. The illumination source is typically either a pulsed laser or a modulated beam, depending on the image sensor type employed in the design. For more information, see “Image sensors evolve to meet emerging embedded vision needs“.

Video Analytics: Also known as video content analysis, it is the capability of automatically analyzing video to detect and determine temporal events not based on a single image. The algorithms can be implemented on general-purpose computers or specialized embedded vision systems. Functions that can be implemented include motion detection against a fixed background scene. More advanced functions include tracking, identification, behavior analysis, and other forms of situation awareness. For more information, see the Wikipedia entry.

VLIB: Video Processing Library, Texas Instruments’ DSP-optimized video processing function library for C programmers. For more information, see Texas Instruments’ product page.

VXL: a collection of open source C++ libraries for vision applications. The intent is to replace X with one of many letters, i.e. G (VGL) is a geometry library, N (VNL) is a numerics library, I (VIL) is an image processing library, etc. These libraries can also be used for general scientific computing applications. For more information, see the VXL library website.

Here you’ll find a wealth of practical technical insights and expert advice to help you bring AI and visual intelligence into your products without flying blind.

Contact

Address

Berkeley Design Technology, Inc.
PO Box #4446
Walnut Creek, CA 94596

Phone
Phone: +1 (925) 954-1411
Scroll to Top