This article was originally published by Embedded Vision Alliance consultant Dave Tokic. It is reprinted here with Tokic's permission.
How one non-profit is helping the poorest by enabling cutting edge image search and self-driving cars
1.2 billion people (22 percent) live in extreme poverty on less than $1.25 a day worldwide and about 50% of the world’s population, or 2.7 billion people, live on a poverty income of $2.50 a day, according to the World Bank.
Grim statistics. But through policies and programs at the global and national level and efforts of non-profits and socially-minded entrepreneurs, progress is being made.
I recently had the opportunity to sit down with Wendy Gonzalez, the senior vice president & managing director at non-profit Samasource whom I was fortunate enough to meet at the Embedded Vision Summit this past May. Below is a video of their demo from the event:
We focused on how their impact sourcing model addresses the poverty challenge by helping companies make their cutting edge machine learning (e.g. artificial intelligence) systems smarter at understanding images and massive datasets.
Samasource is a pioneer in impact sourcing of technology services, the practice of hiring people from the bottom of the income pyramid and directly raising them out of poverty by providing digital work for companies like Google, Walmart, eBay, and many startups. According to Gonzalez, they provide jobs and digital skills training to people below the poverty line in Kenya, Uganda, Haiti, and India. Since the company started in 2008, Samasource has been able to hire nearly 8000 people. As a result, they’ve been able to transition themselves and their dependents out of poverty and provide living wages, transforming over 30,000 lives.
So where does machine learning and computer vision come in?
For those that aren’t aware of these technologies, machine learning is a form of artificial intelligence that mimics the brain through algorithms and large amounts of data to find hidden insights in the data without being specifically programmed where to look. It’s the latest technology behind Amazon and Netflix online recommendations, the Google self-driving car, speech recognition, financial “robo-advisors”, and fraud detection. Computer vision focuses on improving and interpreting still images and video frames to recognize and understand what’s in the scene. It’s used in self-driving cars and collision avoidance systems, augmented and virtual reality (AR / VR), intelligent surveillance, inspection machines, and allows smartphone cameras to identify all the smiling faces or apps that put puppy features on you in Snapchat.
Turns out the computer vision is a really good application of machine learning, especially Deep Neural Networks (DNNs). The results approach or even surpass humans in recognizing a scene or what’s in an image. The challenge is that in order to get these results, the DNNs need to be trained by ideally running millions of correctly tagged images or video frames through them. A great place to learn more is at the Embedded Vision Alliance website.
Getting images to train these systems for a specific application is fairly easy, but making sure those images are precisely segmented and tagged (this area outlined is a car, this is a person, this is the sky, this is a road, this is an English Bulldog vs. an American Bulldog…) is extremely intensive manual work.
This is where Samasource comes in. About two-thirds of their work is in managed services for image capture and annotation. They need a large pool of skilled, detail-oriented workers to identify the appropriate images or video frames and manually tag the scene, specific regions (a forest), or objects (a wolf) with the specific keywords the customer requires for their application.
Gonzalez remarked that their top objective is creating technology jobs for the impoverished while delivering the highest quality work in a cost effective manner. This guided Samasource to focus on east Africa, where there is a good baseline in English and the majority of the population has completed secondary or high school. They have a dedicated training organization that works within the local communities to identify qualified men, women and youth in need and assess them for key skills. For those with strong visual acuity, there is general computer and business skills training and a 3 week dedicated machine learning and image annotation training track with example projects and ongoing assessments that lead to graduation
As workers get moved to projects, there is another set of qualification tests and project-specific training. Once they pass those, they are moved into production for the project and their work is reviewed daily by the QA team to correct mistakes and also resolve interpretation or requirements issues (do you want to tag a traffic light facing away from the camera?). These reviews are done both locally and in the US, allowing real-time feedback with many checkpoints throughout the project lifecycle, reinforcing Samasource’s tightly managed approach to quality.
A Mountain of Data to Annotate
Gonzalez commented they typically start with a pilot project that outlines the minimum number of images needed to establish whether the algorithms are on the right track, usually between 5000-10,000 annotated images. Through the product lifecycle, this volume typically increases significantly. In a number of cases, it has grown to the hundreds of thousands or even millions of images, a massive amount of work required to provide the data needed to train these deep learning vision systems… one that requires a LOT of trained human resources. This data also has to be tagged at very high levels of quality. If the tagging has errors, the algorithm will “learn” the wrong things and not achieve the accuracy the customer is aiming for. A big problem for the massive investment companies are putting into developing these smart systems.
A key enabler for managing the workflow is their image annotation software. The platform loads the video frames or photos for a project and securely distributes the images to the workers, typically in their centers in east Africa and India. Annotations are based on very specific business rules from the client, which can be very fine grained and nuanced, e.g. how an object should be bound, annotate specific objects only larger than 5 by 1 pixels, etc. In some cases, customers are even asking to label every pixel in an entire image. In-process reviews are managed by the local team and QA leads that are walking the floor at the secure data centers where agents work full-time. There is additional staff to assist the reviews, especially for those with very complex rules, and ensure quality and accuracy levels are achieved. The team has a processfor measuring itself and quickly addressing issues, doing rework if necessary.
Identifying Cars, Faces, and… Elephant Butts
I asked about the types of programs Samasource has worked on. In addition to many cases of outlining automobiles, roads, and people in various poses and positions, they worked with Microsoft on Windows Hello, which provides facial recognition for you to unlock your computer or device by looking into the camera. A unique application was for Paul Allen’s Great Elephant Census project. Apparently the rear end of an elephant can be used to identify individual elephants, and this also allowed localization and tracking of unauthorized vehicles in the scene to detect signs of poaching. They are now working with a number of other groups to assist in similar conservation and sustainability projects, which would be incredibly difficult to achieve without computer vision automatically analyzing the huge volume of images.
Gonzalez closed by sharing a number of things that she’s learned in her year and a half at Samasource, and one theme was how things were both very easy and very hard at the same time. The purpose of what they’re doing is crystal clear and impactful, fostering incredibly passionate teamwork, efficiency, and personal reward. At the same time, trying to identify and establish worker populations in hard to reach locations with limited resources has been challenging.
She mentioned just coming back from north Uganda, where she was impressed by the interest level, attitude, and perseverance of the workers she met and with the level of performance by the team. It has required significant training to achieve, but Samasource also benefit from a very low turnover rate in what would normally be considered a rather monotonous job, retaining expertise at a dramatically higher level than traditional business process outsourcing companies.
Lastly, she commented that their image annotation business has grown 100% from the prior year and there is a very high demand for image capture and creating databases for various scenarios in a cost effective manner. For Samasource, this growing movement in the industry to integrate advanced vision technologies into their systems is thankfully expanding the number of people they can help rise out of poverty.
About the Author
I strongly believe that a huge wave of innovation and growth will come from the convergence of machine learning and machine vision technology deployed to the enterprise and consumers through connected IoT devices and strong ecosystems. I help companies succeed by creating winning products, go-to-market programs, and the targeted ecosystems necessary to accelerate revenue growth. Let me know how I can help you.