Researchers at the University of California at San Diego have a plan to meld the brains of Internet users into a vast human grid that would make use of the seconds wasted on solving CAPTCHAs
Likely familiar to any frequent Web user, CAPTCHAs are those difficult-to-see images comprised of squiggly letters and lines designed to confound blog spam bots and the like. Blogs and online forums typically use codes hidden in CAPTCHAs to prove that a poster is a human, rather than an automated program; ideally, a human user can see and enter a CAPTCHA’s hidden code, while an automated program cannot.
While finding a hidden CAPTCHA code may take only a couple of seconds, when multiplied by the millions of other Internet users also responding to CAPTCHAs, those seconds can add up to hundreds of wasted hours.
The Soylent Grid project wants to apply those wasted seconds to identifying images for assistive technology applications.
The project, named in reference to the 1973 Charlton Heston film Soylent Green and its famous phrase, “Soylent Green is people!”, is already well on its way toward developing ways to make use of time that would normally be spent on CAPTCHAs.
Soylent Grid’s first application to harness tiny bits of Internet users’ attention is GroZi Shopping Assistant, a program that helps visually impaired people with the difficult task of locating objects in stores.
A joint mission between the California Institute of Telecommunications and Information Technology (CalIT2) and UCSD’s Computer Science and Engineering departments, GroZi would use the Soylent Grid project to funnel to Web users images taken by visually impaired people, who can then identify the objects in those images.
GroZi relies on a wearable system with a camera and tactile/haptic feedback, a blind-accessible interface and computer vision-based object recognition software.
GroZi’s human need
But without Soylent Grid’s human factor, GroZi faces difficult technical hurdles. Recognizing content in digital images has long been a nut difficult for computer science to crack. The human brain, on the other hand, is superb at recognizing content in images, knowing immediately which object in a family photo is Uncle Sean and which is the family dog.
For the GroZi prototype, it took developer Michele Merler, now at Columbia University, weeks to input the 120 products found in a single 45-minute video. To make the system truly useful, however, GroZi would need to be able to decipher the staggering array of items available in modern stores within seconds.
Enter Soylent Grid. Instead of building an image database item by item, the project could take advantage of time spent identifying CAPTCHAs. In such a scenario, the system could test a user attempting trying to post on a blog by asking them to decipher a GroZi photo instead of a traditional CAPTCHA.
The idea is to do this in real time, so that a visually impaired person at a grocery store could use GroZi to tell the corn niblets from the creamed corn.
“The currently used types of CAPTCHAs are a complete waste once they go stale,” said Stephen Belongie, the UCSD professor who heads the project. “They’re totally artificial, and when hackers crack them, their approach is invariably ‘hacky’ and neither reveals any insight into human object recognition nor does it do any good for society as a whole.”
While efforts in other industries are being made to improve image recognition — search engines, for example, are interested in image-recognition technology to improve their search results — a human-powered system like the GroZi-Soylent Grid effort could vastly improve the lives of the blind and vision-impaired.
Researchers outlined the benefits (PDF file) of Soylent Grid earlier this week in a paper presented at the Interactive Computer Vision 2007 conference in Rio de Janeiro.
Soylent Grid, GroZi, and CAPTCHAs
For a Soylent Grid/GroZi combination to make an impact, however, the service would need to partner with one or more online entities that make heavy use of CAPTCHAs, such as blogging platforms or social media sites.
For example, the Soylent Grid team estimates that Digg users could identify an image approximately every 17 seconds. That’s far from fast enough for someone hurrying through their shopping. Belongie estimates that five seconds would be an acceptable turnaround time, so GroZi would need 25 times the CAPTCHA-producing power of Digg.
Image-recognition as part of a live video feed is an even more remote possibility.
“The idea of doing real-time object recognition on a live video stream is at the fantasy end of the Soylent Grid spectrum,” Belongie said in an e-mail interview. “In reality, we expect it will be more likely to have an increased role for the computational processing component, so that the available human cycles are employed more opportunistically.
Ideally, every product identified would go into a standalone database, eventually enabling quicker lookups for GroZi that wouldn’t require the input of Web users. This ultimately could allow the Soylent Grid project to be harnessed for other endeavors.
“If the initial GroZi box had some amount of computational power … it could be pulled off the grid and run locally on the GroZi box in the user’s hand, or run remotely on a private-GroZi-only computational system of much smaller scale,” said Stephan Steinbach, another Soylent Grid project member.
Soylent Grid is an example of crowdsourcing, the notion of bringing together masses of users to accomplish what no individual or company could.
HumanGrid, launched in December 2005, is another example of applying crowdsourcing to labor-intensive tasks. The HumanGrid marketplace, in private beta, aims to introduce businesses and researchers to individuals willing to perform micro-tasks for micro-payments, such as data enhancement, text classification, transcription and picture classification.
Mechanical Turk service is another
example; It’s a similar automated marketplace where businesses can offer to pay humans to do tasks like tagging objects found in images or
selecting the best photos of a product from a set of images.
The e-tailing giant developed the technology to help sort out the 20
million photos of storefronts to be used in its A9 Yellow
Pages local search product.
Belongie said that Soylent Grid has a better chance of succeeding
because its strategy of distributing the work via third-party sites
creates an ecosystem.
“For all three main parties involved — the researchers, the Web sites, and the users — there’s something in it for them,” he said. Researchers, whether academic or commercial, “have data they need labeled, for which one assumes they’d be willing to pay … for example, someone wanting to spot pizza storefronts or real estate posters in Google Street views footage.”
“Web site owners want a fresh source of CAPTCHAs, since the ones they use routinely go stale, meaning they get cracked by hackers in the Ukraine,” he added. “And the users simply want to get to whatever content lies behind the CAPTCHA.”