IBM is open sourcing its Unstructured Information Management Architecture (UIMA) in a bid to foster wider participation and create a new open standard.
UIMA is an IBM developed framework that has been in development and use since at least 2004. It is currently part of a number of IBM products including WebSphere OmniFind Edition.
Marc Andrews, IBM director of strategy and business development for content discovery, said UIMA is a framework that defines an implementation for plugging content sources together and providing a common structure for sharing information, sending in text and passing out the results. However, it doesn’t include the facilities to feed the content into it or what you do with the analysis results afterwards.
“If you look at companies today if they want to be able to leverage text analytics they have to manually tie together the analytics technology with their business applications,” Andrews told internetnews.com. “The goal of UMIA is to make it so you can plug and play those things.”
One goal behind the move is to drive standardization around the UIMA framework. The process of moving toward broader involvement and standardization has been underway for at least a year. The federal Defense Advanced Research Projects Agency (DARPA) is among the users of UIMA and in January 2005 sponsored a working group that consisted of a number of academic, medical and industrial institutions.
“They basically helped to evolve the architecture and take it to the next level,” Andrews said.
In order to truly make UIMA a standard, the working group concluded that the framework can’t just be an IBM project.
“Our goals around this is to really foster the standardization so that both us and other vendors and organizations can plug and play text analytics technologies to really deliver more advanced business applications and intelligence solutions,” Andrews added. “To do that we felt that we really should make this available to the open source community.”
The first part of open sourcing UIMA is that it now available for download on the popular SourceForge.net open source software repository. According to Andrews, SourceForge.net is the first open source home for UIMA; it may not be the only one.
“We’re not set on which model we will use, Andrews said. “There are a lot of open source development models out there like Apache, Eclipse or Sourceforge that can be fostered to develop a community and we haven’t yet determined which is the best environment for this framework.”
Though the code is available via SourceForge.net, the project is not yet a fully open collaborative project. For now, the only thing users can expect to be able to do is to download the code. According to Andrews, users will have to wait “a few months” until the project will start accepting contributions.