http://incubator.apache.org/uima/downloads/releaseDocs/2.2.1-incubating/docs/html/index.html
Unstructured information represents the largest, the most current, and the fastest growing source of information that is available to businesses and governments. There are mounds of information that are hosted in enterprises across different media, for example, text, voice, or video. With an unstructured information management (UIM) application, you can analyze large volumes of unstructured information to discover, organize, and deliver relevant knowledge to decision makers.
The results of analyses must be put in structured forms so that powerful data-mining techniques and search technologies such as search engines, database engines, On-Line Analytical Processing (OLAP) tools, or Data Mining engines can be leveraged to efficiently find the concepts you need, when you need them.
These technologies are developed independently by highly specialized scientists and engineers who use different techniques, interfaces, and platforms.
The bridge from the unstructured world to the structured world is built through the composition and deployment of these analysis capabilities. The Unstructured Information Management Architecture (UIMA) is an architecture and software framework that helps you build that bridge. It supports creating, discovering, composing, and deploying a broad range of analysis capabilities and linking them to structured information services.
UIMA specifies component interfaces, data representations, design patterns, and development roles for creating, describing, discovering, composing, and deploying analysis capabilities.
The UIMA framework provides a run-time environment in which developers can plug in their UIMA component implementations and with which they can build and deploy UIM applications. The framework is not specific to any IDE or platform.
http://incubator.apache.org/uima/The UIMA SDK is a Java implementation of the UIMA framework. You can load your own UIMA compliant text-analysis modules and run them inside the InfoSphere Warehouse.