Nuxeo's open source Enterprise Content Management (ECM) platform has been ratcheted up a notch this week with the availability of two new modules. Both designed to use semantic technologies to automate linked data services, the new code blocks are available on the company's Nuxeo Marketplace EP-based repository — which in itself represents an ECM industry “app store” of sorts.
Nuxeo claims that the two new modules both use semantic technology to extend the body of information available on recognizable data entities in a content repository. They are available as optional extensions to Nuxeo Enterprise Platform (EP), or any of the products and frameworks based on Nuxeo EP, such as Nuxeo DM.
“The Automated Document Categorization” package: allows any Nuxeo EP-based content application to automatically complete document metadata for a newly created document based on the textual content of the electronic file. Metadata such as language, subject, geographic coverage includes elements from the Dublin Core metadata standard and from the Nuxeo application.
When a new document is added, the text is extracted then tokenized and each token counted to perform advanced statistical analysis to suggest the most likely categories for the metadata fields.
“The Semantic Linking” package: provides a call to the open source Apache incubator project now known as “Apache Stanbol.” Nuxeo has been an active contributor to this open source OSGi-based RESTful semantic engine project, established under the Interactive Knowledge Stack project (IKS) and formerly known as “FISE.” This semantic service analyzes document text to find notable people, places, or organizations using DBPedia, as an online reference knowledge base created from information extracted from Wikipedia.
The semantic engine identifies notable entities within the file text. An entity hub then enables access to related information, such as lists of other repository documents that reference the same entity, and descriptions and images from DBPedia. Entities can also be manually created, or manually linked from a document. News agencies, educational institutions, research firms or any organization needing quick, accurate identification of known personalities, organizations, or places, across large volumes of text, will benefit from this packaged module.
“Nuxeo has been contributing heavily to this exciting initiative and is pleased to see it gain momentum with acceptance as “Stanbol” under the Apache incubator program,” notes Eric Barroca, Nuxeo CEO. “This project is important in the evolution of content analytics, because it is available as open source, ensuring semantic linking capabilities can be embedded and used across a broad range of content-enabled applications either online or on-premise inside enterprises.”
Nuxeo Marketplace is a directory offering a range of packaged plug-ins, templates and applications created by Nuxeo developers, Nuxeo Galaxy partners, and customers.


