Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Channels ▼


Nuxeo Modules for Semantic Linking and Auto-Categorization

Nuxeo's open source Enterprise Content Management (ECM) platform has been ratcheted up a notch this week with the availability of two new modules. Both designed to use semantic technologies to automate linked data services, the new code blocks are available on the company's Nuxeo Marketplace EP-based repository — which in itself represents an ECM industry “app store” of sorts.

Nuxeo claims that the two new modules both use semantic technology to extend the body of information available on recognizable data entities in a content repository. They are available as optional extensions to Nuxeo Enterprise Platform (EP), or any of the products and frameworks based on Nuxeo EP, such as Nuxeo DM.

“The Automated Document Categorization” package: allows any Nuxeo EP-based content application to automatically complete document metadata for a newly created document based on the textual content of the electronic file. Metadata such as language, subject, geographic coverage includes elements from the Dublin Core metadata standard and from the Nuxeo application.

When a new document is added, the text is extracted then tokenized and each token counted to perform advanced statistical analysis to suggest the most likely categories for the metadata fields.

“The Semantic Linking” package: provides a call to the open source Apache incubator project now known as “Apache Stanbol.” Nuxeo has been an active contributor to this open source OSGi-based RESTful semantic engine project, established under the Interactive Knowledge Stack project (IKS) and formerly known as “FISE.” This semantic service analyzes document text to find notable people, places, or organizations using DBPedia, as an online reference knowledge base created from information extracted from Wikipedia.

The semantic engine identifies notable entities within the file text. An entity hub then enables access to related information, such as lists of other repository documents that reference the same entity, and descriptions and images from DBPedia. Entities can also be manually created, or manually linked from a document. News agencies, educational institutions, research firms or any organization needing quick, accurate identification of known personalities, organizations, or places, across large volumes of text, will benefit from this packaged module.

“Nuxeo has been contributing heavily to this exciting initiative and is pleased to see it gain momentum with acceptance as “Stanbol” under the Apache incubator program,” notes Eric Barroca, Nuxeo CEO. “This project is important in the evolution of content analytics, because it is available as open source, ensuring semantic linking capabilities can be embedded and used across a broad range of content-enabled applications either online or on-premise inside enterprises.”

Nuxeo Marketplace is a directory offering a range of packaged plug-ins, templates and applications created by Nuxeo developers, Nuxeo Galaxy partners, and customers.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.