Silicon Valley based Diffbot Corporation is aiming to garner interest from web-centric developers with its visual content and layout recognition (VCLR) technology. Diffbot essentially provides an API that conveys a "visual understanding" of web content in a digitized sense.
The company says it has "categorized the web" into approximately 20 different page types, which can then be visually analyzed using layout and contextual cues. Its developers say that this technology has been built to "perceive context" in the same way that humans do; i.e., understanding common page layouts (like headlines, bylines, and articles), contextual keywords, and content changes buried deep within pages.
The resulting developer proposition is an invitation to build applications that will make a call to the Diffbot API and so "follow websites" to then reflect changes in the app in question. Essentially this is a tool to aggregate content and/or manipulate it for personalization in a developer's own application that may present it in a new structure or with some call to action.
Diffbot cofounder Mike Tung said that he came up with the idea while at college when he realized that he wanted to be instantly notified when new assignments were posted on class websites.
"Diffbot is an incredibly sophisticated tool for developers to rapidly build compelling applications around web content," said Sky Dayton, founder of EarthLink and Boingo, and investor in Diffbot. "The more developers use Diffbot, the more it learns about and adds structure to data on the web. This technology is becoming the basis for a new kind of web experience enhanced by machine interpretation of content."
Diffbot supplies natural language processing to cross-reference against Wikipedia in order to determine relevance by context and deliver keyword tags. For example, Diffbot can determine that an article about Barak Obama is related to "politics" even though the word doesn't appear in the article, or that an article about a new computer is about Apple the technology company and not apple the fruit.
Diffbot's technology consists of two types of APIs: one for headline and article content including pictures, and one to follow changes or updates made to any web page. Diffbot allows developers to build applications that can extract and analyze information displayed on an article page; understand key words and phrases in the context of the larger article; and generate tags to allow developers to categorize, sort, and personalize content. It can also generate an RSS feed enabling an application to follow anything on the Internet.