Channels ▼
RSS

Tools

New dtSearch Expands Proprietary Document Filters


More Insights

White Papers

More >>

Reports

More >>

Webcasts

More >>

Developer text retrieval software company dtSearch Corp has produced version 7.72 of its core product line so that expanded proprietary document filters are included.

For customers in need of data parsing, conversion, and extraction only, the dtSearch Engine (with APIs in native 64-bit/32-bit, Win/Linux C++, Java, and .NET through current versions) also provides the document filters for separate OEM licensing.

These document filters support data formats including web-ready static data. This covers integrated image and text support in HTML, XML/XSL, and PDF. Web-based dynamic data is also covered through the dtSearch Spider, so this covers integrated image and text support in PHP, ASP.NET, SharePoint, etc.

Through the dtSearch Engine APIs there is cover for SQL-type databases along with the full-text of BLOB data; all products support Access, XBASE, XML, CSV, and so on. Microsoft Office documents are also supported along with emails and email attachments in MS Exchange, Outlook (PST/MSG), Thunderbird (MBOX/EML), and other popular email types, including nested email attachments.

For all supported formats, the document filters support data parsing and optional extraction, as well conversion to HTML for browser display with highlighted hits.

dtSearch enterprise and developer products can index over a terabyte of data in a single index, spanning multiple directories, emails and attachments, online data, and other databases. The products can create and search any number of indexes.

Indexed search time is typically less than a second, even across terabytes of data. The product line also supports highly concurrent, multithreaded searching.

In terms of developer SDKs, the dtSearch Engine for Win & .NET and the dtSearch Engine for Linux make available dtSearch instant searching and document filters (both together with searching as well as available for separate licensing) for a wide range of Internet, Intranet, and other commercial applications. SDKs include native 64-bit and 32-bit C++, Java, and .NET (through current versions) APIs.


Related Reading






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video