Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Channels ▼


APIs for Image-Capture Applications

Online image-scanning applications continue to grow in importance. More organizations need to digitize documents for recordkeeping, safekeeping, and archiving among other reasons. As the practice of document digitizing grows, so does the need for greater options in image capturing. A useful image-capturing application should deliver simple editing and upload features. It should also enable the use of as many complimentary devices as possible: scanners, digital cameras, capture cards, webcams, and so on. Developers looking to build image-capture applications have several APIs from which they can choose: developing for TWAIN, for Windows Image Acquisition (WIA,) or for DirectShow, for example. All three of these APIs sit between applications and digital devices.

Overview of APIs

TWAIN, created by the nonprofit TWAIN Working group, iincludes support for devices such as scanners and digital cameras and is supported in operating systems such as Microsoft Windows, Mac OS X, and Linux. TWAIN is designed primarily for C/C++ development.

Figure 1: How TWAIN works between image acquisition applications and devices.

WIA is a Microsoft driver model and API for Microsoft Windows, which has been around since the days of Window Me, but also works with newer Windows operating systems. In Windows Me, WIA enabled graphics software to communicate with imaging hardware such as scanners, digital cameras, and digital video equipment. Since that release in 2000, Microsoft has steadily added features, including OLE integration. With the release of Windows Vista, however, WIA has been more tightly targeted towards scanners. WIA is also designed primarily for C/C++ development.

Figure 2: How Microsoft WIA works.

DirectShow is a multimedia framework and API produced by Microsoft. Software developers can use it to perform various operations with media files or media streams. DirectShow is a replacement for Video for Windows (VFW), also known as Video Compressions Manager (VCM). Most webcams, including FireWire cameras, support the interfaces of DirectShow. Developers should note that while USB Video Class (UVC) cameras have the most marketshare, FireWire cameras still occupy an important place in certain segments. For example, in security or industrial applications, users tend to prefer FireWire cameras to USB cameras.

DirectShow supports many file types, including: Advanced Systems Format (ASF), Windows Media Audio (WMA), Windows Media Video (WMV), AIFF, AU, Audio-Video Interleaved (AVI), MIDI, SND, and WAV. DirectShow is designed primarily for C++ development.

Comparing APIs

A popular misconception exists about TWAIN; namely, that it is too old for modern scanning. However, TWAIN, which was first developed in 1992, is actually a sophisticated API. Not only is it portable across many operating systems, it enables device vendors to create a customized user interface for each driver. In contrast, WIA uses a common user interface for all devices. TWAIN has three transfer modes (native, memory, file) while WIA has only two (memory, file). Additionally, WIA provides a TWAIN compatibility layer that allows TWAIN-aware applications to communicate with WIA devices.

TWAIN is typically an ideal choice for applications such as scanners, due to its flexibility and features. However, for webcams, WIA or DirectShow are more appealing due to a larger spectrum of supported devices. The newer the device, the more likely the device vendor is to support WIA or DirectShow over TWAIN compatibility.

DirectShow is perceived as one of the most complex Microsoft libraries. The use of an SDK will save developers a lot of work. Anyone who tries to explore the core interface of DirectShow will find it pretty difficult to learn, as mastery of complex intricacies, such as COM interfaces, is required. For this reason, many developers who choose to work with DirectShow turn to third-party SDKs to accelerate development and integration.

You can find several third-party development tools online with a simple search, inlcuding both open-source and commercial solutions.

Getting Started in Development Without an SDK

SDKs can significantly cut development time, but some folks will still chose to start from scratch. The benefit of going this route is greater flexibility to include and exclude various capabilities. When developing from scratch, note the following:

  • TWAIN: The TWAIN architecture consists of four layers: application, protocol, acquisition, and device. Developers get their applications to communicate with scanners and/or other devices through the protocol layer. You can find more information about all the available capabilities at the TWAIN website.
  • DirectShow: If you are planning to build a DirectShow application from scratch, you'll need at least basic knowledge of C++/COM programming.
  • WIA: WIA uses the Windows Driver Model (WDM) architecture. Application developers can use WIA to call a set of unique capabilities that enable an application to communicate with WIA-compliant devices already running on Windows.

In addition, take advantage of forums such as The TWAIN forum, Stack Overflow, and MSDN for tips on getting started and avoiding pitfalls. Whether using an SDK or developing from scratch, there is plenty of openly available knowledge to help you create your own image-capture applications. And as more organizations turn to digitization for document management, the need for these applications will continue to grow.

Catherine Sea is the customer service manager for Dynamsoft. She has also been a consultant helping programmers develop document-processing applications.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.