Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Web Development

Building a Smart Online Video Application


Dr. Dobb's Journal December 1997: Building a Smart Online Video Application

Robin is a program manager at a major systems integration company where he designs multimedia software. He can be contacted at [email protected].


Many of today's applications involve Internet protocols, MPEG-1 video, JPEG images, ActiveX components, Netscape plug-ins, and other hard-to-master programming paradigms. Each of these technologies is complex, making it a challenge to build real-time systems that integrate all (or even some) of them.

In this article, I'll describe a new type of application I call "Smart VCR" (SVCR), which integrates these disparate technologies. SVCR is a video-recording technology that enables real-time searches of broadcast television, thereby combining the ubiquity of existing TV channels with the convenient interactive interface of a web browser. An SVCR watches TV for you, and can send a notification to you by e-mail or pop up a window when it finds a topic that is interesting to you.

SVCR captures the closed-captioned text that is encoded in most U.S. television broadcasts and converts that text into program transcripts formatted in HTML. As it scans the closed-captioned text, SVCR searches for clues as to when video segments start and end. In the process, it stores the closed-captioned text into a real-time database and snaps JPEG thumbnail images of the streaming video to become icons and filmstrips. Finally, it captures MPEG-1 video clips at a nominal data rate of about ten MB per minute.

The result is a web page (like Figure 1) that lists TV news stories in chronological order, with links to the full transcripts, filmstrips of the video images, and hyperlinks to the actual video clips. A Netscape plug-in or CGI program provides a browser interface to the transcript database, enabling users to keyword search the transcripts, retrieving associated images and video.

SVCR Overview

While an SVCR could be implemented in many ways, my design has four major hardware components:

  • A Pentium-based PC running Windows 95.
  • An MPEG-1 encoder card.
  • A JPEG encoder device.
  • A closed-captioned decoder box.

Although its name suggests otherwise, an SVCR isn't necessarily a VCR. Over the past several years, the video post production industry has been moving away from video tape to nonlinear editing systems such as the Avid (http://www.avid.com/) and Media 100 (http://www.media100.com/). One advantage of nonlinear systems is their ability to randomly access video clips. It's like the difference between finding a song on a cassette tape and finding one on a CD -- you never need to wait for a nonlinear system to rewind or fast forward. In building an SVCR, my approach was to go digital, recording video straight to disk using an MPEG encoder board.

In the interest of real-time performance, all encoding and decoding operations are hardware assisted. Using commercial off-the-shelf hardware, I wrote C++ code to interface with vendor-supplied Windows COM and OCX objects that manipulate the hardware encoders. A separate Windows thread listens to the closed-captioned text as it is delivered on the RS-232 port.

SVCR runs in real time either stand-alone in Windows 95 or on a distributed networked heterogeneous environment. It is both multiprocess and multithreaded. Because Windows 95 supports a variety of commercial graphics hardware, the code to handle digitizing video content was written for that operating system, using MFC with components built in both Symantec C++ and Microsoft Visual C++. To distribute the load of actually serving the video, parts of SVCR were ported to DEC Alpha and Sun Sparc systems, and compiled with GNU g++. The supported client platforms for viewing captured video include Windows 95/NT, Macintosh, and the UNIX systems of DEC, Sun, HP, and SGI.

Additionally, a database system was written to optimize for minimum latency on writes. The goal was to maintain real-time capture performance on a Pentium 133. For improved performance on user queries, the data may be optionally saved across an Ethernet network using SAMBA to a UNIX box (such as a DEC AlphaStation Server running OSF) rather than to the local disk. (SAMBA is a suite of programs that lets clients access a server's filespace and printers via SMB protocol; for more information, see http://samba .canberra.edu.au/pub/samba/.)

Enter the Culling Agent

Because digital video can consume massive amounts of disk space in a short amount of time (MPEG-1 consumes about two GB every three hours), a "culling agent" periodically removes video that isn't of interest; see Figure 2. Without a culling agent, you must create a large central repository for housing row upon row of RAID drives. Even if you are willing to store it that way, delivering full-quality MPEG-1 video requires more network bandwidth than is available on the Internet today. Collecting the video content through the existing broadcast channels and saving it locally sidesteps this bandwidth bottleneck. (The video repository approach is being researched through the DARPA-sponsored Digital Library Initiative; see http://www.aero.hq.nasa.gov/hpcc/cdrom/content/reports/annrpt96/iita/JRI.htm.)

Why not limit the recording of video segments at the time of capture, instead of culling later? Because this would return us to the same problem a human has when deciding whether or not to record a program. It doesn't usually become apparent that it is an interesting segment until it is already half over. Even if such aggressive culling was possible, it wouldn't be desirable because it leaves no window for users to change their minds about what's worth saving.

A Serial Port Thread in MFC

Each frame of video that is closed captioned contains two text characters; even if they are just nulls (padding). The Telescriber closed-captioned decoder box is an off-the-shelf device primarily intended for the hearing impaired (see http://www .viewcomtech.com/). The Telescriber reads the closed captioning hidden in the retrace interval of line 21 of the video signal and outputs it into a standard serial port. The first step in developing SVCR was to integrate this device by writing my own program to listen to it.

The closed-captioned text transmitted in a TV signal isn't ASCII, but its own code that is set down by FCC rule 91-119, mod 92-157. (It can be found in the Code of Federal Regulations Title 47, Part 15, Section 119.) Stripping off the top bit of closed-captioned text converts it to a rough approximation of ASCII. It's only an approximation because the closed-captioned text contains formatting codes in addition to data. It also has a somewhat different symbol set; for example, a musical note to represent singing.

To monitor the serial port in Windows, you start a separate thread of execution devoted to this task. (Code to control the serial port in Windows is widely available. See, for instance, Programming Windows 95 Unleashed, edited by Randall A. Tamura, Sams, 1995.) The fundamental approach is to use a CWinThread in conjunction with OVERLAPPED event objects.

In Windows 95, the serial port is opened using CreateFile() and treated as an asynchronous file. Windows NT supports asynchronous files (which read or write in the background), but Windows 95 does not. Windows 95 only supports them if the "file" is actually a port. The serial port thread fills a buffer with data while the main thread empties it. The serial thread notifies the main thread that data is waiting by using view->PostMessage().

Using COM, OLE, and OCX

SVCR talks to two other hardware devices -- a Snappy JPEG image-capture device (http://www.play.com/), which hangs off the parallel port, and an MPEGator MPEG-1 encoder card (http://www.darim .com/) installed in a PCI slot. The Snappy includes an API controlled through an OCX. The MPEGator is controlled through a COM interface. Although an OCX is a type of COM, programming to these two interfaces is very different.

COM is the underlying mechanism that allows Windows applications to interface with objects that are external to a program. To a C++ programmer, this means that you get a pointer to an object in an external program, with all the power that implies. ActiveX and OCX controls are standard interfaces that use COM. Similar in concept to abstract base classes, these interfaces are the methods that a particular type of COM class must provide.

ActiveX was formerly known as OLE, and you will still find references to OLE in documentation. The OCX interface evolved from the VBX control. It's a COM interface intended for use by Visual Basic, but available to other languages, too, since it is just another kind of COM. ActiveX is a more lightweight interface than OCX (and is, consequently, more popular for writing ActiveX controls that will be downloaded via the Internet).

Look Ma, No Libs!

Not long ago, to communicate with a vendor's proprietary hardware, you made some calls into its programming API and linked your code with their C library, which implemented low-level control of their device. More recently, an enhancement came in the form of Windows Dynamic Link Libraries (DLLs). You don't have to link statically anymore because you can load the (vendor's) DLL at run time. The Windows COM interface, of which ActiveX and OCX are examples, goes a step further. The COM mechanism lets you retrieve the vendor's library as though it was a C++ object. The COM object is actually an executable or a DLL, but that detail is hidden.

You can retrieve a COM object either by its CLSID number (its unique Windows ID number generated at compile time) or by name (using the Windows registry). The compiler's class wizard generates MFC code to make this operation invisible to the programmer.

Since ActiveX is smaller than OCX, it would be preferable in a real-time program to use ActiveX. Snappy offers a COM interface, but that only handles raw bitmaps. Capturing a JPEG image is part of their OCX interface. It was a further disappointment that Symantec C++ 7.2 (the Windows C++ compiler I first used) doesn't support controlling an OCX. This was confusing because Symantec C++ does support creating an OCX.

As a workaround, I created the code to control the Snappy OCX using VC++ and wrappered it inside a new ActiveX control. I controlled that from my Symantec C++-compiled application. After Symantec C++ 7.5 came out, which included support for the latest version of MFC, I was able to migrate the VC++ code back into the main application. I now build everything with both VC++ and Symantec C++ so I can use either vendor's tools.

Working with an OCX or ActiveX control is almost trivial with C++ compiler support. Using the class wizard, the compiler generates a class with the code to handle the COM communication. Using an object of this class, you can drive the external control as you would if it was an object compiled into your own code.

The MPEGator is a popular MPEG-1 video-capture card. A high-quality MPEG-1 datastream consumes ten MB per minute, a low-resolution stream about two MB per minute. The MPEGator also supports AVI, but that consumes more space for the same quality. Listing One is a simple test program that saves a video clip.

Although Listing One is a short test program, it is complete. The implementation of the Start() and Stop() methods provides the basic functionality to control recording MPEG clips. Listing Two is a header file that contains a few simple utility routines, while the implementation (mpeg.h) is presented in Listing Three.

Database Design

I initially considered using an off-the-shelf database such as Excite (http://www.excite.com/), which is designed for indexing web pages. However, none of the available database engines were suitable for a real-time system. Without a real-time database, captured segments would not become immediately available. They would have to wait for the database to index them. The benefit of a custom real-time database is that access to stories is only limited by the time lag of system buffers in saving the data. Stories become available for search and retrieval almost immediately, even while still in the process of being captured.

Commercial databases are designed for generic uses, not for optimal speed on writes. Rather than use complicated B-trees or object databases, the closed-captioned text database is based on simple flat files -- it appends text and stores the length of each story. Conceptually, this is a simple design, but it is more difficult to edit data later when culling. Since the culling process has no real-time constraints, this limitation poses no significant problem. The alternative design, of myriad tiny files to contain each story, would have degraded keyword search performance.

Consequently, I built a custom search engine to scan through the closed-captioned text database looking for the specified word or phrase. As they are captured, the stories are numbered (1, 2, 3, and so on). The search function returns the number of the matching story so the story itself can be easily retrieved. The search engine was designed as a CGI script -- a program that returns data through a web server.

Creating a C++ CGI Program

There is considerable mystique about CGI and a common misconception that C++ is ill-suited to it. However, as Listing Four illustrates, it is straightforward to write a CGI program in C++.

For a CGI program, you simply output what you want the web browser to see using cout. Of course, it's a little difficult because your output is formatted as HTML code. You must also remember to output the proper "Content-Type" at the beginning, followed by two newlines. (Failing to do so may cause unpredictable errors from different web servers.) Under Windows, don't forget to compile your CGI as a console application.

As Listing Five shows, it's slightly more difficult (compared to Listing Four) to interact with an HTML form to get input from the user. Listing Six, the HTML for the form, instructs the web server that the program form_cgi should be executed in response to a user pressing Enter, and that the output of that program is what should be returned to the user's browser. You must, of course, have the web browser properly configured and the form_cgi executable in the web server's cgi-bin directory.

Although it may not seem like much, this (complete) example shows everything you need to know to work with a database search engine through a web interface. Instead of just displaying the query string back to the user as in this example, you would pass the query string through using a call into your database engine, then output the returned data to cout.

For a single-field input like this, you can use GET data. The field data is provided to you by the web server in the QUERY_STRING environment variable. More complicated forms require POST data, which is read off cin.

Porting to a Netscape Plug-In

Setting up a web server can be tricky, especially if you've never done it before. Each server has its own quirks, and there are hundreds of different servers to choose from on the various platforms. Don't forget that you must have TCP/IP networking set up properly first. That, of course, requires that your networking hardware is correctly installed. Shipping SVCR with only the CGI version of the search engine would be asking for a support nightmare. So, a Netscape plug-in that would stand alone was created -- no web server required.

A plug-in lets you extend Netscape, usually for the purpose of writing an inline viewer of a new file format. The Netscape plug-in API supports Windows, Macintosh, and UNIX platforms. Under Windows, plug-ins are DLLs. The power of a plug-in can be awesome. They can be called by a Java method. Once inside the C++ code of your plug-in, you are no longer under the normal restrictions of the Java applet security model. You are running in native code in Windows. Further, you can call back from C++ into Java, manipulating its GUI or data. What we're talking about is a Java/C++ mixed-language environment. You get the performance and power of C++ along with the portable GUI of Java -- the best of both worlds.

Designing Netscape plug-ins feels like being in a lost world. There is a passing similarity to MFC, since plug-ins are a kind of framework. However, that's where the similarity to any normal C++ code ends. The Netscape plug-in developer's kit (available at http://www.netscape.com/) includes boilerplate code for a typical plug-in and the javah compiler. The javah compiler writes C/C++ wrappers for the methods in Java libraries. It isn't a compiler really, but a code generator. It works for both standard Java classes and for classes you create.

Although the implementation is completely different, using javah is conceptually similar to working with COM. It gives you a way to get a pointer to a Java object and manipulate it as though it was a C++ object.

Figure 3 is a web page with both a plug-in and a Java applet. The plug-in is embedded in the area where it says "Video Server Search." The string is actually being displayed by the plug-in DLL using a Windows call. You can display whatever you want in the plug-in's window using the normal Windows API calls.

The "Search:" string and its data entry field are a Java applet, not an HTML form. When users enter a string into the field, the applet calls into the plug-in, which actually does the search and writes an HTML page (a file on the local file system). Finally, the applet loads that page into the browser using ShowDocument(). The HTML code in Listing Seven is virtually the same for any HTML code that uses a Java-aware plug-in. There are two differences from simple HTML that allow Java to talk to the plug-in: The plug-in is actually embedded into the page by EMBED. You can't just new the Java class that has the native methods implemented in a plug-in. It has to be embedded in the page. The second significant point is the magic MAYSCRIPT tag. The method for calling a plug-in object out of a page is to ask JavaScript for a handle to the plug-in. Even though there isn't any JavaScript code in the page, the applet needs the MAYSCRIPT tag to be allowed to call the JavaScript interpreter and use JSObject.

Instead of using new, you use code similar to Listing Eight, which fetches the JavaScript window, the document in the window, and the plug-in in the document. The name of the plug-in is the same as what it was called in the EMBED clause. Once the plug-in is retrieved it works like any other Java object. You call its native Search method, which hands off all the real work to your DLL. All of this code is just the normal plumbing necessary to use Java and C++ together in Netscape.

Conclusion

Anyone who wants to be able to produce a quick analysis of breaking news is a potential user of an SVCR. In addition to military and financial analysts, television broadcasters or film libraries wanting to search their own content or repurpose it for the Web are possible users, as are news clipping services.

As the prices of multimedia hardware continue to tumble, I wouldn't be surprised if every new PC has the necessary hardware built in to support an SVCRwithin a few years. This could simplify development. We had a surprise recently when the Telescriber ceased production, necessitating migrating to a different closed-caption decoder.

We are starting to demonstrate prototypes at trade shows (such as AFCEA and DVExpo). Early adopters of our SVCR will include a top military command center and a major Wall Street brokerage house. Future enhancements will take us beyond U.S. television news monitoring to indexing other types of video information, both with and without closed captioning. To do that, we must add support for speech recognition and image understanding.


Listing One

 // test_mpeg.cpp: MPEGator test program #include "windows.h"
 #include "../mpegator/mpeg.h"
 #include "test_mpeg.h"
 int PASCAL WinMain(HINSTANCE iCur, HINSTANCE iPrev, LPSTR lpCmdLine, int
 nCmdShow )
 {       const char* filename="test.mpg";
         if(!MsgBox("Start Capture",filename,MB_OKCANCEL))
         {       return 0;
         }
         MPEG mpeg;
         if(!mpeg)
         {       MsgErrorBox(mpeg.ErrorMsg());
                 return 1;
         }
         if(!mpeg.Open(filename))
         {       MsgErrorBox(mpeg.ErrorMsg());
                 return 1;
         }
         MsgBox("Recording",
                 "Press button to stop",
                 MB_OK|MB_ICONEXCLAMATION);
         mpeg.Stop();
         MsgBox("Finished","Done",MB_OK);
         return 0;
 }

Back to Article

Listing Two

 // test_mpeg.h: message boxes

</p>
 #ifndef TEST_MPEG_H
 #define TEST_MPEG_H


</p>
 inline
 int MsgBox(const char* title,const char* string,UINT
 style=MB_OK|MB_ICONQUESTION)
 {       return MessageBox(NULL,string,title,style)!=IDCANCEL;
 }
 inline
 int MsgErrorBox(const char* string)
 {       return MessageBox(NULL,
         string,
         "Error",
         MB_OK|MB_ICONERROR)!=IDCANCEL;
 }
 #endif

Back to Article

Listing Three

 // mpeg.h: encapsulated MPEGator control

</p>
 #ifndef MPEG_H
 #define MPEG_H
 #include <objbase.h>
 #include <initguid.h>
 #include "inc/mtrif.h"
 #include "inc/mtruid.h"


</p>
 class Ole
 {public:
         void Load()
         {       CoInitialize(NULL);
         }
         void Unload()
         {       CoUninitialize();
 }       };
 class MPEG
 {       IMtrCapture* mpeg;
         const char* errorMsg;
         enum {len=80};
         char buffer[80];
         Ole ole;
         void GetErrorMsg()
         {       if(mpeg)
                 {       mpeg->GetLastError(buffer,len-1);
         }       }
         int GetOLE()
     {   HRESULT status = CoCreateInstance(
                 CLSID_MtrMe,
                     NULL,
                 CLSCTX_SERVER,
                 IID_IMtrCapture,
                     (void**)&mpeg);
         return !FAILED(status);
     }
 public:
         operator!() const
         {       return !mpeg;
         }
         MPEG()
         {       mpeg=0;
                 errorMsg="CoCreateInstance failed";
                 ole.Load();
                 if(!GetOLE())
                 {       mpeg=0;
                         return;
                 }
                 if(mpeg->Open()!=S_OK)
                 {       GetErrorMsg();
                         mpeg->Release();
                         mpeg=0;
         }       }
         ~MPEG()
         {       Unload();
         }
         void Unload()
         {       if(!mpeg)
                 {       return;
                 }
                 Stop();
                 mpeg->Close();
                 mpeg->Release();
                 mpeg=0;
                 ole.Unload();
         }
         const char* ErrorMsg() const
         {       return errorMsg;
         }
         int Open(const char* filename)
         {       if(!mpeg)
                 {       return 0;
                 }
                 if(mpeg->SetFileName((char*)filename)!=S_OK)
                 {       errorMsg="SetFileName failed";
                         return 0;
                 }
                 if(mpeg->OpenStream()!=S_OK)
                 {       errorMsg="OpenStream failed";
                         return 0;
                 }
                 return 1;
         }
         void Start()
         {       if(!mpeg)
                 {       return;
                 }
                 mpeg->Start();
         }
         void Stop()
         {       if(!mpeg)
                 {       return;
                 }
                 mpeg->Stop();
                 mpeg->CloseStream();
         }
 };
 #endif

Back to Article

Listing Four

 // hello_cgi.cpp #include <iostream.h>


</p>
 int main()
 {   cout<<"Content-Type: text/html\n\n"
           "<HTML><HEAD><TITLE>Hello cgi</TITLE></HEAD>\n"
           "<BODY><H2>Hello World!</H2>"
           "</BODY>\n"
           "</HTML>"<<endl;
     return 0;
 }

Back to Article

Listing Five

 // form_cgi.cpp #include <iostream.h>
 #include <stdlib.h>
 int main()
 {   const char* data=getenv("QUERY_STRING");
     if(!data)
     {   data="NULL";
     }
     cout<<"Content-Type: text/html\n\n"
           "<HTML><HEAD><TITLE>CGI TEST</TITLE></HEAD>\n"
           "<BODY><H2>CGI TEST</H2>"
           "QUERY_STRING=\""<<data<<"\""
           "</BODY>\n"
           "</HTML>"<<endl;
     return 0;
 }

Back to Article

Listing Six

 <HTML> <HEAD>
 <TITLE>Test Form</TITLE>
 </HEAD><BODY>
 <H3>Input text and press enter</H3>
 <FORM ACTION="http://tower.jumpsite.com/cgi-bin/form_cgi" METHOD=GET>
 Query string: <INPUT NAME=ISINDEX>
 </FORM>
 <P>
 </BODY>
 </HTML>

Back to Article

Listing Seven

 <HTML> <HEAD>
 <TITLE>Video Search</TITLE>
 </HEAD>
 <BODY>
 <EMBED
         type=application/x-SVCRearch-plugin
         name=plugin
         width=400
         height=100>
 <P>
 <APPLET CODE="Applet1.class"
         MAYSCRIPT
         WIDTH=430
         HEIGHT=270>
 <PARAM name="dataDir" value="C:/video">
 <PARAM name="hostURL" value="file://C:/windows/desktop/javacode.html">
 <PARAM name="filename" value="C:/windows/desktop/javacode.html">
 </APPLET>
 </BODY>
 </HTML>

Back to Article

Listing Eight

       ...       System.out.println("Fetching plug-in...");
       JSObject win=JSObject.getWindow(this);
       JSObject doc = (JSObject) win.getMember("document");
       Plugin_tv_cgi plug=(Plugin_tv_cgi)doc.getMember("plugin");
       System.out.println("Searching...");
       int errorCode=plug.Search(dataDir,hostURL,query,filename);
       ...

Back to Article

DDJ


Copyright © 1997, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.