.NET

A Win32 Network Crawler

By Jawed Karim, May 01, 2000

MP3 Voyeur is a freely available Win32 program that automates the task of finding MP3 files on the shared folders of local area networks. It works like a network crawler, querying each computer on the network and traversing each computer's hierarchy of shared folders to find MP3 files.

May00: A Win32 Network Crawler

Finding MP3 files is just one use

Jawed is a student in computer science at the University of Illinois at Urbana-Champaign. He can be contacted at voyeur@ jawed.com or http://www.jawed.com/.

MP3 Voyeur is a freely available Win32 program that automates the task of finding MP3 files on the shared folders of local area networks (LANs). As such, it works like a network crawler, querying each computer on the network and traversing each computer's hierarchy of shared folders to find MP3 files. How good is it? Well, at the University of Illinois at Urbana-Champaign (where I go to school), MP3 Voyeur found more than 25,000 MP3 files; at Carnegie Mellon University, more than 31,000; at the Texas A&M University, more than 97,000; and at Case Western Reserve University, more than 150,000 MP3 files.

Of course, searching Windows-based computers for files is as simple as right-clicking on your desktop's My Computer icon and selecting the Find option. However, anyone who has attempted to perform a similar search on the Network Neighborhood folder knows that all the Network Neighborhood's menu has to offer is Find Computer. Windows does not provide a built-in Find capability for Network Neighborhood. Unfortunately, this means that searching every computer within Network Neighborhood for files translates into manually clicking on every computer, its folders, and subfolders -- a daunting and tedious task, considering that Network Neighborhoods with hundreds or thousands of computers aren't uncommon on LANs of large organizations (like universities).

In this article, I'll present a utility that addresses this problem, as well as examine Win32 network programming in general. In the process, I'll provide both source code and executables for MP3 Voyeur (available electronically; see "Resource Center," page 5). For more information on my MP3 Voyeur project, see http://www.jawed.com/voyeur/.

It should be noted that MP3 Voyeur is not like Napster, a program that has sparked several lawsuits and has been banned from many university networks. Napster works by turning every computer that runs Napster into an MP3 server. MP3 Voyeur is completely different -- it just searches the LAN, providing access to files that you would be able to access anyway. In doing so, it doesn't violate any security rules and has the same limitations as users who manually click through the folders. Finally, Voyeur is not inherently connected to the MP3 file type -- the program can search for anything. I use the ".MP3" extension as an example.

The Missing Feature

The lack of a Network Neighborhood Find feature might appear to be an oversight on the part of Microsoft developers. However, a more likely reason for such an exclusion is to prevent network abuse. First, the question of legitimacy presents itself when users try to search thousands of computers for files. Second, such an operation would require a fair amount of bandwidth and tie up resources on the computers being searched. In any case, it remains that anyone whose curiosity demands such a feature is left dumbfounded. The problem first came to my attention when I wanted a list of all shared MP3 files on my university's Network Neighborhood. Although there were a handful of known computer names on the network with collections of MP3 files, finding other such archives still meant manually exploring a myriad of thousands of computers and their subdirectories.

Consequently, I developed MP3 Voyeur, which recursively searches all shared directories on Network Neighborhood and displays files contained within them. Voyeur's purpose is not to penetrate security barriers, but to automate a task that could otherwise only be accomplished manually. Therefore, no rules are broken because only folders that have been explicitly specified as shared by their owners are visible to Voyeur. This search program has been tested extensively in different network environments and gives the best results on networks with a large number of hosts and shared directories. Unlike many MP3 search engines that search the Web and FTP sites for MP3 files, Voyeur is a LAN search tool and the number of potential hits is entirely dependent on the location from which Voyeur is run. It also differs in that only immediately accessible files are displayed -- if a file is listed, it is guaranteed to be available. In search results of web-based search engines, on the other hand, broken and unreachable links usually outnumber the rest.

Windows Networking

At the core of Voyeur are the Windows Networking (Winet) functions WNetOpenEnum(), WNetCloseEnum(), and WNetEnumResource(), which let you access network resources without making allowances for a particular underlying physical network implementation. Compiling code containing Winet functions under Visual C++ requires linking with mpr.lib. Similar to Win32's FindFirstFile() and FindNextFile() for traversing files in directories, WNetEnumResource() is based on linear enumeration of network resources. Windows organizes a network as a hierarchy, where the root is the topmost container in the network. Before starting the enumeration, you must call WNetOpenEnum() to obtain an enumeration handle to the network root. Table 1 lists its parameters and their meanings.

The desired handle is returned through the enumeration handle pointer, which does not need to contain an expected initial value. Returned handles are specific to the NETRESOURCE structure lpNetResource, which must be initialized appropriately before calling WNetOpenEnum(). However, because we don't have such a resource to begin with, passing NULL will yield a handle to the network root. dwScope and dwType should be initialized to RESOURCE_GLOBALNET and RESOURCETYPE_ANY, respectively. As long as WNetOpenEnum() returns NO_ERROR, plhEnum contains a valid handle to the network root. In the code included with this article, the function GetNetworkHandle() wraps WNetOpenEnum() with error handling and default values for the other parameters.

Given an enumeration handle obtained by the method described, WNetEnumResource() returns an array of NETRESOURCE structures containing resources on the same level of the network hierarchy as the handle. Table 2 lists the WNetEnumResource() parameter types. Each of the returned structures can be passed to WNetOpenEnum() again to obtain a handle to the next level of the network hierarchy and so on. The process is implemented recursively and can be repeated until an enumeration handle of NULL or an empty NETRESOURCE structure is returned; see Listing One. The call to the network enumeration function in Listing Two starts the recursive search.

If there are more items to return than fit into the array of NETRESOURCE structures (lpBuffer), WNetEnumResource() fills the existing array space, keeps the remaining items available for further retrieval, and returns NO_ERROR. This function should be called repeatedly until ERROR_NO_MORE_ITEMS is returned. In the source code, GetNetworkInfo() encapsulates this functionality.

Directory Traversal

In the function RecurseNetworkLevels(), successive NETRESOURCE entries are examined to determine whether they represent actual directories on remote computers. If so, the directory's name is obtained from the NETRESOURCE.lpRemoteName structure member. Its format is: "\\HOSTNAME\DIRECTORY\." Other structure members reveal such details as the scope of the resource, its type, and any comments associated with it. Once a remote directory's full pathname is known, it can be treated as a local path with conventional file-system functions. To have FindFirstFile() and FindNextFile() list all filetypes in the directory, first append "\*.*" to the pathname. Similar to our recursively implemented network search, subdirectories are again handled recursively. When going deeper into subdirectories, the directory name "." must be filtered out to prevent the function from entering its current directory again, which would result in infinite recursion.

MFC's Single Document Interface

The Voyeur search code (available electronically) is threaded into the background and incrementally forwards its search results to the user interface, where users can interact with them. Voyeur's interface is based on a Single Document Interface (SDI) MFC application, the initial code skeleton of which was created by the Visual C++ AppWizard. By default, the name of the current document is displayed in the title bar of the application. However, since the concept of a document doesn't really apply here, that feature should be disabled. The CMainFrame class, which is derived from CFrameWnd, calls PreCreateWindow() with a pointer to a structure containing creation information for that window, immediately before actually creating it. This gives you an opportunity to step in and make changes to the way the window will be created. Taking the FWS_ADDTOTITLE window style bit out of the style field of the CREATESTRUCT structure removes the document's title from the frame's title bar. In this case (and anytime an MFC base-class function is overridden), it is important to still call the base-class version of the function.

Another important class component of the application is the view, which determines how the information contained in the current document is presented. By deriving the CVoyeurView class from MFC's CListView class, the application inherits the look-and-feel of an Explorer window in Details viewing mode. That is, files will be shown as a list of files within a list view. A quick way of modifying its style is to once again override the view's PreCreateWindow() function. Adding the style bits LVS_REPORT, LVS_SINGLESEL, and LVS_SHOWSELALWAYS means that the list will always appear in Details mode, that only one selection at a time is allowed, and that this selection remains visible even when the list is out of focus. The view's overridden OnCreate() function is where the list view is initialized and the column titles are specified. Setting an image list for the list view here requires having created an image list of type CImageList beforehand, into which the desired icon has already been inserted. This is the icon visible next to each entry in the list.

MFC and Object Pointers

The easiest way to insert filenames into the view from the running network search thread is to pass them directly to an instance of the CVoyeurView class. Hence, the network-searching thread needs a reference or a pointer to the main view object. It seems reasonable to pass a this pointer as the thread argument when calling AfxBeginThread() from CVoyeurView to start the search thread. As the search results come in, you could use the pointer to call a member function of the view that updates the interface. However, MFC objects should never be passed between threads by means of pointers, but by means of handles. The view's window handle is stored in its CWnd base class' member m_pHwnd. It's best to pass this window handle to the thread instead, from which a pointer to the original object can be reconstructed using the static CWnd::FromHandle function; see Listing Three. When passing MFC window objects to secondary threads, pass them by handles, never pointers.

Another easily overlooked problem concerning a single search thread is the fact that the user could start a second thread that would interfere with the first thread still in progress. Before beginning another thread, it should be verified that the previous one has finished. To do so, one can call GetExitCode() on the thread's handle, which is contained in m_hThread, which in turn is a member of the CWinThread pointer returned by AfxBeginThread(). A return code of STILL_ACTIVE, for example, would indicate that the thread is still in progress.

How do you go about opening a file in the resulting search list with its associated application? Use the Win32 ShellExecute() function. Given the filename and the window handle of the parent window, the function will determine the associated application from the file's extension and proceed to open the file in it. Conveniently, if a pathname instead of a filename is passed, Windows will open an Explorer window containing that file. Saving and loading search lists into the application is also made simple by MFC's concept of serialization. MFC's SDI encapsulates the whole process by automatically handling and creating the Open File or Save File dialogs that prompt users for the desired filenames and thereafter calling the document's Serialize() function. It has a single parameter of type CArchive that represents the physical file to be used for either writing or reading data. After querying the archive object to find out whether a file storage or retrieval task was requested, one can use operations such as ReadString() and WriteString(), without having to worry about low-level file I/O operations.

Acknowledgment

Thanks to Matthew Wright for idea contribution and prototype codevelopment of Voyeur.

DDJ

Listing One

HANDLE hRoot = GetNetworkHandle (NULL);
RecurseNetworkLevels (hRoot);
WNetCloseEnum (hRoot);

Back to Article

Listing Two

RecurseNetworkLevels (HANDLE handle)
{
    NETRESOURCE netres[NUM_RES];
int numEntries = GetNetworkInfo (handle, netres, NUM_RES);
for (int i = 0; i < NUM_RES; i++)
{
    PrintResource (netres[i]);
    HANDLE tmp = GetNetworkHandle (&netres[i])
    RecurseNetworkLevels (tmp);
}
}

Back to Article

Listing Three

// passing Voyeur's view's window handle to the thread.
// (called from CVoyeurView)
::AfxBeginThread (thread_func, (void *)m_hWnd);
// reconstructing the Voyeur's view pointer from its window handle.
unsigned int thread_func (void *arg)
{
    HWND hwnd = (HWND)arg;
    CWnd *pWnd = CWnd::FromHandle (hwnd);
    CVoyeurView *pvView = (CVoyeurView *)pWnd;
    // proceed to use pvView ...
}

Back to Article

1 2 3 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

.NET

A Win32 Network Crawler

Finding MP3 files is just one use

The Missing Feature

Windows Networking

Directory Traversal

MFC's Single Document Interface

MFC and Object Pointers

Acknowledgment

Listing One

Listing Two

Listing Three

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

.NET Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

.NET

A Win32 Network Crawler

Finding MP3 files is just one use

The Missing Feature

Windows Networking

Directory Traversal

MFC's Single Document Interface

MFC and Object Pointers

Acknowledgment

Related Reading

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

.NET Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content