Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

JVM Languages

Diagnosing Proxy Server Problems


Mar01: Diagnosing Proxy Server Problems

TCPMapper peeks inside network connections

Russ is a programmer in New York, New York. He can be contacted at [email protected]./


Proxy servers play an essential role on the Internet, even though they go almost entirely unnoticed by most Internet users. Invisible as they may be, proxy servers make it possible for companies to access the Web without compromising their own security, and for online businesses to offer products and services. But to many web developers, proxy servers are a mystery, and unsuspecting developers can find proxy servers to be a formidable obstacle when designing new web applications.

In this article, I will present a background on proxy servers, describing their use and operation, and some complications they introduce. I'll then introduce a Java utility called "TCPMapper" that lets you peek inside the network connections between your browser and a proxy server so that you can explore problems you might face with web applications.

Proxy Backgrounder

A proxy server is a program that acts as a relay for network connections between browsers or other web applications and the Internet. Proxies usually work in conjunction with a firewall to allow controlled access to the Internet from inside a private corporation, a small business, or a home network. The proxy server accepts network connections on well-known ports and executes commands defined by the HTTP (or other) protocol specification. Typically, a browser program will connect to a proxy server and ask the proxy to load a resource from the Internet. While the proxy loads the data for the requested resource it passes that data back to the browser, standing in for the requesting program and living up to its name as a proxy.

If a proxy server is to be used by a web application or browser, the client program must be configured to talk to the proxy server when it needs to make a connection to the Internet. Here's how it works. Say you want to load http://www.yahoo.com/ from your browser. If you had a direct connection to the Internet, you would configure your browser to bypass any proxy servers and connect straight to the address www.yahoo.com. Now say your site uses a proxy server called "my-proxy," you first need to configure your browser to use the my-proxy server on a well-known port, usually port 80. Next, when you try to load http://www.yahoo.com/ the browser effectively ignores the request at first, connecting to the my-proxy server instead. Once connected to the proxy, the browser asks it to load the page at http://www.yahoo.com/ and return the data contained in it. The proxy may access the site directly, or with the help of other proxy servers, downstream from the request. In any case, the proxy attempts to load the requested page on behalf of your browser.

Proxies and Programmers

While the principle of proxying is fairly simple, the myriad details of their practical use complicates matters, sometimes making life difficult for programmers. Whether you are building a browser-based application with client-side script, or a custom app that needs access to the Internet, you almost certainly have users accessing your site from behind a firewall and proxy server. When things go wrong, it may be necessary to diagnose a problem, not with your web server or client script, but with a proxy server.

Proxy servers try to provide a wide assortment of features without interfering with the intended outcome (they are supposed to be invisible to the application, remember). But sometimes features such as caching, site filtering, clustering, and proxy authentication can have unintended consequences to programmers. Like it or not, proxies have become an integral part of our Internet experience, and when things go wrong, proxies matter to everyone involved in the production of a web application.

Developers are usually insulated from the network level of the applications they build. Most often, the browser handles the network connections and the proper execution of HTTP and HTTPS requests. For this reason, many developers are in the dark about what is actually going on in the execution of a request. To understand proxy servers, you must also understand something about the data exchanged in the conversations taking place between browsers, proxies, and servers on the network.

Proxy servers can interfere with an otherwise well-behaved web application in a number of ways. The most common problem is caching. Proxies like to save a copy of previously visited URLs in order to minimize the number of times pages are loaded from their source. This can be confusing if your application depends on dynamically created content, or uses form content to maintain state information. Other problems can arise if a proxy is configured to block access to certain URLs. Sometimes a pattern-matching expression intended to block access to multiple sites can accidentally block access to legitimate sites as well. Proxies aren't always passive in their job of relaying data from a server to a client program, either. In fact, some proxies can modify the HTTP headers being sent back and forth. This can potentially interfere with the exchange of cookies and other header information. Finally, proxies can be configured to force users to supply a user name and password each time proxy services are requested. This feature can provide an audit trail showing the usage history for each person who accesses the proxy. But it can also prevent certain browsers and applications from working properly, as you will see in the case study that follows.

When trying to delve into a problem that might be caused by a proxy server, there are a few different techniques that can help — the most obvious being to collect the data sent to/from the application that is having trouble and to analyze it. This can be done by running a low-level trace on the network to collect all of the traffic going in and out of the application in question. Using a tool that can separate out the data from each socket connection on the network, you can single out the connection that is having problems and look at the data exchanged between client and proxy. However, the tools for doing this kind of trace turn out to be either very expensive, or free but tedious and difficult to work with. A better alternative to doing network sniffer traces is to put a mapper program between the client and the proxy. The mapper accepts connections from the client program and forwards them to the proxy. If the mapper program can also log all of the data that it copies between the client and the proxy, then you can just open the logs for a postmortem search through the data sent back and forth.

One such tool that I have found essential in my work with proxy servers is TCPMapper, a Java program written specifically for this task. TCPMapper maps socket connections between a browser or other Internet client program and a proxy server (or any server, really). I wrote TCPMapper (available electronically; see "Resource Center," page 5) specifically for debugging and troubleshooting proxy and other network programming problems. To illustrate its use, I'll analyze a nasty incompatibility between Netscape 4.7 and Microsoft Proxy Server 2.0.

Eavesdropping on Proxy Servers With TCPMapper

Figure 1 is an example of the HTTP header data captured by TCPMapper as it traveled between a browser program and a web proxy. Shown in this example is a successful attempt to load the home page at the web site http://www.yahoo.com/.

First we see the request made by the browser program to the web proxy, in this case using Internet Explorer 5.0 (see Figure 1). The request contains a command statement in the first line where you see "GET..." and several lines of header text that can supply both the web proxy and ultimately the web server with information about the browser and the type of connection it requires.

The response that comes back from the proxy server (see Figure 2) includes a line of text indicating the response code followed by a series of headers giving the browser information about the content being returned. Finally, after the headers, the actual HTML of the requested document would appear (though it is omitted in this excerpt).

Diagnosing Problems Caused by Proxy Servers

The ability to peer into the conversation between web browser and web proxy lets you dig into a complicated proxying problem and attempt to analyze it. With that in mind, let's consider a problem that comes up occasionally and try to diagnose it using the TCPMapper.

Say you have just installed Netscape Navigator 4.7 where, until now, only Microsoft products were used. In the past, you have had no problem running Internet Explorer 5.0, but today you cannot make Netscape load any web pages from the Internet. You decide to insert the TCPMapper program in between the browser and the Microsoft Proxy Server to see what is going wrong when you attempt to load the URL http://www.yahoo.com/.

Looking at Figure 3, you notice little difference between this request, coming from the Netscape browser, and the request in the previous transcript coming from the Microsoft browser. Bear in mind that, just as in the other case, this request is going to a proxy server that will attempt to contact the web server and return the results to the browser. But there is a catch. Figure 3 is using a proxy server that does authentication. That means the proxy demands a valid user name and password from the browser before it will load any web pages.

In the response text in Figure 4, you see a different response code than before. This time a 407 error is being reported with a brief description of the error. Browsers usually recognize this error code as the proxy's way of requesting a user name and password. The second line gives a hint at how the browser should reformulate the request giving user credentials for the authentication check. It says "Proxy-Authenticate: NTLM." In a more widely used form of authentication, the proxy would return "Proxy-Authenticate: Basic."

Normally, the browser would see the Proxy-Authenticate directive and prompt users for a user name and password for the given proxy. Next, the browser would send a new request with a special header that encoded the user information for the proxy to authenticate.

But there is a problem. In this example, the Netscape browser doesn't recognize the NTLM authentication request and gets confused. If you were looking at the browser trying to make this request, you would wait while the progress bar bobbed back and forth indefinitely. By studying the TCPMapper trace, you get a clue as to what is going wrong.

Analysis and a Solution

In the full TCPMapper trace, the identical request and 407 response are repeated over and over again. This tells you that the Netscape browser is not recognizing the authentication request sent back to it from the proxy. It turns out that the NTLM method of authenticating is a proprietary Microsoft technology not implemented in the Netscape browser.

By reconfiguring the proxy server to use Basic authentication as well as NTLM authentication, the problem can be easily fixed to enable both Microsoft and Netscape browsers to work with this authenticating proxy server. This slight modification to the proxy server causes the proxy to respond to an initial request with an extra header line not seen in the previous trace; see Figure 5, which offers Basic authentication as an option to the browser program. Now the Netscape browser sees a familiar form of authentication being offered to it, and will prompt the user for a valid user name and password. Assuming the user credentials are acceptable to the proxy server the request will be granted by the proxy and the problem will be solved.

Understanding the TCPMapper Program

The job of the TCPMapper is to accept network connections on a well-known port and, much like a proxy, relay that connection to a server listening for connections somewhere else on the network. TCPMapper can be used with web proxy servers or any other TCP/IP-based server program. Running the TCPMapper requires a Java virtual machine and the compiled Java classes. Say you wanted to use the TCPMapper to intercept requests to a proxy called "my-proxy" on port "80." If you were to run TCPMapper on a machine called "my-pc" and have it listen on port 8080, you would run it as in Figure 6. By configuring your browser program to use my-pc port 8080 as its proxy server (instead of the usual my-proxy port 80), you effectively sandwich the TCPMapper between your browser and web proxy where it can trace all communication sent back and forth between the two.

Once TCPMapper is collecting data, you will see data files being created in its working directory. For each network connection relayed by the TCPMapper there is one log file containing the data sent to the server by the client, and another log file containing the data sent from the server to the client. Each pair of files is given a sequence number corresponding to the order in which connections were handled. For text-based protocols (such as HTTP), these log files reveal network conversations in plain text. When using the TCPMapper with binary protocols, interpreting the log files is a more difficult undertaking.

Source Code Highlights

The main TCPMapper class extends a base class called ConcurrentServer; see Figure 7(a). This avoids having to handle both server-side and client-side socket connections in one class. Separating the server code also makes it possible to reuse code when building other tools. The base class is a general-purpose, multithreaded network server capable of receiving multiple concurrent network connections and delegating the processing on these connections to separate threads, each running a method called handleSession() in the TCPMapper class; see Figure 7(b).

After a connection has been accepted by TCPMapper, the next step is to relay that connection to the actual proxy server by setting up another socket connection, as in Figure 7(c). From Figure 7(c), proxyHost would be my-proxy and proxyPort would have the value 80.

Routing Packets Between Browser and Proxy

With TCPMapper handling incoming and outgoing socket connections, you need a simple way to shuttle the data back and forth in a manner that hides TCPMapper from the other two programs trying to communicate with each other. To handle this chore, you delegate the work to two more threaded classes whose job it will be to read incoming data from one socket and immediately copy that data out to another socket. Using two such instances, TCPMapper can concurrently copy data back and forth bidirectionally.

In Listing One, the program first creates data input and output streams from the socket connected to the client program. It then opens the socket connection to the proxy server (shown already) and creates data input and output streams connected to that program. Next, TCPMapper creates instances of the ByteCopier class, initializing them with two data streams connecting one to the client and the other to the server program. Depending on the value of the logging flag, the copier objects may also be initialized using a reference to an object that will log all of the data passed through TCPMapper to output files. When all of this setup is completed, the program starts the copier objects (remember that these run as separate threads) and simply waits until one of the copiers detects a broken connection indicating that the active mapping (still under the control of the handleSession() thread) should be shut down.

Logging the Network Trace Captured by TCPMapper

TCPMapper includes logging capabilities. If logging is enabled, TCPMapper saves all of the data sent to and from the proxy server into files, one pair of output files for each connection. To facilitate different kinds of logging, the DataLogger class created and passed into the ByteCopier implements a special interface called LineListener (see Listing Two). When data arrives in the ByteCopier class, it will invoke a dataAlert() method on its corresponding LineListener. How the presentation of this data is handled is up to the class implementing LineListener or, in this case, the DataLogger class.

Extending TCPMapper

Some clear limitations of TCPMapper appear in practical use. First, with or without logging, TCPMapper introduces a performance bottleneck when it is placed in front of a proxy server. Although the program can handle large numbers of concurrent connections from client programs, the data- copying process is inefficient and slow.

For strictly HTTP traffic, one simple enhancement would be to replace the current ByteCopier class with another called LineCopier (also included with the source code), which buffers input and output.

Next, by writing a new LineListener implementation, it would be possible to put a prettier face on TCPMapper, perhaps displaying the traced data in a window instead of just writing the raw data out to a file. You might even go so far as to interpret the meaning of the HTTP statements and response codes to produce an annotated and color-coded transcript of an HTTP session.

Further Study

The protocol specification for HTTP describes the use of proxies in detail. It is a tedious document to read, but as a reference it can be an invaluable help. Several editions are available online, but I suggest you get the nicely formatted online version from http://www.w3.org/Protocols/rfc2616/rfc2616.html.

If you are interested in further reading on the subject of proxies, a short list of definitive sources must also include the the book Web Proxy Servers, by Ari Luotonen (Prentice Hall Computer Books, 2000). While written for the network administrator to a large extent, this book covers the nuts and bolts of proxying and troubleshooting, as well as rules of thumb for capacity planning.

Free proxy software or trial versions (see Table 1) of commercial products are easy to find on the Web and can give you a solid test platform for working through proxy-related development problems. For developing inside of a corporate network, you can chain a development proxy to a production proxy and still access external web pages while testing a new application.

DDJ

Listing One

clientIn = new DataInputStream(socketFromClient.getInputStream());
clientOut = new DataOutputStream(socketFromClient.getOutputStream());

socketToServer = new Socket(proxyHost, proxyPort);
serverIn = socketToServer.getInputStream();
serverOut = socketToServer.getOutputStream();

ByteCopier copier1 = null;
ByteCopier copier2 = null;
DataLogger logger1 = null;
DataLogger logger2 = null;

if (logging)
{
    logger1 = new DataLogger(uniqueId, "To");
    logger2 = new DataLogger(uniqueId, "From");
    copier1 = new ByteCopier(clientIn, serverOut, logger1);
    copier2 = new ByteCopier(serverIn, clientOut, logger2);
}
else 
{
    copier1 = new ByteCopier(clientIn, serverOut);
    copier2 = new ByteCopier(serverIn, clientOut);
}
copier1.start();
copier2.start();
while (copier1.isAlive() && copier2.isAlive()) waitHere(100);

Back to Article

Listing Two

public interface LineListener
{
    public void dataAlert(int data);
    public void dataAlert(String data);
}

Back to Article


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.