Mobile

iSCSI Target Emulation

By Patrick L. Garvan, Shawn McFarland, Manoj Mehta, Mike Ramsay and and Chris Robinson, January 01, 2004

iSCSI is a networking standard for sending SCSI commands over IP networks. Our authors present an iSCSI emulator and show how it can be used to test iSCSI systems.

Jan04: iSCSI Target Emulation

The authors are members of Microsoft's Windows Base Drivers and Driver Services Test group.

Internet SCSI (iSCSI) is a recent networking Standard for sending SCSI commands over IP networks (http://www.ietf.org/internet-drafts/draft-ietf-ips-iscsi-20.txt). One of the principal benefits of iSCSI is that SCSI storage devices can be accessed remotely utilizing existing IP network infrastructure. For example, a company in one city might backup all of its data on SCSI disks in another city by accessing the disks using iSCSI over a standard private IP connection.

As with SCSI, iSCSI uses the concepts of "initiator" and "target" when describing the flow of commands and data. In iSCSI, the initiator is responsible for packaging SCSI commands generated by the OS and sending these over an IP network. An iSCSI target then receives these packaged commands and sends them to a SCSI device. The same process works in reverse when a SCSI device sends back its response. The types of SCSI devices controlled by an iSCSI target may be a disk, CD-ROM, tape drive, printer, scanner, or any other type of SCSI device.

To test Microsoft's iSCSI Initiator, our group—Windows Base Drivers and Driver Services Test—needed an iSCSI target to test against. The available iSCSI targets were not suitable for some of our test cases, so we designed several iSCSI target emulators specifically for the purpose of testing the initiator software. In this article, we describe the design of one of our iSCSI target emulators and provide examples of how we used our emulated target to test the Microsoft iSCSI Initiator prior to shipping. The full source code for this iSCSI target emulator is available at no charge from DDJ (see "Resource Center," page 5). The emulator provides a useful environment for experimenting with both the iSCSI and SCSI protocols. To use the emulated target, you also need an iSCSI initiator. The Microsoft iSCSI Initiator is available at no charge from http://www.microsoft.com/downloads/ and is supported on Windows 2000 SP3, Windows Server 2003, and Windows XP SP1.

The Microsoft iSCSI Initiator consists of a software initiator and software for managing iSCSI connections and iSCSI hardware via Windows Management Instrumentation (WMI). The management software is designed to work with both the Microsoft software iSCSI initiator and with hardware iSCSI initiators. (Hardware iSCSI initiators are essentially iSCSI storage adapters containing their own TCP/IP stack for improved performance.)

The Microsoft software initiator includes kernel drivers, so our testing relied a great deal on kernel-based test tools such as Windows Driver Verifier (DV). DV monitors kernel-mode drivers for illegal function calls and system corruption, and has been included in-box with Windows since Windows 2000. If you'd like to try DV, just run "verifier" from a Windows command prompt to open the Driver Verifier Manager (for more information, see http://msdn.microsoft.com/library/default.asp?url=/library/en-us/ddtools/hh/ddtools/dv_7g8j.asp).

On our test machines, we made it a rule to have DV permanently verifying the two kernel-mode initiator drivers iscsiprt.sys and msiscsi.sys. We also occasionally used DV's optional low-resource simulation feature that randomly fails memory requests made by the drivers it is verifying. This option is very good at ensuring that there are no memory leaks associated with rarely traversed error paths. DV can detect such leaks whenever the drivers are unloaded.

For testing the user-mode software in the initiator—such as the msiscsi service—we used the AppVerifier tool available in the Windows Application Compatibility Toolkit. AppVerifier enables pageheap checking and monitors for the most common application problems. It is also available at no charge from http://www.microsoft.com/downloads/.

Two often-overlooked built-in test tools are PerfMon and the Windows Task Manager. PerfMon can be opened from the command line as "perfmon," and Task Manager using the CTRL+SHIFT+ESC keyboard shortcut. These tools include many system performance indicators such as current CPU usage and memory usage. During development of the Microsoft iSCSI initiator, we were able to check whether there were memory leaks by running highly repetitive tests and simply watching the behavior of the nonpaged pool usage. If the usage steadily increased over time, then this usually indicated a leak somewhere. The simplest of these repetitive tests consisted of logging into and logging out from an iSCSI target in an infinite loop. This effectively stressed both the management software and the kernel-mode drivers.

Designing the Test Target

The emulated target comprises routines to manage communications sockets and threads, process iSCSI Protocol Data Units (PDUs), process SCSI commands, and emulate disk IO.

The accept thread uses the Winsock library to handle new connection requests from the initiator. Each time the target receives a new connection request from the initiator, it creates two new threads—a recv thread and a send thread. The current design supports up to 512 simultaneous connections.

Each recv thread receives data from the initiator and parses them into iSCSI PDUs. Listing One shows the parse_received_data() function that performs the parsing. It calls AddPDUtoList() to store these incoming PDUs for the connection in a doubly linked list. Each time it stores new PDUs on this list, it calls signal_connection(), which uses a named event to signal the send thread to process these PDUs for this connection. It is possible for a PDU to arrive in fragments, so the recv thread saves any excess data in case it is part of a subsequent PDU.

When the send thread for a connection is signaled, it immediately unlinks the list of one or more iSCSI PDUs. It then analyzes each PDU on the list, performs the appropriate iSCSI processing, and sends one or more response PDUs across the network to the initiator. Multithreaded access to the PDU list for each connection is controlled by per-connection mutexes.

To reduce the likelihood of orphaned threads and orphaned connections, when the recv thread detects that the initiator has closed its end of the connection, it signals the send thread to indicate that the send thread should exit. The recv thread then closes the target's end of the connection and exits.

Each send thread also functions as an iSCSI processor. The principal function of an iSCSI processor is to process the iSCSI PDUs according to the iSCSI Standard (http://www.ietf.org/internet-drafts/ draft-ietf-ips-iscsi-20.txt). The emulated target does not implement the full iSCSI Standard, but implements enough to allow useful testing of the initiator.

For example, our test target does not check the sequencing of incoming PDUs, and instead makes the assumption that PDUs arrive in the correct order. As far as possible, our testing indicates that the default behavior of the Microsoft iSCSI Initiator results in PDUs being sent to targets in order.

The iSCSI processor first determines the type of each received PDU and also which type of response PDU must be sent. If the received PDU contains a SCSI command, then the SCSI command usually needs to be processed before an iSCSI response can be sent. The iSCSI processor then constructs one or more iSCSI response PDUs, which may also contain SCSI sense data or data read from the emulated disk. Listing Two (available electronically; see "Resource Center," page 5) shows the process_LOGIN_COMMAND() function that analyzes an iSCSI login command PDU and constructs an appropriate login response PDU.

The SCSI processing code processes the SCSI commands received from the iSCSI processor. For the purposes of this simple iSCSI target emulator, the SCSI processor only handles the following subset of SCSI disk commands: REPORT LUNS, INQUIRY, READ CAPACITY, READ(10), WRITE(10), TEST UNIT READY, REQUEST SENSE, MODE SENSE(6), and VERIFY. The VERIFY command is supported so that Windows can succeed a full format of the disk. These commands are described in The Small Computer System Interface 2 (ANSI X3.131-1994) and SCSI-3 Primary Commands (ANSI X3.301-1997).

The C functions implementing the SCSI READ and WRITE commands call the appropriate emulated disk-medium read and write functions to perform the actual raw reads and writes to and from the emulated disk medium

If the SCSI processor receives a command that it does not support, then it returns a SCSI CHECK CONDITION status together with any available sense data. Listing Three (available electronically) shows the process_SCSI_READ10() function that processes SCSI READ(10) commands. If it receives a READ(10) command for LUN 0, it will call the emulated-medium READ function ReadFromMedium(). Commercial emulated iSCSI target disks would be expected to support all the mandatory SCSI-II and SCSI-III disk commands. Additionally, our emulator supports only a single SCSI LUN - LUN 0. It returns CHECK CONDITION in response to any SCSI commands destined for a nonzero LUN.

The disk-medium emulator processes block reads and writes to and from the disk medium. It writes the requested WRITE data blocks to a disk-medium backing file, and reads requested READ data blocks from the disk-medium backing file. Listing Four (available electronically) shows the ReadFromMedium() function that emulates READs from the medium.

The code for the disk emulator unit is not especially fast because the slowest part of our target emulator is the Winsock-based network communication. However, since the emulated disk capacity is only 20 MB, a fast disk medium could be emulated by using a static array in memory. An interesting experiment might be to modify the emulated disk-medium code to use the following C definition to represent the emulated 20-MB disk medium as a static 2D array:

#define DISK_CAPACITY_IN_BLOCKS (40960)

#define DISK_BLOCK_SIZE_IN_BYTES (512)

UCHAR uchDiskMedia

[DISK_CAPACITY_IN_BLOCKS]

[DISK_BLOCK_SIZE_IN_BYTES];

Clearly, a SCSI hard disk is not required for emulating the test target. In fact, if we were to use the aforementioned static array representation, no physical disk medium would be required at all.

Using the Emulated Target

The emulated target is a console application that can be run from a Windows command prompt. If it is run with no command-line parameters, it creates a new 20-MB emulated disk-medium backing file in the Windows temporary directory. To use an existing emulated disk-medium backing file, the program accepts the full path to the file as an optional command-line parameter. This means that you can copy files to the emulated medium, and later reuse this emulated disk.

It is interesting to note that the emulated target application can either run on the same machine as the initiator, or on a separate machine. The only requirement is that the initiator machine is actively connected to a network via a network interface card (NIC) or a hardware iSCSI initiator, and that the initiator machine can access the target machine across the network.

When the emulated target console application starts, it is ready to accept connections from an initiator. Additionally, it is ready to accept user commands at the console. These commands include various management commands, as well as commands to control the error-injection switches discussed in the next section.

Once you have logged into the target, you can then partition, format, and use the disk as you would a physical disk until you choose to logout from the iSCSI target. You can also log back into the target later and continue using the existing data on the emulated disk.

Using the Emulated Target as a Test Tool

The emulated target provides many test switches for error injection. The main program thread accepts typed console commands from users as a means of manually toggling the test switches on and off. For automated testing, the emulated target also manages a dedicated thread that accepts test-switch commands received via TCP/IP from test applications.

For example, the communications code supports the ability for users to arbitrarily close any live connection. The expected behavior of the initiator is to then attempt to reconnect to the target every 5 seconds until the connection is reestablished. If the initiator did not attempt to reconnect, it would indicate a bug in the initiator.

Additionally, the communications code can randomly split response PDUs into two network packets. The byte position at which PDUs are split is also chosen randomly. This tests whether the initiator correctly reassembles PDU fragments, and also whether it waits until all fragments have arrived before it attempts to process the PDU. The expected behavior of the initiator is that it remains unaffected by multipacket PDUs. An example of a bug would be for the initiator to prematurely timeout a response PDU after receiving only the first fragment.

The communications code can also randomly delay sending iSCSI response PDUs to the initiator. The expected behavior of the initiator is to either wait until the PDU is received, or timeout if the target takes too long. Listing Five (available electronically) shows the send_PDU() function that implements random delays and random PDU fragmentation.

The iSCSI processor code supports sending an iSCSI asynchronous logout PDU to the initiator via any live connection at any time. The expected behavior of the initiator is for it to immediately close the connection and then immediately login again. If the initiator did not close the connection, did not attempt to subsequently login, or removed the installed disk device, then a bug in the initiator may be indicated.

Asynchronous iSCSI logouts are rare events, so the asynchronous-logout code paths in the initiator require focused testing outside of typical stress testing. Clearly, using the emulated-target tool for sending asynchronous logouts increases our confidence that our testing has not missed bugs in rarely used code paths.

The SCSI device emulation code can randomly inject SCSI READ and WRITE errors. The initiator should remain unaffected by these errors. Listing Three (available electronically) demonstrates how the process_SCSI_READ10() function randomly injects SCSI READ errors.

The emulated disk-medium code can randomly inject disk-medium read/write corruption. Of course, by choosing to inject write corruption, it takes only a few corrupted file writes to render unusable any filesystem stored on the emulated disk. The emulated disk would then need to be reformatted while read/write corruption injection is disabled.

The random disk-medium corruption is injected by setting 10 random bytes in random read or write data blocks to the character "X." This makes it easy to spot the corruption in text files. If read corruption is switched on after a file has already been cached by the filesystem, then no corruption of the file will occur until the next time the file is read from the emulated disk medium. Listing Four demonstrates how we implemented random READ corruption in the ReadFromMedium() function. Figure 1 is a typical test case using the error-injection capabilities of the emulated target.

Conclusion

The iSCSI transport protocol makes it possible for you to access storage over standard, Ethernet-based TCP/IP networks. While not a production-level product, the tool we present here provides a useful user-mode environment for experimenting with both the iSCSI and SCSI protocols.

DDJ

Listing One

int parse_received_data(
    IN PST_TARGET_CONNECTION lpConnection,
    IN char *charBuf,
    IN ULONG ulDataLength,
    IN OUT ULONG *lpulBufferOffset
)
/*++
Description:
    This routine parses the ulDataLength bytes of data in charBuf 
    into individual iSCSI PDUs. It copies any left over data to the 
    start of charBuf and sets *lpulBufferOffset to the number of 
    left over bytes.
    
Arguments:
    lpConnection points to the connection struct describing the 
    connection with the target.

    charBuf points to the buffer containing the data received 
    from the initiator.

    ulDataLength contains the total number of unprocessed bytes in charBuf.

    *lpulBufferOffset returns with the number of unprocessed bytes in charBuf.

Return Values:
    A return value of 0 indicates success; otherwise there was an error.

--*/
{
    int nStatus = 0;
    int nRet    = 0;

    ULONG ulDataSegmentLength = 0;
    int   cbPadding           = 0;
    ULONG cbPDUlength         = 0;
    ULONG ul                  = 0;

    BOOL bAddedPDU = FALSE;

    char *charTempBuf = charBuf;

    PISCSI_GENERIC_HEADER q = NULL;

    // The lpConnection argument cannot be NULL.
    TargetAssert(NULL != lpConnection);

    // Hold the connection mutex.
    nRet = lock_connection(lpConnection);
    if (0 != nRet) {
        fprintf(stderr, "parse_received_data(): 
                                  Error calling lock_connection()\n");
        _flushall();
        return 1;
}
    // Loop until we have dealt with as much of the received data as we can.
    while (1) {
        // Check whether we have less than the minimum 48 byte header.
        if (ulDataLength < 48) {
            *lpulBufferOffset = ulDataLength;
           break;
        }
        // We have at least the header.
        q = (PISCSI_GENERIC_HEADER)charTempBuf;
        cbPDUlength = 48;
        // Get the DataSegmentLength;
        UCHAR3toULONG(q->DataSegmentLength, ulDataSegmentLength);
        cbPDUlength += ulDataSegmentLength;

        // Calculate if there are any padding bytes.
        if (0 != ((ulDataSegmentLength + 48) % 4)) {
            cbPadding = 4 - (ulDataSegmentLength + 48) % 4;
        } else {
            cbPadding = 0;
        }
        cbPDUlength += cbPadding;
        // Check whether we have not yet received the entire DataSegment.
        if (ulDataLength < cbPDUlength) {
            *lpulBufferOffset = ulDataLength;
            break;
        }
        // Create a new PDU element and add it to the list.
        nRet = AddPDUtoList(lpConnection, charTempBuf, cbPDUlength);
        if (0 != nRet) {
            nStatus = 1; // There was an error adding a new PDU 
                         //    element to the list.
            goto label_parse_received_data_finished;
        }
        bAddedPDU = TRUE;
        // Check whether we've reached the end of the buffer.
        if (ulDataLength == cbPDUlength) {
            *lpulBufferOffset = 0; // Reset the buffer to empty.
            break;
        }
        // Update our local pointer into the charBufInitiator buffer.
        charTempBuf = charTempBuf + (cbPDUlength);
        // Update the number of bytes left.
        ulDataLength -= cbPDUlength;

    } // while (1)
    // If necessary, copy the left over data to the start of charBufInitiator.
    if ((charTempBuf != charBuf) && (*lpulBufferOffset > 0)) {
        memcpy(charBuf, charTempBuf, *lpulBufferOffset);
    }
    // If we added a new PDU to the list, then signal the connection mutex.
    if (TRUE == bAddedPDU) {
        nRet = signal_connection(lpConnection);
        if (0 != nRet) {
            fprintf(stderr, "parse_received_data(): 
                                  Error calling signal_connection()\n");
            nStatus = 1;
        }
    }
label_parse_received_data_finished:
    // Release the connection mutex.
    nRet = release_connection(lpConnection);
    if (0 != nRet) {
        fprintf(stderr, "parse_received_data(): 
                                  Error calling release_connection()\n");
        nStatus = 1;
    }
    // Return the status.
    return nStatus;
} // parse_received_data()

Back to Article

1 2 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Mobile