An alternative protocol to TCP
Ian currently works as a development consultant for Marimba and his interests are in computer security and networking. He can be contacted at ian_barile yahoo.com.
The Stream Control Transmission Protocol (SCTP) is a new transport layer protocol proposed in RFC 2960 (http://www.faqs.org/rfcs/rfc2960.html). Like TCP and UDP, SCTP uses the network layer protocol for intersystem communication. SCTP is a connection- oriented protocol, similar to TCP that provides reliable communication between two endpoints. Unlike the byte stream TCP uses for data transport, SCTP sends data in message chunks similar to UDP packets.
SCTP was originally developed for telephony signaling networks. Integrated Services Digital Networks (ISDN) is one SS7 protocol implemented on SCTP. SCTP has many features that make it a beneficial protocol for use outside of the telecom industry.
SCTP's unique features such as native virtual streams and multihomed support make SCTP an attractive alternative to TCP. Over the next several years, SCTP will start to play a role in Internet applications. SCTP stacks are available for almost every operating system. To find an implementation for a particular OS, visit http://www.sctp.org/implementations.html. In this article, I use the LKSCTP implementation (LKSCTP is integrated into the 2.6 Linux kernel; http://www.kernel.org/).
Types of Socket Communication
Several types of socket communication are available for use in developing network applications. Developers must choose the transport layer protocol and technique that transmits data efficiently for applications. Transport layer protocols transmit data between two endpoints on IPv4 networks using the following methods:
- Unicast. One endpoint to one endpoint.
- Broadcast. One endpoint to all endpoints on a subnet.
- Multicast. One endpoint to many.
TCP and SCTP are connection-oriented protocols that only support unicast data transmission. UDP is a connectionless protocol that supports unicast, broadcast, and multicast data transmission. Connectionless protocols send data to endpoints regardless of the endpoint ability to receive and process the data. Connection-oriented protocols ensure reliable data transmission, flow control, and error control.
Unicast Data Transmission
When developing applications that require unicast data transmission, you must choose the correct transport layer protocol. SCTP, TCP, and UDP have unique features that can benefit your application. SCTP is a connection-oriented protocol similar to TCP. SCTP provides error and flow control for data packets such as TCP. SCTP also gives you the ability to tag data through the use of virtual streams and protocol payload identifiers. SCTP sends data in messages similar to UDP packetsunlike the TCP byte stream.
TCP uses a full-duplex byte stream to transmit data between endpoints. TCP ensures reliable, sequenced data transmission between two endpoints by using flow- and error-control algorithms.
Like IP, UDP transmits data using unreliable, unordered datagram packets. Any packet reordering and validation occurs in the application layer. UDP is the fastest transport layer protocol due to the lack of error or flow control.
SCTP's virtual streams and the protocol payload identifier let you reduce application complexity and simplify application layer protocols by making it easy to identify data types. Virtual streams benefit you by letting data types be tagged by the stream identifier, reducing the need to parse data types out of the TCP byte stream. When implementing the application protocols HTTP and SMTP over TCP, different data types are identified by text headers (MIME headers). Parsing a TCP byte stream for text headers is a complex, costly process, compared to using the SCTP stream identifier (a WORD value) in the SCTP [DATA] packet header.
To set up the number of inbound and outbound virtual streams for an SCTP association, two setsockopt() calls must be made with the option name SCTP_INITMSG and SUBSCRIBE_EVENT (see Example 1). SCTP-specific setsockopt() calls must pass SOL_SCTP for the level parameter. When an SCTP endpoint receives a message from an associated endpoint via the sctp_recvmsg() API, the sctp_sndrcvinfo structure specifies the virtual stream via the sinfo_stream member (Example 2).
Other application layer protocols such as FTP require multiple connections to communicate between a client and server. FTP servers and clients could be reimplemented with the "data connection" and the "control connection" being sent over one SCTP association. The multiple "connections" would be on separate virtual streams or have unique protocol payload identifiers.
SCTP native multihomed support enables an endpoint to send data transparently over multiple networks by binding an endpoint to multiple IP addresses. SCTP determines the IP address it will transmit data on by analyzing the routes between the addresses associated with an endpoint. When multiple addresses are available to an association, SCTP uses a default address. If the default route fails, the SCTP endpoint switches to another address in the transport address list. This enables clients to connect to servers over completely different networks reducing dropped connections due to connectivity issues and latency on a network; see Figure 1.
When using SCTP to support multiple interfaces on a single SCTP socket, the sctp_bindx() call is used to bind the socket to more than one interface. sctp_bindx() (Example 3) differs from the regular socket bind in that it allows addresses to be bound and removed from the socket dynamically. There is an optional call, sctp_connectx(), which enables an endpoint to connect to multiple addresses on a remote endpoint. At this writing, the LKSCTP implementation hasn't used the sctp_connectx() call.
Security is an ever-increasing concern when developing applications. When developing a network application, you must understand the security issues associated with the transport layer protocol used. SCTP has been designed to mitigate certain types of security threats.
A common transport layer attack, Denial of Service (DoS), tries to reduce the number of new inbound connections a system can create by tying up resources through a [SYN] flood. SCTP mitigates DoS attacks by using a cookie to maintain state information during the initialization handshake. The SCTP cookie contains data, like the Transmission Control Block, used in system resources once an endpoint enters the established state. An established SCTP socket protects against random and malicious packets through validation tags. If a packet is received with an invalid validation tag, the packet is discarded. SCTP relies on other technologies to provide protection against data corruption (IP Authentication Header RFC 2402) and protecting confidentiality (IP Encapsulating Security Payload RFC 2406).
If your application requires a connection-oriented protocol to communicate between endpoints, you can choose between SCTP and TCP. When developing the applications, knowledge of how the connections are initialized, terminated, and data is transferred leads to better application design and faster debugging.
Establishing an association between two endpoints requires the use of network APIs in the client and server to negotiate the association and allocate system resources. The socket APIs used for establishing TCP associations are used in establishing an SCTP association. SCTP also provides an implicit association to be created between endpoints. The socket APIs have been extended to to support SCTP-specific functionality. Table 1 lists the network APIs.
When client and server applications establish an association between endpoints, the endpoints enter the protocols initialization sequences. The SCTP initialization sequence has a four-way handshake where TCP has a three-way handshake. The TCP initialization sequence in Figure 2 illustrates the [SYN]/[ACK] packet exchange required to establish communication between the endpoints. The endpoints use the sequence numbers established in the [SYN] packets to allow TCP stacks deliver sequenced data to the application.
SCTP's initialization sequence uses a four-way handshake. The extra step in SCTP's handshake is used to pass a cookie that maintains state and validate the endpoints; see Figure 3. The initialization steps for two SCTP endpoints A and Z is as follows:
- Association [INIT] is sent. The Init contains endpoint A maximum inbound virtual streams, number of outbound virtual requested, and the address list of the host.
- Endpoint Z sends an [INIT ACK]. The [INIT ACK] contains endpoint Z's [COOKIE] packet, Cookie_Z, number of virtual outbound streams, maximum number of inbound streams, and the address list.
- Endpoint A sends the [COOKIE ECHO] packet.
- Endpoint Z receives the [COOKIE ECHO] and sends the [COOKIE ACK]. Endpoint Z is now in an established state waiting to receive data.
- Endpoint A receives the [COOKIE ACK] and the association enters an established state. User data is now allowed to be sent.
During the initialization sequence, the number of inbound and outbound virtual streams, along with the address lists for the endpoints are exchanged. Each endpoint specifies its maximum number of inbound streams (MIS) and requests the number of outbound streams (OS). If the number of outbound streams is greater than the MIS, the OS must be adjusted to match the MIS stream count or the association must be aborted.
If an SCTP endpoint needs multihomed support, an address list must be present in the initialization sequence. There are three possibilities for how addresses are listed in the [INIT] and [INIT-ACK] datagrams:
- No IP addresses list and no hostname. SCTP will use the IP address specified in the datagram header and no multihomed support is provided.
- IP address list. SCTP will use the IP addresses for its transport address list and provide multihomed support.
- A hostname. SCTP resolves the hostname to IP addresses; these addresses are used in the transport address list.
TCP and SCTP Bulk Data Flow
Once an association has been established, data can be sent between the endpoints. Both TCP and SCTP provide reliable data transmission through similar error- and flow-control algorithms. SCTP can send data using the same APIs as TCP and UDP as well as a new SCTP-specific APIs:
- send() and recv() (TCP and SCTP only).
- sendto() and recvfrom().
- sendmsg() and recvmsg().
- sctp_sendmsg() and sctp_recvmsg() (SCTP only).
The following client-server source files (available electronically; see "Resource Center," page 5) illustrate how to use the APIs with the SCTP protocol to communicate between two applications:
- server_recv.cpp and client_send.cpp illustrate how SCTP can be used like a TCP to communicate between a client and a server.
- tcpserver_recv.cpp and tcpclient_send.cpp are a simple TCP client and server to illustrate that SCTP and TCP can overlap when compared to server_recv.cpp and client_send.cpp.
- server_recvmsg.cpp and client_sendmsg.cpp illustrate how SCTP can send data using scatter/gather I/O APIs. sctp_sendmsg() API is implemented using the recvmsg() and sendmsg() APIs.
- server_sctprecvmsg.cpp and client_sctpsendmsg.cpp demonstrate how SCTP APIs allow communication between a client and server using multiple virtual streams.
To understand the differences between how SCTP and TCP manage data transmission, you need to look at the protocol headers and control messages.
TCP designates data being sent to an application by using the [PSH] flag in the TCP header. To ensure that data sent from an endpoint has been received, TCP uses an [ACK] that specifies which [PSH] packets have been received. [ACK]s are sent in response to [PSH] datagrams in two different circumstances:
- When data has been received.
- The ACK delay has been reached.
TCP provides flow control and error control through the use of a receiving window. The receiving window informs the sender of the size of the buffer that is available to receive data. If the sender hasn't received an [ACK] after filling the receiving window:
- The sender must wait for an [ACK] to send new data.
- Wait on the retransmission timeout to retransmit unacknowledged data.
- A TCP endpoint will also retransmit data if an [ACK] for a specific data packet is continuously sent. This situation occurs if a packet is dropped or damaged. If packets are lost or damaged, an exponential back off is applied to the retransmission timer. The retransmission timeouts are not readjusted until a packet that isn't being requested for retransmissions [ACK] is received.
SCTP's model for bulk data flow uses many of the techniques that TCP has for sending data and flow control. SCTP handles dropped and delayed packets differently than TCP's ability to manage virtual streams. If data is dropped then the [SACK] sent will contain the transition sequence number (TSN) for the last packet received. The [SACK] packet contains Gap ACK blocks specifying which packets need to be retransmitted. SCTP also uses Gap ACK blocks to report that packets where dropped because buffers filled up and the data can't be forwarded to the application. When packet loss occurs for one virtual stream, SCTP continues to deliver data on other virtual streams while delaying the affected virtual stream. SCTP supports the ability to deliver data on a virtualized stream in an ordered or unordered fashion to an application.
SCTP supports message bundling and fragmentation. SCTP fragments a message if it is larger than the smallest maximum transfer unit (MTU). SCTP doesn't support refragmenting data chunks. If SCTP messages are smaller than the smallest MTU value, SCTP will bundle packets into a single data chunk to improve performance. When SCTP bundles packets, all control chunks are packed at the beginning of the SCTP packet.
Once an association has been established, data can be sent between the endpoints until the association is terminated or severed. The two network APIs that allow for the termination of an open association are shutdown(socket,how) and close(socket). The close() terminates the SCTP association. shutdown() terminates an association differently for SCTP than TCP because SCTP doesn't have to have closed semantics. See draft-ietf-tsvwg-sctpsocket-08.txt for the semantics of shutdown().
A connection-oriented protocol uses the termination sequence to notify endpoints that data can no longer be received and that system resources should be released. The TCP termination sequence is a four-way handshake. Each endpoint sends a [FIN] and receives an [ACK] from its corresponding endpoint (see Figure 4). TCP allows for one endpoint to terminate its association and the other to still transmit data, resulting in a half open socket.
SCTP's association-termination sequence is a three-way handshake. SCTP doesn't require a fourth step because it doesn't support half-open socket communications. When an endpoint is entering the shutdown pending state, it will send all remaining data chunks and the [SHUTDOWN] chunk. When an endpoint receives a [SHUTDOWN], it sends the remaining data chunks and ACKS before the [SHUTDOWN ACK], refusing any new data and enters the shutdown-received state. The endpoint originating the [SHUTDOWN] datagram upon receipt of the [SHUTDOWN ACK] will send a [SHUTDOWN COMPLETE] finishing the shutdown sequence (see Figure 5).
SCTP is an exciting new transport layer protocol that offers an alternative to TCP. SCTP's virtual streams and multihomed support can reduce application complexity and increase reliability. The similarities between TCP and SCTP APIs will ease porting and creating new applications using SCTP.