Getting inside the SSH protocol
Glen is an R&D manager at Hummingbird and can be contacted at glenmontreal.hcl.com.
Imagine finding a file on your network containing IP addresses, log-in accounts, and passwords from hundreds of outgoing telnet connections to machines all over the worldand then realizing all of the administrators that you need to contact. It's not just hypotheticalthis happened to me about five years ago. An intruder had penetrated one computer, installed a program to sniff accounts and passwords from the Ethernet (and hidden it), and subsequently harvested at least one file containing authentication information. How could this have been prevented?
It turns out that the venerable telnet protocol, widely used to establish UNIX shell sessions on remote systems, sends and receives all of its dataincluding authentication informationas plaintext, unencrypted ASCII data. While this was not a defect in the original telnet protocol, telnet is a risky thing to use these days where high-school kids have the tools to crack UNIX systems. What was needed was something that could provide the same functionality, and yet encrypt all network traffic so that data could not be captured by lurking intruders. One such solution is the secure shell.
In 1995, Tata Ylven at the University of Technology (Helsinki, Finland) released SSH (short for the "secure shell"), which was so successful that Ylven founded SSH Communications Security to commercialize it. In 1996, SSH2, a second version of the original SSH1 protocol was released. As well, the Internet Engineering Task Force (IET) started the SECSH Working Group to begin standardizing the protocol. The first Internet draft for SSH2 was released in 1997. Currently, some of the specifications are in their 17th revision, with an RFC not yet released. This has not prevented commercial and open-source versions from being successful and interoperable.
In this article, I describe the SSH protocol and the architecture of a Windows client implementation that I and others developed at Hummingbird (where I work).
The IETF is the organization that defines the standard and protocols used on the Internet (http://www.ietf.org/). Various working groups have been formed to deal with topics such as HTML, networking, and communications. The SSH protocol is currently being developed by the SECSH Working Group (http://www.ietf.org/ secsh/). SSH is based on four core Internet drafts (soon to be RFCs): the SSH Architecture, SSH Transport Protocol, SSH Authentication Protocol, and SSH Connection Protocol. There are 14 Internet drafts so far for the SECSH Working Group, all in various stages of revision.
Version 2 of SSH utilizes TCP/IP as its transport layer, using a packet-oriented protocol. The transport protocol basically describes how encrypted sessions are set up, including key exchange and derivation. It also describes the basic SSH packet format. Figure 2 illustrates this, showing the lengths of each field. The packet length pl is (1+n1+n2), where n1 is the payload or actual data being transported, and n2 the random padding required to make the packet a multiple of the cipher block size.
To be sure of correct transmission, a Message Authentication Code (MAC) is appended to each packet. This is a cryptographic checksum or hash of the packet; by calculating this checksum upon receipt of the packet and comparing it to the received value, the application can decide if the data has been correctly transmitted. The length (m) of this field depends on the cryptographic hashing algorithm useda size of 20 bytes is not uncommon.
Before being encrypted, the payload data are compressed first. While reducing the payload sizes (thus helping to make up for added protocol overhead), it also has the benefit of obscuring the data prior to compression. This generates a more or less uniform distribution of byte values, thus helping to defend against known plaintext cryptographic attacks. (If you always have the same text in the same position, for example, then this can help attackers crack your encryption.)
The minimum packet size, excluding the MAC field, is 16 bytes. Thus, when a packet is in the process of being received, this amount of data can be presumed to be incoming, even though the length field is encrypted. If there is less data than this, the packet is padded (the padding field) to make up the required amount. Since the length field is encrypted, a packet needs to be received in two stages. The first stage is the first guaranteed 16 bytes. Once the initial chunk has been decrypted, the length can be used and the remainder of the packet, if any, including the MAC field, can be read.
Encryption, compression, and the MAC calculation are initially disabled. (All of these are negotiated between client and server.) Thus, the first several packets are sent unencrypted. You might think that this would let attackers defeat the protocol, but this isn't so. SSH negotiates the cryptographic algorithms to be used in subsequent symmetric encryption, then uses public-key cryptography to securely derive a shared secret. Unless you are one of the two parties participating in the communication, it cannot be calculated by observing the traffic. This shared secret (K) is then used to generate a set of symmetric cipher keys used in the subsequent communications.
If speed was irrelevant, then public-key cryptography could be used for everything. Unfortunately, public-key cryptography is slow, whereas symmetric (non-public-key cryptography) is fast. Symmetric ciphers are used because of their speed. The key element of a symmetric cipher is that the same key is used for encryption and decryption. So the client and the server both need to possess the key; this is the downside of symmetric cryptography. The need to share this key securely is a problem associated with symmetric ciphers, which is solved by using a key-exchange algorithm.
In SSH, key exchange is accomplished via the Diffie-Hellman Key Exchange Protocol, which shares a secret between client and server without ever transmitting this secret in the clear across the network. So the expensive partpublic-key cryptographyis used briefly to effect a secure exchange of a symmetric keythe shared secretafter which fast symmetric cryptography is used for encryption/decryption in the remainder of the connection.
Server authentication is an important consideration. Without it, clients have no way of knowing if the connection has been made to the correct host. The server (sshd) sends a public key to the client (which is used in negotiating the key exchange). The client must decide whether to accept this key; if accepted (and cached on the client), then subsequent communications cannot be hoaxed through strategies such as man-in-the-middle attacks.
This first connection is potentially a weak point, unless the server public key can be independently validated. In the absence of a public-key infrastructure, the Internet draft recommends users be given the opportunity to accept or reject the key. While not optimally secure, it provides a higher level of security than plain telnet, yet at a relatively low risk. Using Kerberos key exchange, even this risk can be eliminated since the Kerberos protocol provides for mutual authentication between client and server. (For more information on Kerberos, see "Kerberos Versus the Leighton-Micali Protocol" by Aviel D. Rubin, DDJ, November 2000.)
Figure 3 illustrates the packet exchanges required for the initial negotiation. Each of those packets has a specific format and SSH record type. All of these packets are unencrypted and uncompressed until the NEWKEYS packet. All subsequent packets sent after the NEWKEYS message must be encrypted and compressed as negotiated.
From the point of view of attackers, these packets are readable but not helpful since no authentication information is passed. Once encryption is turned on, the authentication service can be requested (Figure 4), but potential attackers cannot decipher subsequent exchanges.
Once an encrypted tunnel has been established, the connection can be authenticated. Even standard username/password authentication ("password") is now secure, since the strings are no longer sent between client and server in easily readable ASCII text, but as part of an encrypted packet. However, there are other authentication methods that are more powerful and flexible.
For example, "public key" authentication requires that a public/private key pair exist for users, with the server possessing the public key and the client having both public and private keys. In this case, the client signs the session key with the private key, and the server decrypts it with the public key. Another authentication method is "keyboard-interactive," which is designed to interface the authentication exchange with cryptographic devices (such as a SecureId card) that users hold. In this way, a challenge is emitted by the server and users input the response (copied from the cryptographic device).
A proposed authentication method uses Kerberos tickets to provide mutual authentication with servers. The advantage of this is that servers and users are authenticated via the Kerberos protocol, thereby avoiding a potential man-in-the-middle attack. Kerberos is accessed via the Generic Security Services Application Program Interface (GSSAPI) and potentially future security providers might be able to provide similar services.
The SSH-user-auth service is initiated by the client at the end of the initial packet exchange via a user-authentication request message that requests a specific authentication method, and provides authentication information. The server can then reply with SUCCESS, FAILURE, or FAILURE With Partial Success. Method-specific messages are also possible. If NONE is used, then the server replies with all of the methods that it is willing to negotiate. Consequently, the client can discover supported server authentication methods.
FAILURE With Partial Success means that the authentication information was correct, but that the server requires further authentication information from another authentication method. For example, the security policy might require keyboard interactivety, along with knowledge of the username and static password. In this case, the server would require both authentications to be successful before declaring the result of the authentication protocol a success.
The connection protocol describes how data is passed across the secure connection that has been negotiated by the application of the previous protocol layers; this secure connection is called a "tunnel." Within tunnels, an arbitrary number of separate logical connections can be made, called "channels" (Figure 1). If you could program an application to invoke SSH routines, channels would come to replace sockets as the transport mechanism, although the underlying transport protocol remains as TCP/IP.
There are several main packets that are used in this stage of the protocol. CHANNEL_OPEN is used to start a channel, and CHANNEL_CLOSE to stop a channel. CHANNEL_REQUEST lets the sender request various operations on a channel. Sending/receiving data uses the CHANNEL_DATA packet.
To give both client and server control over the rate of data transmission, flow control is used. SSH flow control is implemented by maintaining a maximum window of unacknowledged dataonce this is exceeded, no more data can be transmitted on that channel until the window size has been adjusted upwards by the opposite party. Both directions of data transmission have separate windows, which are adjusted by CHANNEL_REQUEST packets.
Figure 5 shows the formats of several different packets. Their data elements are straightforward: bytes for Boolean values, 32-bit numbers in network byteorder for numerical values, while string data types are BLOBs of data, prefaced with four length bytes in network byteorder (network byteorder is Big-endian; Intel PCs are Little-endian).
SSH is intended to carry terminal emulation traffic (after all, it is a shell, albeit remote, in its most basic aspect). But any application that uses sockets has the potential to use channels within a tunnel. There are several flavors of indirect channel connections, or port forwardings: local (outgoing), remote (incoming), and X11 (a variation on remote port forwarding). As well, agent forwarding lets remote SSH servers authenticate on the client machine by talking via a channel connection to agent processes.
Port forwardings mix TCP/IP socket I/O with channel I/O. In Figure 7, the SSH client has a preconfigured local port that it listens to for incoming socket connections. When it receives one, it opens a new channel in the corresponding active tunnel. The connection could come from anywhere, unless remote access to this port forwarding has been disabled. The client opens the channel by specifying DIRECT-TCPIP in the CHANNEL OPEN request, as well as the target server address and port. On the remote side, when it receives the CHANNEL OPEN request, the SSH server connects to this IP address and port. Both SSH client and server now act as proxies for the connectiondata coming in via the socket connection at one end is forwarded through the tunnel via this channel, then written out to the socket connection at the other end. Data arriving and being sent on the socket connections is, of course, in clear textencryption only occurs for data sent through the tunnel.
Remote and X11 port forwardings are similar. However, in this case, the SSH client enables the port forwarding with a CHANNEL REQUEST to the remote SSH server. The CHANNEL OPEN is initiated by the remote server when it receives a socket connection on the local port that it is listening to. X11 forwardings differ in that the parties at the end points of the socket connections are X-application and X-server.
Subsystems can also be run across SSH. The prime example of this is sftp, which is intended to be a replacement for the problematic classical ftp protocol. Ftp uses multiple socket connections to accomplish file transfers between client and serverone control socket to issue commands and receive results, and one data socket for every file to be transferred. Sftp uses SSH as a transport layer, with its packets being entirely encapsulated within the SSH data packets. Sftp also uses only one channel for both commands and data transfers.
From the UI point of view, sftp seems similar to ftp because clients often support ftp format commands. However, internally it is quite different. The commands in the protocol are not ASCII strings, but binary. As well, it cannot operate without SSH because it has no authentication or encryption of its own. Figure 5 shows how sftp packets are encapsulated within CHANNEL DATA packets.
At Hummingbird, we decided to provide our own implementation of SSH to complement our connectivity software. We already had a version of SSH available, but the architecture was clumsy (each connection could only handle one channel per tunnel and required a separate process for each tunnel). I felt we could do better by starting from scratch. Additionally, our earlier version was essentially an old port of OpenSSH to Windowsand this posed maintenance and eventual interoperability problems.
Our newer SSH implementation consists of a number of COM components for the underlying engine. As well, there are other components for key management and sftp, in addition to a GUI to glue everything together. The SSH engine design consists of the following COM objects (actually, there are several others, but they don't add much to this description):
- ITunnel, abstracts an SSH tunnel.
- IChannel, abstracts a channel within a tunnel.
- ITunnelManager, allows enumeration of existing tunnels.
- IHUMSSHManager, allows tunnels to be created out-of-process and within the COM server.
By making these classes COM components, we could make it easy to invoke the SSH engine and centralize all of the tunnels and channels system-wide within a single process, which would be multithreaded for performance reasons. This SSH engine process was set up as a COM server that could have a lifetime beyond that of applications that invoked it. It is loaded on demand whenever a channel or tunnel is created. This is the repository for global data; in particular, a list of active ITunnels that new applications could enumerate and connect. The IHUMSSHManager component creates instances of COM objects in this COM server process and returns a COM pointer to the client.
The created classes are out-of-process with respect to the client, and all data transfer occurs via COM. This caused some marshaling problems in COM because we wanted to return an arbitrarily sized bufferthe buffer needed to be allocated on the server side, and copied into a COM-allocated buffer on the client side. We needed some COM wizardry to get this to work properly: The IDL required for the IChannel Read() method is an example of it. Here, the value in the size_is() parameter is set on the server side when the size of the returned buffer is known:
[out, size_is( ,*readLen )] BYTE** data,
[out] long* readLen, [out] long* status);
Without declaring Read() like this, data would not actually be returned to the client. Both the data buffer and the readLen value are output variables and are returned to the client, as is the status variable. Compare this to the IDL for the Write() method. The size_is() parameter is set on the client side, as is the value in the writeLen variable, before the call because the client knows how much data is being written:
[in, size_is(writeLen)] BYTE*data,
[in] long writeLen, [out] long* status);
ITunnelManager provides a way to locate active tunnels and to obtain a COM pointer to them. The other way to obtain a tunnel COM pointer is to CoCreate an instance of IHUMSSHManager and then to use the CreateTunnel method to create a tunnel within the COM server. If you simply performed a CoCreateInstance() of ITunnel in your application, this would work, but the instance would be in the wrong place, in-process to the application, but out-of-process to the COM server.
There are four main methods in the IChannel interface: Two are required to start/stop the channel and two--Read() and Write()--are used to send/receive data. The semantics are similar to socket semantics (though not identical).
We also created the concept of a tunnel profile. This lets us maintain information about a tunnel's configuration (server address, port, authentication method, and so on) between invocations and provided an object to select and edit.
To start a connection, an application could create an ITunnelManager and enumerate the existing tunnels. If a tunnel existed for the profile, then the client application could reuse it by connecting to it and creating a new channel within the tunnel. On the other hand, even if the tunnel already existed for a profile, the client application could decide to create a new tunnel and start its channel there.
An alternative way of starting connections requires no knowledge of tunnels. Instead, the application specifies a tunnel profile, and then instantiates and starts an IChannel. The IChannel methods determine that there is no attached tunnel, and then use the profile information to create a tunnel on behalf of the application, start it, and then start the channel within this tunnel.
The net effect of this simplification is that for the application, channels start to have the similar semantics as sockets (but by no means identical, of course). As a consequence, getting an application to use direct COM SSH connections becomes easy. TestTunnel.cpp (available electronically; see "Resource Center," page 3) is a testbed application that connects in this way.
The internal architecture of the SSH engine is that of a multithreaded apartment (MTA) COM server (Figure 8). This lets a process make multiple calls into the COM server. Why would this be useful? Consider a terminal emulatorfor greater responsiveness, there are three threads running: the user interface and separate read and write threads. The write thread makes calls to the SSH engine upon demand. However, the read thread would always have a read call pending against the COM server; if there is no data, it is blocked there. If the architecture was not MTA, then the write call would be blocked in the client until the read call returned. Clearly this is unacceptable.
To handle this correctly within the SSH engine, each active tunnel runs under its own thread, which is called the IORequest thread; the name indicates its function, which is to handle socket I/O as well as client requests. COM method calls communicate with this thread via a message queue called the IORequest queue. For example, the Write() method request for IChannel would place its request on the queue, signal the IORequest thread that work was waiting, and then wait until notified of completion.
In a typical call through the server (ee Figure 6), the client calls Read(). The arguments are then marshaled on the client side in the Read() proxy and sent over to the COM server onto a worker thread running the Read() stub. Within the Read() stub, the arguments are unmarshaled and presented to the CChannel Read() method. After doing this, the client blocks and waits until the Read() routine returns.
By virtue of having opened the channel within a particular tunnel, the CChannel class is connected to the tunnel's IORequest queue. Once the Read() method determines that there is no data in the incoming buffer (data would have been placed there asynchronously with respect to the client call, had it arrived within the tunnel), the Read() issues a request to the tunnel to read data. The request is formatted and inserted in a threadsafe manner onto the IORequest queue. It also signals a tunnel event to tell the IORequest thread that work has arrived. The Read() method then waits on an event variable contained within the request.
The IORequest thread has been waiting on the tunnel event variable if nothing has been happening in the tunnel. When it wakes up, it scans the IORequest queue and extracts the request from the Read() method. If there is no data available (likely, since if there was data, the Read() method would have simply extracted it from the CChannel buffer without inserting a request into the request queue), the IORequest thread stores the pending request into the CChannel state data, then goes back to sleep if no other events are waiting to be serviced. When data arrives, the IORequest queue decodes the channel number and identifies the IChannel instance and stores the data into the channel input buffer there. The Read() method event that had been waiting for data to arrive is now signaled. It wakes up, extracts the data from the buffer, and returns. The COM stub marshals the data and returns it via COM to the waiting client application.
A variant of the above happens for every client call, for every channel, and for every tunnel within the SSH engine process.
The SSH engine is highly multithreaded (see Figure 8). Every port forwarding runs on its own thread, to better multiplex the multiple incoming and outgoing sockets. All local (outgoing) port forwardings have a separate listener thread to handle incoming socket connections: When one arrives, another thread is started to handle this specific connection. Thus, the IORequest thread doesn't act as a bottleneck for any of these functionsall of the threads are event driven. There is no polling in any of the threads.
Agent forwarding is also handled as a separate thread per tunnel, and this thread exists only while the agent forwarding channel is open. When an event occurs that causes the creation of one of these threads (a socket connection), a new channel is created and attached to the tunnel. Incoming agent connections, remote port forwardings, and X11 port forwardings have their channels created by the remote SSH server. In the SSH engine, a clientless channel is created to handle the state information. The real client is at the other end of a local socket handled by one of these threads.
Each IORequest thread is scheduled by the system. Subtasks such as port forwardings or agent requests run on separate threads. They are loosely coupled to the IORequest thread, since all channel I/O goes through the IORequest queue just like an external client. The CChannel Read() and Write() methods are used to transfer data between the forwarding thread and the tunnel thread. None of these threads perform blocking reads or do any sort of polling. Events are used to synchronize their actions, and to initiate processing, be it for request processing or for handling of socket I/O.
The CTunnel class holds the state information about an active tunnel within its member variables. A particular CChannel class that corresponds to an open channel within a tunnel holds its state information within its member variables, including separate input and output CBuffer classes.
When the IORequest thread receives data for a channel, it calls methods in the output CBuffer class of that channel to insert it. The core methods of the CBuffer class that insert and remove data are serialized with critical sections. Therefore, channel Read() and Write() methods can operate correctly from multiple threads on these data buffers in an asynchronous manner, at the same time that the IORequest thread is manipulating the buffers.
One of the most recent additions to the SSH engine that I've done is support for Kerberos authentication, based on the IETF draft 6 document (draft-ietf-secsh-gsskeyex-06.txt). Just as I finished, the IETF SECHSH Working Group released Draft 7, in which the specification that I had implemented had been almost completely changed. Since none of the drafts have been accepted as RFCs yet, there will still be a number of changes, though hopefully not as dramatic as this one.
We took some care to design the COM interfaces such that they could be invoked from a scripting language. The SSH engine can be easily incorporated into an application, so that instead of using a socket, the application uses a channel within a secure and authenticated tunnel. Certainly, the development of the programming API for SSH is something that will continue.
SSH is a protocol under active revision and standardization. This doesn't mean that it isn't usable nowfar from it. In fact, it is a mature and flexible security protocol, with a number of implementations (both commercial and open source) that interoperate with each other. It's also a fascinating protocol to program.