Channels ▼
RSS

Open Source

Hot-Rodding Windows and Linux App Performance with CUDA-Based Plugins


Google protobufs in a Click-Together Framework

For many applications, preprocessing the data can be as complicated and time consuming as the actual computation that generates the desired results. "Click-together tools" is a common design pattern that enables flexible and efficient data workflows by creating a pipeline of applications to process information. Each element in the pipeline reads data from an input (usually stdin), performs some filtering or transformation operation, and writes the result to the output (usually stdout). Information flows through these pipelines as packets of information comprised of streams of bytes. All elements in a click-together pipeline have the ability to read a packet of information and write it to an output. Based on the type of the packet of information, elements in the pipeline can decide to operate on the data or just pass it along to other elements in the pipeline.

A click-together framework naturally exploits the parallelism of multicore processors because each element in the pipeline is a separate application. The operating system scheduler ensures that any applications that have the data necessary to perform work will run — generally on separate cores. Buffering between applications allows very large data sets to be processed on the fly. Similarly, the parallelism of multiple machines can be exploited by piping data across machines with ssh, socat, or equivalent socket-based applications or libraries.

Scalable, high-performance workflows can be constructed by tying together multiple machines across a network or in the cloud. Component and plugin reuse reduces errors as most workflows can be constructed from existing "known working" applications. The flexibility of dynamic runtime loading allows CUDA to be used as a scripting language in creating and testing new workflows. Highly complex multistream workflows can be easily constructed using a load-balancing split operation based on simple socket-based programming techniques such as select() or poll() to determine when a stream is ready for more data. One possible workflow showing the use of multiple GPUs within a system and across a network is represented in the following figure.



Figure 1: Example workflow.

Keep in mind that each stream of information can be written out to disk as an archive of the work performed, to checkpoint results, or to use as the input deck for an application. Experience gained from decades of using this click-together framework has shown that each packet of information needs to be preceded by a header similar to the one shown below:

Version number Size (in bytes) of packet Packet Type ID Size (in bytes) of packet

Figure 2: Example header.

It is important that the header include a version number as it allows libraries to transparently select the correct formatting and serialization methods. For example, data streams that I saved to disk in the early 1980s at Los Alamos National Laboratory are still usable today. For robustness, it is necessary to duplicate the size of the packet to detect "silent" transmission errors. Otherwise, bit-errors in the packet size can cause bizarre failures as an application may suddenly attempt to allocate 232 or 264 bytes of memory (depending on the number of bits used to store the size). I have seen such errors when preprocessing data sets on clusters of machines when the processing takes many weeks to complete. Machine failures can cause successful transmission of bogus information across a TCP network. The redundancy in the size information provides a high likelihood that bit rot in persistent streams will be found. Some disk subsystems, especially inexpensive ones, are susceptible to bit rot caused by multibit data errors. So long as the size is known, the packet information can be correctly loaded into memory where other more extensive checks or error recovery can occur. Even if that particular packet is corrupt, the remaining packets in the stream can be correctly loaded so all is not lost.

Following is a simple definition of a header that can be adapted to a variety of languages.

struct simpleHeader {
  uint64_t version, size1, packetID, size2;
}

Each application, regardless of language, must be able to read and understand this header. The application programmer can decide what to do with each packet of information. At the very least, the programmer can just pass on the packet of information without affecting the packet contents, or discard the packet and remove it from the data stream. Regardless, all header data is transferred between applications in binary format using network standard byte order so that arbitrary machine architectures can be used. This type of streaming protocol has run successfully on systems for decades.

The following pseudocode describes how to read and write one or more packets of information. For clarity, this pseudocode does check that every I/O operation was successful. Actual production code needs to be very strict about checking every operation.

while ( read the binary header information == SUCCESS)
{
•	Compare header sizes (a mismatch flags an unrecoverable error)
•	Allocate size bytes (after converting from network standard byte order)
•	Binary read of size bytes into the allocated memory.

// perform the write
•	Binary write the header in network standard byte order
•	Binary write the packet information
}

Most users will utilize a common data interchange format for the packet data. By appropriately specifying the type of the packet in the header, proprietary and special high-performance formats can also be mixed into any data stream. Whenever possible, binary data should be used for performance reasons. This tutorial uses Google Protocol Buffers (protobufs) because they are a well-supported free binary data interchange format that is fast, well-tested, and robust. Google uses protobufs for most of their internal RPC protocols and file formats. Code for a variety of destination languages can be generated from a common protobuf description. Generators exist for many common languages including C, C++, Python, Java, Ruby, PHP, Matlab, Visual Basic, and others. I have used protobufs on workstations and supercomputers.

The following protobuf specification demonstrates how to describe messages containing vectors of various types. For simplicity, only floating-point and double precision vectors are defined.

package tutorial;

enum Packet {
  UNKNOWN=0;
  PB_VEC_FLOAT=1;
  PB_VEC_DOUBLE=2;
  PB_VEC_INT=3;
}
message FloatVector {
  repeated float values	= 1 [packed = true];
  optional string name = 2;
}
message DoubleVector {
  repeated float values	= 1 [packed = true];
  optional string name = 2;
}

The .proto file is compiled to a destination source language with the protoc source language generator. The following is the command to generate a C++ source package using tutorial.proto. The protoc compiler will also generate Java and Python source packages. Consult the protobuf website for links to source generators for other languages.

protoc --cpp_out=. tutorial.proto

Linux users can install protobufs from the application manager such as "apt-get" under Ubuntu. Windows users will need to download and install protobufs from the code.google.com site. Google provides Visual Studio solutions to help with building the code generator and libraries.

The following file, packetheader.h, contains the methods to read and write the header and packet information in a stream containing multiple protobuf messages. For general use, note that the message type is defined via the enum in the .proto file. Your own message packets can be utilized by adding to these definitions.

For brevity, many essential checks have been left out of packetheader.h. C++ purists will note that cin and cout are changed to support binary information in the method setPacket_binaryIO(). This was done for convenience as it allows the use of pipes to easily "click-together" applications. While not part of the C++ standard, most compilers support binary I/O on std::cin and std.cout. C++ programmers who object to this practice can either change the scripts to manually specify the FIFOs and network connections so they can use binary I/O according to the C++ standard, or they can use the C language. Windows programmers will note that packetheader.h uses the Microsoft provided _setmode() method to perform binary IO.

#ifndef PACKET_HEADER_H
#define PACKET_HEADER_H

#ifdef _WIN32
#include <stdio.h>
#include <fcntl.h>
#include <io.h>
#include <stdint.h>
#include <Winsock2.h>
#else
#include <arpa/inet.h>
#endif

// a simple version identifier
static const uint32_t version=1;

// change cin and cout so C++ can use binary
inline bool setPacket_binaryIO()
{
#ifdef _WIN32
  if(_setmode( _fileno(stdin), _O_BINARY) == -1)
    return false;
  if(_setmode( _fileno(stdout), _O_BINARY) == -1) 
    return false;
#endif
  return true;
}

inline bool writePacketHdr (uint32_t size, uint32_t type, std::ostream *out)
{
  size = htonl(size);
  type = htonl(type);
  out->write((const char *)&version, sizeof(uint32_t));
  out->write((const char *)&size, sizeof(uint32_t));
  out->write((const char *)&type, sizeof(uint32_t));
  out->write((const char *)&size, sizeof(uint32_t));
  return true;
}

template <typename T>
bool writeProtobuf(T &pb, uint32_t type, std::ostream *out)
{
  writePacketHdr(pb.ByteSize(), type, out);
  pb.SerializeToOstream(out);
  return true;
}

inline bool readPacketHdr (uint32_t *size, uint32_t *type, std::istream *in) 
{
  uint32_t size2, myversion;

  in->read((char *)&myversion, sizeof(uint32_t)); myversion = ntohl(myversion);
  if(!in->good()) return(false);
  in->read((char *)size, sizeof(uint32_t)); *size = ntohl(*size);
  if(!in->good()) return(false);
  in->read((char *)type, sizeof(uint32_t)); *type = ntohl(*type);
  if(!in->good()) return(false);
  in->read((char *)&size2, sizeof(uint32_t)); size2 = ntohl(size2);
  if(!in->good()) return(false);

  if(*size != size2) return(false);
  return(true);
}

template <typename T>
bool readProtobuf(T *pb, uint32_t size, std::istream *in)
{
  char *blob = new char[size];
  in->read(blob,size);
  bool ret = pb->ParseFromArray(blob,size);
  delete [] blob;
  return ret;
}
#endif

The program testWrite.cc demonstrates how to create and write both double and float vector messages. The default vector length is 100 elements. Larger messages can be created by specifying a size on the command-line.

// Rob Farber
#include <iostream>
using namespace std;
#include "tutorial.pb.h"
#include "packetheader.h"


int main(int argc, char *argv[])
{
  GOOGLE_PROTOBUF_VERIFY_VERSION;

  int vec_len = 100;
  // allow user to change the size of the data if they wish
  if(argc > 1) vec_len = atoi(argv[1]);
 
  // change cin and cout to binary mode
  // NOTE: this is not part of the C++ standard
  if(!setPacket_binaryIO()) return -1;

  tutorial::FloatVector vec;
  for(int i=0; i < vec_len; i++) vec.add_values(i);

  tutorial::DoubleVector vec_d;
  for(int i=0; i < 2*vec_len; i++) vec_d.add_values(i);
 
  vec.set_name("A");
  writeProtobuf<tutorial::FloatVector>(vec, tutorial::PB_VEC_FLOAT,
				  &std::cout);
  vec_d.set_name("B");
  writeProtobuf<tutorial::DoubleVector>(vec_d, tutorial::PB_VEC_DOUBLE,
					&std::cout);
 return(0);
}

The program testRead.cc demonstrates how to read the header and messages via a stream. The string associated with the optional name in the protobuf message is printed when provided.

// Rob Farber
#include <iostream>
#include "packetheader.h"
#include "tutorial.pb.h"
using namespace std;

int main(int argc, char *argv[])
{
  GOOGLE_PROTOBUF_VERIFY_VERSION;

  // Change cin and cout to binary mode
  // NOTE: this is not part of the C++ standard
  if(!setPacket_binaryIO()) return -1;

  uint32_t size, type;
  while(readPacketHdr(&size, &type, &std::cin)) {
    switch(type) {
      case tutorial::PB_VEC_FLOAT: {
        tutorial::FloatVector vec;
        if(!readProtobuf<tutorial::FloatVector>(&vec, size, &std::cin))
  	    break;
        if(vec.has_name() == true) cerr << "vec_float " << vec.name() << endl;
          cerr << vec.values_size() << " elements" << endl;
      } break;
      case tutorial::PB_VEC_DOUBLE: {
        tutorial::DoubleVector vec;
        if(!readProtobuf<tutorial::DoubleVector>(&vec, size, &std::cin))
	    break;
        if(vec.has_name() == true) cerr << "vec_double " << vec.name() << endl;
          cerr << vec.values_size() << " elements" << endl;
      } break;
      default:
        cerr << "Unknown packet type" << endl;
    }
  }
  return(0);
}

These applications can be built and tested under Linux with the following commands:

nvcc -I . testWrite.cc tutorial.pb.cc -l protobuf -o testWrite
nvcc -I . testRead.cc tutorial.pb.cc -l protobuf -o testRead

echo "----------- simple test -----------------"
./testWrite | ./testRead

On Windows systems:

PB_BASE=protobuf-2.4.1/vsprojectsPB_INC=$PB_BASE/includePB_LIB=$PB_BASE/Release
echo "------------ Building -------------------"
nvcc -Xcompiler "/MD /EHsc" -I $PB_INC -L $PB_LIB testWrite.cc tutorial.pb.cc -llibprotobuf Ws2_32.lib -o testWrite.exe
nvcc -Xcompiler "/MD /EHsc" -I $PB_INC -L $PB_LIB testRead.cc tutorial.pb.cc -llibprotobuf Ws2_32.lib -o testRead.exe
echo "----------- simple test -----------------"./testWrite.exe | ./testRead.exe

The following output was generated under Linux. Windows users will see the same output.

$:~/DDJ023/protobuf_examples$ sh BUILD.linux 
----------- simple test -----------------
vec_float A
100 elements
vec_double B
200 elements


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video