Dr. Dobb's | File Formats & Automotive Data Acquisition

File Formats & Automotive Data Acquisition

When it comes to automotive data acquisition, the files generated while a vehicle is being tested are as different and varied as the vehicles themselves. Lee presents a tool that enables viewing, filtering, or analysis of this disparate data.

November 01, 1998
URL:http://www.drdobbs.com/database/file-formats-automotive-data-acquisitio/184410717

Nov98: File Formats & Automotive Data Acquisition

Lee is a project engineer for Michigan Scientific Corp. Lee can be contacted at [email protected].

In the field of automotive data acquisition (ADA), the files generated while a vehicle is being tested are as varied as the types of vehicles themselves. A single test, for instance, may use one or more types of file formats for output. This file format depends on the type of ADA system, the type of sensors installed (temperature, acceleration, strain, and so on), or even budget constraints. The file format may also depend on the personal preference of the engineers performing the testing. If this is true, the engineers in one vehicle group may be using a different format than those engineers in another group in the same company. As you can imagine, this situation can cause problems when data files need to be shared or compared between groups.

Regardless of the sensors or ADA system used to acquire the data, the end result is normally a file contained on a laptop or desktop computer. Unfortunately, this file often is not user friendly, since it typically is a custom format that prohibits viewing the data using standard means, such as text and spreadsheet programs. Although some ADA system software provide means to convert the file into standard ASCII format, even this workaround is less than ideal because of the increase in file size when converting from the custom binary format to standard ASCII. It is also less than ideal because of the increased computer requirements to process the much larger ASCII file in a timely manner. In short, what is needed is a way to read the ADA system file in its true form that allows viewing, filtering, or analysis of the data.

One recent project I was involved in required that I write an application for reading, analyzing, and writing particular types of ADA files. In reviewing the project requirements and considering other current and future projects, I came to the following conclusions:

Reusing code from one project to the next would be ideal.
Each project had three main sections: reading, analyzing, and writing data files.
The code would be available for future ADA system changes.

Based on these conclusions, I focused on reading data files, since coding for that part of the job is constant from project to project.

For starters, I used the Hexworks binary/hex editor from BreakPoint Software (http://www.bpsoft.com/) to peek inside the file formats I needed to read. In doing so, I discovered that each file was composed of two main sections divided by a smaller section; see Figure 1. The first section of the file is the header. It contains ASCII characters that make up a small portion of the total file size. These characters make up numerous Field Pairs consisting of a Field Name and a Field Value that describe the various parameters of the file. The Field Names are specific for each type of file format and cover areas such as FORMAT, FILE_TYPE, and CHANNELS. The Field Values change based on the vehicle test and describe the corresponding Field Name, such as BINARY for FORMAT, TIME_HISTORY for FILE_TYPE, and 35 for CHANNELS as in Figure 1. These various Field Pairs repeat until the many aspects of the data file have been described.

The next section in the file is a separator that serves only as a dividing line between the header section and the data section. This separator is unique for each ADA file format and may be represented by a single character (normally nonASCII) or a keyword, such as DATA, which is used in a different file format.

The final section in the file is the actual data of the vehicle test stored in binary format. This is done to reduce the overall file size (since storing a floating-point value such as 1234.5678 in ASCII would require a single byte for each digit and the decimal point, yielding a total of nine bytes). If you also add ASCII spaces between the numbers (to separate the individual values so that 12.34 and 56.78 don't become unreadable as 12.3456.78), the byte count for each value gets even larger.

Using this information, I created the C++ DataFileReader_MSC class in Listing One (at the end of this article) for loading and manipulating ADA data files. The first step was to outline the specific member data items and methods that would be needed. This is where I began to use the Genitor Object Construction Suite (http://www.genitor.com/). Because of its visual class layout, drag-and-drop construction style, and maintenance of both the header and source files (which eliminated the painful process of changing both files when the class needed a modification), I was able to quickly put together the class outline in Table 1. This made it possible for me to quickly move on to coding the DataFileReader_MSC class (DFR for short). Because I could declare methods, data items, and comments in Genitor -- and allow it to create the corresponding source and header files -- I was able to focus more on my class as an object, rather than on several separate entities that had to be in sync.

The DFR class contains all of the methods and data items common to each of the ADA file formats, and serves as a base class to derive specific reader classes. (Table 2 lists the coding conventions I used to create the DFR class.) Its main purpose is to load the entire data file into a memory buffer and provide pointers to the header and data portions. During initialization, these pointer values are set to NULL and the file size is set to 0. When the LoadDataFile_DFR method is called (upon construction of the class or independently), the data file is loaded into memory with the header and data pointers both being set to the beginning of the file. I've also included minor error checking that returns the cause of most errors.

Although I compiled the DFR class for use under Windows 95/NT, it should be easy to port to other operating systems or CPUs. The complete DFR class (available electronically; see "Resource Center," page 3) includes: DataFileReader.dx, an export file for the Genitor Object Construction Suite; DataFileReader.hlp, a Windows help file created with Genitor; DataFileReader.cpp, the C++ source code for the DataFileReader_MSC class; DataFileReader.h, the header code for DataFileReader_MSC; dfrsample.cpp, the C++ source code for a sample program that uses DataFileReader_MSC; dfrsample.def, the Inprise definition file for the dfrsample project; dfrsample.exe, a sample executable program; DFRSample.ide, the Inprise project file; and testme.dat, a sample data file.

It is up to the derived class or application to set the header and data pointers to valid positions in the file for the specific file format. This is enabled through the SetPointer_DFR method, and it may be retrieved with the GetPointer_DFR method.

The LoadDataFile_DFR method can be called as many times as there are data files due to the buffer check in the beginning of the method. If the buffer contains a file, it is emptied and then refilled with the new file. The destructor simply releases the buffer memory if it was used.

Listing Two is a simple console mode application that uses the DFR class to read a data file (all text in this case -- find the token separator, set the data pointer, then send the data portion to the console). The file dfrsample.exe (available electronically) can then be used to read the sample data file testme.dat; see Listing Three. The command-line format is <prompt>dfrsample testme.dat. After the sample program finishes, you should see the data portion of the file on your screen.

Conclusion

After completing DFR and creating the rest of the classes, I used Genitor to create the online help (Microsoft Windows Help files) and printed documentation (Microsoft Word files) from comments, descriptions, and the like. After I completed the initial project that uses the DFR, I began to get ideas for future versions of the class. One direction I have considered exploring is the removal of some of the methods and data items. It is possible to shift the header and data pointers into the derived class rather than include them at this base level. That would essentially convert this class to a simple file reader, but more error checking could be added without overcoding the class. A decrease in speed was encountered when the DFR class was used to load extremely large data files (over 50 MB on a normal 32-MB system). This caused the system to create virtual memory to load the file since sufficient RAM was unavailable. The creation of virtual memory more than doubled the processing time because the entire file was essentially being copied. A dual method of accessing the data files could be used to speed up processing for files of different sizes. For example, if the file size is less than a preset amount, then the entire file is loaded. If not, a stream to the file is opened and the class reads in the header and data as it is requested. There are numerous other changes and updates that could also be made, but DFR provides a good starting point for reading ADA data files correctly.

DDJ

Listing One

#include "DataFileReader_MSC.h"#define _ThisClass      DataFileReader_MSC
#define _NumBaseClass   0



DataFileReader_MSC::DataFileReader_MSC()
{
    // initialize class
    InitializeDataFileReaderClass_DFR();
}
DataFileReader_MSC::DataFileReader_MSC(
    char * pcharFileName)
{
    // initialize class
    InitializeDataFileReaderClass_DFR();
    // load file during construction
    LoadDataFile_DFR(pcharFileName);
}
DataFileReader_MSC::~DataFileReader_MSC()
{
    // free memory
    if (pvoid_FileBuffer)
    {
    // clear buffer
    delete[]pvoid_FileBuffer;
    pvoid_FileBuffer = NULL;
    }
    else {}
}
void * DataFileReader_MSC::GetDataPointer_DFR()
{
    // return pointer to start of data
    return(pvoid_DataStart);
}
unsigned long DataFileReader_MSC::GetFileSize_DFR()
{
    // return file size
    return(ul_FileSize);
}
void * DataFileReader_MSC::GetHeaderPointer_DFR()
{
    // return pointer to start of header
    return(pvoid_HeaderStart);
}
void DataFileReader_MSC::InitializeDataFileReaderClass_DFR()
{
    // set defaults
    pvoid_FileBuffer = NULL;
    pvoid_HeaderStart = NULL;
    pvoid_DataStart = NULL;
}
int DataFileReader_MSC::LoadDataFile_DFR(
    char * pcharFileName)
{
    // declare local variables
    FILE *hFile;
    int iReturn;
    // check for previously loaded file
    if (pvoid_FileBuffer)
    {
    // clear previous buffer
    delete[]pvoid_FileBuffer;
    pvoid_FileBuffer = NULL;
    }
    else {}
    // open file
    hFile = fopen(pcharFileName, "rb");
    if (hFile)          // valid stream returned from fopen function
    {
      // get file size
      fseek(hFile, 0L , SEEK_END);
      ul_FileSize = ftell(hFile);
      if (ul_FileSize > 0)                  // valid file size
      {
        // get memory for file
        pvoid_FileBuffer = new unsigned char [ul_FileSize];
        // load file into buffer
        rewind(hFile);
        if (fread(pvoid_FileBuffer,sizeof(unsigned char),ul_FileSize,hFile))
        {
          // set internal pointers
          pvoid_HeaderStart = pvoid_FileBuffer;
          pvoid_DataStart = pvoid_FileBuffer;



          // file loaded ok
          iReturn = FILE_OK_DFR;
        }
        else
        {
          // ERROR -- load failure
          iReturn = ERROR_FILE_LOAD_DFR;
        }
      }
      else
      {
        // ERROR -- seek failure
        iReturn = ERROR_FILE_SEEK_DFR;
      }
    }
    else
    {
    // ERROR -- file not opened
    iReturn = ERROR_FILE_OPEN_DFR;
    }
    // return results
    return(iReturn);
}
void DataFileReader_MSC::SetDataPointer_DFR(
    void *pvoidStart)
{
    // set internal pointer to start of data
    pvoid_DataStart = pvoidStart;
}
void DataFileReader_MSC::SetHeaderPointer_DFR(
    void *pvoidStart)
{
    // set internal pointer to start of header
    pvoid_HeaderStart = pvoidStart;
}
#undef _ThisClass
#undef _NumBaseClass

Back to Article

Listing Two

#include "datafilereader_msc.h"// declare class
DataFileReader_MSC DataFile;



int main(int argc, char *argv[], char *env[])
{
  // declare local variables
  void * pvoidHeader;
  void * pvoidData;



  char * pcharHeader;
  char * pcharData;
  if (argc == 2)
  {
    if (!DataFile.LoadDataFile_DFR(argv[1]))
    {
      // set local header pointers
      pvoidHeader = DataFile.GetHeaderPointer_DFR();
      pcharHeader = (char*)pvoidHeader;
      // find data section
      pcharData = strstr(pcharHeader, "DATA");
      // update local & DataFileReader data pointers      
      pvoidData = (void*)pcharData;      
      DataFile.SetDataPointer_DFR(pvoidData);
      // print data section to console
      cout << pcharData;
    }
    else
    {
      // error loading file
      cout << "Dooh!!";
    }
  }
  return (0);
}

Back to Article

Listing Three

HEADER:


This section normally contains creation information
(who/what/when/where/why) along with various data 
acquisition settings such as sample rates and sensor 
configurations for each channel.



DATA:



This section contains the binary data which was recorded.
It is normally written in a float or integer format 
which means the derived class must know how to 
interpret it correctly.

Back to Article

File Formats & Automotive Data Acquisition

By Lee R. Copp

Dr. Dobb's Journal November 1998

Figure 1: Sample data file. This data file has been edited to a text format. You can still see the plain ASCII header and the binary data. The token separator is the first nonstandard ASCII character.

File Formats & Automotive Data Acquisition

By Lee R. Copp

Dr. Dobb's Journal November 1998

Table 1: Initial class outline.

File Formats & Automotive Data Acquisition

By Lee R. Copp

Dr. Dobb's Journal November 1998

Table 2: Coding conventions. All classes have an underscore, then an abbreviation for each method and constant in the class (_DFR). This comes in handy when working on the LoadDataFile_* method of three or four derived classes at the same time.