Steganography for DOS Programmers

Steganography is a branch of cryptography that deals with concealing messages.


January 01, 1997
URL:http://www.drdobbs.com/security/steganography-for-dos-programmers/184410110

Dr. Dobb's Journal January 1997: Steganography for DOS Programmers

Steganography is a branch of cryptography that deals with concealing messages. It has unique possibilities, particularly when you have terabytes of information -- much of it mundane or infrequently accessed -- that offer hiding places for important information. The techniques I present here will enable you to effectively hide critical information within a large volume of data.

Text Versus Binary Files

The difference between DOS text and binary files is at the heart of this approach to steganography. A DOS text file is generally composed of text, control characters, and maybe graphics characters. It is a "last-record-check" type of file, meaning that the last record is a flag or sentinel that marks the end of the file. When opened for listing or reading, the system keeps reading data until it hits the end-of-file (EOF) flag, then stops. For an ordinary DOS text file, this EOF marker is control-Z (ASCII 26).

A binary file, on the other hand, can contain characters in any order, since there is no EOF flag in binary mode. End-of-file is calculated from the directory. The system reads the file specs to see how many bytes are in the file (recorded at the last write to the file) and stops reading when it has read that many characters. Binary files are necessary for .COM and .EXE files, as machine-language files may end up having an ASCII character 26 (or any other character) anywhere.

A file on disk can be opened for reading or writing in either binary or text mode, regardless of whether its internal format is text or binary. You can read a text file in binary mode, or a binary file in text mode.

Enter DOS

Since DOS usually stores files without an EOF marker, the operating system must calculate EOF from the directory. However, many DOS utilities also check for control-Z. The DOS TYPE, MORE, and FIND commands open files for reading in text mode. COPY, XCOPY, and MOVE, on the other hand, open a file in binary mode (they may be copying or moving a .COM or .EXE file). The COPY command has a /A switch for copying ASCII (text) files, but defaults to binary mode -- except when a file is created using the COPY command from the console (COPY CON filename). This copies from the keyboard (stdin) to the file until control-Z is typed. Stdin and stdout are text files.

What if you were to open a text file in binary mode, and append another file to the end of the first while explicitly writing the EOF flag at the end of the first file? The result is that the second text is hidden beyond the EOF flag. The only way to get to it is to read the file again in binary mode. Remember, TYPE reads in text mode. The data beyond EOF is hidden from anyone looking around in your hard drive. If you use this technique to add a small amount of data, say 1 or 2 KB, to a large file (say 100 KB), the extra length will likely not be noticed if the file is TYPEed to the screen. For even better security, you can encrypt the hidden part and it will look like random garbage at the end of the text file. Most people would consider this just a system or disk error, even if they found it.

It is best to choose a text file that is used only for reading. If another program opens the host file in text mode for updating, the hidden data will likely be lost. Use reference files as the host files, and consider using ATTRIB to make them read-only.

The Source Code

Because XCOPY and COPY do not support the kind of appending I've just described, you must write special programs to implement this technique under DOS. Listing One is LLIST.C, a program that lists binary files to the screen. (LLIST stands for "long-list," implying a long listing beyond EOF.) LLIST will list a text file, then list the hidden data after EOF.

Listing Two, LAPPEND.C (short for "long-append") tacks a second file onto the first with a control-Z between them to keep the second file hidden. Be aware, however, that the second file is not automatically deleted from the disk. You must manually delete it, or a copy of the hidden data will be left in the open. (Or, you can use the optional remove() command supplied in a comment line in LAPPEND.)

Listing Three, LSPLIT.C, copies the hidden data to the temporary file TEMP.1 This can be modified to place the data in a file supplied by the user as the second command-line argument. Change the fopen statement for thedata to use params[2] as the filename, and change if (n == 2 ) to if (n == 3 ). Remove the #define statement and change the error-message printfs to show the new command format lsplit <source> <destination>.

Hidden data can be removed from the host file by redirecting the output of TYPE. For example, type host > dummy. To return the data to the host, copy dummy host. Then delete dummy. The same result can be obtained using the COPY command with the /A switch, except that the EOF flag is retained. Of course, you should do this after recovering the hidden data with LSPLIT.

Listing Four, LWRITE.C, allows the writing of data from the keyboard to the far end of EOF of the target file, and requires a bit of extra explanation. DOS uses two characters -- a carriage return (ASCII 13, CR) and a line feed (ASCII 10, LF) -- to separate the lines of a text file on disk. In contrast, when you type on a keyboard, you obviously only press one key, the Enter key. To make things consistent, and to simplify the processing of text files, the C library getc() function converts all line separators to newline characters (ASCII 10, LF). This ensures that whether you read text from the keyboard or a disk file, you'll see one newline character at the end of each line. This conversion is only performed if you open the file as a text file. If you read from a file opened as text, and write to a file opened as binary -- as LWRITE does -- you must manually convert each newline to a CR-LF combination.

The purpose of Listing Five, FINDH.C, is to find hidden data, in case you forget which is the host file(s). FINDH.C looks for control-Z in the file at a point before the physical end of the file, and prints the filename to the screen. The wildcard characters "*" and "?" can be used, and all the files in a directory can be scanned for possible hidden data. Each directory must be searched individually. If you use "*.*", any .COM or .EXE file may show up as a false possibility if it happens to contain a control-Z at some point.

To read the directory, FINDH uses the findfirst() and findnext() functions. These are not completely standard, so you may have to rewrite them to work with alternate compilers. I am using PowerC 1.1.6 from MIX Software. Most compilers should support some equivalent of these two functions; Microsoft QuickC, for instance, has _dos_findfirst() and _dos_findnext().

LAPPEND and LSPLIT offer another interesting possibility. You are not limited to placing only one hidden file beyond EOF; they can be stacked up behind each other. LAPPEND will keep adding new hidden files onto the end of the host file, and LSPLIT splits off everything behind the FIRST EOF to TEMP.1. You can have a hidden part, hidden on the end of another hidden part, which is hidden itself, and so on. To recover the next layer of hidden data, use LSPLIT on a renamed TEMP.1.

You cannot stack up hidden files if they are individually encrypted with pseudorandom numbers. I advise LAPPENDing multiple files together, then encrypting, then placing the cipher at the end of the host file. The reason is that if you are encrypting with pseudorandom numbers to generate a random cipher byte for each byte of cleartext, sooner or later an encrypted byte will fall on control-Z. If just one such cipher file is hidden beyond EOF, LSPLIT will find it, as an EOF precedes all the cipher. If LSPLIT is used to try to separate appended cipher files, eventually it will split a cipher file. If your encryption scheme disallows a value of integer 26 (control-Z), then hidden cipher files can be directly stacked beyond EOF. Otherwise, LAPPEND all the files to be hidden while they are still text, then encrypt them as one file -- EOFs and all -- then LAPPEND that one cipher file to the host file. When using LSPLIT and decrypting, all the files will be cleartext and LAPPENDed together. Subsequent files will still be hidden beyond EOF, even though they are decrypted. If the actual data to be kept confidential is in one of the following hidden files, it may remain undiscovered, even if decrypted. You can leave the following ones as hidden, or split them off with LSPLIT.

You also could super encipher messages by using LAPPEND to join a cipher file to a text, encrypting both, then LAPPENDing the cipher to a host file. The first cipher file is now super enciphered, possibly with a different key the second time. This could be used to give encrypted instructions for using the first cipher file to an intermediary, without the intermediary necessarily knowing the key to the first cipher. Be careful, however, that your plans and protocols don't become too complicated.

Conclusion

Other steganography techniques hide data by altering one bit-per-pixel-record in a high-resolution graphic, or one-bit-per-sound record in digitized music or speech. You can hide data that's on your own system, sent over phone lines, or mailed on a disk. These methods can be used to distribute keys for more-rapid encrypted communication and are useful for couriers.

Remember to encrypt the hidden data for the best security, and to erase the LAPPENDed files, or program LAPPEND to do so automatically by including the remove() statement supplied with LAPPEND. Don't forget that an erased file on disk still contains data until something new is written over that portion of disk, so creating an overwrite-erase utility is advisable. Or, just work on a Ramdisk, and keep the cipher on the hard drive.

DDJ

Listing One

/* llist.c  */#include "stdio.h"
main(argc,argv)   /* ....this program lists a file in binary mode; */
                  /*     ALL characters are shown.                 */
     int    argc;
     char  *argv[];
{
  int byte;
  long count=0l;
  FILE *infl;
  if ( argc == 2 )
    {
          infl = fopen(argv[1],"rb");
          if ( infl != NULL )
            {
                while( ( byte = getc(infl) ) != EOF )
                     {
                           count++;
                  /*       if ( byte != 26 ) remove these comment marks to 
                                          avoid display of cntrl-Z. Writing
                                          cntrl-Z to stdout terminates the 
                                          program on some systems.        */
                           putc(byte,stdout);
                     }
                close(infl);
            }
          else
            {
               printf("\n\n...cannot open input file...\n\n");
            }
    }
  else
    {
        printf("\n\n....must supply one file name...\n");
    }
}

Back to Article

Listing Two

/* lappend.c  */#include "stdio.h"
main(n,params)
  int n;
  char *params[];
{
  FILE *thefile,*thedata;
  int ch=0;
  if ( n == 3 )
    {  
       thefile = fopen( params[1], "ab" );
       thedata = fopen( params[2], "rb" );
       if ( thefile != NULL && thedata != NULL )
         {
           fseek(thefile, 0, 2);
           putc(26,thefile); 
           while ( (ch = getc(thedata)) != EOF ) 
             { 
                    putc(ch,thefile);
             }
         }
       else
         {
           printf("\ncannot open a file..\n");
         }
       fclose(thedata);
       fclose(thefile);
    /*   remove(params[2]);   option to erase the source of the hidden data */
    }
  else
    {
      printf("\nproper format is:\nlappend <file appended to> ");
      printf("<file to append>\n");
    }
}

Back to Article

Listing Three

/*  lsplit.c  */#include "stdio.h"
#define TEMPFILE "temp.1"
main(n,params)
  int n;
  char *params[];
{
  FILE *thefile,*thedata;
  int ch=0,hiddendata=0;
  if ( n == 2 )
    {  
       thefile = fopen( params[1], "rb" );
       thedata = fopen( TEMPFILE, "wb" );
       if ( thefile != NULL )
         {
           while ( (ch = getc(thefile)) != EOF ) 
             { 
               if (ch == 26 ) hiddendata++;
               if (hiddendata)
                 {
                   while ( (ch = getc(thefile)) != EOF )
                     {
                        putc(ch,thedata);
                     }   
                 }     
             }
         }
       else
         {
           printf("\ncannot open a file..\n");
         }
       fclose(thedata);
       fclose(thefile);
    }
  else
    {
      printf("\nproper format is:\nlsplit <datafile>\n");
      printf("hidden data (if any) will be in temp.1\n");
    }
}

Back to Article

Listing Four

/* lwrite.c  */#include "stdio.h"
main(n,params)
  int n;
  char *params[];
{
  FILE *thefile;
  int ch=0;
  if ( n == 2 )
    {  
       puts("type control-Z to finnish");
       thefile = fopen( params[1], "ab" );
       if ( thefile != NULL )
         {
           fseek(thefile, 0, 2);
           putc(26,thefile); /*<-add new EOF...*/
           while ( (ch = getc(stdin)) != EOF ) 
             { 
               if ( ch == 10 )
                 putc(13,thefile); /* explicitly add CR.*/
                 putc(ch,thefile);
             }
         }
       else
         {
           printf("\ncannot open file..\n");
         }
       fclose(thefile);
    }
  else
    {
      printf("\nproper format is:\nlwrite <filename>\n");
    }
}

Back to Article

Listing Five

/* findh.c  */#include "stdio.h"
#include "direct.h"
#include "dos.h"
read_file(filename)
char *filename[];
{
  FILE *readingthis;
  int inbyte=0,stoploop=0;
  long count=0l,filelength=0l;
  readingthis = fopen(filename,"rb");
  fseek(readingthis,0l,2); /*<-find end of file          */
  filelength = ftell(readingthis);/*<-find filelength    */
  fseek(readingthis,0l,0);/*<-return to begining; Rewind!*/  


while( (inbyte = getc(readingthis) ) != EOF && !stoploop ) { count++; if (inbyte == 26 && ( count < filelength-1 ) ) { stoploop++; printf("\n%s has possible hidden data ", filename); printf(" %ld bytes from the begining...",count); } } fclose(readingthis); } main(n,params) int n; char *params[]; { struct ffblk fldata; int c1; if ( n >= 2 ) { for(c1=1;c1<n;c1++) { if (findfirst(params[c1],&fldata,(FA_NORMAL | FA_RDONLY)) != -1) { read_file(fldata.ff_name); while( findnext(&fldata) != -1) { read_file(fldata.ff_name); } } else { printf("\n..no matching file(s) for %s\n",params[c1]); } }/*endfor*/ } else { printf("\nproper format is: findh filename(s) \n"); printf(" wildcards `*' and `?' may be used.\n"); } }
Back to Article

DDJ


Copyright © 1997, Dr. Dobb's Journal

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.