Dr. Dobb's | Distributed Computing: Windows and Linux

Distributed Computing: Windows and Linux

Mohamed and Klaus describe a distributed system in which a single Windows machine controls a Linux cluster.

October 10, 2007
URL:http://www.drdobbs.com/parallel/distributed-computing-windows-and-linux/202401082

Mohamed is a Ph.D. student at Munich University of Technology. Klaus is the head of the Institute for Data Processing at Munich University of Technology. They can be contacted at [email protected] and [email protected], respectively.

In this article, we discuss a distributed system in which a single Windows machine controls a Linux cluster. We implemented this system as a part of an investigation into the security of block ciphers. In particular, we were analyzing block ciphers as to their suitability to act as random number generators. To this end, we used the NIST statistical suite for testing the randomness of specific data sets (csrc.nist.gov/rng). For each data set, we performed 188 different statistical tests, measuring the percentage of the tests each cipher passes. The two main parameters we were changing were the keys and plaintext. To generate and analyze the data, we built "System1" under Windows on a single machine. This system was easily able to process megabytes of generated data for our analysis.

Before long, however, the number and size of the data sets increased dramatically (as we decided to study the randomness of block cipher modes of operation), and System1 could no longer analyze data in a timely and reasonable manner. Consequently, we then built "System2," a Linux-based cluster that works in parallel on several machines to analyze hundreds of gigabytes of data; see Figure 1.

[Click image to view at full size]

Figure 1: Overview of the Windows/Linux distributed system.

System1 and System2

The Windows-based System1 consists of the following programs:

The Generator, which generates statistical data sets, written in Visual C++.
The NIST statistical suite, which processes the data sets generated with the Generator, written in C.
The Extractor, which extracts and summarizes the results generated from the NIST statistical suite, written in Visual Basic 6.

For its part, System2 works as follows:

Data sets are generated with the Generator on the Windows machine.
These data sets are then transferred to a shared hard-disk partition on the Linux cluster.
The NIST statistical suite processes the data sets on the Linux cluster.
The results are then transferred back to the Windows machine.
Finally, the data are extracted and summarized by the Extractor on the Windows machine.

Our biggest challenge here was how to get System2 programs to communicate with each other, and how to automate the system as much as possible. To accomplish this, we had to modify the current programs and write new ones. In a nutshell, we decided to divide System2 into three parts:

Data sets generation and transmission to the Linux cluster.
Processing data on the Linux cluster.
Transmitting results to the Windows machine for analysis.

Once we generated the data using the Generator, our next problem was how to transfer it to the Linux cluster. We tried several techniques, including generating all data sets, then transferring them via the Samba client. But the problem here is that we had to wait until all the data is generated (which could take a couple of days), then start the Samba client from a Linux emulator under Windows (cgywin). Clearly, this didn't suit our needs.

In another attempt, during generation of the data sets, we started WinSCP using the command, "Keep the remote directory up to date," and changed the priority of the WinSCP to a higher priority than that of the Generator, to transfer the data once it is created. The problem with this approach was that we ran out of space on the Windows machine, as all the generated data sets are stored on the Windows machine.

Finally, we modified the Generator by adding the function MoveData() (Listing One), which calls a WinSCP script (Listing Two) to copy the data sets from the Windows machine to the Linux cluster. It then deletes these data sets from the Windows machine (to free up space for more generated data sets). MoveData() is then called after each group of data sets is generated.


void MoveData()
{
system("F:\\WinSCP3\\WinSCP3 /console /script=c:\\move.txt");
system("del c:\\test\\* /q");
}

Listing One


# Automatically answer all prompts negatively not to 
# stall the script on error
#     option batch on
# Disable overwrite confirmations that 
# conflict with the previous
#     option confirm off
# Connect a stored connection called mido 
# (where the password was saved)
#     open mido
# Change remote directory
#    cd /users/disk2/et/midono1/test
# Force binary mode transfer (It is very 
# important for us to transfer the data in binary
# format) 
#    option transfer binary
# Upload the file to current working directory
put c:\test\*
# Disconnect
close
# Exit WinSCP
exit

Listing Two

Controlling the Cluster

With the NIST statistical suite, input is designed to be interactive. We modified this to accept all the parameters from the command line (some parameters were also hard-coded). We then added these parameters:

Start. The index of the first processed file.
End. The index of the last processed file.
Exp. The index of the current experiment.
Other. Other parameters that are experiment specific.

All the intermediate files (a couple of dozens for each file) are suffixed with the file index and experiment index. After processing each file, that file is deleted with all the intermediate files to free space on the Linux cluster. Only the final result file is kept.

At this point, the question was how to start the NIST statistical suite on each processor of the Linux cluster. We had several options:

Connect to each machine using PUTTY and run the command line. However, this approach required a lot of user interaction, so it was discarded.
Connect to all the machines using SSH and start a program that executes scripts—which is what we opted to do.

We wrote the Cluster program (Listing Three) to invoke the Scheduler program (Listing Four) on each node of the Linux cluster. The Scheduler executes script files prepared by the script generator program. This is done using the fork() function to create a child process for each connection. The connections to the Linux nodes using SSH are set to use automatic login and the Cluster program is run on the cgywin emulator on the windows machine.


#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h> 
#include <unistd.h> 
int main()
{
int i,pid;
char cmd[100];
static int N=1;
for( i=start;i<end;i++) {
pid=fork();
sprintf(cmd,"ssh username@node%d.cluster.com ./script/scheduler ",i);
system(cmd);
exit(0);
if(pid==-1)
printf("error on node %d\n",i);}
return 1;
}

Listing Three


#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int CPU_sleep()
{
int i=0,j=0,k=0,sum=0;
for(i=0;i<100;i++)
for(j=0;j<100;j++)
for(k=0;k<10;k++)
sum=sum+j*j*k;
return sum;
}

void main()
{

char * file="/home/script/max.txt";
char* filebase="/home/script/";
char filename[200];
char cmd[200];
char tempo[200];
FILE* W;
FILE* F=fopen(file,"r");
int N=0;
int X,max;
int S,E;
fscanf(F,"%d",&max);


fclose(F);
while(N<max)
{
N++;
sprintf(filename,"%s%d.do",filebase,N);
CPU_sleep();
W=fopen(filename,"r");
if(W==NULL)
{
   FILE* ch=fopen(filename,"w");
   fclose(ch); // we reserve the file
   sprintf(cmd,"/home/script/%d.sh",N);
   system(cmd);
   ch=fopen(filename,"w");
   fprintf(ch,"DONE");//now the file is marked to be done
   fclose(ch);
}
else
{
   fclose(W);// by pass
}
F=fopen(file,"r");
fscanf(F,"%d",&max);  //read the maximum again, 
                      //  to be able to increase it
fclose(F);
CPU_sleep();
}
}

Listing Four

The script generator program generates scripts to be executed by the Linux schedule. A script processes each job and the number of the scripts is stored in a max.txt file. The script files are generated in the form #.sh, where # is the number of the script. These files are then transferred to the directory "script" on the Linux cluster and their access mode is changed to be executable using the chmod +x *.sh command executed from a PUTTY terminal.

The Scheduler program first opens the max.txt file to get the maximum number of scripts. It then enters the loop where it sequentially checks all the status files with the form "#.do." When it finds a nonexisting status file, then it directly assigns it to itself by creating it and executes the corresponding script file. After finishing the script file, it rewrites the status file with the string "DONE" in it, then searches for the next status file until the maximum is reached. It then stops.

The max is read in each loop (as sometimes we increased it) as we introduce new experiments (with new files).

Only one job is assigned per script. If one node stops working due to any reasons, only one file will not be processed, and can be processed later.

The generated scripts look like the script in Listing Five. After the Cluster program finishes, we check the status files. If any are of size 0, the corresponding file is not executed due to a node failure. We then execute this script (manually or by deleting this status file(s) and calling the Cluster program). If the Windows machine is restarted for some reason, we restart the Cluster program. In the worse case, two scripts execute at the same time on each node. Because the Linux cluster is not dedicated to our processes, we use the lowest priority of execution using the nice -n 19 command, which lets others use the Linux nodes.


cd
cd  test
nice -n 19 ./nist  1  1  2481  1  128 > /dev/null

Listing Five

Once the status files are four bytes in size (the word DONE is written in them), we transfer the result files back to the Windows machine using WinSCP, then run the unmodified Extractor program.