# Algorithm Alley

OCT93: ALGORITHM ALLEY

Survey: "May I ask you a few questions?"

Voice mail: (eagerly) "beep!"

Survey: "Do you believe in God?"

Voice mail: (tentatively) "beep?"

Survey: "If you died today, are you sure you will go to heaven?"

Voice mail: (concerned) "beep!?"

This goes on for 20 minutes! Apparently, my voice mail's "leave a message" beep was music to the polling system's ears, which seemed to accept any sound as a valid response. In turn, my machine's start-recording switch was reactivated by each new question. I doubt the exchange qualifies as a successful Turing Test, but if you tried my number and couldn't get through, at least now you know why. Some families have teenagers who monopolize the line, but in our house, we can now tell our friends, "Sorry to have missed your call, but our phone was on the phone."

### Selection Sampling

Speaking of polls, one of a pollster's primary software tools is the selection-sampling algorithm--a technique for selecting N items at random from a set of any size M, where N is less than or equal to M. The method is useful not only in polling, but any time you need to reduce a large collection of records to a more manageable subset. Selecting names and phone numbers for a poll is an obvious use, but you could also use the technique in other ways: to sample disk sectors for performing diagnostic spot checks on a large disk drive, or in statistical work requiring samples to be taken at random from massive data sets.

The algorithm seems almost too simple to work, but it is guaranteed to produce exactly the number of samples requested from an input data set, regardless of size. Example 1, Algorithm #12, lists the selection-sampling algorithm in pseudocode. M represents the size of the input source, and is usually set to the total number of records in an input file. N is the desired number of records in the sample. Three integer variables keep track of the number of requested, examined, and selected records. A real number r holds a value selected at random so that 0 <= r < 1.0. After initializing the integers, a While loop repeats until the number of selected samples equals the number requested. On each pass through the loop, the number of examined records is incremented and r is set to a random real number. To decide whether to use the next record from the input, an If statement tests the result of the expression (M-examined)*r>=(requestedselected). In English, the expression tests whether the number of records left to be examined times a random value is greater than or equal to the number of records remaining to be added to the subset. If the expression is true, the algorithm skips the next input record; otherwise, it increments selected and writes the next record to the output.

Algorithm #12 works because, as the number of sampled records grows, the chances that the If statement's control expression will be true increases proportionally. (The expression is always true, for example, when requested equals selected.) At first, it may seem that selection sampling would tend to sample too many records from one end of the input, but using different random numbers on each pass through the While loop ensures an even distribution in the sample. Believing this fact requires faith in probability--not to mention a well-tested random-number generator, but the algorithm works flawlessly, every time.

SAMPLE.PAS (Listing One, page 169) implements Algorithm #12 in Pascal. The demonstration program reads a text file containing one record per line. For test purposes, I used Grady Ward's Moby Words database of 21,400 common first names, but any list of names or other text records will do. Set constant M to the number of input records in your data source, and change INFNAME and OUTFNAME to your input and output file names. Run the program and enter the number of records you want to sample. Even with a huge input source, the selection-sampling algorithm requires only a single pass through the file, making the method very fast--in a few seconds, I can create a subset of any size from my input source of 21,400 records. I often use the program to create sample input files for testing other algorithms.

### Permutations in Pairs

As Jim goes on to say, "given four teams, the algorithm produces the output sequence 1-2, 1-3, 1-4, 2-3, 2-4, 3-4--in other words, a permutation of all possible team pairings. But one fact complicates the solution: No team may play more than one game in a day. To fulfill that requirement, consider that the number of unique pairings of N teams equals 1+2+3 ...+(N-1), or (N*(N-1))/2. Also, if N is even, it takes N-1 days for all games to be played. If N is odd, it takes N days. (With three teams, for example, at one match per day, the pairings are 1-2, 1-3, and 2-3.)

"Experimenting with these ideas led to an insight. Take the pairings of four and five teams, shown in Table 1. An asterisk marks a day off for that team. In the five-team table, replacing the asterisks with the digit 6 produces pairings for six teams. Discovering that principal led to the final program, which generates pairings as though there were an even number of teams, replacing the last team with an asterisk for an odd number of teams.

"Table 2 shows the same pairings as integer values rearranged for each day, substituting the next higher number for asterisks.

"These are merely the permutations of the numbers from 1 to N! The Swap column shows the positions to exchange for generating the next permutation. In the six-team case, start with the first-day pairings of 1-2, 3-4, and 5-6. Then swap the positions of teams 2 and 3, and also swap the positions of teams 4 and 5, to generate the pairings for the next day: 1-3, 2-5, and 4-6. Swap the positions of teams 3 and 4, and of teams 5 and 6, to generate day three, and so on. PAIRINGS.PAS (Listing Two, page 169) implements this algorithm using Pascal to generate team pairings for sports events. Each team is guaranteed to play each other team exactly once, and no team plays more than one game per day. The code uses a touch of magic in the manipulation of the swap array at the end of the day loop, but is otherwise straightforward.

Send me your thoughts, notes, and algorithms in care of DDJ, or post a message to my Compuserve ID, 73627,3241. Meanwhile, the next time you receive an unsolicited phone call from a high-pressure sales department, try this. Don't say anything. Just push a button on your touchtone phone in response to every question. Drives 'em nuts, and they hang up right away. I suppose this proves that I'm not too old to learn new tricks--even from a voice-mail machine with a mind of its own.

#### Example 1: Pseudocode for Algorithm #12 (selection sampling)

```const
M = 1000;  { Input records }
N = 128;   { Subset (N <= M) }
var
requested,
examined,
selected: Integer;
r: Real;
begin
requested <- N;
examined <- 0;
selected <- 0;
while (selected < requested) do
begin
examined <- examined + 1;
r <- Random;
if (M - examined) * r
>= (requested - selected)
then skip next input record
else begin
selected <- selected + 1;
use next input record
end
end
end.

```

#### Example 2:

```for x <- 1 to NTeams - 1 do
for y <- x + 1 to NTeams do
write(x, '-', y, ',');

```

#### Table 1: Four and five sports-team pairings.

```
4 teams                          5 teams
Day 1   Day 2   Day 3       Day 1   Day 2   Day 3   Day 4   Day 5
1-2     1-3     1-4         1-2     1-3     1-4     1-5     1-*
3-4     2-4     2-3         3-4     2-5     2-*     3-*     4-5
5-*     4-*     3-5     2-4     2-3```

#### Table 2: Team pairings from Table 1 rearranged.

```
4 teams             Swap          6 teams         Swap
Day 1     12   34         2&3         12   34   56      2&3, 4&5
Day 2     13   24         3&4         13   25   46      3&4, 5&6
Day 3     14   23                     14   26   35      2&3, 4&5
Day 4                                 15   36   24      3&4, 5&6
Day 5                                 16   45   23
```

```
_ALGORITHM ALLEY_
by Tom Swan

[LISTING ONE]

(* ----------------------------------------------------------- *(
**  sample.pas -- Algorithm #12: Selection Sampling            **
** ------------------------------------------------------------**
**  Creates a file SAMPLE.DAT with a specified number of names **
**  extracted from Grady Ward's Moby Words. The first line of  **
**  the output file indicates the number of selections.        **
**  Assumes the number of names in the source is known.        **
**  Reference: Knuth, Vol 2, p122                              **
** ------------------------------------------------------------**
)* ----------------------------------------------------------- *)

program Sample;
const
M = 21420;  { Number of records in source }
INFNAME = 'g:\moby\words\21400nam';  { Source file }
OUTFNAME = 'sample.dat'; { Destination file }
var
infile, outfile: Text;  { File variables }
word: String;           { Holds each record from source }
requested,              { Requested number of samples }
examined,               { Total records examined }
selected: Integer;      { Total records selected }
r: Real;                { Random number 0 <= r < 1.0 }
begin
Randomize;
Writeln('Write selected names to ', OUTFNAME);
Write('How many names? ');
if (requested <= 0) or (requested > M) then
begin
Writeln('Number must be >= 0 and <= ', M);
Exit
end;
examined := 0;
selected := 0;
Assign(infile, INFNAME);
Reset(infile);
Assign(outfile, OUTFNAME);
Rewrite(outfile);
Writeln(outfile, requested);  { Save 'requested' in file }
while (selected < requested) (* and (not Eof(infile)) *) do
begin
examined := examined + 1;
r := Random;
if (M - examined) * r >= requested - selected
then
Readln(infile)             { Skip next record }
else
begin                        { Select next record }
selected := selected + 1;  { Count selections so far }
Writeln(outfile, word);    { Write record to destination }
Writeln(word)              { Echo selection to display }
end
end;
Close(infile);
Close(outfile)
end.

```
[LISTING TWO]
```<a name="02e5_000e">

(* ----------------------------------------------------------- *(
**  pairings.pas -- Select sports-event team pairings          **
** ------------------------------------------------------------**
**   This program generates team pairings for sports events.   **
**   Each team is guaranteed to play each other team exactly   **
**   once. No team will play more than one game per day.       **
**   An asterisk ('*') means a day off for that team.          **
**   For example, 5 teams produces this output:                **
**     Day 1 - 12 34 5*                                        **
**     Day 2 - 13 25 4*                                        **
**     Day 3 - 14 2* 35                                        **
**     Day 4 - 15 3* 24                                        **
**     Day 5 - 1* 45 23                                        **
** ------------------------------------------------------------**
)* ----------------------------------------------------------- *)

program pairings;
const
TEAMCOUNT = 5;
var
TeamNames: Array [1 .. TEAMCOUNT + 1] of Char;
SwapArray: Array [1 .. TEAMCOUNT + 1] of Integer;
x, Temp, Day: Integer;
TempChar: Char;
const
NTeams: Integer = TEAMCOUNT;
begin
{ Set up team names. Normally read from a file. }
for x := 1 to NTeams do
TeamNames[x] := Chr(x + Ord('0'));
if Odd(NTeams) then
begin
NTeams := NTeams + 1;
TeamNames[NTeams] := '*'
end;
{ Set up the array that controls swapping. }
for x := 1 to NTeams do
SwapArray[x] := x;
for Day := 1 to NTeams - 1 do
begin
Write('Day ', Day, ' -');
{ Write the team pairings for this day }
x := 1;
while x < NTeams do
begin
Write(' ', TeamNames[x], TeamNames[x + 1]);
x := x + 2;
end;
WriteLn;
{ Perform swaps to prepare array for next day's pairings. }
if Odd(Day)
then x := 2
else x := 3;
while x < NTeams do
begin
TempChar := TeamNames[SwapArray[x]];
TeamNames[SwapArray[x]] := TeamNames[SwapArray[x + 1]];
TeamNames[SwapArray[x + 1]] := TempChar;
Temp := SwapArray[x];
SwapArray[x] := SwapArray[x + 1];
SwapArray[x + 1] := Temp;
x := x + 2
end
end
end.

```

### More Insights

 To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.