Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Database

Partitions


NOV94: Partitions

Dividing data into meaningful groups

Joe is a database consultant and contributing editor to DBMS magazine. He can be contacted on CompuServe at 71062,1056.


Formally, a partition of a set of objects is a collection of subsets such that: 1. The union of all the subsets in the collection is equal to the original set; 2. the intersection of all the subsets in the collection is empty; and 3. no subset in the collection is empty. This means that every element of the original belongs to one and only one subset in the partition. Informally, a partition is a way of dividing up the loot, cutting up the cake, or grouping data into classes for a report.

If you are familiar with SQL, the GROUP BY clause is a partitioning table. The GROUP BY clause results in the minimum number of groups where, for each grouping column of each group of more than one row, all values of that grouping column are either the NULL value, or equal to each other. For example, if I used GROUP BY states on a geographic database, I would expect to get (at most) 51 rows in the grouped table (one grouped sub-table per state and one for all NULL values).

In computer science, there is a whole class of NP-complete problems called the "Knapsack" and "Bin-Packing" problems which depend on partitioning. Informally, NP-complete problems are ones which get so big, so fast that it is not practical to work them out. They often involve a factorial calculation hidden in them.

The Knapsack problem takes its name from the way it is usually presented. Imagine that you are packing a knapsack for a hike. The items which you can stuff into the knapsack have a weight (or cost) and a value to you. A compass is extremely light, but very valuable. A bag of wet sand is extremely heavy and totally worthless on a hike. A canteen of water is somewhere between these two extremes in both weight and value.

The problem is to get the most value in the knapsack under a certain weight limit. This means that you want to partition the items into all possible subsets, and then rate them for weight and value. Of those whose weight is no greater than the limit of your back, you wish to find the ones which have the greatest value. The packing may or may not be unique, if it exists at all.

The Bin-Packing problem is usually explained with a warehouse full of identical empty bins and a pile of different-sized boxes which must be put into the bins. The goal is to put all the boxes into the smallest number of bins. The size of the bins and boxes are usually given as integer units to make the problem easier. The partition we want to find has the smallest number of subsets, each with a total size no greater than the bin size.

Partitions have been an area of study for mathematicians for over 300 years, ever since Leibniz asked Bernoulli if he had investigated P(n), the number of partitions of a set of (n) objects.

The mathematician Eric Temple Bell came up with the "Bell numbers," a formula for finding the number of partitions of a set. There are two formulas to compute the Bell number; one uses Sterling numbers, and the other uses Binomial coefficients. The Sterling number, S(m,n), is defined as the number of ways of picking a subset of n elements from a set of m elements. Binomial coefficients, C(n,r), represent the number of combinations from a set of n elements taken r at a time.

The function for C(n,r) can be coded iteratively and will run very fast, but the procedure which calls it will be recursive. The function for S(m,n) can be coded in a simple, recursive fashion, but the procedure which calls it will be iterative. You simply cannot escape recursion in partitioning problems. I've given the Sterling version here because it is short and uses simpler arithmetic.

If you wish to try coding the Binomial coefficient version, the function for C(n,r) is also given. I have not compared the run times. Figure 1(a) shows the recursive formula.

The Bell number simply tells you how many partitions to expect, but does not show you what the partitions will look like. For example, Figure 1(b) lists all 15 possible partitions of the set {1, 2, 3, 4}. The vector beside each partitioning is the signature of the collection. The first partition has all four elements in one set; the second partition has the first three elements in one set and the fourth element in another; and so forth until we have four partitions of one element each. This signature can be used for generating partitions. The procedure Part(n) is an algorithm for finding the signatures. The procedure OutSignature can be replaced by one that will print the actual subsets.

Inspect the signatures in the example. There are really only five different signature patterns in this partition, namely:

  • One set of four elements.
  • One set of three elements and one set of one element.
  • Two sets of two elements each.
  • One set of two elements and two sets of one element.
  • Four sets of one element each.
The count of partition multisets is called p(n), and there is no known formula to calculate it; you just have to work it. As you have likely already figured out, p(n) is the number of ways that you can write the integer (n) as the sum of smaller integers. An approximation formula does exist for p(n), but it is a little high for smaller values of (n). The formula is shown in Figure 2.

The two tricks to generating the signature patterns are relatively simple. The first is that no signature pattern can be longer than (n) elements; the second is to realize that if you know the sum of all but one of the elements, you can compute it by subtracting it from the total of the others. The total of a signature pattern is always equal to (n).

This means that you can start with a single element of (n) and expand it in an orderly fashion, building each series of partitions of a particular length. See GenPattern1 in Listing One .

This algorithm can be improved upon by a change in notation for a multiset. Let the x @ y sign mean "x sets of y members each" so that you can condense the notation as in Figure 3. This algorithm is given in Genpattern2 in Listing One. The program could be made more readable by introducing a record data type with size and count fields to use as an array, but I have not done so.

Lotteries usually involve picking a subset of numbers from some range. As an exercise, you're invited to take your state lottery rules and use these algorithms to make yourself a millionaire. Who says computer science isn't practical?

References

Djokic, B., et al. "A Fast Iterative Algorithm for Generating Set Partitions." The Computer Journal, 1989.

Er, M.C. "A Fast Algorithm for Generating Set Partitions." The Computer Journal, 1988.

Nijenhaus, A. and H.S. Wilf. Combinatorial Algorithms. San Diego, CA: Academic Press, 1978.

Reingold, E.M., J. Nievergelt and N. Deo. Combinatorial Algorithms: Theory and Practice. Englewood Cliffs, NJ: Prentice-Hall, 1977.

Semba, I. "An Efficient Algorithm for Generating All Partitions of the Set {1,_n}." Journal of Information Processing, 1984.

Figure 1: (a) Computing the Bell number via Binomial coefficients; (b) 15 possible partitions of the set {1, 2, 3, 4}.

(a)   Bell(n+1) = Sum(C(n, k) * Bell(n)) from k :=0 to n;

(b)   {1, 2, 3, 4}}        = (1, 1, 1, 1)
      {{1, 2, 3}, {4}}     = (1, 1, 1, 2)
      {{1, 2, 4}, {3}}     = (1, 1, 1, 2)
      {{1, 2}, {3, 4}}     = (1, 1, 2, 2)
      {{1, 2}, {3}, {4}}   = (1, 1, 2, 3)
      {{1, 3, 4}, {2}}     = (1, 1, 1, 2)
      {{1, 3}, {2, 4}}     = (1, 1, 2, 2)
      {{1, 3}, {2}, {4}}   = (1, 1, 2, 3)
      {{1, 4}, {2, 3}}     = (1, 1, 2, 2)
      {{1, 4}, {2}, {3}}   = (1, 1, 2, 3)
      {{1}, {2, 3, 4}}     = (1, 2, 2, 2)
      {{1}, {2, 3}, {4}}   = (1, 2, 2, 3)
      {{1}, {2, 4}, {3}}   = (1, 2, 2, 3)
      {{1}, {2}, {3, 4}}   = (1, 2, 3, 3)
      {{1}, {2}, {3}, {4}} = (1, 2, 3, 4)

Figure 2: Function to approximate the partition count.

FUNCTION PartitionCount(n : INTEGER): INTEGER;
{ approximate number of partition patterns for a set of (n)
 elements. Formula tends to guess high. }
CONST pi = 3.141592653;
BEGIN { math can be optimized because of constants }
PartitionCount :=Round ((1.0 / (4.0 * n * Sqrt(3.0))) * Exp (pi * Sqrt((2.0 * n)/3.0)))
END;

Figure 3: Condensing notation for multisets.

{{1, 2, 3, 4}}        becomes (1 @ 4)
{{1, 2, 3}, {4}}      becomes
                        (1 @ 3, 1 @ 1)
{{1}, {2}, {3}, {4}}  becomes (4 @ 1)

Listing One


PROGRAM TestBellNumbers;
VAR n : INTEGER;

FUNCTION Bell(m : INTEGER) : INTEGER;
{ total number of partitions for a set of n elements }
VAR i, sum : INTEGER;


FUNCTION Sterling (m, n : INTEGER) : INTEGER;
{ partition of m things in sets of size n }
BEGIN
IF ((m = n) OR (n = 1)) { subset equal to original }
THEN Sterling := 1
ELSE IF (m < n) { subset greater than original }
     THEN Sterling := 0
     ELSE IF (n = 0)  { empty set }
          THEN Sterling := 0
          ELSE Sterling := Sterling((m-1), (n-1))
                      + (n * Sterling((m-1), n));
END;

BEGIN 
sum := 0;
FOR  i := 0 TO m
DO sum := sum + Sterling (m, i);
Bell := sum;
END;

BEGIN
Write('Give an n: ');
ReadLn (n);
n := Bell(n);
WriteLn('Bell Number is ', n);
ReadLn;
END.

PROGRAM TestCombinations;
{ version using combination operator }
VAR n, k : INTEGER;

FUNCTION Comb (n, k : INTEGER) : INTEGER;
{ Binomial coefficient 
  or number of k element subsets in set of n }
Var i, top, bottom : INTEGER;
BEGIN
top := 1;
FOR i := n DOWNTO (n - k + 1) DO top := top * i;
bottom := 1;
FOR i := k DOWNTO 2 DO bottom := bottom * i;
Comb := top DIV  bottom;
END;

BEGIN
Write('Give an (n): ');
ReadLn (n);
Write('Give an (k): ');
ReadLn (k);
n := Comb(n, k);
WriteLn('Comb Number is ', n);
ReadLn;
END.
PROGRAM GenPattern1;
CONST big = 100;
VAR
  p : ARRAY [0..big] OF INTEGER;

  i, j, n, PatternSize : INTEGER;

PROCEDURE OutPattern;
{ Display the current partition pattern }
VAR i : INTEGER;
BEGIN
Write('(', p[1]);
FOR i := 2 TO PatternSize
DO Write(', ', p[i]);
WriteLn(')');
END;

FUNCTION Sum(a, b :INTEGER) : INTEGER;

{ compute total of subarray p[a:b] }
VAR total, i : INTEGER;
BEGIN
total := 0;
FOR i := a TO b
DO total := total + p[i];
Sum := total;
END;

BEGIN
PatternSize := 1;
p[0] := -1;  { sentinel value }
WriteLn('Give me n: ');
ReadLn(n);  
p[1] := n;  { load starting value into array }
WHILE (PatternSize <= n)
DO BEGIN
   OutPattattern;
   i := PatternSize - 1;
   WHILE ((p[PatternSize] - p[i]) < 2)
   DO i := i - 1;
   IF (i <> 0)
   THEN FOR j := (PatternSize - 1) DOWNTO i 
        DO p[i] := p[i] + 1
   ELSE BEGIN
        FOR j := 1 TO PatternSize 
        DO p[j] := 1;
        PatternSize := PatternSize + 1;
        END;   
   p[PatternSize] := n - Sum(1, (PatternSize - 1))
   END;
END.
PROGRAM GenPatterns2;
{ generate partitions in dictionary order }
CONST BIG = 100;
VAR
  m, p :ARRAY [-1..BIG] OF INTEGER;
  i, sum, n, left : INTEGER;

PROCEDURE OutList;
{ this procedure uses the new notation,
  but can be easily modified to print out full multisets }

VAR i : INTEGER;
BEGIN
Write('(', m[1] ,' @ ', p[1]);
FOR i := 2 TO left
DO Write(', ', m[i] ,' @ ', p[i]);
WriteLn(')');
END;

BEGIN
left := 1;
p[-1] := 0;
m[-1] := 0;
WriteLn('Give me n: ');
ReadLn(n);
p[0] := n + 1;
m[0] := 0;
p[1] := 1;
m[1] := n;
WHILE (left <> 0)
DO BEGIN
   OutList;
   sum := m[left] * p[left];
   IF (m[left] = 1)
   THEN BEGIN
        left := left - 1;
        sum := sum + (m[left] * p[left]);
        END;
   IF (p[left - 1] = p[left] + 1)
   THEN BEGIN
        left := left - 1;
        m[left] := m[left] + 1;
        END
   ELSE BEGIN
        p[left] := p[left] + 1;
        m[left] := 1;
        END;
   IF (sum > p[left])
   THEN BEGIN
        p[left + 1] := 1;
        m[left + 1] := sum - p[left];
        left := left + 1;
        END;
   END;
END.
PROGRAM Partitions;
CONST big = 100;
VAR
   p, q: ARRAY [0..big] OF INTEGER;
   n : INTEGER;

PROCEDURE OutSignature;
{ display the signature of a partitioning }
VAR i : INTEGER;
BEGIN
Write('(', q[1]);

FOR i := 2 TO  n
DO Write (', ', q[i]);
WriteLn(')');
END; { OutSignature}

PROCEDURE AllSubsets(n : INTEGER);
LABEL 10;
VAR
   i, last, m, ClassCount : INTEGER;
BEGIN
ClassCount := 1;
FOR i := 1 TO n DO q[i] := 1;
p[1] := n;
OutSignature; { display single set }
REPEAT
    m := n;
    WHILE (TRUE)
    DO BEGIN
       last := q[m];
       IF (p[last] <> 1)THEN GOTO 10;
       q[m] := 1;
       m := m -1
       END;
10: ClassCount := ClassCount + m - n;
    p[1] := p[1] + n - m;
    IF (last = ClassCount)
    THEN BEGIN
         ClassCount := ClassCount + 1;
         p[ClassCount] := 0;
         END;
    q[m]:= last + 1;
    p[last] := p[last] - 1;
    p[last + 1] := p[last + 1] + 1;
    OutSignature;
UNTIL (ClassCount = n);
END; { AllSubsets }



BEGIN
WriteLn ('Give me an n: ');
ReadLn(n);
AllSubsets(n);
ReadLn
END.

PROCEDURE OutSubsets;
{ Build actual subsets from signature array }
VAR i, j, ThisSet : INTEGER;
BEGIN
ThisSet := 1; { current set id # }
BEGIN
   FOR i := 1 TO n    { scan array for elements }
   DO IF (p[i] = ThisSet)
      THEN BEGIN                 { format display }
           Write('{ ', i);       { first element }
           FOR j := (i+1) TO n   { scan rest of set }

           DO IF (p[j] = ThisSet)
              THEN Write (', ', j);
           Write(' } ');           { close up brackets }
           ThisSet := ThisSet + 1; { set up for next set }
           END;
   END;
WriteLn;
END;

Copyright © 1994, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.