Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Database

Month-Text Ordering


December, 2005: Month-Text Ordering

David runs FileJockey Software and can be reached at http://www.FileJockeySoftware.com/ or [email protected].


To have a fighting chance of remembering relatives' birthdays and anniversaries, I put together a database table with the necessary information. But in designing the table I ran into a problem: Which field-types should I select? The database I use, Corel Paradox, provides three date-related field-types—Date, Time, and Timestamp. None of these seem appropriate since I do not want to sort rows based on the years when people were born or married. Instead, I included Month Name, Day, Year, and Month Number columns. After entering the data, I sorted on the Month Number, Day, Last Name, and First Name columns.

This worked but has two drawbacks—I had to enter month numbers that correspond to month names, and the Month Number column was displayed on the printout. I later found that I could define a date display screen format that suppresses year numbers. (This format does not apply to printing, exporting, or publishing in HTML.) I then combined the Month Name, Day, and Month Number columns into a Date column. The resulting display looks cleaner but makes adding to the table more difficult. The reason is that the Date column includes—but doesn't show—year information. To add birthdays or anniversaries in future years, I will need to display the dates in a format that shows year numbers and enter new rows with the year set to 2005.

I present an alternative in this article. Databases should support a month field-type that will sort in month order. In searching the Web, I found that databases have varying support for date-related field-types. Of those that I looked at, MySQL has the most such field-types. It allows Date, Datetime, Time, Timestamp, and Year fields.

Month-text ordering is also useful for filesystem sorting. For example, you could have a 2005 Marketing Plan folder that contains files such as January.doc, February.doc, and so on. These files might not be created in order and they may be modified out of order. As a result, sorting by the last-modified date/time would not necessarily arrange them properly. Other examples of month-related files include progress reports, sales reports, and newsletters. Some of these may be edited months after they are created. Folders with month names could be useful for storing photos and magazine articles.

MonthOrder Class

The first design goal for this class was to allow for short and long month names. In addition, I decided to include support for the common abbreviation "Sept." Because the month names are not expected to change, they can be sorted alphabetically, stored in an array, and found using a binary search. However, because hash tables are available in Java and MFC, I decided to use one to store the month names.

How many entries should be placed in this hash table? In a shareware file manager that I haven't completed, I use 12 entries that are keyed on the first three letters of the month names. To check if a word corresponds to a short or long month name, this program copies the first three letters, converts them to uppercase, and tests whether this key is in the table. If there is a match, it then tests if this word matches the beginning part of the corresponding month name and that the length of this word is three for any month, four for September, or the full length of the month name.

For the MonthOrder class, I decided to use 24 table entries. The additional entries are the short names, except for "MAY," plus "SEPT." As a result, the testing logic is simplified. A potential month name is converted to uppercase, an existing trailing dot is removed, and the key is checked in the hash table. (The previous routine does not handle trailing dots since they are removed by a tokenizer.) See Listing One.

The compareTo function in Listing Two starts by requesting month numbers for a pair of strings. If both strings are month names and their month numbers differ, these numbers are subtracted. If they match, their lengths are subtracted. This is unnecessary but makes the order look nicer. If only one string is a month name, it is placed first. By grouping month names above other strings, there are no inconsistencies. For example, sorting "January," "February," and "Hat" would lead to an unpredictable arrangement without grouping. If neither string is a month name, they are ordered by using a routine that is based on my article "Alphanumeric Ordering" (DDJ, October 2000).

The Java implementation of this routine differs from the C implementation in that substrings consisting solely of digits must be created for Integer.parseInt to work. The C function atoi can handle strings that start with digits and are followed by letters; see Listing Three.

Sample Program

MonthOrderApplet (available electronically, see "Resource Center," page 4) demonstrates month-text ordering. To use it, load MonthOrder.html into a browser. The class files must be in the same folder as this file. Then, you could either add strings individually or you could add name sets. For the latter, check either or both of the checkboxes and press the Add Name Set(s) button. After checking the Long Month Names box and pressing this button, you will see what is in Figure 1. Then press the Sort button. The month names will be arranged as in Figure 2. You can add short month names and resort the list. Because the compareTo function in Listing Two orders strings based on length when they represent the same month, short names appear before full names; see Figure 3.

Pressing the Remove Item(s) button removes selected items. The corresponding routine does not use an array of indexes. Instead, it looks at each item and deletes it if it is selected. Afterwards, either the count variable or the current index is updated; see Listing Four.

File and Folder Names

Several month names have meanings in other contexts. For example, "march" and "may" are common words. In addition, "April," "May," and "June" are women's names. This creates an interpretation problem. One resolution is to surround text with braces to signal that it should be interpreted as a potential month. Then, when a filesystem comparison function sees a string such as "{May}," it knows to look in a hash table to find the index of the text inside the braces.

The other problem with mixing month names and other strings was already addressed. Month names are grouped above other strings so that a comparison function does not return inconsistent results to a sorting algorithm.

Month Field-Type

A month field-type could contain short and long month names and compare them using the routines in Listings One and Two. An alternate approach is to cache the month indexes along with case and length information.

One possible encoding in a 32-bit integer starts with the left-most byte being zero. The remaining bytes would contain a case code, a length, and a month index. Months encoded in this manner would be compared after applying a 0xFF mask to extract the month index. The case code would range from zero to two to represent mixed, upper, and lowercase, respectively. If the first letter of an input month name is in lowercase, assume that this name is in lowercase. Otherwise, if the second letter is in uppercase, assume that this name is in uppercase. The remaining circumstance is to assume that this name is in mixed case, as in "January."

The lengths would range from three to nine, and the month indexes would range from one to twelve. A display routine would subtract one from an index, use this value to select a month name from a string array, extract the required number of characters, and adjust the case, if necessary.

If 16-bit integers are available, then each part of a month-field code could be put into a nybble (4 bits). Then the mask would be 0xF.

Conclusion

A month field-type would be helpful in database tables when day or year information is not useful for sorting or display. In a filesystem context, month-text ordering supplements alphanumeric ordering in that it also arranges file and folder names in a way that respects the patterns that people see in strings.

DDJ



Listing One

import java.util.Hashtable;

public class MonthOrder implements SortTest {
    Hashtable map = new Hashtable();

    public MonthOrder() {
        MonthInfo[] monthList = new MonthInfo[24];
        monthList[0]  = new MonthInfo ("JANUARY",   1);
        monthList[1]  = new MonthInfo ("FEBRUARY",  2);
        monthList[2]  = new MonthInfo ("MARCH",     3);
        monthList[3]  = new MonthInfo ("APRIL",     4);
        monthList[4]  = new MonthInfo ("MAY",       5);
        monthList[5]  = new MonthInfo ("JUNE",      6);
        monthList[6]  = new MonthInfo ("JULY",      7);
        monthList[7]  = new MonthInfo ("AUGUST",    8);
        monthList[8]  = new MonthInfo ("SEPTEMBER", 9);
        monthList[9]  = new MonthInfo ("OCTOBER",  10);
        monthList[10] = new MonthInfo ("NOVEMBER", 11);
        monthList[11] = new MonthInfo ("DECEMBER", 12);
        monthList[12] = new MonthInfo ("JAN",  1);
        monthList[13] = new MonthInfo ("FEB",  2);
        monthList[14] = new MonthInfo ("MAR",  3);
        monthList[15] = new MonthInfo ("APR",  4);
        monthList[16] = new MonthInfo ("JUN",  6);  // MAY is above
        monthList[17] = new MonthInfo ("JUL",  7);
        monthList[18] = new MonthInfo ("AUG",  8);
        monthList[19] = new MonthInfo ("SEP",  9);
        monthList[20] = new MonthInfo ("SEPT", 9);  // alternative
        monthList[21] = new MonthInfo ("OCT", 10);
        monthList[22] = new MonthInfo ("NOV", 11);
        monthList[23] = new MonthInfo ("DEC", 12);

        for (int i = 0;  i < monthList.length;  i++) {
            map.put (monthList[i].monthName, monthList[i]);
        }
    }   // MonthOrder

    private int findMonth (String monthName) {
        if (monthName == null) return -1;
        // Prepares test string

        String key = monthName.toUpperCase();
        int dot = key.indexOf ('.');    // trims off trailing dot
        if (dot != -1) key = key.substring (0, dot);

        MonthInfo mi = (MonthInfo) map.get (key);
        if (mi == null)
            return -1;
        else
            return mi.index;
    }   // findMonth

    ...
}
Back to article


Listing Two
public int compareTo (String name1, String name2) {
    int m1 = findMonth (name1);
    int m2 = findMonth (name2);
    if (m1 == -1 && m2 == -1)       
        return simpleANCompareTo (name1, name2);
    else if (m1 == -1)      // month names appear before other items
        return +1;
    else if (m2 == -1)
        return -1;
    else if (m1 == m2)      // not needed, but makes order look nicer
        return name1.length() - name2.length();
    else
        return m1 - m2;
}   // compareTo
Back to article


Listing Three
private int simpleANCompareTo (String name1, String name2) {
    int size1 = name1.length(), size2 = name2.length();
    int n1 = 0, n2 = 0, e1, e2;
    int val1, val2, test = 0;
    while (n1 < size1 && n2 < size2) {
        if (Character.isDigit (name1.charAt (n1))
        &&  Character.isDigit (name2.charAt (n2))) {
            // Finds ends of numbers so that parseInt will work
            for (e1 = n1 + 1;  e1 < size1
            && Character.isDigit (name1.charAt (e1));  )
                e1++;
            for (e2 = n2 + 1;  e2 < size2
            && Character.isDigit (name2.charAt (e2));  )
                e2++;
            try {
                val1 = Integer.parseInt (name1.substring (n1, e1));
            }
            catch (NumberFormatException ex) {
                val1 = -1;
            }
            try {
                val2 = Integer.parseInt (name2.substring (n2, e2));
            }
            catch (NumberFormatException ex) {
                val2 = -1;
            }
            if ((test = val1 - val2) != 0)
                return test;
            n1 = e1;
            n2 = e2;
        }
        else {  // caseless match
            test = Character.toLowerCase (name1.charAt (n1)) -
                   Character.toLowerCase (name2.charAt (n2));
            if (test != 0)
                return test;
            else {
                n1++;  n2++;

            }
        }
    }
    // One or more strings has ended
    if (n1 >= size1 && n2 >= size2)
        return 0;
    else if (n1 >= size1)
        return -1;
    else
        return +1;
}   // simpleANCompareTo
Back to article


Listing Four
void removeButton_ActionPerformed (java.awt.event.ActionEvent event)
{
    try {
        int count = itemList.getItemCount();

        for (int i = 0;  i < count;  ) {
            if (itemList.isIndexSelected (i)) {
                itemList.delItem (i);
                count--;
            }
            else
                i++;
        }
    } catch (Exception e) {
    }
}   // removeButton_ActionPerformed
Back to article


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.