Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Database

Date Compression and Year 2000 Challenges


Date Compression and Year 2000 Challenges

By

Bob and Greg are senior software engineers at Coastal Research and Technology Inc. Bob is the author of a Year 2000 compliance criteria checklist which is widely disseminated throughout the U.S. intelligence community. Both Bob and Greg can be contacted at [email protected].


Many systems with Year 2000 (Y2K) problems store dates in "MMDDYY" form. The first two places in this six-place field represent the month, the next two the day, and the final two the year. There are many different ways to encode MMDDYY information. Typically, each place is an integer type, or some other format (like characters), in the mechanics of the machine in question that represent the month, day, or year.

Six places can store a lot of information, but storage formats such as MMDDYY don't take advantage of this potential. For instance, the MM field only counts from 01 to 12, and the DD field only counts from 01 to 31. Neither field uses any of the remaining digits of storage potential. The MM field could (but does not) store the digits 13 through 99 (and 00). The DD field could (but does not) store the digits 32 through 99 (and 00). Potential storage space is lost because of the way information is recorded.

Ordinarily, not using the potential storage space available in six places doesn't matter. However, the field used to store the year is nearing the end of its storage capacity. The YY field will overflow when it goes from 1999 to 2000, because it counts from 00 to 99 (1900 to 1999 a.d., for instance). This overflow, in addition to other peculiar computer events that occur around January 1, 2000, (the use of "99" in the date field as a "save forever" indicator, the occurrence of a 400-year leap-year in 2000, the roll-over of the GPS time field from August 21 to August 22, 1999, to mention just a few) is at the heart of the the much-heralded Year 2000 problem.

Y2K is not just a software problem. It also impacts hardware, firmware, and networks. Y2K affects mainframe, client/server, desktop, and embedded systems. Neither is it just a Cobol problem. Applications written in Fortran, PL/1, C, C++, Perl, Ada, and the like are at risk for Y2K problems. The solutions we present here are from the viewpoint of C on a client/server system, but every comment is generally applicable. Ultimately, fixing Y2K problems is principally about fixing storage overflow.

Overflow eventually occurs in any storage format that records finite information. However, the MMDDYY date format overflows much earlier than necessary because of inefficient use of storage space in the MM and DD fields.

Y2K Solutions

A variety of solutions are available for fixing the overflow version of the Y2K problem. The two best known are expanding the MMDDYY date representation to MMDDYYYY (that is, using four digits for the year rather than two) or using a "logic" solution such as windowing (writing source code to interpret years into the correct century over a 100-year time span, for instance). Most Y2K repair discussions focus on these methods. (Another solution, applicable in limited circumstances, is to use the 28-year cycle of the Gregorian calendar's association between the day-of-the-week and the date in the years 1901 to 2099 inclusive. For example, July 4, 1998, is a Saturday as is July 4, 2026. We won't address that approach in this article.)

Date Expansion

Given enough time and resources, expanding date representations from MMDDYY to MMDDYYYY is usually the preferred method between expansion and windowing. Expansion offers several advantages:

  • It lasts forever (well, from January 1, 0001, a.d. to December 31, 9999, a.d. anyway).
  • For most people, four-digit year format is the natural representation.

Until recently, Y2K experts recommended four-digit expansion as the solution of choice. These recommendations have begun to change because only a short time remains to implement a Y2K fix. This brief remaining time span highlights the drawbacks to four-digit expansion:

  • It must be implemented consistently everywhere dates are used or declared (source code, scripts, JCL, databases, computer-to-computer interfaces, user interfaces, and so on).
  • It uses more storage space than two-digit year-date representations.

Changing from MMDDYY to MMDDYYYY is quite simple in abstract. In practice, you must find and appropriately alter all relevant declarations, output and input statements, calculations, and function calls. Furthermore, any variable that uses a changed variable (and any variable that uses the variable using the changed variable, and so on) must be checked and perhaps changed.

Windowing

One frequently suggested alternative to date expansion is a logic solution such as windowing. In windowing, dates of the form MMDDYY translate on-the-fly into unambiguous (MMDDYYYY, for instance) dates. Windowing is a simple concept. Human beings easily implement it mentally when we automatically translate a date like 07/04/98 to 07/04/1998, or 02/29/00 to 02/29/2000.

Windows come in two varieties -- fixed and sliding. Fixed windows have a hard-coded comparison algorithm that prefixes a "19" to all two-digit dates in a given range (all two-digit year dates, where 50 < YY <= 99, for example) and a "20" to all remaining dates (00 <= YY <= 50). Sliding windows parameterize the comparison so that the range of dates moves as the current date changes; see Figure 1 and Listing One

Regardless of the windowing mechanism chosen, windows are quicker and require fewer coding changes to implement because they only apply where usage demands. Consider, for instance, the calculation AGE = CURRENT_DATE - BIRTH_DATE. For MMDDYY, this results in a negative (and hence erroneous) number if CURRENT_DATE and BIRTH_DATE fall on different sides of 01/01/2000. Windowing corrects this error. Windowing expands the date variable references to four digits at the point of calculation -- AGE = WINDOW(CURRENT_DATE)-WINDOW(BIRTH_DATE). The declarations of AGE, CURRENT_DATE, BIRTH_DATE need not change, nor does any other usage necessarily need to change just because this usage needed a window. Reduced implementation effort is far and away the principal driver for windowing, although windows have the added advantage of avoiding the extra data storage space needed by date expansion.

Although windowing may be the solution of the hour, it is not without disadvantages. Windowing is a quick fix -- it works for the moment. But don't count on it forever. The three major concerns with windows are:

  • Their usable time span is no more than a century.
  • They create additional testing challenges.
  • They raise the risk that a future maintenance programmer may accidentally make an incorrect source-code change.

Sliding windows somewhat mitigate the first difficulty. They move forward as time advances, extending their life span (as opposed to fixed windows where the useful time span eventually expires). Still, windowing solutions are not appropriate in many places -- for instance, the AGE calculation, if CURRENT_DATE and BIRTH_DATE are more than a century apart. Unfortunately, sliding windows suffer even more acutely from the second and third difficulties than fixed windows, because they are functionally more complex.

The two common solutions, expansion and windowing, neglect the fundamental cause of the Y2K problem -- storage overflow. Computers can store much more information in six places than is allowed by the MMDDYY format. Storing more information in the same space is the essence of the Y2K solution known as "compression." Nothing is really being "compressed." Compression simply uses a more concise date representation method. Compression is one of the more technically sophisticated Y2K solutions. Unfortunately, many programmers are not aware of it.

Compression

Compression shares many of the advantages and disadvantages of expansion and windowing. Among compression's advantages are:

  • It can last for a very long time (for example, from January 1, 1600, a.d. to 768,151,959,528 a.d.).
  • It does not require additional data storage space for expanded dates.

Compression also has comparative advantages with respect to windowing and expansion. For instance, compression requires no more source code changes than does expansion, yet it can store much more information. Further, except for output to human-readable forms, compression requires no function calls -- a significant advantage over windowing. This does not mean that compression is suitable for every situation. Compression works well when:

  • An application is built for the long term.
  • The project to reach Y2K compliance is not a death march.
  • Storage space concerns had some weight in design decisions.

The compression formats given later provide a wide range of alternatives for date storage. We've included C source code for converting between MMDDYYYY and each compressed format and back again. These calculations rely on knowing leap years; see Listing Two. These are useful when conversion to a user-readable format is required, or some calculation or interface with MMDDYYYY format is needed.

The compressed formats described next are in order of increasing date-storage capacity.

The CYYDDD Format

Using 1900 as a starting date (you can choose any date), one compression method is to count the number of years elapsed since 1900, and the number of days since the beginning of the year. The number of years is recorded in a three-digit field, where the first digit (C) represents the number of centuries elapsed since 1900 and the second and third digits (YY) are the year in the century. Places four through six (DDD) are the day count in the year beginning with January 1. For example, July 4, 1999, is 099185. July 4, 2055, would be 155185. This storage format can count 10 centuries, 99 years, and 365 (366 for a leap year) days; it can store a total of 1100 years. Converting MMDDYY to CYYDDD (see Examples 1 and 2) permits dates from January 1, 1900, to December 31, 2999.

This format is useful when some user-readability is important but storage space requirements do not permit date expansion, and date range requirements make the single-century-at-a-time limits of windows unfeasible. Expansion to four-digit years is usually preferable in this case, if practical.

The DDDDDD Format

Another approach is to count the number of days since a certain date rather than the number of years, months, and days. For example, storing date information as DDDDDD -- counting days from January 1, 1900 -- means the same six places that can only store dates from January 1, 1900, to December 31, 1999, in MMDDYY can store 1 million days in DDDDDD. This is equivalent to more than 2700 years. Converting MMDDYY to DDDDDD (see Examples 3 and 4) lets you track dates from January 1, 1900, to past 4600 a.d.

This format is useful when storage space does not permit expansion and dating requirements do not allow windows. If possible, expansion to four-digit years is still preferable.

The MMDD 16-bit Year Format

One easily implemented and natural way of storing more date information is to leave the month (MM) and day (DD) fields alone and replace the year field (YY) with a bit register. The two-place year field is equivalent to a 16-bit long register, assuming that the computer in question puts eight bits in each character byte. Essentially, the format is the functional equivalent of MMDDYYYYY. (Note the extra Y.) Consequently, there is no need for a complicated algorithm -- the year recorded is a YY expanded by windowing, just stored in binary rather than characters or digits. Since 16 bits stores 216-1 = 65,535 years, dates from January 1, 0001, a.d. to December 31, 65,535, a.d. may be stored without ambiguity.

This format is generally more useful than four-digit date expansion, unless user-readability of machine-stored dates is a vital requirement. This format has a far wider range and uses less space than four-digit expansion, and converts to and from user-readable date information easily. The format also requires fewer function calls than windowing, hence is more efficient at run time. It also avoids windowing's date-span difficulties while using no more storage space than a windowed date.

The 48-bit Date Format

A format that allows storage of very long time spans is to count the number of days since a given date and store the information as a 48-bit long register. If the storage format of the computer being used equates eight bits to one byte, this 48-bit long register exactly replaces the MMDDYY date format with respect to storage space. This format is similar to the DDDDDD format. (Use the source code given with that format for conversions.) However, using bits rather than separate integers is a more efficient storage method. It is so effective that counting from January 1, 1600, a.d. allows date storage to 768,151,959,528 a.d. This is a sufficient time span for most applications.

The all-bits representation is useful when storing long date ranges. If desired, less than the full 48-bit representation may be used (say, 40- or 32-long bit register) to store date information with reduced storage space. For instance, using four eight-bit registers saves two bytes of space and is sufficient to store a count of 4,294,967,295 days, or about 11.75 million years. This format makes direct user interpretation of stored dates very difficult. Unless reduced storage space for dates is needed or long span date storage is desirable, the MMDD 16-bit year format is preferable.

Use the algorithms in Examples 3 and 4 (MMDDYYYY to DDDDDD and DDDDDD to MMDDYYYY ) for conversions.

Conclusion

The Y2K literature suggests many other date storage formats. These formats are often variations of those suggested in this article (MMDDHH, counting years in hexadecimal rather than decimal digits or HDDCYY, counting months in hexadecimal and years as a century offset (C) plus years, and so on). The comparative advantages and disadvantages of these formats are similar to the earlier formats.

Date compression as a Y2K solution is not appropriate everywhere. For instance, windowing is the necessary choice of desperation for many, given the short time span in which we must implement a solution. Expansion is a straightforward choice, easily understood by anyone. However, when a project is not on a death march, when time spans longer than a century must be stored, or when using extra storage space is inconvenient, compression is an ideal Y2K solution.

DDJ

Listing One

/* routine:  sliding_window*  Author:  Robert Moore
*  Takes a reference four digit year (year), a window width above the 
*  reference year (upperwinwith), and a one or two-digit year being queried 
*  (qyear), and returns a four-digit representation (ryear) of the query 
*  year based on the window and the reference year if the query year is one 
*  or two digits. Query years greater than two digits are just returned. 
*  Setting reference year to a constant would make this a fixed window.
*/
int sliding_window(int year, int upperwinwidth, int qyear)
{
  int yeardate, centurydate, ryear, temp;
  ryear = -1;  /* status that will be returned on error */
  if (qyear < 100)  /* the query year is a two digit year -- 
                       convert to four */
    {               /* bounds check */
      if ((year >= 0) && (0 < upperwinwidth) && (0 <= qyear)) 
    {  /* everything a proper bound */
      yeardate = year % 100;  /* convert year to a year */ 
      centurydate = year - yeardate;  /* and century */
      temp = yeardate + upperwinwidth;  /* calculate upper window limit */
      if (temp < 100)
        {
          if (temp <= qyear )
        ryear = centurydate - 100 + qyear; /* year wraps to previous */
          else                                 /* century:  case 1b */
        ryear = centurydate + qyear; /* same century:  case 1a or 1c*/
        }
      else
        {

          if (qyear < (temp - 100))
       ryear = centurydate + 100 + qyear; /* year in next century: case 2c */
          else
       ryear = centurydate + qyear; /* same century:  case 2a or 2b */
        }    
    }
    }
  else
    ryear = qyear; /* three or more digit year--don't convert */
  return ryear;
}

Back to Article

Listing Two

/*---------------------------------------------------------------/*  Authors:  D. Greg Foley, Robert L. Moore
/*  Description: This function determines if a four digit year is a leap
/*   year. A year is a leap year if it is divisible by 4 but not by 100, 
/*   except years that are divisible by 400.  
/*  Parameter Description:
/*  int Year - a four-digit year
/*  returns 1 if the year is a Leap Year
/*          0 if the year is NOT a Leap Year
/*  Notes: For clarity of the algorithm no error checking is included.
/*--------------------------------------------------------------*/
int LEAPYEAR(int Year)
{

    if ((Year % 4 == 0 && Year % 100 != 0) || Year % 400 == 0)
      return 1;   /* This is a LEAP year */
    else
      return 0;   /* This is NOT a LEAP year */
}

Back to Article


Copyright © 1998, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.