Date Compression and Year 2000 Challenges

Ultimately, fixing Y2K problems is about fixing storage overflow, and there are a variety of solutions that address the problem. Bob and Greg examine some of these solutions, focusing on date compression.



May 01, 1998
URL:http://www.drdobbs.com/database/date-compression-and-year-2000-challenge/184410550

Date Compression and Year 2000 Challenges

By

Bob and Greg are senior software engineers at Coastal Research and Technology Inc. Bob is the author of a Year 2000 compliance criteria checklist which is widely disseminated throughout the U.S. intelligence community. Both Bob and Greg can be contacted at [email protected].


Many systems with Year 2000 (Y2K) problems store dates in "MMDDYY" form. The first two places in this six-place field represent the month, the next two the day, and the final two the year. There are many different ways to encode MMDDYY information. Typically, each place is an integer type, or some other format (like characters), in the mechanics of the machine in question that represent the month, day, or year.

Six places can store a lot of information, but storage formats such as MMDDYY don't take advantage of this potential. For instance, the MM field only counts from 01 to 12, and the DD field only counts from 01 to 31. Neither field uses any of the remaining digits of storage potential. The MM field could (but does not) store the digits 13 through 99 (and 00). The DD field could (but does not) store the digits 32 through 99 (and 00). Potential storage space is lost because of the way information is recorded.

Ordinarily, not using the potential storage space available in six places doesn't matter. However, the field used to store the year is nearing the end of its storage capacity. The YY field will overflow when it goes from 1999 to 2000, because it counts from 00 to 99 (1900 to 1999 a.d., for instance). This overflow, in addition to other peculiar computer events that occur around January 1, 2000, (the use of "99" in the date field as a "save forever" indicator, the occurrence of a 400-year leap-year in 2000, the roll-over of the GPS time field from August 21 to August 22, 1999, to mention just a few) is at the heart of the the much-heralded Year 2000 problem.

Y2K is not just a software problem. It also impacts hardware, firmware, and networks. Y2K affects mainframe, client/server, desktop, and embedded systems. Neither is it just a Cobol problem. Applications written in Fortran, PL/1, C, C++, Perl, Ada, and the like are at risk for Y2K problems. The solutions we present here are from the viewpoint of C on a client/server system, but every comment is generally applicable. Ultimately, fixing Y2K problems is principally about fixing storage overflow.

Overflow eventually occurs in any storage format that records finite information. However, the MMDDYY date format overflows much earlier than necessary because of inefficient use of storage space in the MM and DD fields.

Y2K Solutions

A variety of solutions are available for fixing the overflow version of the Y2K problem. The two best known are expanding the MMDDYY date representation to MMDDYYYY (that is, using four digits for the year rather than two) or using a "logic" solution such as windowing (writing source code to interpret years into the correct century over a 100-year time span, for instance). Most Y2K repair discussions focus on these methods. (Another solution, applicable in limited circumstances, is to use the 28-year cycle of the Gregorian calendar's association between the day-of-the-week and the date in the years 1901 to 2099 inclusive. For example, July 4, 1998, is a Saturday as is July 4, 2026. We won't address that approach in this article.)

Date Expansion

Given enough time and resources, expanding date representations from MMDDYY to MMDDYYYY is usually the preferred method between expansion and windowing. Expansion offers several advantages:

Until recently, Y2K experts recommended four-digit expansion as the solution of choice. These recommendations have begun to change because only a short time remains to implement a Y2K fix. This brief remaining time span highlights the drawbacks to four-digit expansion:

Changing from MMDDYY to MMDDYYYY is quite simple in abstract. In practice, you must find and appropriately alter all relevant declarations, output and input statements, calculations, and function calls. Furthermore, any variable that uses a changed variable (and any variable that uses the variable using the changed variable, and so on) must be checked and perhaps changed.

Windowing

One frequently suggested alternative to date expansion is a logic solution such as windowing. In windowing, dates of the form MMDDYY translate on-the-fly into unambiguous (MMDDYYYY, for instance) dates. Windowing is a simple concept. Human beings easily implement it mentally when we automatically translate a date like 07/04/98 to 07/04/1998, or 02/29/00 to 02/29/2000.

Windows come in two varieties -- fixed and sliding. Fixed windows have a hard-coded comparison algorithm that prefixes a "19" to all two-digit dates in a given range (all two-digit year dates, where 50 < YY <= 99, for example) and a "20" to all remaining dates (00 <= YY <= 50). Sliding windows parameterize the comparison so that the range of dates moves as the current date changes; see Figure 1 and Listing One

Regardless of the windowing mechanism chosen, windows are quicker and require fewer coding changes to implement because they only apply where usage demands. Consider, for instance, the calculation AGE = CURRENT_DATE - BIRTH_DATE. For MMDDYY, this results in a negative (and hence erroneous) number if CURRENT_DATE and BIRTH_DATE fall on different sides of 01/01/2000. Windowing corrects this error. Windowing expands the date variable references to four digits at the point of calculation -- AGE = WINDOW(CURRENT_DATE)-WINDOW(BIRTH_DATE). The declarations of AGE, CURRENT_DATE, BIRTH_DATE need not change, nor does any other usage necessarily need to change just because this usage needed a window. Reduced implementation effort is far and away the principal driver for windowing, although windows have the added advantage of avoiding the extra data storage space needed by date expansion.

Although windowing may be the solution of the hour, it is not without disadvantages. Windowing is a quick fix -- it works for the moment. But don't count on it forever. The three major concerns with windows are:

Sliding windows somewhat mitigate the first difficulty. They move forward as time advances, extending their life span (as opposed to fixed windows where the useful time span eventually expires). Still, windowing solutions are not appropriate in many places -- for instance, the AGE calculation, if CURRENT_DATE and BIRTH_DATE are more than a century apart. Unfortunately, sliding windows suffer even more acutely from the second and third difficulties than fixed windows, because they are functionally more complex.

The two common solutions, expansion and windowing, neglect the fundamental cause of the Y2K problem -- storage overflow. Computers can store much more information in six places than is allowed by the MMDDYY format. Storing more information in the same space is the essence of the Y2K solution known as "compression." Nothing is really being "compressed." Compression simply uses a more concise date representation method. Compression is one of the more technically sophisticated Y2K solutions. Unfortunately, many programmers are not aware of it.

Compression

Compression shares many of the advantages and disadvantages of expansion and windowing. Among compression's advantages are:

Compression also has comparative advantages with respect to windowing and expansion. For instance, compression requires no more source code changes than does expansion, yet it can store much more information. Further, except for output to human-readable forms, compression requires no function calls -- a significant advantage over windowing. This does not mean that compression is suitable for every situation. Compression works well when:

The compression formats given later provide a wide range of alternatives for date storage. We've included C source code for converting between MMDDYYYY and each compressed format and back again. These calculations rely on knowing leap years; see Listing Two. These are useful when conversion to a user-readable format is required, or some calculation or interface with MMDDYYYY format is needed.

The compressed formats described next are in order of increasing date-storage capacity.

The CYYDDD Format

Using 1900 as a starting date (you can choose any date), one compression method is to count the number of years elapsed since 1900, and the number of days since the beginning of the year. The number of years is recorded in a three-digit field, where the first digit (C) represents the number of centuries elapsed since 1900 and the second and third digits (YY) are the year in the century. Places four through six (DDD) are the day count in the year beginning with January 1. For example, July 4, 1999, is 099185. July 4, 2055, would be 155185. This storage format can count 10 centuries, 99 years, and 365 (366 for a leap year) days; it can store a total of 1100 years. Converting MMDDYY to CYYDDD (see Examples 1 and 2) permits dates from January 1, 1900, to December 31, 2999.

This format is useful when some user-readability is important but storage space requirements do not permit date expansion, and date range requirements make the single-century-at-a-time limits of windows unfeasible. Expansion to four-digit years is usually preferable in this case, if practical.

The DDDDDD Format

Another approach is to count the number of days since a certain date rather than the number of years, months, and days. For example, storing date information as DDDDDD -- counting days from January 1, 1900 -- means the same six places that can only store dates from January 1, 1900, to December 31, 1999, in MMDDYY can store 1 million days in DDDDDD. This is equivalent to more than 2700 years. Converting MMDDYY to DDDDDD (see Examples 3 and 4) lets you track dates from January 1, 1900, to past 4600 a.d.

This format is useful when storage space does not permit expansion and dating requirements do not allow windows. If possible, expansion to four-digit years is still preferable.

The MMDD 16-bit Year Format

One easily implemented and natural way of storing more date information is to leave the month (MM) and day (DD) fields alone and replace the year field (YY) with a bit register. The two-place year field is equivalent to a 16-bit long register, assuming that the computer in question puts eight bits in each character byte. Essentially, the format is the functional equivalent of MMDDYYYYY. (Note the extra Y.) Consequently, there is no need for a complicated algorithm -- the year recorded is a YY expanded by windowing, just stored in binary rather than characters or digits. Since 16 bits stores 216-1 = 65,535 years, dates from January 1, 0001, a.d. to December 31, 65,535, a.d. may be stored without ambiguity.

This format is generally more useful than four-digit date expansion, unless user-readability of machine-stored dates is a vital requirement. This format has a far wider range and uses less space than four-digit expansion, and converts to and from user-readable date information easily. The format also requires fewer function calls than windowing, hence is more efficient at run time. It also avoids windowing's date-span difficulties while using no more storage space than a windowed date.

The 48-bit Date Format

A format that allows storage of very long time spans is to count the number of days since a given date and store the information as a 48-bit long register. If the storage format of the computer being used equates eight bits to one byte, this 48-bit long register exactly replaces the MMDDYY date format with respect to storage space. This format is similar to the DDDDDD format. (Use the source code given with that format for conversions.) However, using bits rather than separate integers is a more efficient storage method. It is so effective that counting from January 1, 1600, a.d. allows date storage to 768,151,959,528 a.d. This is a sufficient time span for most applications.

The all-bits representation is useful when storing long date ranges. If desired, less than the full 48-bit representation may be used (say, 40- or 32-long bit register) to store date information with reduced storage space. For instance, using four eight-bit registers saves two bytes of space and is sufficient to store a count of 4,294,967,295 days, or about 11.75 million years. This format makes direct user interpretation of stored dates very difficult. Unless reduced storage space for dates is needed or long span date storage is desirable, the MMDD 16-bit year format is preferable.

Use the algorithms in Examples 3 and 4 (MMDDYYYY to DDDDDD and DDDDDD to MMDDYYYY ) for conversions.

Conclusion

The Y2K literature suggests many other date storage formats. These formats are often variations of those suggested in this article (MMDDHH, counting years in hexadecimal rather than decimal digits or HDDCYY, counting months in hexadecimal and years as a century offset (C) plus years, and so on). The comparative advantages and disadvantages of these formats are similar to the earlier formats.

Date compression as a Y2K solution is not appropriate everywhere. For instance, windowing is the necessary choice of desperation for many, given the short time span in which we must implement a solution. Expansion is a straightforward choice, easily understood by anyone. However, when a project is not on a death march, when time spans longer than a century must be stored, or when using extra storage space is inconvenient, compression is an ideal Y2K solution.

DDJ

Listing One

/* routine:  sliding_window*  Author:  Robert Moore
*  Takes a reference four digit year (year), a window width above the 
*  reference year (upperwinwith), and a one or two-digit year being queried 
*  (qyear), and returns a four-digit representation (ryear) of the query 
*  year based on the window and the reference year if the query year is one 
*  or two digits. Query years greater than two digits are just returned. 
*  Setting reference year to a constant would make this a fixed window.
*/
int sliding_window(int year, int upperwinwidth, int qyear)
{
  int yeardate, centurydate, ryear, temp;
  ryear = -1;  /* status that will be returned on error */
  if (qyear < 100)  /* the query year is a two digit year -- 
                       convert to four */
    {               /* bounds check */
      if ((year >= 0) && (0 < upperwinwidth) && (0 <= qyear)) 
    {  /* everything a proper bound */
      yeardate = year % 100;  /* convert year to a year */ 
      centurydate = year - yeardate;  /* and century */
      temp = yeardate + upperwinwidth;  /* calculate upper window limit */
      if (temp < 100)
        {
          if (temp <= qyear )
        ryear = centurydate - 100 + qyear; /* year wraps to previous */
          else                                 /* century:  case 1b */
        ryear = centurydate + qyear; /* same century:  case 1a or 1c*/
        }
      else
        {

          if (qyear < (temp - 100))
       ryear = centurydate + 100 + qyear; /* year in next century: case 2c */
          else
       ryear = centurydate + qyear; /* same century:  case 2a or 2b */
        }    
    }
    }
  else
    ryear = qyear; /* three or more digit year--don't convert */
  return ryear;
}

Back to Article

Listing Two

/*---------------------------------------------------------------/*  Authors:  D. Greg Foley, Robert L. Moore
/*  Description: This function determines if a four digit year is a leap
/*   year. A year is a leap year if it is divisible by 4 but not by 100, 
/*   except years that are divisible by 400.  
/*  Parameter Description:
/*  int Year - a four-digit year
/*  returns 1 if the year is a Leap Year
/*          0 if the year is NOT a Leap Year
/*  Notes: For clarity of the algorithm no error checking is included.
/*--------------------------------------------------------------*/
int LEAPYEAR(int Year)
{

    if ((Year % 4 == 0 && Year % 100 != 0) || Year % 400 == 0)
      return 1;   /* This is a LEAP year */
    else
      return 0;   /* This is NOT a LEAP year */
}

Back to Article


Copyright © 1998, Dr. Dobb's Journal

Date Compression and Year 2000 Challenges

MMDDYYYY to CYYDDD
Represent YYYY as (Y1)(Y2)(Y3)(Y4). Then C = 
(Y1*10+Y2)-19. The "YY" in the CYYDDD are Y3 and Y4. 
Use MMDDtoDDD.c (Listing Three; available 
electronically; see "Resource Center," page 3) to 
complete the conversion.

Example 1: MMDDYYYY to CYYDDD.


Copyright © 1998, Dr. Dobb's Journal

Date Compression and Year 2000 Challenges

Date Compression and Year 2000 Challenges

By Robert L. Moore and D. Gregory Foley

Dr. Dobb's Journal May 1998

CYYDDD to MMDDYYYY
Assuming the CYYDDD count begins with 1900 A.D., 
YYYY = (19+C)*100+YY. Use DDDtoMMDD.c (Listing Five; 
available electronically) to complete the conversion.

Example 2: CYYDDD to MMDDYYYY.


Copyright © 1998, Dr. Dobb's Journal

Date Compression and Year 2000 Challenges

Date Compression and Year 2000 Challenges

By Robert L. Moore and D. Gregory Foley

Dr. Dobb's Journal May 1998

MMDDYYYY to DDDDDD
Use MMDDToDDD.c (Listing Three; available 
electronically) to convert the MMDD part of MMDDYYYY 
to DDD--the day count within a year. Then use 
DDDYYYYToDDDDDD.c (Listing Six; available 
electronically) to convert DDD and the YYYY part of 
MMDDYYYY to DDDDDD.

Example 3: MMDDYYYY to DDDDDD.


Copyright © 1998, Dr. Dobb's Journal

Date Compression and Year 2000 Challenges

Date Compression and Year 2000 Challenges

By Robert L. Moore and D. Gregory Foley

Dr. Dobb's Journal May 1998

DDDDDD to MMDDYYYY
Use DDToDDD4Y.c (Listing Four; available 
electronically) to convert from DDDDDD to DDDYYYY. 
This is a four digit year and a count of days 
starting at January 1 within that year. Use 
DDDToMMDD.c (Listing Five; available electronically) 
on the DDD part of DDDYYYY to convert DDD to MMDD. 
The combined result is MMDDYYYY.

Example 4: DDDDDD to MMDDYYYY.


Copyright © 1998, Dr. Dobb's Journal

Date Compression and Year 2000 Challenges

Date Compression and Year 2000 Challenges

By Robert L. Moore and D. Gregory Foley

Dr. Dobb's Journal May 1998

Figure 1: Sliding windows.


Copyright © 1998, Dr. Dobb's Journal

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.