Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Database

HDF: The Hierarchical Data Format


The Future of HDF

Dr. Dobb's Journal May 1998

The Future of HDF

By Mike Folk

Mike is the HDF Project Manager at NCSA. He can be reached at [email protected].


Scientists typically use one computer to render results for visualization, and another to further analyze and visualize the data. Furthermore, they frequently share data with colleagues. The need to use a mix of computers and transport large amounts of data among many different computers was an early data management problem for many scientists at the University of Illinois National Center for Supercomputing Applications (NCSA).

In response, NCSA developed the Hierarchical Data Format (HDF) in 1988. NCSA HDF is a portable, self-describing data format for moving and sharing scientific data in networked, heterogeneous computing environments. HDF can store several different kinds of data objects, including multidimensional arrays, raster images, color palettes, and tables. It allows individual scientists to mix and group different kinds of data in one file, according to their needs. NCSA provides a library of APIs for reading and writing HDF as well as workstation tools for visualizing data stored in HDF files.

Although HDF has evolved to meet new requirements, support new kinds of scientific data and applications, and operate effectively in new computing environments, some important new requirements seriously test the original design of HDF. Examples of these new requirements include:

  • The need to store very large objects (the current HDF limit is two gigabytes).
  • The need to store large numbers of objects (the current limit is 20,000 objects).
  • More general, flexible data models.
  • Performance improvements.
  • Compatibility with object-oriented databases and distributed-object technologies.

To address these new needs, the NCSA HDF project is working on a prototype for the next generation of HDF, codenamed "HDF 5." Current plans call for three fundamental changes in HDF 5:

Unified data model. The proposed data model will support only one datatype: a multidimensional array of atomic elements. The new object will have two required attributes: dimensionality (the number and sizes of dimensions) and a data type (a definition of the array elements type). More data types will be supported, including record structures. Objects will include optional user-defined attributes of the form "parameter = value." Users will specify optional physical storage schemes for the data, such as compressed storage and possibly an indexed structure. For backward compatibility, the new HDF object is designed so that all current objects can be defined as subtypes of this basic object type.

New file structure. The new file structure will support files and objects of any size and any number of objects. The internal structure for describing objects is simpler than the current structure and should provide faster, easier access to objects.

New I/O library. In planning the next-generation HDF library, NCSA developers hope to exploit similarities between HDF and other popular scientific data formats by building a system that understands a variety of different data models and formats. APIs at the top level allow programs to view data according to a variety of different data models. These APIs communicate with the middle layer that interprets their requests in terms of a common model. The service layer consists of different file-format drivers, each of which reads from or writes to one file format. Each driver has a well-documented interface for transferring objects and lists of objects to the higher arbitration layer. Possible drivers in the first implementation include HDF, BigHDF, netCDF, and FITS.

DDJ


Copyright © 1998, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.