Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Database

The CVS Data Format


May99: The CVS Data Format

Cesar is a researcher for the Landscape Archaeology Research Unit at the University of Santiago de Compostela in Spain. He can be contacted at [email protected].


As a computer specialist working with archaeologists, I've found many areas of activity that suffer from lack of appropriate tools and methods. One of the most notorious areas involves the use of cartographic information to locate and set in context archaeological sites and other geographical places. Paper maps are often the only means of dealing with geographical locations, apart from lists of coordinates, which seldom solve any problem. Of course, commercial Geographical Information System (GIS) packages exist, but none combine power with ease of use. They tend to have too many features and are less than intuitive for those lacking computer training -- not to mention they are usually expensive or a pain in the neck to use.

Consequently, I and other members of the Landscape Archaeology Research Unit at the University of Santiago de Compostela decided to invest in research on simple cartographic representations for geographic location and reference. As a result, we designed and implemented a new data format and a small set of accompanying tools.

The biggest problem when dealing with cartographic information is the huge amount of data needed to acceptably manage and display a medium-sized area. Our archaeological work is strongly based on the zoom principle, which says that any study must be done at several scales centered around the same area to be precise and in context. Also, our area of work covers the whole Galicia, over 30,000 square kilometers. In addition, we specialize in archaeological impact assessment, which often involves working in linear-track works such as motorways or pipelines, involving very long and narrow work areas instead of the classical circular ones.

We envisioned a system capable of displaying a layered contour map, relieving users from intrusive tasks such as changing sheets after hitting a sheet border or changing scales. Also, a major problem in some GIS tools is the huge number of files they create. Since we believe that the users shouldn't have to worry about thousands of files and relationships among them, we decided that the system should integrate all the information about a specific wide area in a single file, including different levels of detail. We did not attempt to perform automatic geographic generalization, but instead to store already-computed data about different levels of detail into one file. The system would then select the most appropriate data set from context information, such as the working scale, output destination, and user preferences.

Furthermore, the huge amount of information required to deal with cartography results in the need to index the data inside the files so retrieval is fast enough. A 2D indexing scheme was needed, because cartographic data is almost always retrieved following an inside-rectangle test. Our own experiments showed that raw lists of coordinates with no indexing did well in small areas (up to 50 km2), but performed badly above this limit.

The CVS Data Format

The result of our work is the CVS (which, in English, stands for the "Segmented Vectorial Cartography") data format, which stores homogeneous cartographic data for a specific geographic area into a single file, optionally including different levels of detail and offering a two-dimensional indexing scheme.

A CVS file holds what in classical terms could be called a layer, or information relative to a single thematic coverage. The layer concept has been extensively used in the GIS world and is beyond the scope of this discussion. To obtain a complete map, several layers are usually necessary, so several CVS files are needed.

The information inside a CVS file is partitioned into levels, corresponding each to a level of detail at which the geographic information of the area can be represented. In fact, all the levels in a CVS file represent the same area, but at different details. Thus, each level is more suitable to be displayed for a specific range of working scales. Levels are the means by which the zoom principle can be successfully applied.

Also, the information inside a CVS file is partitioned into sectors, corresponding each to a rectangle on the area to be represented. Division in sectors is made separately for each level, so low-detail levels can be partitioned in few sectors (even in just one), and high-detail levels can be divided in up to 65,536 sectors in the current implementation. Sectors are the way to achieve two-dimensional indexing.

Every sector in a CVS file contains curves, as the CVS format is initially oriented to deal with contour maps. Each curve is stored as a sequence of points. A curve spanning two or more sectors is accordingly split in as many curve segments as needed, all of them with the same curve identifier. We plan to improve the CVS specification with capabilities to store different kinds of information other than curves.

Format Description

As Figure 1 illustrates, each CVS file contains a file header, data header, one or more levels, and one or more sectors for each level. The file header contains a magic number identifying the file as a CVS file, information about format revision (currently Version 2), and room for future extensions such as the content type. Currently, no content type is specified as only one is implemented. The data header holds the minimum and maximum values for x-, y-, and z-coordinates in the whole file, the number of levels in the file, and some data for each of them. In turn, each level contains a level header and data for each sector inside it. The level header carries the sector count for this level, and some information for each of them. Each sector holds a sector header and the cartographic information itself in the form of curves. Curves are not indexed, and consist of a curve identifier and a sequence of coordinate triplets.

The file pointers from the LevelInfo and SectorInfo data elements in Figure 1 point to LevelData and SectorData, respectively, and constitute the foundation of the indexing mechanism. The ScaleFrom and ScaleTo fields for each LevelInfo are stored in meters per pixel (MPP), a good way to express scale on digital media. The higher these values, the lower the level of detail. On a 17-inch monitor with a resolution of 1024×768 pixels, 100 MPP correspond to a 1:320,000 conventional scale. Also, 64-bit floating-point numbers are used to store coordinates, allowing the CVS data format to deal with Universal Transverse Mercator (UTM) coordinates, our system of choice as the whole of Galicia is contained into a single UTM zone.

Finally, the CVS data format performs a little trick to improve data retrieval performance. Each sector stores all the vertices of the curves it includes, plus two more optional vertices for each curve, one before the first vertex inside the sector (in case the curve starts outside the sector), and the other after the last vertex (in case the curve ends outside the sector). This offers the whole path of a curve segment for each sector. See Figure 2 for details on these off-by-one vertices.

How It Works

Assume that a CVS file is stored on disk, and that some piece of software wants to read it to display a map. After checking the magic number to reduce the risk of file type conflicts, and verifying that the CVS revision of the file is compatible with that of itself, the software checks that the map it intends to display is intersected by the area specified by the data header fields FromX, FromY, FromZ, ToX, ToY, and ToZ. If not, no useful data is contained in the CVS file. If this test is successful, the software scans through every LevelInfo element to find the one with an appropriate range of scales, looking at the ScaleFrom and ScaleTo fields. Once found, the software can follow the LevelInfo's Pointer into a LevelData, which will contain a header with the sector count and some information for each sector. Scanning through SectorInfo elements, the software builds a list of which sectors are to be retrieved to draw the map, by computing whether or not each sector area, given by the FromX, FromY, ToX, and ToY fields, intersects the wanted map area. Once this list is built, the software must iterate over it, navigating to the cartographic information by using each SectorInfo's Pointer field into a corresponding SectorData. From this element, the software reads the curve count and starts iterating over every curve. Curves are not indexed or delimited, so retrieving all the curves in a sector is, in the current form of CVS, a strictly sequential process. Each curve starts with a CurveHeader element that holds a curve identifier and a vertex count, after which follows a sequence of vertices, each one consisting of x-, y-, and z-coordinates.

Tools

We've developed a number of tools to work with CVS files.

  • Format converter, which converts CVS files from Drawing eXchange Format (DXF) files. DXF files are a common way to interchange vectorial drawings, and AutoCAD (our main digital input tool) can output them easily. Nevertheless, we already had a DXF parser and converter developed in-house, so the finite state machine implementation to extract information from DXF files existed already. We decided to let our DXF parser convert DXF files into an intermediate format called "DAT," and build a DAT-to-CVS converter.

  • The DAT2CVS converter works by first specifying a DAT input file and a CVS output file (which will be created), and then by mapping one or more layers in the DAT file -- retained from the original DXF file -- to each of the desired levels in the output CVS file. A DAT layer can be input to none, one, or several levels, and each level can merge data from one or more DAT layers.

  • After specifying how many levels are desired, and setting the mapping options between layers and levels, a quad tree depth must be chosen for each level. The whole area spanned by the DAT file is then recursively divided into quarters up to the selected depth. The current implementation of the DAT2CVS converter allows the user to specify a lower limit of 0 (resulting in just one sector, or no division) and an upper limit of 8 (resulting in 65,536 sectors), although the CVS format is not limited in this way.

  • Some options can also be changed, such as the directory for temporary files and the Ratio Source to Destination (RSD), which is used to decrease the amount of disk space used during the conversion. (Reducing the RSD can lower required disk space during conversion, but the more it is reduced, the greater the chances of unrecoverable errors. In the case of such errors, the DAT2CVS converter sends a message, recommends adjusting the RSD, and quits. Conversion must then be started again.) Huge amounts of disk space are usually needed during conversion. As a rule of thumb, an RSD of 70 percent usually works without a glitch when converting conventional maps with an even coverage of contours and four or more sectors. In case of error, raise the RSD up to the safest 100 percent.

  • The format converter (available electronically; see "Resource Center," page 5) is written in Visual Basic and includes a complete setup package (including DLLs and other components).

  • Access library, which encapsulates the particularities of the CVS data format. In its current form, it is an ActiveX DLL that exports five classes: Layer, Levels, Level, Sectors, and Sector. It has been used with Visual Basic 5 programs with great success. The code for the CVS access library presented in Listing One opens a CVS file, gets its first level, and dumps all the curves and vertices of all sectors in that level. To use this code, you need the CVS access library and Microsoft Visual Basic 5.

  • Viewing and dumping tools, which (as their names suggest) view and dump the contents of CVS files. CVSView dumps the contents of any CVS file to the desired depth. Figure 3 shows a CVSView dump.

  • CVSEdit is more sophisticated, as it shows the internal structure of a CVS file in the form of a tree, and allows editing some data fields such as scale ranges for each level or coordinate values. CVSEdit uses the CVS access library described above. Figure 4 shows this tool.

  • Visualizing tools, which test the performance and usability of cartographic user interfaces based on the CVS data format. They use the CVS access library to read and display several layers of information accounting for 20.5 MB. The tools allow panning and zooming, automatically computing which level is best to use and which sectors to display. Figure 5 shows CVSTest.

Current Use and Future Enhancements

The CVS data format is currently being used to provide cartographic facilities to our main information system, used by 25 simultaneous users several hours a day. The CVS files being used cover the whole Galicia, and integrate the full 1:100,000 cartography of this area. CVS files live on our applications server, and each client reads them through a local copy of the CVS access library. The first improvement we have made to the described set of tools is to port the CVS access library from Visual Basic 5 to Visual C++ 5, achieving some performance improvements. (We have not performed measured tests, but our experience indicates that slight improvements are mainly due to the disk-access mechanisms used by C++ libraries in comparison to that of Visual Basic. Thanks to my colleague Roberto Gomez, who ported the CVS access library into Visual C++.) Currently, we are planning to redesign it as a server-side component so only the selected sectors travel through the network, and the sequential portion of the work (iterating over all curves in each sector) benefits from being executed on the server. We have experimented with DCOM and found it suitable for a design like this.

We are also planning to convert the CVSTest tool into an ActiveX control, including a canvas and enough functionality to draw and manage multilayer maps, so any application written in any ActiveX-hosting language could use it. Also, the DAT2DXF converter must be improved both at the performance and disk space requirements sides. Finally, extending the CVS data format to host content kinds other than curves is easy and will be done sometime. We consider digital elevation models and archaeological site distributions as candidate content kinds.

We routinely overlay other data that we use (such as archaeological sites) on top of CVS layers, pulling it from Microsoft SQL Server 6.5, Microsoft Access 96, and CA Jasmine databases, depending on the system. Our main internal working system, the SIA+ Archaeological Information System (an integrated information system for the management of archaeological sites and finds, assessments, projects, people, documents, and images; see http://wwwgtarpa.usc.es/), pulls data from a 45-MB Access database to show geographic locations, zones, and sites atop the CVS layers.

The CVS data format is an inexpensive and easy-to-use solution for those applications that need displaying and making operations with contour maps. We know that many improvements are still necessary to make the CVS data format a professional solution. Any help or collaboration will be welcome.

DDJ

Listing One

'open a CVS file.
Dim ly As New Layer
ly.OpenFile "C:\Temp\Test.cvs"

'get the first level.
Dim lv As Level
Set lv = ly.Levels(1)

'iterate all sectors.
Dim lSectorIdx As Long
Dim sc As Sector
For lSectorIdx = 1 To lv.Sectors.Count
    'get sector.
    Set sc = lv.Sectors(lSectorIdx)
    'output data.
    Debug.Print "Sector " & CStr(lSectorIdx) & ":"
    
    'begin retrieving curve data for this sector.
    Dim lCurveCount As Long
    sc.BeginGetData lCurveCount
    
    'iterate all curves in this sector.
    Dim lCurveIdx As Long, lId As Long
    For lCurveIdx = 1 To lCurveCount
        'get curve info.
        Dim lVertexCount As Long
        sc.GetCurveInfo lId, lVertexCount
        'output data.
        Debug.Print "  Curve " & CStr(lId) & " with " 
                            & CStr(lVertexCount) & " vertices:"
        'iterate vertices for this curve.
        Dim lVertexIdx As Long
        For lVertexIdx = 1 To lVertexCount
            'get vertex data.
            Dim dX As Double, dY As Double, dZ As Double
            Dim bInside As Boolean
            sc.GetVertex dX, dY, dZ, bInside
            'output data.
            Debug.Print "    Vertex (" & CStr(dX) & ", " & CStr(dY) & ", 
                                        " & CStr(dZ) & ") " & CStr(bInside)
        Next lVertexIdx
    Next lCurveIdx
    
    'end retrieving curve data.
    sc.EndGetData
Next lSectorIdx

'close CVS file.
ly.CloseFile

Back to Article


Copyright © 1999, Dr. Dobb's Journal

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.