The CVS Data Format

The CVS data format stores cartographic data for a specific geographic area into a single file. Cesar examines the format, then presents a tool for converting CVS files into DXF format.


May 01, 1999
URL:http://www.drdobbs.com/database/the-cvs-data-format/184410936

May99: The CVS Data Format

Cesar is a researcher for the Landscape Archaeology Research Unit at the University of Santiago de Compostela in Spain. He can be contacted at [email protected].


As a computer specialist working with archaeologists, I've found many areas of activity that suffer from lack of appropriate tools and methods. One of the most notorious areas involves the use of cartographic information to locate and set in context archaeological sites and other geographical places. Paper maps are often the only means of dealing with geographical locations, apart from lists of coordinates, which seldom solve any problem. Of course, commercial Geographical Information System (GIS) packages exist, but none combine power with ease of use. They tend to have too many features and are less than intuitive for those lacking computer training -- not to mention they are usually expensive or a pain in the neck to use.

Consequently, I and other members of the Landscape Archaeology Research Unit at the University of Santiago de Compostela decided to invest in research on simple cartographic representations for geographic location and reference. As a result, we designed and implemented a new data format and a small set of accompanying tools.

The biggest problem when dealing with cartographic information is the huge amount of data needed to acceptably manage and display a medium-sized area. Our archaeological work is strongly based on the zoom principle, which says that any study must be done at several scales centered around the same area to be precise and in context. Also, our area of work covers the whole Galicia, over 30,000 square kilometers. In addition, we specialize in archaeological impact assessment, which often involves working in linear-track works such as motorways or pipelines, involving very long and narrow work areas instead of the classical circular ones.

We envisioned a system capable of displaying a layered contour map, relieving users from intrusive tasks such as changing sheets after hitting a sheet border or changing scales. Also, a major problem in some GIS tools is the huge number of files they create. Since we believe that the users shouldn't have to worry about thousands of files and relationships among them, we decided that the system should integrate all the information about a specific wide area in a single file, including different levels of detail. We did not attempt to perform automatic geographic generalization, but instead to store already-computed data about different levels of detail into one file. The system would then select the most appropriate data set from context information, such as the working scale, output destination, and user preferences.

Furthermore, the huge amount of information required to deal with cartography results in the need to index the data inside the files so retrieval is fast enough. A 2D indexing scheme was needed, because cartographic data is almost always retrieved following an inside-rectangle test. Our own experiments showed that raw lists of coordinates with no indexing did well in small areas (up to 50 km2), but performed badly above this limit.

The CVS Data Format

The result of our work is the CVS (which, in English, stands for the "Segmented Vectorial Cartography") data format, which stores homogeneous cartographic data for a specific geographic area into a single file, optionally including different levels of detail and offering a two-dimensional indexing scheme.

A CVS file holds what in classical terms could be called a layer, or information relative to a single thematic coverage. The layer concept has been extensively used in the GIS world and is beyond the scope of this discussion. To obtain a complete map, several layers are usually necessary, so several CVS files are needed.

The information inside a CVS file is partitioned into levels, corresponding each to a level of detail at which the geographic information of the area can be represented. In fact, all the levels in a CVS file represent the same area, but at different details. Thus, each level is more suitable to be displayed for a specific range of working scales. Levels are the means by which the zoom principle can be successfully applied.

Also, the information inside a CVS file is partitioned into sectors, corresponding each to a rectangle on the area to be represented. Division in sectors is made separately for each level, so low-detail levels can be partitioned in few sectors (even in just one), and high-detail levels can be divided in up to 65,536 sectors in the current implementation. Sectors are the way to achieve two-dimensional indexing.

Every sector in a CVS file contains curves, as the CVS format is initially oriented to deal with contour maps. Each curve is stored as a sequence of points. A curve spanning two or more sectors is accordingly split in as many curve segments as needed, all of them with the same curve identifier. We plan to improve the CVS specification with capabilities to store different kinds of information other than curves.

Format Description

As Figure 1 illustrates, each CVS file contains a file header, data header, one or more levels, and one or more sectors for each level. The file header contains a magic number identifying the file as a CVS file, information about format revision (currently Version 2), and room for future extensions such as the content type. Currently, no content type is specified as only one is implemented. The data header holds the minimum and maximum values for x-, y-, and z-coordinates in the whole file, the number of levels in the file, and some data for each of them. In turn, each level contains a level header and data for each sector inside it. The level header carries the sector count for this level, and some information for each of them. Each sector holds a sector header and the cartographic information itself in the form of curves. Curves are not indexed, and consist of a curve identifier and a sequence of coordinate triplets.

The file pointers from the LevelInfo and SectorInfo data elements in Figure 1 point to LevelData and SectorData, respectively, and constitute the foundation of the indexing mechanism. The ScaleFrom and ScaleTo fields for each LevelInfo are stored in meters per pixel (MPP), a good way to express scale on digital media. The higher these values, the lower the level of detail. On a 17-inch monitor with a resolution of 1024×768 pixels, 100 MPP correspond to a 1:320,000 conventional scale. Also, 64-bit floating-point numbers are used to store coordinates, allowing the CVS data format to deal with Universal Transverse Mercator (UTM) coordinates, our system of choice as the whole of Galicia is contained into a single UTM zone.

Finally, the CVS data format performs a little trick to improve data retrieval performance. Each sector stores all the vertices of the curves it includes, plus two more optional vertices for each curve, one before the first vertex inside the sector (in case the curve starts outside the sector), and the other after the last vertex (in case the curve ends outside the sector). This offers the whole path of a curve segment for each sector. See Figure 2 for details on these off-by-one vertices.

How It Works

Assume that a CVS file is stored on disk, and that some piece of software wants to read it to display a map. After checking the magic number to reduce the risk of file type conflicts, and verifying that the CVS revision of the file is compatible with that of itself, the software checks that the map it intends to display is intersected by the area specified by the data header fields FromX, FromY, FromZ, ToX, ToY, and ToZ. If not, no useful data is contained in the CVS file. If this test is successful, the software scans through every LevelInfo element to find the one with an appropriate range of scales, looking at the ScaleFrom and ScaleTo fields. Once found, the software can follow the LevelInfo's Pointer into a LevelData, which will contain a header with the sector count and some information for each sector. Scanning through SectorInfo elements, the software builds a list of which sectors are to be retrieved to draw the map, by computing whether or not each sector area, given by the FromX, FromY, ToX, and ToY fields, intersects the wanted map area. Once this list is built, the software must iterate over it, navigating to the cartographic information by using each SectorInfo's Pointer field into a corresponding SectorData. From this element, the software reads the curve count and starts iterating over every curve. Curves are not indexed or delimited, so retrieving all the curves in a sector is, in the current form of CVS, a strictly sequential process. Each curve starts with a CurveHeader element that holds a curve identifier and a vertex count, after which follows a sequence of vertices, each one consisting of x-, y-, and z-coordinates.

Tools

We've developed a number of tools to work with CVS files.

Current Use and Future Enhancements

The CVS data format is currently being used to provide cartographic facilities to our main information system, used by 25 simultaneous users several hours a day. The CVS files being used cover the whole Galicia, and integrate the full 1:100,000 cartography of this area. CVS files live on our applications server, and each client reads them through a local copy of the CVS access library. The first improvement we have made to the described set of tools is to port the CVS access library from Visual Basic 5 to Visual C++ 5, achieving some performance improvements. (We have not performed measured tests, but our experience indicates that slight improvements are mainly due to the disk-access mechanisms used by C++ libraries in comparison to that of Visual Basic. Thanks to my colleague Roberto Gomez, who ported the CVS access library into Visual C++.) Currently, we are planning to redesign it as a server-side component so only the selected sectors travel through the network, and the sequential portion of the work (iterating over all curves in each sector) benefits from being executed on the server. We have experimented with DCOM and found it suitable for a design like this.

We are also planning to convert the CVSTest tool into an ActiveX control, including a canvas and enough functionality to draw and manage multilayer maps, so any application written in any ActiveX-hosting language could use it. Also, the DAT2DXF converter must be improved both at the performance and disk space requirements sides. Finally, extending the CVS data format to host content kinds other than curves is easy and will be done sometime. We consider digital elevation models and archaeological site distributions as candidate content kinds.

We routinely overlay other data that we use (such as archaeological sites) on top of CVS layers, pulling it from Microsoft SQL Server 6.5, Microsoft Access 96, and CA Jasmine databases, depending on the system. Our main internal working system, the SIA+ Archaeological Information System (an integrated information system for the management of archaeological sites and finds, assessments, projects, people, documents, and images; see http://wwwgtarpa.usc.es/), pulls data from a 45-MB Access database to show geographic locations, zones, and sites atop the CVS layers.

The CVS data format is an inexpensive and easy-to-use solution for those applications that need displaying and making operations with contour maps. We know that many improvements are still necessary to make the CVS data format a professional solution. Any help or collaboration will be welcome.

DDJ

Listing One

'open a CVS file.
Dim ly As New Layer
ly.OpenFile "C:\Temp\Test.cvs"

'get the first level.
Dim lv As Level
Set lv = ly.Levels(1)

'iterate all sectors.
Dim lSectorIdx As Long
Dim sc As Sector
For lSectorIdx = 1 To lv.Sectors.Count
    'get sector.
    Set sc = lv.Sectors(lSectorIdx)
    'output data.
    Debug.Print "Sector " & CStr(lSectorIdx) & ":"
    
    'begin retrieving curve data for this sector.
    Dim lCurveCount As Long
    sc.BeginGetData lCurveCount
    
    'iterate all curves in this sector.
    Dim lCurveIdx As Long, lId As Long
    For lCurveIdx = 1 To lCurveCount
        'get curve info.
        Dim lVertexCount As Long
        sc.GetCurveInfo lId, lVertexCount
        'output data.
        Debug.Print "  Curve " & CStr(lId) & " with " 
                            & CStr(lVertexCount) & " vertices:"
        'iterate vertices for this curve.
        Dim lVertexIdx As Long
        For lVertexIdx = 1 To lVertexCount
            'get vertex data.
            Dim dX As Double, dY As Double, dZ As Double
            Dim bInside As Boolean
            sc.GetVertex dX, dY, dZ, bInside
            'output data.
            Debug.Print "    Vertex (" & CStr(dX) & ", " & CStr(dY) & ", 
                                        " & CStr(dZ) & ") " & CStr(bInside)
        Next lVertexIdx
    Next lCurveIdx
    
    'end retrieving curve data.
    sc.EndGetData
Next lSectorIdx

'close CVS file.
ly.CloseFile

Back to Article


Copyright © 1999, Dr. Dobb's Journal
May99: The CVS Data Format

Figure 1: The CVS data format. Magic equals 0xABCDEF88, and CVSRevision equals 2. Definitions on the left side are developed on the right side. Data elements with labels that end with a colon are defined later. Numbers preceded by a plus sign above the data elements show the byte offset of each data element from the start of the definition. Numbers below data elements show the length in bytes of the data element (4 means a 32-bit integer and 8 means a 64-bit floating-point number).


Copyright © 1999, Dr. Dobb's Journal
May99: The CVS Data Format

Figure 2: Curve segment inside a sector. In (a), the curve is represented by a thick dark line. Curve vertices are marked as small circles. The sector boundaries are drawn as rectangles. Vertices inside the sector are marked dark, while vertices outside the sector are marked blank. The curve path for the sector is drawn with a thin gray line underimposed to the curve line. Notice that two vertices outside the sector are needed in order to fully specify the curve path. These two vertices are called "off-by-one vertices." With off-by-one vertices, the curve segment can be drawn as in (b). Without them, the curve could only be drawn as in (c).


Copyright © 1999, Dr. Dobb's Journal
May99: The CVS Data Format

Figure 3: CVSView. The file name and size are displayed at the top. The CVS revision number follows. Data header information is then displayed, including the byte offset of each field in the file (in both hexadecimal and decimal notations) and the field value itself. Notice the first level pointer at offset 0x84, pointing at 0x9C, and corresponding level data at the bottom starting at that offset.


Copyright © 1999, Dr. Dobb's Journal
May99: The CVS Data Format

Figure 4: CVSEdit. The hierarchy of levels, sectors, curves, and vertices can be seen. Properties with a blue icon are user editable.


Copyright © 1999, Dr. Dobb's Journal
May99: The CVS Data Format

Figure 5: CVSTest. Contours and rivers can be seen on the map, corresponding to two different CVS files. Sectors for the current level of the contours file are displayed in blue. An information window (floating on the map) shows data about levels and sectors being used.


Copyright © 1999, Dr. Dobb's Journal

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.