Channels ▼

Walter Bright

Dr. Dobb's Bloggers

Arrays: Core or Library Type?

May 28, 2008

As programming languages provide ever more powerful ways to specify user defined types, where do arrays fit in? Are arrays a fundamental type that must be in the core language, or are they a type that can be built up from other core types, and so be a user defined library type?

 

In languages that have pointers, there are several ways in which an array of type T could be represented, such as (in D programming language notation):

  1. length and pointer to the array data:
    struct Array { size_t length; T* begin; }
  2. pointer pair representing the start and end of the array:
    struct Array { T* begin; T* pastEnd; }
  3. length prefixed array data:
    struct Array { size_t length; ... data ... }
  4. slice out of memory block:
    struct Array { void* datablock; T* begin; size_t length; }

Add in some operator overloading for [ ] and other boilerplate, and there's a nice, servicable user defined array type.

Or is it?

Consider it from the perspective of a tool that is analyzing the types. That tool could be the compiler, the optimizer, a debugger, the reflection/introspection facilities, the garbage collector, the security auditor, etc. Some of these tools will collect information at compile time (static analysis) while others at runtime (dynamic analysis). The tool will see the pointer(s), and that's it. It will have no idea that it is pointing to an array, how many elements are in that array, or even if the pointer is pointing off the end of an array and is not pointing at any valid data at all. It hits an impermeable wall in its attempt to discover things about the program both at compile time and at run time.

There are a couple of ways one might break through that barrier. The first is for the tool to recognize the struct Array as something special that the tool has special knowledge of. While this works, it flies in the face of the whole point of a user defined array type. The user cannot change how the Arrays work, they are essentially fixed and become core types anyway.

The second way is to establish conventions for a user defined aggregate type like begin(), end(), and an iterator type. Then the tool looks for those names and uses them to walk the array type. The problem is that the code for begin(), end(), etc., can be arbitrarily complex, and so the tool must now be able to execute code. This requires a massive increase in the implementation complexity of the tool, as compared with the pretty simple case of just having a built in array type. Worse, the tool will have to use this strategy for every aggregate because it cannot tell the difference between an array and any other collection type. Dealing with this will also unacceptably slow down things like type aware garbage collectors that need to quickly walk a runtime data structure.

In addition, consider the problems an introspection facility must face:

  • A generic swap needs to be able to 'crack' open the type and look at its composition. It can't do anything with a pointer.
  • A deep copy can be implemented generically with core arrays. But with user defined arrays, the generic facility will see the pointer, but will have no idea how that relates to the array. The memory held by the array object cannot be transitively accessed.
  • A move can be implemented generically with core arrays. But how would
    it be done with a pointer?
  • Many generic algorithms rely on being able to tell if two memory blocks are distinct or are overlapping, or whether they have interior references. (For example, a generic move, copy or swap would need to detect interior references so they can be fixed up.) Relying on begin(), end() and an iterator does not give a clue as to whether the memory is continuous or not.
  • Templates parameterized on values (as opposed to types) have been shown by C++ to be very useful and popular. Such templates could be easily specialized on a built-in array value (with strings as an obvious  candidate). But without built-in arrays, the proposition becomes much more difficult as the template engine must impose its own convention for passing pointers and sizes around.

Another problem with user defined arrays is that their implementations require pointer arithmetic. This means that the array implementations must necessarily be outside any memory safe subset of the language. The industry and academia both agree that the Trusted Computing Base (i.e. code that is not demonstrably safe and must be considered safe "by faith") should be minimal in size so as to reduce bug incidence. See for example Daniel Bernstein's notes about the exceptionally safe qmail program (http://cr.yp.to/qmail/qmailsec-20071101.pdf) or Nickolai Zeldovich's work (http://www.scs.stanford.edu/~nickolai/app/zeldovich-research.pdf).

Other collection types, such as linked lists, binary trees, etc., do not have this issue because they can be built up out of safe references and arrays. Arrays cannot be, and so they should properly be put into the core language.

This article was inspired and contributed to by Andrei Alexandrescu.

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Video