Channels ▼

Walter Bright

Dr. Dobb's Bloggers

Core vs Library

April 12, 2008

When thinking about the design of a programming language, the inevitable question arises: what types should be part of the core language, and which should be put in a library? One would think that ideally the language would be so powerful that all types can be specified in the library. But I'm not so sure about that.

Let's list the advantages of a type being a core language feature:

  • Customized syntactic forms
  • Type's primitive operations may not be expressible in the language
  • Gains important optimizations based on the compiler 'understanding' the type
  • Consistent, reliable behavior
  • Better error messages

And the advantages of a type being a library feature:

  • It can be developed independently of the compiler
  • It simplifies the development of the compiler
  • The user can customize it
  • It can be added on later without having to modify the compiler
  • The implementation of it can be easily inspectable by the user
  • It can drive improvements to the language to better support user defined types
  • There can be vastly more library types than could ever be possible in the core language

For fun, let's imagine that the integer type should be implemented as a library feature. What are the consequences of this? First off, we won't have any integer literals, so we'll need a function to create them. To create an integer literal for the number 567:

    Int("567")

(Of course, in a way, such syntax only transfers the problem of core vs. library from integers to strings.)

The language may support user definable tokens, but there haven't been any successful production quality languages that do this. Next, there may not be any arithmetic operators. If the language does not allow the user definition of infix operators, you'll be reduced to things like representing the expression x+5*y as:

    Add(x, Mul(Int("5"), y)

It's starting to look rather unpleasant. The Add() and Mul functions would also have to be written in a foreign language, like assembler or C, to get any reasonable performance.

What more do we give up with integers as library types? We toss out the window pretty much all the optimizations the compiler can do on integers. After all, the compiler knows a lot about integers and arithmetic (at least the programmer writing the compiler did). A typical compiler will optimize things like:

    5+1 => 6
    x*2 => x<<1
    (x+2)+4 => x+6
    (x+2)+foo() => (foo()+x)+2

(This last one is computable in fewer registers.)

There are a lot of those patterns. Then there are loop induction variable optimizations, where integer indices are replaced with pointers. In the code generator, there is a lot of effort expended to efficiently map integers onto machine operations and registers, for example:

 a = x/10
 b = x%10

can be done with one divide instruction rather than two.

It's very hard to see how this could be pushed into a library type. Even if it could be done, it makes the library as or more complicated than the compiler, making it hard to see the win. I don't know of any usable language that doesn't make integers a core language type, with the notable exception of bash.

On the other hand, there's the complex data type that consists of two floating point values - a real and imaginary part. Complex made its debut with FORTRAN, it was added to C99, and is even in the D programming language. But advancing compiler technology has whittled away its core advantages one by one, and it's getting increasingly hard to justify it as a core type.

For example, compilers have traditionally treated a struct as a monolithic block of data. For a user defined complex type, this means it doesn't get put in registers.  Newer compilers will look inside the struct, and see if it can be put in registers, register pairs, or the CPU floating point registers. Not only will the user defined complex type then work efficiently, but other user defined structs of paired values will benefit equally.

In a future column I'll examine strings, arrays, and associative arrays as core types and see how they stack up.

Thanks to Andrei Alexandrescu and David Held for their valuable contributions to this article.

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Video