# What is a Concatenative Language

Recently there has been some discussion on the concatenative discussion group about the term "concatenative language" and what it actually means. In this post I provide my definition and attempt to deconstruct it.

The canonical example of a concatenative language is the Joy programming language by Manfred von Thun. I consider my language Cat to be concatenative as well. Factor by Slava Pestov is another language that describes itself as concatenative. For more examples of concatenative languages see the concatenative wiki and Wikipedia.

One problem that we have in the concatenative community currently has is lack of a rigorous definition. This makes it hard to determine whether a given language is concatenative or not, and makes any formal study difficult. The following is my attempt to provide a rigorous definition for a concatenative programming language:

"A concatenative programming language is language in which terms correspond to functions and in which the juxtaposition of terms denotes the composition of functions.".

For those readers not scared away yet, allow me to elaborate. A term is a valid and complete syntactic phrase that can be generated from the concrete (syntactic) grammar of a language. For example in Scheme "(f a 5)" is a term, as is "a" or "5" or "f". However, "(f" is not a term. Juxtaposition is just a fancy way of saying "two terms side by side". A concatenative language differs from the functional programming language paradigm where terms correspond to values (including functions) and the fundamental operation is function application. In the Lambda calculus for example -- which is the basis for functional programming -- all terms correspond to values, function application, or function abstraction (i.e. lambda expressions). However, in a concatenative language all terms correspond to functions on a tuple (e.g. a single stack, a pair of stacks, a pair of stacks plus a dictionary, or even a deque if you are really masochistic), or the composition of functions (which yields a new function, so is really a function).

So for example a literal term such as "42" in Joy or Cat is in fact a function that maps a stack to a new stack that is a copy of the original with the value 42 added to the top. For most practical purposes a programmer may think of the term "42" as being equivalent to the value "42". This in fact aids computations, but it can confuse the theory a bit. To have a correct and formal understanding of programming languages we have to understand the term/value distinction. IN very down-to-earth terms: the operation that pushes a value on the stack is different from the value that is on the stack.

At this point, some people may say, hey what about quotations? In Joy and Cat a quotation is a function that yields a new stack with a function on the top. This is consistent with what I have been saying.

While in theory a concatenative language does not require a stack, in practice most concatenative languages are stack-based, and at the same time most stack-based languages can be modeled formally as a concatenative language.

An interesting side note: it is not strictly neccessary to evaluate a concatenative language like Joy or Cat using a stack, it is simply convenient. One could for example devise an evaluator (e.g. an interpreter) that used term rewriting.

In practical terms a concatenative language is not really different from a functional language, except that there is less nesting of functions. Rather than writing

(f0 (f1 (f2 ... (fn x) ...))

We could write:

x fn ... f2 f1 f0

One of the interests of concatenative languages is that it is a formal computation model that closely models the actual processor of a lot of computers. It also corresponds nicely to both imperative and functional reasoning about code.

One interesting property of some concatenative languages that has me particularly interested is that we can replace any referentially transparent sub-sequence of terms with a new function that is defined as that sub-sequence. For example "a b c d a b c" can be replaced with "f d f" where "f" is defined as "a b c". This make automated code refactoring and analysis much easier. Hence the moniker chosen by Slava for his language Factor. This is also why I am actively studying the usage of concatenative languages for code size optimization (let me know if this interests you, and I'll tell you more).

I think that it may benefit the community to distinguish between pure point-free concatenative languages (e.g. those with no environment and control structures such as Joy and Cat) and those with an explicit environment (e.g. Postscript and Forth), as is done in the functional community.

Hopefully this blog entry didn't get too esoteric for my readers. Maybe I'll write about some neat C++ hacks next time. :-)