Channels ▼

Embedded Systems

A Universal Cross Assembler

The genasm Function

The soloasm.c program calls a function named genasm twice. The first pass receives an argument of 1 and is a "dry run" so that labels can be assigned correct values. The second pass, unsurprisingly, gets an argument of 2 and causes the assembler to actually fill in the correct values into the memory array.

Who writes genasm? You do! The target processor macros do, actually. So, for example, for the One-Der processor, the ORG macro looks like this:

#define ORG(n) unsigned int genasm(int _solo_pass) { \
   unsigned _solo_add=n;\
   _solo_info.psize=32; \
   _solo_info.begin=n; \
   _solo_info.memsize=0xFFFF; \

This creates the genasm function and also sets some crucial info in some global variables (that all start with _solo). You can probably guess what each variable does, but just in case you can't see Table 1 for the definitions. Note that you can't use ORG more than once. So provides REORG in case you want to change the code generation location after the start. Of course, you also have to have an END directive which closes off the genasm function:

#define END _solo_info.end=_solo_info.end

<b>Name                     Description</b>

_solo_add	   Current code generation address
_solo_info.psize	   Size of a "word"
_solo_info.begin	    Lowest address generated
_solo_info.memsize     Total size of memory space
_solo_info.ary              Array that holds generated code
_solo_info.end             Last address generated

Table 1: Global Variables

The rest of the file is straightforward. Especially for One-Der which only has one instruction. The basic instruction is created with the soloASM macro:

#define soloASM(mask,cond,src,dst) _solo_info.ary[_solo_add++]=((mask)<<29)|((cond)<<26)|
((src)&0x1000)<<13|((dst)&0x1000)<<12 | ((src) & 0xFFF)<<12 | ((dst & 0xFFF))

Of course, that would be awkward to write, so more macros hide that one:

#define MOV(src,dst) soloASM(0,0,src,dst)

Other definitions help make your code easier to read:

#define FCON 0
#define FCON_0 F(FCON,0)
#define FSTK 4
#define FSTK_PUSH F(FSTK,3)
#define PUSH(r) MOV(r,FSTK_PUSH)

The effect is you might write:

push	FCON0    ; push a zero on the stack

The awk script converts this to:


Then the C preprocessor resolves it to:


Which itself resolves to:


That macro puts the right opcode into the array. After genasm returns the second time, the code in soloasm.c emits the array in a form you asked for (Intel hex, raw bytes, etc.). The macros can be arbitrarily complex. For example, the file makes a special provision that if you set the _SOLO_XSYM define (the shell script passes defines) a symbol listing will appear on the stderr stream.

That's basically it. Your assembly code transforms into macros and those macros form a C function that fills in an array that is dumped out by a boilerplate C program. You can use C expressions just about anywhere you like. For example:

	ldi	0xA<<10

You can also pass lines directly to the C compiler by prefixing them with the # sign. This gives you a powerful (though somewhat cumbersome) macro capability:

  ##define CT 5
  # { int i; for (i=3;i<3+CT;i++) {
     LDRIQ i,R(i)
  # } }

The above snippet will load registers 3 to 7 with the values 3 to 7. You can even define new opcodes this way:

##define MOVE MOV
##define CLEAR(r) MOV(FZERO,r)

The downside is you can't use C reserved words for things like labels. This is a small price to pay for the ease of use. Of course, if you can always use uppercase, make the script force everything to uppercase, or adopt a naming convention (like starting all labels with "_") if you really don't want to reserve the C keywords.

If you study the file, you'll have a pretty good idea how to create a different target for nearly any common microcontroller or processor. For example, the online listings include targets for the RCA1802 (the first microprocessor I owned) and the Microchip PIC16F84. Note these CPUs are all very different. One-Der is a 32-bit machine, the 1802 is an 8-bit oldie but goodie, and the PIC uses a 14-bit instruction size (pesky Harvard architecture).

Sure, there are some limitations. Obviously if you stopped forcing opcodes to uppercase you could have trouble with conflicting names with C built in keywords (and labels still have that problem, but it is awkward to force arbitrary input to uppercase). If you wanted to compile really large programs (One-Der could address 4GB of memory although I've never built one with anything over 1MB) you'd need to either accept a big memory footprint on the host or adopt some sparse array technique (maybe make the array pointer a list of function pointers that manipulate the sparse array). But for the jobs I ask the assembler to perform, all of this is of no consequence and I've found it a useful tool. I hope you do too.

Of course, once you have an assembler, the next step is a high level-language. Which is why I wrote The Commando Forth Compiler, a lazy Forth cross compiler that is built on top of the assembler and makes programming the One-Der CPU a breeze.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.