The genasm Function
The soloasm.c program calls a function named genasm twice. The first pass receives an argument of 1 and is a "dry run" so that labels can be assigned correct values. The second pass, unsurprisingly, gets an argument of 2 and causes the assembler to actually fill in the correct values into the memory array.
Who writes genasm? You do! The target processor macros do, actually. So, for example, for the One-Der processor, the ORG macro looks like this:
#define ORG(n) unsigned int genasm(int _solo_pass) { \
unsigned _solo_add=n;\
_solo_info.psize=32; \
_solo_info.begin=n; \
_solo_info.memsize=0xFFFF; \
_solo_info.ary=malloc(_solo_info.memsize)
This creates the genasm function and also sets some crucial info in some global variables (that all start with _solo). You can probably guess what each variable does, but just in case you can't see Table 1 for the definitions. Note that you can't use ORG more than once. So soloasm.inc provides REORG in case you want to change the code generation location after the start. Of course, you also have to have an END directive which closes off the genasm function:
#define END _solo_info.end=_solo_info.end
<b>Name Description</b> _solo_add Current code generation address _solo_info.psize Size of a "word" _solo_info.begin Lowest address generated _solo_info.memsize Total size of memory space _solo_info.ary Array that holds generated code _solo_info.end Last address generated
The rest of the file is straightforward. Especially for One-Der which only has one instruction. The basic instruction is created with the soloASM macro:
#define soloASM(mask,cond,src,dst) _solo_info.ary[_solo_add++]=((mask)<<29)|((cond)<<26)| ((src)&0x1000)<<13|((dst)&0x1000)<<12 | ((src) & 0xFFF)<<12 | ((dst & 0xFFF))
Of course, that would be awkward to write, so more macros hide that one:
#define MOV(src,dst) soloASM(0,0,src,dst)
Other definitions help make your code easier to read:
#define FCON 0 #define FCON_0 F(FCON,0) #define FSTK 4 #define FSTK_PUSH F(FSTK,3) #define PUSH(r) MOV(r,FSTK_PUSH)
The effect is you might write:
push FCON0 ; push a zero on the stack
The awk script converts this to:
PUSH(FCON0);
Then the C preprocessor resolves it to:
MOV(FCON0,FSTK_PUSH);
Which itself resolves to:
soloASM(0,0,0,0x304);
That macro puts the right opcode into the array. After genasm returns the second time, the code in soloasm.c emits the array in a form you asked for (Intel hex, raw bytes, etc.). The macros can be arbitrarily complex. For example, the soloasm.inc file makes a special provision that if you set the _SOLO_XSYM define (the shell script passes defines) a symbol listing will appear on the stderr stream.
That's basically it. Your assembly code transforms into macros and those macros form a C function that fills in an array that is dumped out by a boilerplate C program. You can use C expressions just about anywhere you like. For example:
ldi 0xA<<10
You can also pass lines directly to the C compiler by prefixing them with the # sign. This gives you a powerful (though somewhat cumbersome) macro capability:
##define CT 5
# { int i; for (i=3;i<3+CT;i++) {
LDRIQ i,R(i)
# } }
The above snippet will load registers 3 to 7 with the values 3 to 7. You can even define new opcodes this way:
##define MOVE MOV ##define CLEAR(r) MOV(FZERO,r)
The downside is you can't use C reserved words for things like labels. This is a small price to pay for the ease of use. Of course, if you can always use uppercase, make the script force everything to uppercase, or adopt a naming convention (like starting all labels with "_") if you really don't want to reserve the C keywords.
If you study the soloasm.inc file, you'll have a pretty good idea how to create a different target for nearly any common microcontroller or processor. For example, the online listings include targets for the RCA1802 (the first microprocessor I owned) and the Microchip PIC16F84. Note these CPUs are all very different. One-Der is a 32-bit machine, the 1802 is an 8-bit oldie but goodie, and the PIC uses a 14-bit instruction size (pesky Harvard architecture).
Sure, there are some limitations. Obviously if you stopped forcing opcodes to uppercase you could have trouble with conflicting names with C built in keywords (and labels still have that problem, but it is awkward to force arbitrary input to uppercase). If you wanted to compile really large programs (One-Der could address 4GB of memory although I've never built one with anything over 1MB) you'd need to either accept a big memory footprint on the host or adopt some sparse array technique (maybe make the array pointer a list of function pointers that manipulate the sparse array). But for the jobs I ask the assembler to perform, all of this is of no consequence and I've found it a useful tool. I hope you do too.
Of course, once you have an assembler, the next step is a high level-language. Which is why I wrote The Commando Forth Compiler, a lazy Forth cross compiler that is built on top of the assembler and makes programming the One-Der CPU a breeze.


