Rick Hohensee August 2001 -> Jan 2002


Yes, asmacs/shasm/osimpa/osimplay, my assembler-in-Bash, is a joke. It
cracks people up, me especially. It does however have huge non-joke
potential, and I want to point out some things people have missed about
it. Don't stop laughing, but laugh with your eyes open. Randy Hyde's High
Level Assembler serves as a good point of comparison. Some of the ideas in
osimpa aren't at the point yet where I can really express them well. Some
are. All have a good bit of fermentation behind them, and more developed,
but more typical, analogues in HLA.

HLA is also, like the recent osimpa extention to shasm, what I would call
a compembler. Hyde uses the term "mid-level-language" for the same thing.
HLA is somewhere between MASM and C, as is osimpa. HLA and osimpa are both
x86-specific. Beyond that we diverge rapidly, at roughly 180 degrees.

HLA is huge, tested, full of features, for Windows, needs MASM or TASM,
and is itself two compiled C programs I think.  HLA reflects a couple full
Hyde-years, apparently. 

All of osimpa is about two Hohensee-months. The design ideas evident in it
are not a recent arrival, however. Osimpa is Forth influenced, where HLA
seems to be more Pascal/C/C++ influenced, with some nods to Lisp, at least
in the docs.

osimpa is 100% Bash, and will work with minor changes with pdksh. It
requires a recent sh for the "parameter expansions" that do BASIC-like
string chomping and other handy things, and it uses named arrays. You can
probably get a suitable unix shell running on e.g. Cygwin on Windows.
Virtunix has zsh, which is itself probably suitable, but that may be in a
Dos context, which won't work, because osimpa uses 32-bit Bash arithmatic.
Little conveniences like a glossary lister are omitted from osimpa because
the simplest way to do that is external to the shell, with grep. osimpa
itself requires only the shell to make Linux static runnables. It is the
one-link toolchain, like a Forth typically is.

An assembler written in a shell is phenomenally slow, like 100 times
slower than GNU gas. The benefits of a scripted compembler are
significant, however. Let's call osimpa functions in osimpa itself
"wirds". Most osimpa wirds have online help. Here's an example

..................................................................
	
	:; cLIeNUX /dev/tty10  03:58:45   /H3nix/shasm
	:;. osimpa
	ELFstatic=yes
	libo
	:; cLIeNUX /dev/tty10  03:58:48   /H3nix/shasm
	:;
	:;divide h
	
	                                        Intel DIV
	
	38 clocks, no early-out.
..................................................................

So osimpa "divide" is a shell function with help on the h argument or
"switch". Now, note that the divide help isn't very informative. So change
it. You can change it in the time it takes you to edit the change. You
don't like me changing the names of instructions to my own verbose
non-Intel forms? Change that. Not a big deal. Note also that we ran divide
as a regular command. All osimpa wirds are regular shell functions. This
means your shell is in fact the compembler. The flexibility this affords
is not to be discounted. For example, having already done   .  osimpa...
.................................................................

:;type divide
divide is a function
divide ()
{
    if test "$1" = "h"; then
        echo -e "\n\t\t\t\t\tIntel DIV\n
38 clocks, no early-out.
                        ";
    else
        herelist;
        parse $*;
        ab 0xf7;
        ao 36${register[$source*2+0]};
        opnote divide $*;
    fi
}
..................................................................

Bash does a nice 4-char indent, and adds some semi-colons. 

So a script has advantages, particularly for prototyping. Having done a
bit of coding now _in_ osimpa, It's level of interactivity is reminiscent
of Forth, i.e. excellent. For many things, fast is human-speed, which is 
easily attained in the shell.

osimpa is about the machine you are programming, not the "language" you
are using to do so. This means that an osimpa in cpp/gas is just as much
an osimpa as the Bash one. It means that the syntax of osimpa is mostly
the syntax of the language it is implemented in. The emphasis is on
semantics of opcodes. This is the Forth influence showing through. Syntax
does very little net work, particularly vis-a-vis assembly.

Toward a portable assembler

HLA doesn't talk about portability much. One of the reasons for the
shasm/asmacs renaming of all the Intel opcode names is to investigate the
possibilities of a portable assembler. Assembler directives like .ascii
are portable. Opcodes are another matter, but there's some play there
also. It does appear that a portable subset of most desktop CPUs is
possible, and might be featureful enough to be practical. C is considered
a portable language, but portability is subjective. C needs
machine-specific sources to be useful. Much of that gets shunted off to
libc. 

The 386 is a subset of other comparable CPUs as far as registers, so
shasm's A, B, C, D, SP, BP, SI and DI probably port easily, perhaps as A,
B, C, D, E, SP, I, O. 386 addressing modes are a superset of RISC, but if
the fanciest you use is "displacement + register" you probably don't break
much. A 386 only does one memory access per instruction. HLA allows "mov
memloc to memloc", as does the Plan 9 386 assembler. This is compembly,
more than one instruction being assembled for one assembler command. This
is a case of compembly for the 386 being mere assembly for a CISC like the
68k. It's not a big jump, nor is it hard to implement. The 386 addressing
modes are quite powerful otherwise. One can compemble RISC up to 386's
"dispacement + register + register at shift", or live without that on 386
if you want that aspect of your code to port verbatim to other machines.

386 does not however have indexed addressing in the general way other CPUs
(derived from the PDP I suspect) have post-indexed and pre-indexed. The
386 has "string ops" and a direction flag. A library will provide the bulk
of what you want that behavior for, so that a liba or libo will help to
reduce inter-machine code breakage. "fill", "move" and a couple similar
routines will abstract most of that out of the core of the thing, as in C.
386 IO port ops are portablizable also, but I don't need that yet, and
that involves some work.

There's other things that lean osimpa toward portability, such as the
"byte" operand specifier, using byte/dual/quad integer size names rather
than names based on 8086 16 bit "words", to/from/with arg format, and so
on. Leaning away, def procedures use the 386's RET imm16 to drop the stack
frame. That's also compembleable on other machines that don't have it in
one instruction.

procedures

HLA provides C-style function calls. The arguments to a function are
pushed onto the stack last-arg-pushed-first. The other prevalent format is
the opposite, first-argument-pushed-first. Actually, I don't know which
order HLA uses, but it's one or the other. osimpa is more similar to BCPL,
the predecessor of C. The BCPL compiler allocates the space for a
procedure's locals, it's stack frame, in the caller, but things aren't
always pushed into the frame. Sometimes a procedure needs a value but
doesn't need a copy of it, i.e. sometimes it's read-only. The caller of an
osimpa "def" procedure allocates it a 16-cell frame, and nothing is pushed
by the caller. The callee, now "self", can access the locals of the
caller. This is normally made impossible by something like a C compiler,
and for good reason, but a simple bit of discretion by the programmer can
provide more flexibility at what in some cases may provide better
performance. As long as you don't write to the parent locals, things stay
in order, and you get copy-on-write parameter passing implicitly, by hand.  
Also, since all procedures have the same size frame, you have multi-level
scoping at a simple constant offset from self. osimpa therefor provides
macros for refering to locals near the current procedure's frame in a
self/parent/child layout. What osimpa gives up is about 45 dedicated names
for locals, and the ability to have more than 15 cells of local data for
a procedure. I like the trade-offs to this a lot. My osimpa-like version of
Ackerman's function at
http://www.bagley.org/~doug/shootout/bench/ackermann/alt/ackermann.gas
beats Gcc (but not ocaml), and is coded more or less as one would code 
a first cut of the given algorithm. I beat Gcc without removing the two
tail-recursions that Gcc does remove.

Nov 2001

It now looks like recursion is one of the only places where full-blown
reentrancy of procedures is really useful. That is, the cost of
instantiating copies of subroutine data is only necessary when routines
call themselves. That means that I'm writing string ops routines like
"bytescan" "copyrange" and so on using in-lining rather than subroutines,
and normal low-level register-allocation-by-hand. This should benefit
performance. In-lining is a size hit, but written correctly most useful
routines are just a handful of instructions.

strings

HLA has it's own string format. Strings in osimpa are similar, but are one
case of a nested set of data structures called strands, which are
collectively a variety of array-like structures. osimpa strings have a
count-cell prefix, which means they aren't strings per se, but rather
arbitrary-size ranges of arbitrary data, within the limits of the memory
address-space of the machine. A "range" in osimpa is a counted range of
memory, and the "text" wird makes a range and fills it with ASCII, just as
you write it. For expanses of text with no inserted variables it's more
convenient than C's printf().

flow controls

HLA provides various non-GOTO things such as WHILE constructs and so on.
The only intra-procedure flow control osimpa provides is jump tables, or
execution arrays. These are called xrays in osimpa. I suspect they have
performance benefits versus C switch/case and similar, but I have no
examples yet. Even if a C compiler can figure out that a switch/case is
bounded and can be made into a table, the incredible sophistication that
requires is avoided in osimpa. Like Forth historically, the internals
remain relatively simple. 

The key to maintainable code is not the avoidance of GOTO, it is keeping
procedures small and/or understandable. assembly-style conditional
branches have a big advantage over the high-level equivalents; they are
unambiguous. The code may or may not be clear, but what an individual
branch does is unmistakeable. Keep in mind that e.g.

per_cell_that_isnt_a_comma

is a valid branch target label name.

ease of use

The accepted idea about high level languages is that they are about a
decimal order of magnitude easier to write code in than assembly.
Considering how "easy" C is, or Forth, or anything else that gets within a
mile of hand-coded assembly, they must be making that comparison using
some very bad assemblers. The Intel names for the 386 opcodes are for the
most part horrid, for example. LOOP for example is actually simply
fundamentally misleading. 386 LOOP has no problem doing forward branches,
which aren't loops. LOOP is now "when C-1" in osimpa. Simple things like
that make things much more productive, in my experience. Writing H3sm
using the asmacs macros that subsequently became shasm was not much harder
than C, if at all. With a few other improvements, and perhaps a degree of
portability, a reassessment of the relative merits of high and mid-level 
languages may hold some substantive surprises.
