[mu TECH] byte #1767 or piping /usr/bin/hexd

From: Alfie Costa (agcosta@gis.net)
Date: Fri Mar 31 2000 - 20:50:48 CEST


For any code detectives and programmer gumshoes out there, a strange and
unsolved mystery concerning pipes and redirects...

Wandering around the mu file tree, I discovered '/usr/bin/hexd'. 'hexd' reads
in binary files and outputs hex dumps; it also converts its own hex dumps back
to binary files. It's used by the 'muhex' editor. I meant to get back to
improving that rustic 'file' script, and 'hexd' seemed perfect, as it would be
a nice tool to avoid some problems that ash's 'case' statement has with binary
data.

Now 'hexd' didn't have the two features I wanted -- to output only hex data or,
only text data. MuLinux comes with 'hexd.c' though, so I tweaked that and
added new command-line switches to conveniently include these features. Also
renamed some variables, re-indented the brackets, and details like that.

My new code seemed to work, and was almost ready to be uploaded to this list.
First I did some tests though, just to make sure.

------

A reminder on the 'hexd' syntax may help. 'hexd' reads from standard input and
writes to standard output. So, to produce a hex dump of '/bin/sed', one would
type:

hexd -c < /bin/sed

...and the first line of output looks like this:

000000: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 02 [.ELF.............]

Once such a hex dump is written to a file, (say 'sed.hexdump'), this
next command will translate it back to binary:

hexd -d < sed.hexdump > sed.hexcode

...and so 'sed.hexcode' should be identical to '/bin/sed'. Well, that's the
'hexd' syntax.

------

An obvious test was to feed the new hexd's input into its output, like this:

myhexd -c < /bin/sed | myhexd -d > sed.hexcode

...and then see if the files were the same, which I wasn't sure how to do in mu
so I used 'ls', which isn't really much of a test, but it does at least tell if
files aren't the same length which is all that matters to appreciate the
following problem. So...

ls -l /bin/sed sed.hexcode

...displays...

-rwxr-xr-x 1 root 88 21512 Jun 6 1999 /bin/sed*
-rw-rw-r-- 1 root root 21401 Mar 30 18:52 sed.hexcode

No, these two files aren't the same length. Then I tried the same test using
the original mu 'hexd', and surprisingly, that did the same thing. It seemed
odd this hadn't already been discovered, more about which below...

Comparing hex dumps of both files showed that 'sed.hexcode' began to deviate at
byte #1767. One byte was missing at that address.

Testing other files, it seems that if the data file being tested was shorter
than 1767 bytes, the second file would be same length as the first. Whether
the file is text or binary doesn't effect this; the two files always start to
differ at byte #1767, or 0x6E7, or octal 03347.

So, I closely inspected the source code, as the 'decode()' routine in 'hexd.c'
did look tricky. Tried many experiments, but got no improvements. And why
always byte #1767?

Later on I tried the same experiment one piece at a time, like this...

hexd -c < /bin/sed > sed.hexdump
hexd -d < sed.hexdump > sed.hexcode
ls -l /bin/sed sed.hexcode

...and so...

-rwxr-xr-x 1 root 88 21512 Jun 6 1999 /bin/sed*
-rw-rw-r-- 1 root root 21512 Mar 30 19:09 sed.hexcode

Now both files are the same, or at least they're the same length. Assuming
they both really are the same, this explains why the problem hadn't been
discovered before, and why 'muhex' hasn't ruined any files.

I thought it might be a problem with mu, and so tried it with Debian. Debian
was no different.

Summing up: apparently, it makes a difference if something is piped to 'hexd'
with "|", or written to a file first. Why does this happen?

---------------------------------------------------------------------
To unsubscribe, e-mail: mulinux-unsubscribe@sunsite.auc.dk
For additional commands, e-mail: mulinux-help@sunsite.auc.dk



This archive was generated by hypermail 2.1.6 : Sat Feb 08 2003 - 15:27:13 CET