Re: Pascal vs C style string ?

Theo.Norvell@comlab.oxford.ac.uk (Theo Norvell)
Thu, 30 Jun 1994 14:28:21 GMT

          From comp.compilers

Related articles
[10 earlier articles]
Re: Pascal vs C style string ? andrew@cee.hw.ac.uk (1994-06-28)
Re: Pascal vs C style string ? jhallen@world.std.com (1994-06-28)
Re: Pascal vs C style string ? larryr@pa.dec.com) (1994-06-28)
Re: Pascal vs C style string ? boehm@parc.xerox.com (1994-06-28)
Re: Pascal vs C style string ? cjmchale@dsg.cs.tcd.ie (1994-06-29)
Re: Pascal vs C style string ? nandu@cs.clemson.edu (1994-06-29)
Re: Pascal vs C style string ? Theo.Norvell@comlab.oxford.ac.uk (1994-06-30)
Re: Pascal vs C style string ? guerin@IRO.UMontreal.CA (1994-06-30)
Re: Pascal vs C style string ? synaptx!thymus!daveg@uunet.uu.net (Dave Gillespie) (1994-06-30)
Re: Pascal vs C style string ? nickh@harlequin.co.uk (1994-07-01)
Re: Pascal vs C style string ? mps@dent.uchicago.edu (1994-07-05)
| List of all articles for this month |

Newsgroups: comp.compilers
From: Theo.Norvell@comlab.oxford.ac.uk (Theo Norvell)
Keywords: C, Pascal, design
Organization: Oxford University Computing Laboratory, UK
References: 94-06-214 94-06-220
Date: Thu, 30 Jun 1994 14:28:21 GMT

<nandu@cs.clemson.edu> wrote:
> One hack around that could be to encode the zero byte as
> zero-zero bytes. The decoding routine identifies consecutive
> zero-zero bytes as the encoding of a single legal zero byte and
> understands that a singly occuring zero byte is actually the end of
> string marker. I believe this hack is used in transmitting packets


Stefan Monnier <monnier@di.epfl.ch> writes:
>How convenient. That reminds me of the ugly hacks due to ms-dos's
>encoding of end-of-line as 'CR-LF'. The encoding makes sense, ...


Here I have to disagree. The suggestion of encoding a single NUL byte as
two consecutive NUL bytes does not make sense. One generally does not
have control over the byte that follows the end marker of your string.
Should that byte happens to be a NUL, then the end of a string will be
misinterpreted as an encoded NUL byte. In other words this encoding is a
time bomb. Changing the convention for marking the end of string to
something else --say a NUL followed by something other than a NUL-- would
make this merely an ugly hack.


Lest one thinks that silly encodings are unlikely to make it into
production, consider CDC's NOS operating system. In one of the common
file formats, the NUL byte (6 bit bytes) was used to encode a colon,
except that two NUL bytes in a row in positions 10k+8 and 10k+9 (for some
k) encoded an end of line. (Line lengths were thus forced to be 8 mod
10.) Reportedly prominent computer scientist on being told of this
suggested that a good talk at a software reliability conference would
consist of the speaker describing this file format and then simply
allowing the audience to ponder its possible consequences for the
remainder of his or her time period.


Theo Norvell
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.