Re: Pascal vs C style string ?

nandu@cs.clemson.edu
Wed, 29 Jun 1994 12:53:52 GMT

          From comp.compilers

Related articles
[9 earlier articles]
Re: Pascal vs C style string ? eru@tele.nokia.fi (Erkki Ruohtula) (1994-06-28)
Re: Pascal vs C style string ? andrew@cee.hw.ac.uk (1994-06-28)
Re: Pascal vs C style string ? jhallen@world.std.com (1994-06-28)
Re: Pascal vs C style string ? larryr@pa.dec.com) (1994-06-28)
Re: Pascal vs C style string ? boehm@parc.xerox.com (1994-06-28)
Re: Pascal vs C style string ? cjmchale@dsg.cs.tcd.ie (1994-06-29)
Re: Pascal vs C style string ? nandu@cs.clemson.edu (1994-06-29)
Re: Pascal vs C style string ? Theo.Norvell@comlab.oxford.ac.uk (1994-06-30)
Re: Pascal vs C style string ? guerin@IRO.UMontreal.CA (1994-06-30)
Re: Pascal vs C style string ? synaptx!thymus!daveg@uunet.uu.net (Dave Gillespie) (1994-06-30)
Re: Pascal vs C style string ? nickh@harlequin.co.uk (1994-07-01)
Re: Pascal vs C style string ? mps@dent.uchicago.edu (1994-07-05)
| List of all articles for this month |

Newsgroups: comp.compilers
From: nandu@cs.clemson.edu
Keywords: C, Pascal, design
Organization: Compilers Central
References: 94-06-175 94-06-240
Date: Wed, 29 Jun 1994 12:53:52 GMT

!> One hack around that could be to encode the zero byte as
!> zero-zero bytes. The decoding routine identifies consecutive
!
!this is a pretty bad hack. not only does it require that you
!possibly look beyond your allocated space. it also prevents such
!practice as allocating memory and filling it with 0 and then
!putting a null-term string in it. now you have "abc\0\0\0\0"
!so where is the end? or what is the length?
!.....larry


Not really. First of all, string constants cannot be specified the way you
have above. As mentioned earlier, both the _encoding_ and the _decoding_
mechanisms must ensure that \0 that are part of the string should occur in
pairs and the end of string should be a solitary \0. By defining a string
constant as "abc\0\0\0\0" (instead of "abc\0\0\0\0\0"), you are violating
the above rule. Hence the end of string and the length cannot be
determined from your example. From this point on, it is essential that the
existence of an _implicit_ end-of-string marker is not taken for granted.


      ENCODING:


The encoders include the programmer (who define string constants) and
string operations like copy and concatenate that create new strings.
Since an implicit end-of-string marker is absent, all string _must_ be
explicitly terminated by a solitary \0. The only modifications that need
be made in the copy and concatenate operations is that consecutive \0\0 be
used for legal \0 characters. The disadvantage is that it is necessary to
look one character beyond the end of string to recognize the end-of-string
and the additional space to duplicate \0 characters. But once we have
decided to hack, it is just routine to add another hack to allocate one
extra character at the end of strings that can contain anything (does not
matter since this will only be used to confirm that the solitary \0 is
indeed the end of string marker) other than a \0.


      DECODING:


The decoders include string operations that use strings (again the copy
and concatenate operations and the print operation). Invariants need be
maintained similar to above.


Choosing between this scheme and maintaining an explicit string length
depends upon the general length of the strings and the frequency of
occurrance of legal \0 characters. If the length of all the strings in the
application can be represented in a byte, then maintaining a string length
just requires a byte overhead per string. No matter what it contains. On
the other hand, in real applications, text editors for example, strings
are typically longer than what can be encoded in a byte. So each string
has two or more bytes overhead, even if it does not contain any \0
characters within.


Encoding \0 _always_ incurs a byte overhead per string for the last non \0
character. Further overhead arises only in cases where \0 has to be
duplicated. Clearly, encoding is a win (over the one that maintains
lengths) in applications that handle long strings and where _legal_ \0
characters are rare.
--
Nandakumar Sankaran, G34 Jordan, Comp. Sci. Dept., Clemson Univ. SC 29634
311-8 Old Greenville Hwy. Clemson SC 29631-1651 (803)653-7749
http://www.cs.clemson.edu/~nandu/nandu.html nandu@cs.clemson.edu
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.