Re: Internal Representation of Strings

"Armel" <armelasselin@hotmail.com>
Mon, 2 Mar 2009 22:58:27 +0100

          From comp.compilers

Related articles
[30 earlier articles]
Re: Internal Representation of Strings DrDiettrich1@aol.com (Hans-Peter Diettrich) (2009-02-24)
Re: Internal Representation of Strings tony@my.net (Tony) (2009-02-25)
Re: Internal Representation of Strings armelasselin@hotmail.com (Armel) (2009-02-26)
Re: Internal Representation of Strings marcov@stack.nl (Marco van de Voort) (2009-02-27)
Re: Internal Representation of Strings tony@my.net (Tony) (2009-02-28)
Re: Internal Representation of Strings cr88192@hotmail.com (cr88192) (2009-03-03)
Re: Internal Representation of Strings armelasselin@hotmail.com (Armel) (2009-03-02)
Re: Internal Representation of Strings tony@my.net (Tony) (2009-03-03)
Re: Internal Representation of Strings hebisch@math.uni.wroc.pl (Waldek Hebisch) (2009-03-05)
Re: Internal Representation of Strings cr88192@hotmail.com (cr88192) (2009-03-06)
| List of all articles for this month |

From: "Armel" <armelasselin@hotmail.com>
Newsgroups: comp.compilers
Date: Mon, 2 Mar 2009 22:58:27 +0100
Organization: les newsgroups par Orange
References: 09-02-051 09-02-068 09-02-078 09-02-120 09-02-125 09-02-134 09-03-001
Keywords: storage, GC
Posted-Date: 03 Mar 2009 12:43:12 EST

Tony wrote:
> "Armel" <armelasselin@hotmail.com> wrote in message
>>> Simply a length and the character data immediately following, probably.
>>> Reallocation in memory is going to have to be done for dynamic strings
>>> of
>>> course/maybe depending on the application. I think that will be where
>>> I'll
>>> start and with a 32-bit length.
>>
>> this is a common implementation, which is rather cool when strings are
>> immutable or in Copy On Write (you need to add a reference count then
>> along the length).
>
> Are you suggesting that it may not be an appropriate implementation for
> mutable strings and a library not using the COW technique?


This is also a good representation for mutable strings, though it
appears that mutable strings are often not a really good idea at
all. it is somewhat more interesting to have the conecept of
StringBuilder (to _update_ strings), and on the other hand String
which _produces_ new String on each operation. The interest is then
that your users are less tempted to make loop of str = str +
<something> and do strbuilder = strbuilder + <something>, then str =
strbuild.String( ) where StringBuilder is optimized for 'many +' (or
many other things such as inserts, removes, replaces, in fact the
generic 'splice' command...)


>> this is indeed _very_ useful to keep a few zero bytes at the end,
>> whenever calling a C-like API, it avoids memory
>> allocation/de-allocation and copy at API call time... malloc/free
>> are extremely time consuming (with respect to simply putting a zero
>> at the end of string), and this need for temporary-zero-ended copy
>> will clearly not be in the "I think of this as a big task" when
>> calling very simple APIs.


> My goal is to get away from all the APIs that use null-terminated strings,
> so I will be replacing all of that. Not needing that null terminator would
> be an indication of success of a string implementation that wished to
> depart from that paradigm.


IMHO, don't get too away from them, they often do well and portable things
you might do less well and less portable.


null terminated strings are as good as other representations in many
cases, though they clearly do not support the "mid of" representation
(i.e. each string being an excerpt of larger strings). though the
excerpt representation may be larger in some cases (as it needs ptr +
start + length) it proved to be very efficient in many projects for
which I developped (it drastically reduces the number of
allocations.. .and failures).


Armel



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.