Re: Internal Representation of Strings

anton@mips.complang.tuwien.ac.at (Anton Ertl)
Sat, 14 Feb 2009 17:51:29 GMT

          From comp.compilers

Related articles
Internal Representation of Strings tony@my.net (Tony) (2009-02-14)
Re: Internal Representation of Strings mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2009-02-14)
Re: Internal Representation of Strings haberg_20080406@math.su.se (Hans Aberg) (2009-02-14)
Re: Internal Representation of Strings DrDiettrich1@aol.com (Hans-Peter Diettrich) (2009-02-14)
Re: Internal Representation of Strings marcov@stack.nl (Marco van de Voort) (2009-02-14)
Re: Internal Representation of Strings anton@mips.complang.tuwien.ac.at (2009-02-14)
Re: Internal Representation of Strings cfc@shell01.TheWorld.com (Chris F Clark) (2009-02-14)
Re: Internal Representation of Strings lkrupp@pssw.nospam.com.invalid (Louis Krupp) (2009-02-14)
Re: Internal Representation of Strings cr88192@hotmail.com (cr88192) (2009-02-16)
Re: Internal Representation of Strings tony@my.net (Tony) (2009-02-15)
Re: Internal Representation of Strings DrDiettrich1@aol.com (Hans-Peter Diettrich) (2009-02-16)
Re: Internal Representation of Strings bartc@freeuk.com (Bartc) (2009-02-16)
[28 later articles]
| List of all articles for this month |
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.compilers
Date: Sat, 14 Feb 2009 17:51:29 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
References: 09-02-051
Keywords: storage
Posted-Date: 14 Feb 2009 16:52:10 EST

"Tony" <tony@my.net> writes:
>What are some good ways/concepts of internal string representation?
>Are/should string literals, fixed-length strings and dynamic-lenght strings
>handled differently? My first tendency is to avoid like the plague
>NUL-terminated strings (aka, C strings) and to opt for some kind of array
>with a length at the beginning followed by the characters that could be
>encapsulated at the library level with appropriate functions.


In Forth several different string representations have been used, but
among them the representation as start+length (both in the descriptor)
turned out to be the most flexible, and therefore won out. The
advantages are:


+ all characters can be represented (unlike 0-terminated strings).


+ the length is only limited by the size of the length (which is a
full machine word in Forth, i.e., you can use the whole address
space).


+ it's easy to create a substring of any kind without needing to copy
characters.


+ it's easier to parallelize operations on strings in this
representation than in some others (particularly 0-terminated
strings).


The disadvantage is that you have to pass two values around. Also, if
you use garbage collection, you may not want to make use of the third
advantage.


> But just a length seems like not enough information: the capacity
>(array length) also would be nice to have around.


You need to differentiate between string values (and I wrote about
them above) and string buffers (where you store the character data of
strings). String buffers need the capacity information, string values
don't.


- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.