Related articles |
---|
Internal Representation of Strings tony@my.net (Tony) (2009-02-14) |
Re: Internal Representation of Strings mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2009-02-14) |
Re: Internal Representation of Strings haberg_20080406@math.su.se (Hans Aberg) (2009-02-14) |
Re: Internal Representation of Strings DrDiettrich1@aol.com (Hans-Peter Diettrich) (2009-02-14) |
Re: Internal Representation of Strings marcov@stack.nl (Marco van de Voort) (2009-02-14) |
Re: Internal Representation of Strings anton@mips.complang.tuwien.ac.at (2009-02-14) |
Re: Internal Representation of Strings cfc@shell01.TheWorld.com (Chris F Clark) (2009-02-14) |
Re: Internal Representation of Strings lkrupp@pssw.nospam.com.invalid (Louis Krupp) (2009-02-14) |
Re: Internal Representation of Strings cr88192@hotmail.com (cr88192) (2009-02-16) |
Re: Internal Representation of Strings tony@my.net (Tony) (2009-02-15) |
Re: Internal Representation of Strings DrDiettrich1@aol.com (Hans-Peter Diettrich) (2009-02-16) |
[29 later articles] |
From: | Marco van de Voort <marcov@stack.nl> |
Newsgroups: | comp.compilers |
Date: | Sat, 14 Feb 2009 17:57:28 +0000 (UTC) |
Organization: | Stack Usenet News Service |
References: | 09-02-051 |
Keywords: | storage |
Posted-Date: | 14 Feb 2009 16:51:28 EST |
On 2009-02-14, Tony <tony@my.net> wrote:
> What are some good ways/concepts of internal string representation?
> Are/should string literals, fixed-length strings and dynamic-lenght strings
> handled differently? My first tendency is to avoid like the plague
> NUL-terminated strings (aka, C strings) and to opt for some kind of array
> with a length at the beginning followed by the characters that could be
> encapsulated at the library level with appropriate functions. But just a
> length seems like not enough information: the capacity (array length) also
> would be nice to have around. All thoughts, old and novel, welcome.
Have a look at Delphi stringtypes, most notably the ansistring type.
- String is a first class type.
- pointer to first char of char array.
- length and ref count before first char (negative offset of pointer)
- the capacity part is not there, but part of the memory manager system.
- while it has a length, it is also double #0 terminated, so for read
purposes can be passed to C code.
Literals are encoded with the same layout
([length] [ref count[ [length bytes chardata] #0#0 ) but have refcount -1.
This makes copy on write schemes possible.
D2009 afaik extends this to also
- a codepage (which can also be UTF-8 or 16)
- a granularity value (now 1 or 2), that specifies the granularity of the
encoding.
However I'm not that deep into the unicode extensions.
Return to the
comp.compilers page.
Search the
comp.compilers archives again.