Re: Internal Representation of Strings

"Bartc" <bartc@freeuk.com>
Sun, 22 Feb 2009 18:39:41 GMT

          From comp.compilers

Related articles
[18 earlier articles]
Re: Internal Representation of Strings cr88192@hotmail.com (cr88192) (2009-02-21)
Re: Internal Representation of Strings tony@my.net (Tony) (2009-02-21)
Re: Internal Representation of Strings idbaxter@semdesigns.com (Ira Baxter) (2009-02-21)
Re: Internal Representation of Strings cr88192@hotmail.com (cr88192) (2009-02-22)
Re: Internal Representation of Strings DrDiettrich1@aol.com (Hans-Peter Diettrich) (2009-02-22)
Re: Internal Representation of Strings DrDiettrich1@aol.com (Hans-Peter Diettrich) (2009-02-22)
Re: Internal Representation of Strings bartc@freeuk.com (Bartc) (2009-02-22)
Re: Internal Representation of Strings scooter.phd@gmail.com (Scott Michel) (2009-02-22)
Re: Internal Representation of Strings cr88192@hotmail.com (cr88192) (2009-02-23)
Re: Internal Representation of Strings marcov@stack.nl (Marco van de Voort) (2009-02-23)
Re: Internal Representation of Strings haberg_20080406@math.su.se (Hans Aberg) (2009-02-23)
Re: Internal Representation of Strings tony@my.net (Tony) (2009-02-24)
Re: Internal Representation of Strings DrDiettrich1@aol.com (Hans-Peter Diettrich) (2009-02-24)
[9 later articles]
| List of all articles for this month |

From: "Bartc" <bartc@freeuk.com>
Newsgroups: comp.compilers
Date: Sun, 22 Feb 2009 18:39:41 GMT
Organization: Compilers Central
References: 09-02-051 09-02-068 09-02-078 09-02-084 09-02-090 09-02-105
Keywords: storage
Posted-Date: 22 Feb 2009 18:36:07 EST

"Tony" <tony@my.net> wrote in message news:09-02-105@comp.compilers...
> "Bartc" <bartc@freeuk.com> wrote in message
>> "Bartc" <bartc@freeuk.com> wrote in message
>>
>>> I'm thinking of the following representation for short strings 2 to 256
>>> characters, designed for use as array and record elements.
...
>>> [This sounds awfully complicated for an in-memory design. Why not just
>>> use a four byte length and code more compactly if needed on I/O. -John]
>>
>> I use a 4-byte length in other places, but where a string is short, it
>> does seem attractive to use all available bytes and not waste one for
>> a length or terminator (and if I wanted up to 8 useable characters,
>> that would mean using an odd 9-byte field).


>> [In a world where laptops have a gigabyte of RAM, what's the point in
>> trying to save a few bytes with structures in memory? -John]
>
> What if you make every item in a parse tree contain a string. Those
> strings
> are likely to be very small, a lot of one-character strings. It just seems
> like low overhead strings always have a place. (No, I haven't built a
> compiler, yet).


> [Let's say you have a gigantic parse tree with 10,000 nodes. That means
> you'd have 40K of length words. Who cares? -John]


With my 8-char example, adding a 9th length or terminator byte means
12.5% extra memory. If that needs to be 4 bytes for alignment reasons,
or a 4-byte length is used, that means 50% more memory.


If the extra byte or 4 bytes pushes a record from say 32 bytes to 64
bytes, where 2^N sizes are important, then that's 100% more memory.


These 12.5% to 100% extra memory requirements can be quite
substantial. If you have hundreds of millions of these objects, even
your 1GB laptop can have problems.


In my language use of such a packed string would be a programmer's
choice, since dealing with it is a bit more fiddly.


(However my language can also use up to 16 bytes to store a 4-byte int
-- when part of variant type. That's why it might be necessary to
economise elsewhere...)
--
Bartc


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.