Re: Internal Representation of Strings

"cr88192" <cr88192@hotmail.com>
Thu, 19 Feb 2009 07:52:21 +1000

          From comp.compilers

Related articles
[11 earlier articles]
Re: Internal Representation of Strings bartc@freeuk.com (Bartc) (2009-02-16)
Re: Internal Representation of Strings wclodius@lost-alamos.pet (2009-02-16)
Re: Internal Representation of Strings ArarghMail902@Arargh.com (2009-02-17)
Re: Internal Representation of Strings bartc@freeuk.com (Bartc) (2009-02-18)
Re: Internal Representation of Strings tony@my.net (Tony) (2009-02-18)
Re: Internal Representation of Strings tony@my.net (Tony) (2009-02-18)
Re: Internal Representation of Strings cr88192@hotmail.com (cr88192) (2009-02-19)
Re: Internal Representation of Strings cr88192@hotmail.com (cr88192) (2009-02-21)
Re: Internal Representation of Strings tony@my.net (Tony) (2009-02-21)
Re: Internal Representation of Strings idbaxter@semdesigns.com (Ira Baxter) (2009-02-21)
Re: Internal Representation of Strings cr88192@hotmail.com (cr88192) (2009-02-22)
Re: Internal Representation of Strings DrDiettrich1@aol.com (Hans-Peter Diettrich) (2009-02-22)
Re: Internal Representation of Strings DrDiettrich1@aol.com (Hans-Peter Diettrich) (2009-02-22)
[16 later articles]
| List of all articles for this month |

From: "cr88192" <cr88192@hotmail.com>
Newsgroups: comp.compilers
Date: Thu, 19 Feb 2009 07:52:21 +1000
Organization: albasani.net
References: 09-02-051 09-02-086
Keywords: storage, functional
Posted-Date: 18 Feb 2009 17:13:52 EST

"William Clodius" <wclodius@lost-alamos.pet> wrote in message
> Tony <tony@my.net> wrote:
>
>> What are some good ways/concepts of internal string representation?
>> Are/should string literals, fixed-length strings and dynamic-lenght
>> strings
>> handled differently? <snip>
>
> The "best" string representations depends on the nature of your language
> and its applications. The forms you describe are largely array based,
> and are typical of imperative languages. They have the advantage of a
> compact form, reducing memory usage, and ease of access to the
> components of the string. Such languages also rely on arrays and provide
> the infrastructure for manipulating arrays. Functional languages will
> typically use a list of characters. This has greater flexibility in
> string construction and modification, and uses the primary data
> structure of such languages, so that most of the infrastructure of the
> language is readilly available.
>


odd, I am loosely familiar with several functional languages and have not
seen what you describe.
at least the ones I have seen have usually treated strings like
mostly-opaque builtin types which are manipulated via function calls (and
the implementations I am familiar with usually have some kind of "string
object" representing an array of characters).


in terms of use, using a single cons cell per character, assuming here that
character's are directly encoded inside references, which is typical (except
for some VMs built on the JVM, which use... objects...), would be expensive.


many of us shy away from the overhead of even using UTF-16, and in some
cases it may even be desired to use more compact representations than ASCII
or UTF-8 (although, granted, LZ and Huffman are typically far too expensive
performance-wise to be justified for strings).


a reason here is that, along with conses, strings are another one of those
types which can really use up a lot of the memory...


it just came up as an interesting thought that an implementation "could"
embed very short strings directly into references (for example, 3 chars, or
maybe 4 using a modulo-encoding on x86, or 6-8 chars on a 64-bit arch...).


of course, it is unclear how much this would improve over the much more
conventional approach of merging duplicate strings. it could have payoff for
implementing compilers though, where there are large numbers of relatively
short strings (most variable names and some function names being encodable
directly in references...).


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.