Re: Pascal vs C style string ? (Hans Boehm)
Tue, 28 Jun 1994 20:35:54 GMT

          From comp.compilers

Related articles
[7 earlier articles]
Re: Pascal vs C style string ? (1994-06-28)
Re: Pascal vs C style string ? (Stefan Monnier) (1994-06-28)
Re: Pascal vs C style string ? (Erkki Ruohtula) (1994-06-28)
Re: Pascal vs C style string ? (1994-06-28)
Re: Pascal vs C style string ? (1994-06-28)
Re: Pascal vs C style string ? (1994-06-28)
Re: Pascal vs C style string ? (1994-06-28)
Re: Pascal vs C style string ? (1994-06-29)
Re: Pascal vs C style string ? (1994-06-29)
Re: Pascal vs C style string ? (1994-06-30)
Re: Pascal vs C style string ? guerin@IRO.UMontreal.CA (1994-06-30)
Re: Pascal vs C style string ? synaptx!thymus! (Dave Gillespie) (1994-06-30)
Re: Pascal vs C style string ? (1994-07-01)
[1 later articles]
| List of all articles for this month |

Newsgroups: comp.compilers
From: (Hans Boehm)
Keywords: C, Pascal, design
Organization: Xerox Palo Alto Research Center
References: 94-06-175 94-06-226
Date: Tue, 28 Jun 1994 20:35:54 GMT (Hans Boehm) writes:
>You might also want to look at the "cord" string package that's included
>with the garbage collector in It has
>the following characteristics: (Joseph H Allen) writes:
>This is a fascinating subject and one in which I've spent some time
>researching. I took a look at your cord package. It shares many of the
>ideas I implemented in two libraries of my own: the buffer management
>routines in my editor JOE and a software virtual memory / file-buffering

>Here are some improvements you might make to your cord package:

[3 good suggestions omitted. Thanks. The current string searching function
was a stop-gap that's clearly in need of replacement.]

>4) Use skip-lists for the data structure instead of a binary tree: I think
>it could work- you can search by accumulating forward-element sizes just as
>well as you can by accumulating left sub-tree sizes. The advantage of
>skiplists is that they're self-balancing.

This brings up some fundamental issues, which are worth clarifying. The
basic cord package provides only applicative operations. If I concatenate
a and b to produce the string c, a and b are still intact after the
concatenation. Thus if I get passed a string as a parameter, I can safely
form new strings by concatenating other strings to the parameter. I
consider this an important property for a general purpose string library,
especially for one that's intended for use in a multi-threaded
environment. I don't know how to do this efficiently with skip lists.
(It can be done in time logarithmic in the length of the string by
explicitly keeping track of version numbers and using the
Driscoll-Sarnak-Sleator-Tarjan persistent data structures algorithm, I
think. But the constant isn't very competitive.)

For some of the applications I have dealt with, very cheap concatenation
is essential. (If you're building a source-to-source translator, it's
often convenient to build up the output by string concatenation. You no
longer have to do things strictly left-to-right. Repeated character
insertion in a text editor essentially amounts to concatenation.) My
impression is that the current data structure usually supports appreciably
faster concatenation than the alternatives that preserve reasonable access
times. We average something < 25 SS2 usecs per character when naively
building a 100K string by concatenating single characters to its end.
That isn't very many allocations. (Splay trees are more attractive at
first glance, but have serious practical problems in this context,
especially in a multi-threaded environment.)

Hans-J. Boehm

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.