Re: thread static

mac@yukon.asd.sgi.com (Michael McNamara)
Fri, 18 Aug 1995 19:32:36 GMT

          From comp.compilers

Related articles
thread static chris@tkna.com (1995-08-08)
Re: thread static bill@amber.ssd.hcsc.com (1995-08-15)
Re: thread static mac@yukon.asd.sgi.com (1995-08-18)
Re: thread static stefan.monnier@epfl.ch (Stefan Monnier) (1995-08-21)
Re: thread static pardo@cs.washington.edu (1995-08-21)
Re: thread static Roger@natron.demon.co.uk (Roger Barnett) (1995-08-21)
Re: thread static pardo@cs.washington.edu (1995-08-21)
Re: thread static mfinney@inmind.com (1995-08-22)
Re: thread static erik@kroete2.freinet.de (1995-08-22)
[4 later articles]
| List of all articles for this month |
Newsgroups: comp.compilers
From: mac@yukon.asd.sgi.com (Michael McNamara)
Keywords: parallel
Organization: Verilog Consulting Services, Inc.
References: 95-08-078
Date: Fri, 18 Aug 1995 19:32:36 GMT

chris@tkna.com (Christopher Helck) writes:


: Is there a technical reason why most languages don't support
:threads? I would like to be able to declare in 'C' a variable as
:"thread static", there would be a copy of the variable for each
:thread. A thread can not modify or read another thread's variable.
: ...
:
: Most multi-threaded extensions to 'C' and languages that directly
: support threads (JAVA is the only one I know) concentrate on locking
: resources. This is good. But why not go one step further and allow
: thread specific variables?
:
: Thanks.
: [I'd guess that it's mostly historical -- languages like C weren't designed
: with multiprocessors in mind. It'd also take some linker hackery, but that
: shouldn't be too hard. -John]


The Ardent C and FORTRAN compilers had a extension of a
"threadlocal" qualifier you could use for all storage types. (pointers
and data). An yes, not only is it elegant, it delivers better
performance.


threadlocal int foo;
threadlocal char *psmyname = "joe";


This, along with the linker hackery John alludes to made it a
breeze to parallelize programs and libraries.


Basically the linker gathered up all indentifiers labeled
threadlocal, and allocated them in one virtual memory region. Each
thread shared the data region, but got its own threadlocal data
region. Every thread's threadlocal data region was mapped at the same
virtual address, so code like that you describe suffered no "Identity
Crisis Penalty"*


The programmer was freed from the mechanics of parallelism;
she need only identify that "Aha, each thread needs it's own copy of
foo!!", and add the threadlocal qualifier, and she is done.


Comparing this to the error prone tedium a programmer must go
through to get a thread local data item with the currently available C
and FORTRAN compilers shows you the software engineering benefit of
this extension.


Moreover, the fact that a the threadlocal qualification was
completely harmless to a uniprocessor program, made it simple to
provide one C library that was MP safe, but that imposed no overhead
on the uniprocessor programmer:


struct passwd *
getpwent() {
threadlocal static struct passwd *ppwcurr = 0;


if (ppwcurr) {
/* details omitted... */
return (++ppwcurr);
} else {
init_pwent();
return(ppwcurr);
}
}


*Most important to me was the avoidance of the Identity Crisis
I allude to above. Since I am a performance freak, I rail at the need
to call the os to determine my thread number, and then index into an
array to determine which foo is my threadlocal foo. (Some os's even
do not use cardinal numbers for numbering threads: they have holes in
the number space, forcing one to allocate extra data, or munge the
thread number... grrr)


I recognize that one can dedicate a register to hold one's
thread number, thus avoid the os call; but then consider the cost of
removing a register from register allocator's pool.


Moreover, one still incurs the cost of the array index to get
one's own foo, and potentially the false cache line sharing problem
if one packs the array of thread local data in a data major order,
rather than a thread major order.


Bad:
int afoo[NUM_THREADS];
char *acp[NUM_THREADS];
Better:
struct {
int foo;
char *cp;
char pad[128]; /* pad out to at least a cache line */
} atd[NUM_THREADS];




I also recognize that supporting a per thread paging scheme is
contrary to some UNIX's thread model. This I feel is a bug in their
thread model. Other OS's can support a per thread paging scheme, but
at some cost to every tbl miss. Again, I feel that this can be fixed;
however, I would like to understand the cost in performance of the
fix.


-mac
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.