Related articles |
---|
Dealing with load/store instructions on static tainted flow analysis gabrielquadros@hotmail.com (Gabriel Quadros) (2011-06-06) |
Re: Dealing with load/store instructions on static tainted flow analys gah@ugcs.caltech.edu (glen herrmannsfeldt) (2011-06-07) |
Re: Dealing with load/store instructions on static tainted flow analys kym@kymhorsell.com (2011-06-08) |
Re: Dealing with load/store instructions on static tainted flow analys gneuner2@comcast.net (George Neuner) (2011-06-09) |
Re: Dealing with load/store instructions on static tainted flow analys martin@gkc.org.uk (Martin Ward) (2011-06-12) |
From: | George Neuner <gneuner2@comcast.net> |
Newsgroups: | comp.compilers |
Date: | Thu, 09 Jun 2011 18:51:16 -0400 |
Organization: | A noiseless patient Spider |
References: | 11-06-010 |
Keywords: | analysis |
Posted-Date: | 11 Jun 2011 13:53:55 EDT |
On Mon, 6 Jun 2011 21:00:41 -0700 (PDT), Gabriel Quadros
<gabrielquadros@hotmail.com> wrote:
> I am trying to implement a pass to detect information leak in
>programs. The problem is a variation of static tainted-flow analysis:
>I have some source functions, sink functions and sanitizers. I want to
>know if it is possible for data to flow from source to sink without
>going across a sanitizer.
> :
>a = SOURCE
>b = malloc(100)
>...
>b[i] = a
>...
>SINK = c[j]
>...
>
>So, the problem is that it is hard to know that c != b and i != j.
>Once information flows into memory, the safest thing to do is to flag
>the whole memory as a SOURCE. Of course, that is very conservative. I
>was wondering if you guys could recommend me some strategies and
>techniques to be more precise. In particular, if you could point me
>some paper that does it, that would be great.
Hi Gabriel,
A general solution, I believe, is intractable.
Using SSA or some kind of value numbering, it theoretically would be
possible - though not easy - to keep track of all generated non-local
addresses and compare the addresses accessed via disjoint program
paths to see if there is overlap.
Unfortunately, there are (at least) 4 problems with that. The first
is that the addresses the compiler works with are symbolic: a name
plus offset ... and they may be more involved to store and compare
than a simple numerical address.
The second, related, problem is that global and heap addresses often
will be of a form like "$DATA+0xFA3892", having no reference to any
recognizable program data ... you'll need to track or to figure out
the boundaries of your globals and heap allocations in terms of the
compiler's symbolic addresses in order to give meaningful error
messages.
The third problem is you'll need to process the entire program before
looking for leaks. The storage needed to retain all the address lists
may be extremely large and if the language permits separate
compilation, you'll need to ensure that all of the program has been
processed.
And the fourth problem is identifying the disjoint program paths. This
can be done using any of the number of flow and dependency graphs that
the compiler generates ... but the compiler is more interested in
following individual serial execution paths and generally isn't much
interested in the extent of multiple path disjointness.
WRT literature, I'm not familiar with any dealing precisely with the
kind of *internal* "hidden channel" security issues you are addressing
... there is quite a bit about leakage outside of a program.
The best I can suggest is that you investigate array optimizations, in
particular looking at the extensive work that has been done toward
identifying array index aliasing. Your particular problem, though
more expansive, is directly analogous to the array aliasing problem
(treating all of process memory as an array).
George
Return to the
comp.compilers page.
Search the
comp.compilers archives again.