Re: Extending REG-EXP to 2-Dimension.

bakul@netcom.com (Bakul Shah)
Thu, 8 Dec 1994 21:51:37 GMT

          From comp.compilers

Related articles
Extending REG-EXP to 2-Dimension. mosh@ramanujan.cs.albany.edu (1994-11-21)
Re: Extending REG-EXP to 2-Dimension. steve@cegelecproj.co.uk (1994-11-30)
Re: Extending REG-EXP to 2-Dimension. ruiter@ruls41.fsw.leidenuniv.nl (1994-12-01)
Re: Extending REG-EXP to 2-Dimension. rockwell@nova.umd.edu (1994-12-03)
Re: Extending REG-EXP to 2-Dimension. lloyd@kauri.vuw.ac.nz (1994-12-05)
Re: Extending REG-EXP to 2-Dimension. pwk@eb.ele.tue.nl (1994-12-05)
Re: Extending REG-EXP to 2-Dimension. steve@cegelecproj.co.uk (1994-12-05)
Re: Extending REG-EXP to 2-Dimension. bakul@netcom.com (1994-12-08)
| List of all articles for this month |
Newsgroups: comp.compilers
From: bakul@netcom.com (Bakul Shah)
Keywords: lex
Organization: NETCOM On-line Communication Services (408 261-4700 guest)
References: 94-11-137 94-12-033
Date: Thu, 8 Dec 1994 21:51:37 GMT

>Jan-Peter de Ruiter:
>: [ Request for regexp systems that work on "boxes" of text ]
>: As far as I know it has not been solved in any way. The problem is
>: that you need to extend the notion of linearity (characters
>: following other characters) in 2 dimensions.


Even if we forget about regular expressions and whether this
problem is polynomial time, the first problem we face is how to
*formally* specify an arbitrary 2D pattern of text.


Even something as simple as a fixed 2D pattern (i.e. described
by a matrix, where non-zero elements are chars we are looking
for) can be a bit hairy. We know how to look for a fixed 2D
pattern (match the first row of pattern against a line, move down
to the appropriate column and exactly match second row and so
forth). But how do we specify it?


How about a box like


XX...XX
X X
. .
. .
X X
XX...XX




Where ... denotes an arbitrary number of Xes but the *shape* must
be a rectangle? Even here we know what to do (i.e. we can write
a program that handles exactly this problem), but how to specify.


Next, how do we specify a square box? A box that is half as long
as it is wide but the length of the wide side is defined by the
number of Xes? And remember that such a pattern may be
*anywhere* in the quarter plane defined by a text file.


One idea that comes to mind is to use modified back references.
Recall that in regexp "(a)(bb)\1\1\2\1", back reference \1 matches
the first submatch, that is "a" and \2 matches "bb". The whole
RE matches "abbaabba". We can define a new kind of back ref.
"\l1" (that is, backslash, ell, one) to denote the length of the
first submatch and so on.


Give this, we can specify an arbitrarily wide and long box of
Xes as


^(.*) (XX+) .*\n
(.{\l1} X.{\l2-2}X .*\n)*
.{\l1} X{\l2} .*\n


(expression broken up in lines and by spaces to make it somewhat
easier to read. Spaces are not significant and "." matches
anything but a newline, {} contains the repetition count)


This says, in the first line we have any number of chars followed
by atleast two Xes, followed by any number of chars. This line
can be followed by any number of lines that have the same number
chars as the first sub match, followed by an X, followed by chars
two less than the number of consecutive Xes in the first line,
followed by another X, followed by anything and finally the last
line has Xes in the same position as the first line.


I don't know if this highly irregular expression can be mapped to
any kind of finite automation.


Bakul Shah <bakul@netcom.com>
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.