Scanner Design Advice

Mike <mbush@nospam.home.com>
22 Dec 2001 23:03:24 -0500

          From comp.compilers

Related articles
Scanner Design Advice mbush@nospam.home.com (Mike) (2001-12-22)
| List of all articles for this month |

From: Mike <mbush@nospam.home.com>
Newsgroups: comp.compilers
Date: 22 Dec 2001 23:03:24 -0500
Organization: Excite@Home - The Leader in Broadband http://home.com/faster
Keywords: parse, question
Posted-Date: 22 Dec 2001 23:03:24 EST

Hello everyone, I'd like to ask your opinion on a scanner design im
contemplating with.


Im writing attempting to write a scanner that will scan for URL's in
HTML tags, and then break the relative or absolute URL down into its
optional hostname, optional port, optional path, and optional search path.


Currently with use of a finite state machine w/ a state table I am able
to scan and return the HTML tags that contain URLs. Now here is where I
am contemplating the next step. I have thought of re-scanning the
matching lexemes to strip the URL from the tag and to break it down to
its various components. Or use another state table just for the URL,
which will be called once a matching HREF= , CITE=, or etc. has been
matched, and then returning each piece of the URL.


Im trying to make this process as fast as possible and am curious which
way would you recommend, or if a more efficient way is possible, where
can I find out about it.




Thanks.


Mike.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.