|Scanner Design Advice firstname.lastname@example.org (Mike) (2001-12-22)|
|Date:||22 Dec 2001 23:03:24 -0500|
|Organization:||Excite@Home - The Leader in Broadband http://home.com/faster|
|Posted-Date:||22 Dec 2001 23:03:24 EST|
Hello everyone, I'd like to ask your opinion on a scanner design im
Im writing attempting to write a scanner that will scan for URL's in
HTML tags, and then break the relative or absolute URL down into its
optional hostname, optional port, optional path, and optional search path.
Currently with use of a finite state machine w/ a state table I am able
to scan and return the HTML tags that contain URLs. Now here is where I
am contemplating the next step. I have thought of re-scanning the
matching lexemes to strip the URL from the tag and to break it down to
its various components. Or use another state table just for the URL,
which will be called once a matching HREF= , CITE=, or etc. has been
matched, and then returning each piece of the URL.
Im trying to make this process as fast as possible and am curious which
way would you recommend, or if a more efficient way is possible, where
can I find out about it.
Return to the
Search the comp.compilers archives again.