Related articles |
---|
Need simple parser/interpreter - reinvent the wheel? pld@fc.hp.com (2000-05-31) |
Re: Need simple parser/interpreter - reinvent the wheel? William.H.Duquette@jpl.nasa.gov (2000-06-01) |
Re: Need simple parser/interpreter - reinvent the wheel? andrew@andrewcooke.free-online.co.uk (Andrew Cooke) (2000-06-03) |
Re: Need simple parser/interpreter - reinvent the wheel? peter@abbnm.com (2000-06-03) |
From: | pld@fc.hp.com (Paul Dineen) |
Newsgroups: | comp.compilers,comp.lang.misc |
Date: | 31 May 2000 23:10:11 -0400 |
Organization: | H-P CSL, Fort Collins, Colorado, USA |
Keywords: | parse, question, comment |
Before I reinvent the wheel, I'd like to see if something like this
already exists.
Our project has a need to do some simple parsing of text files. We
want to be able to pull specific fields out of various files.
Specifically, we want to pull information about devices out of the
output of HP's Mesa diagnostics commands. I've created a general
purpose mini language for expressing this, and expect to create the
parser and interpreter to implement that -- unless there's already a
tool that does this. For today we're just parsing device information
out of Mesa output. However, we'd like to have a general-purpose tool
that could be used over the longer term to pull other information out
of other text files. We can't determine up front all possible uses we
might want to apply this tool to.
Here's an example that illustrates the idea.
The output from Mesa for a SCSI disk is:
=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=
-- Information Tool Log for SCSI Disk on path 10/0.6.0 --
Log creation time: Thu May 4 12:41:14 2000
Hardware path: 10/0.6.0
Product Id: ST34371W Vendor: SEAGATE
Device Type: SCSI Disk Firmware Rev: HPM2
Device Qualifier: SEAGATEST34371W Logical Unit: 0
Serial Number: JDM992760E12LG
Capacity (M Byte): 4095.86
Block Size: 512
Max Block Address: 8388313
Error Logs
Read Errors: 0 Buffer Overruns: N/A
Read Reverse Errors: N/A Buffer Underruns: N/A
Write Errors: 0 Non-Medium Errors: 0
Verify Errors: 0
=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=
We need to pull out the values for "path", "Product Id",
"Firmware Rev" and "Serial Number". In the language I've made
up, this would be accomplished with:
SCSI_disk {
open values[1] = "-- Information Tool Log for SCSI Disk on path (.*) --";
values[2] = "^Product ID:\\s*(\\S*)\\s*";
values[3] = "^Firmware Rev:\\s*(\\S*)\\s";
close values[4] = "^Serial Number:\\s*(\\S*)\\s";
return 1-4;
}
The "open" means that the context of the SCSI Disk section starts when
we see the line that matches the regular expression on that line.
Then we search for the values specified by the other regexps until we
reach the serial number line, where the context closes.
The output of the above would be something like:
SCSI_disk, 10/0.6.0, ST34371W, HPM2, JDM992760E12LG
(This hasn't been designed to production quality standard yet, so I'm
bleeping issues such as commas in the parsed data. Just trying to
express the concept for now.)
So, note the line orientation. We don't expect to need to be able to
fully parse the various outputs -- that's overkill, and the format may
change underneath us in ways that would break the parser but wouldn't
be important to us. We just need to find and pull out specific
values, ignoring the rest.
Since this is a made-up language, I don't expect any processor to be
out there that does exactly this. However, I am wondering if there
exists a general-purpose tool that is like this. I'm not looking for
anything as general purpose as sed, awk or grep. I want to create
something like the above that is somewhat more domain specific so that
some of the work is done by the language, and not entirely by the
programmers who need to do the parsing of these files.
The above example is a simple one. A more interesting one (for
AutoRAIDs) illustrates some of the greater complexity that we'd like
this language to support. An AutoRAID contains some number of disks
and some number of controllers, for each of which we need to get the
Product ID, Serial Number and Firmware revision. Note below that the
"disk_array" function "includes" the first 2 functions.
##### Disk array
array_disk {
open values[1] = "Information for disk in slot (.*):";
values[2] = "Product ID.*= (.*);";
values[3] = "Serial number.*= (.*)";
close values[4] = "Firmware revision.*= (.*)";
return 1-4;
}
array_controller {
open values[1] = "Information for controller (.*):";
values[2] = "Product ID.*= (.*);";
values[3] = "Serial number.*= (.*)";
# Also close on empty line because have seen one example in which
# the only line after the "Information controller line" was:
# Controller Y is installed, but not responding to this system
close values[4] = "Product revision.*= (.*)|^$";
return 1-4;
}
disk_array {
open values[1] = "Information Tool Log for Disk Array on path (\\S*) --";
values[2] = "Product ID.*= (.*)";
values[3] = "Array serial number.*= (.*)";
include array_disk, array_controller;
close "^=-+-=-+-|-- Exit the Support Tool Manager";
return 1-3;
}
Example output for an AutoRAID:
disk_array, 8/0.3, C5447A, 000000057FE3
array_disk, A1, ST19171N, LAE94268, HP06
array_disk, B1, ST19171N, LAH19802, HP06
array_disk, A2, ST19171N, LAE93675, HP06
array_disk, B2, ST19171N, LAH02953, HP06
array_disk, A3, ST19171N, LAH19648, HP06
array_disk, B3, ST19171N, LAE90348, HP06
array_disk, A4, ST19171N, LAH18432, HP06
array_disk, B4, ST19171N, LAH19149, HP06
array_disk, A5, ST19171N, LAE95689, HP06
array_disk, B5, ST19171N, LAH18494, HP06
array_disk, A6, ST19171N, LAE94639, HP06
array_disk, B6, ST19171N, LAH11738, HP06
array_controller, X, C5447A, R785AD9364731111, HP24
array_controller, Y, C5447A, R855AD3293884984, HP24
TIA.
Paul Dineen
paul_dineen@hp.com
[Looks to me like this would be a fine "little language" that you could
compile into perl or awk. -John]
Return to the
comp.compilers page.
Search the
comp.compilers archives again.