29 Mar 2007 00:59:10 -0400

Related articles |
---|

Grammar for roman numerals msully4321@gmail.com (2007-03-27) |

Re: Grammar for roman numerals martin@gkc.org.uk (Martin Ward) (2007-03-29) |

Re: Grammar for roman numerals boldyrev+nospam@cgitftp.uiggm.nsc.ru (Ivan Boldyrev) (2007-03-29) |

Re: Grammar for roman numerals mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2007-03-30) |

Re: Grammar for roman numerals martin@gkc.org.uk (Martin Ward) (2007-03-30) |

Re: Grammar for roman numerals mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2007-04-01) |

Re: Grammar for roman numerals DrDiettrich1@aol.com (Hans-Peter Diettrich) (2007-04-01) |

Re: Grammar for roman numerals alex.habar.nam@gmail.com (whiskey) (2007-04-06) |

[3 later articles] |

From: | Martin Ward <martin@gkc.org.uk> |

Newsgroups: | comp.compilers |

Date: | 29 Mar 2007 00:59:10 -0400 |

Organization: | Compilers Central |

References: | 07-03-095 |

Keywords: | parse |

Posted-Date: | 29 Mar 2007 00:59:10 EDT |

On Tuesday 27 Mar 2007 14:27, msully4321@gmail.com wrote:

*> Here is my grammar (I allow an arbitrary number of Ms)*

*>*

*> numeral -> thousands*

*> thousands -> thous_part hundreds | thous_part | hundreds*

*> thous_part -> thous_part M | M*

*> hundreds -> hun_part tens | hun_part | tens*

*> hun_part -> hun_rep | CD | D | D hun_rep | CM*

*> hun_rep -> C | CC | CCC*

*> tens -> tens_part ones | tens_part | ones*

*> tens_part -> tens_rep | XL | L | L tens_rep | XC*

*> tens_rep -> X | XX | XXX*

*> ones -> ones_rep | IV | V | V ones_rep | IX*

*> ones_rep -> I | II | III*

*>*

*> Comments?*

This doesn't accept IIII for 4 (as found on many clocks with Roman

Numeral faces, for example), nor does it accept the "shorthand"

forms: IC for 99, IIC for 98, MVM for 1995 and so on.

The rule is that any smaller number placed before a larger

number is subtracted from the larger number.

I know of no examples where the "smaller number"

consists of other than a single numeral, or the two identical numerals

II, XX or CC. However, constructions such as IIIII for "five", IIX for "eight"

or VV for "ten" have been discovered in manuscripts.

A bar placed over a number multiplies it by one thousand,

and a double bar multiplies it by one million.

This could be implemented in your system by using parentheses

to denote the bar: thus (I) would represent 1,000.

(In the Middle Ages, 500, usually D, was sometimes written as

I followed by an apostrophus, resembling a backwards C, while 1,000

was written as CI followed by an apostrophus.)

The more general question raised by this discussion (and more relevant

to comp.compilers) is how "forgiving" should a parser be in the case

where the language being parsed has no formal definition: or where

there are several, conflicting formal definitions?

Do you accept anything that can possibly be interpreted,

or do you place "arbitrary" restrictions in order to simplify

the grammar, at the expense of rejecting existing files?

--

Martin

martin@gkc.org.uk http://www.cse.dmu.ac.uk/~mward/ Erdos number: 4

G.K.Chesterton web site: http://www.cse.dmu.ac.uk/~mward/gkc/

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.