From: | "Nicola Musatti" <nicola.musatti@gmail.com> |
Newsgroups: | comp.compilers |
Date: | 10 Aug 2006 15:47:29 -0400 |
Organization: | Compilers Central |
References: | 06-08-042 |
Keywords: | C, lex |
Posted-Date: | 10 Aug 2006 15:47:29 EDT |
chtaylo3@gmail.com wrote:
> Hello all,
>
> First, I must admit that I know squat about compilers. I've asked
> around about how to determine if a given line of source contains
> nothing but comments and I've been referred to this newsgroup severla
> times.
>
> I'm trying to determine if a given line is a C/C++ style comment.
[...]
> [You need about 2/3 of a C++ lexer. You more or less need to
> scan for /* and then the matching */, except that you also need
> to look for quoted strings since "/*" is a string, not a comment.
> It's not that hard, with a lexer generator like flex you should
> be able to do it in a few hours. But people must have done this
> a hundred times before so I would first poke around on the net and
> see if there is code you can just steal. -John]
Well, it's simpler than that, because if a block comment start
sequence is embedded in a string than the corresponding line contains
something other than comments. The following Python program shows how
to do it with one regular expression (barring bugs :-)
Cheers,
Nicola Musatti
#
import re
# The following regular expression matches either a line containing only
# optional white space and a line comment or a sequence of block comments
# possibly intermixed with white space
comment = re.compile(R'(^[ \t]*//.*?$)|(^([ \t]*/\*.*?\*/)+[ \t]*$)',
re.MULTILINE | re.DOTALL)
# a simple C/C++ program
text = '''
// whole line comment
int main() // not whole line
{
/* valid multi line
block comment */
return 0; /* non valid
multiline comment */
}
'''
# the example is scanned and each line containing only comments is
printed
found = True
start = 0
while found:
found = comment.search(text, start)
if found :
start = found.end()
print found.group()
Return to the
comp.compilers page.
Search the
comp.compilers archives again.