Re: How to determine if a given line is a C/C++ comment

"Nicola Musatti" <nicola.musatti@gmail.com>
10 Aug 2006 15:47:29 -0400

          From comp.compilers

Related articles
How to determine if a given line is a C/C++ comment chtaylo3@gmail.com (2006-08-09)
Re: How to determine if a given line is a C/C++ comment tom@infoether.com (Tom Copeland) (2006-08-10)
Re: How to determine if a given line is a C/C++ comment haberg@math.su.se (2006-08-10)
Re: How to determine if a given line is a C/C++ comment nicola.musatti@gmail.com (Nicola Musatti) (2006-08-10)
Re: How to determine if a given line is a C/C++ comment johnmillaway@yahoo.com (John Millaway) (2006-08-10)
Re: How to determine if a given line is a C/C++ comment mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2006-08-12)
Re: How to determine if a given line is a C/C++ comment listas@nicolasb.com.ar (Nico) (2006-08-12)
Re: How to determine if a given line is a C/C++ comment gah@ugcs.caltech.edu (glen herrmannsfeldt) (2006-08-14)
Re: How to determine if a given line is a C/C++ comment nicola.musatti@gmail.com (Nicola Musatti) (2006-08-18)
Re: How to determine if a given line is a C/C++ comment zebedee@zebedee.net (zebedee) (2006-09-12)
| List of all articles for this month |

From: "Nicola Musatti" <nicola.musatti@gmail.com>
Newsgroups: comp.compilers
Date: 10 Aug 2006 15:47:29 -0400
Organization: Compilers Central
References: 06-08-042
Keywords: C, lex

chtaylo3@gmail.com wrote:
> Hello all,
>
> First, I must admit that I know squat about compilers. I've asked
> around about how to determine if a given line of source contains
> nothing but comments and I've been referred to this newsgroup severla
> times.
>
> I'm trying to determine if a given line is a C/C++ style comment.
[...]
> [You need about 2/3 of a C++ lexer. You more or less need to
> scan for /* and then the matching */, except that you also need
> to look for quoted strings since "/*" is a string, not a comment.
> It's not that hard, with a lexer generator like flex you should
> be able to do it in a few hours. But people must have done this
> a hundred times before so I would first poke around on the net and
> see if there is code you can just steal. -John]


Well, it's simpler than that, because if a block comment start
sequence is embedded in a string than the corresponding line contains
something other than comments. The following Python program shows how
to do it with one regular expression (barring bugs :-)


Cheers,
Nicola Musatti


#
import re


# The following regular expression matches either a line containing only
# optional white space and a line comment or a sequence of block comments
# possibly intermixed with white space


comment = re.compile(R'(^[ \t]*//.*?$)|(^([ \t]*/\*.*?\*/)+[ \t]*$)',
                                          re.MULTILINE | re.DOTALL)


# a simple C/C++ program


text = '''
// whole line comment


int main() // not whole line
{
/* valid multi line
      block comment */
      return 0; /* non valid
      multiline comment */
}
'''


# the example is scanned and each line containing only comments is
printed


found = True
start = 0
while found:
        found = comment.search(text, start)
        if found :
                start = found.end()
                print found.group()


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.