Re: Spell checking identifiers

Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid>
Thu, 25 Jun 2020 22:33:37 +0800

          From comp.compilers

Related articles
[3 earlier articles]
Re: Spell checking identifiers derek@_NOSPAM_knosof.co.uk.invalid (Derek M. Jones) (2020-06-24)
Re: Spell checking identifiers 937-053-0959@kylheku.com (Kaz Kylheku) (2020-06-24)
Re: Spell checking identifiers tkoenig@netcologne.de (Thomas Koenig) (2020-06-24)
Re: Spell checking identifiers gautier_niouzes@hotmail.com (2020-06-24)
Re: Spell checking identifiers gah4@u.washington.edu (2020-06-24)
Re: Spell checking identifiers johann@myrkraverk.invalid (Johann 'Myrkraverk' Oskarsson) (2020-06-25)
Re: Spell checking identifiers johann@myrkraverk.invalid (Johann 'Myrkraverk' Oskarsson) (2020-06-25)
Re: Spell checking identifiers acolvin@efunct.com (mac) (2020-07-09)
Re: Spell checking identifiers tkoenig@netcologne.de (Thomas Koenig) (2020-07-10)
Re: Spell checking identifiers gah4@u.washington.edu (2020-07-10)
| List of all articles for this month |

From: Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid>
Newsgroups: comp.compilers
Date: Thu, 25 Jun 2020 22:33:37 +0800
Organization: Easynews - www.easynews.com
References: 20-06-010 20-06-011 20-06-012
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="42502"; mail-complaints-to="abuse@iecc.com"
Keywords: lex, errors
Posted-Date: 25 Jun 2020 11:54:47 EDT
In-Reply-To: 20-06-012
Content-Language: en-GB

On 24/06/2020 7:51 am, gah4@u.washington.edu wrote:
> On Tuesday, June 23, 2020 at 12:59:35 PM UTC-7, Johann 'Myrkraverk' Oskarsson wrote:
>
> (snip)
>
>> This clang blog specifically mentions Levenshtein,
>
>> http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-recovery.html#spell_checker
>
>> and it looks like what people do is to go through the entire symbol
>> table and compute it against the individual erroneous identifier.
>
>> I thought that'd be a bit on the expensive side,
>
> With either constant weighting or character dependent weighting
> it is easy to do with dynamic programming. The time is then O(m n)
> where m and n are the two lengths.


Are you talking about doing this one by one through the entire symbol
table?


> It seems most obvious to do only variable that are in the appropriate
> scope to be misspelled, but I suspect catching variables used out
> of scope is also worth doing. Well, in the latter case, you could
> hope that they at least spell them the same.


Depending on context, one would also want to do this for type names (as
per the blog above). Depending on the language* and culture**, there
can be thousands of type names in scope.


> I think you should turn it off for one character names, though,
> even though I suspect those are more likely. Too many false
> positives!


rustc obviously does this for one character names, at least in the
case for i and j. I don't know if it's useful to compare a and k.


* C++ and Java come to mind.


** Programming culture, some of them have a name such as Agile, and
eXtreme Programming; others don't have a name.


--
Johann | email: invalid -> com | www.myrkraverk.com/blog/
I'm not from the Internet, I just work there. | twitter: @myrkraverk


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.