Related articles |
---|
Spell checking identifiers johann@myrkraverk.invalid (Johann 'Myrkraverk' Oskarsson) (2020-06-24) |
Re: Spell checking identifiers johann@myrkraverk.invalid (Johann 'Myrkraverk' Oskarsson) (2020-06-24) |
Re: Spell checking identifiers gah4@u.washington.edu (2020-06-23) |
Re: Spell checking identifiers derek@_NOSPAM_knosof.co.uk.invalid (Derek M. Jones) (2020-06-24) |
Re: Spell checking identifiers 937-053-0959@kylheku.com (Kaz Kylheku) (2020-06-24) |
Re: Spell checking identifiers tkoenig@netcologne.de (Thomas Koenig) (2020-06-24) |
Re: Spell checking identifiers gautier_niouzes@hotmail.com (2020-06-24) |
Re: Spell checking identifiers gah4@u.washington.edu (2020-06-24) |
[5 later articles] |
From: | Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid> |
Newsgroups: | comp.compilers |
Date: | Wed, 24 Jun 2020 03:56:56 +0800 |
Organization: | Easynews - www.easynews.com |
References: | 20-06-010 |
Injection-Info: | gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="42091"; mail-complaints-to="abuse@iecc.com" |
Keywords: | lex, errors |
Posted-Date: | 23 Jun 2020 15:59:33 EDT |
In-Reply-To: | 20-06-010 |
Content-Language: | en-GB |
> [There's a vast amount of work on edit distance. My guess is they
> use something like Levenshtein, but rather than use a constant
> distance of 1 between different letters, the distance varies depending
> on how different the letters look. -John]
This clang blog specifically mentions Levenshtein,
http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-recovery.html#spell_checker
and it looks like what people do is to go through the entire symbol
table and compute it against the individual erroneous identifier.
I thought that'd be a bit on the expensive side, because C++ files
can have 100k+ (or millions?) of lines after preprocessing, so one
translation unit really can go up to million identifiers in practice.
[I don't know if that actually happens but I don't think it's safe
to assume it doesn't.]
In the 10 years since, people may have changed from standard Levenshtein
as you mention.
But then, maybe compilation speed for erroneous input isn't really
important. rustc is slow for a short input file in both cases [which
could be the startup cost.]
--
Johann | email: invalid -> com | www.myrkraverk.com/blog/
I'm not from the Internet, I just work there. | twitter: @myrkraverk
Return to the
comp.compilers page.
Search the
comp.compilers archives again.