Secure your code as it's written. Use Snyk Code to scan source code in minutes - no build needed - and fix issues immediately.
Because we use the `langcodes` module, we can handle slight
variations in language codes. For example, looking for 'pt-BR',
'pt_br', or even 'PT_BR' will get you the 'pt' (Portuguese) list.
Looking up the alternate code 'por' will also get the same list.
"""
if match_cutoff is not None:
warnings.warn(
"The `match_cutoff` parameter is deprecated",
DeprecationWarning
)
available = available_languages(wordlist)
# TODO: decrease the maximum distance. This distance is so high just
# because it allows a test where 'yue' matches 'zh', and maybe the
# distance between those is high because they shouldn't match.
best, _distance = langcodes.closest_match(
lang, list(available), max_distance=70
)
if best == 'und':
raise LookupError("No wordlist %r available for language %r"
% (wordlist, lang))
if best != lang:
logger.warning(
"You asked for word frequencies in language %r. Using the "
"nearest match, which is %r (%s)."
% (lang, best, langcodes.get(best).language_name('en'))
)
return read_cBpack(available[best])