How to use the sister.tokenizers.JapaneseTokenizer function in sister

To help you get started, we’ve selected a few sister examples, based on popular ways it is used in public projects.

Secure your code as it's written. Use Snyk Code to scan source code in minutes - no build needed - and fix issues immediately.

github tofunlp / sister / tests / test_tokenizers.py View on Github external
def setUp(self):
        self.tokenizer = tokenizers.JapaneseTokenizer()
github tofunlp / sister / sister / core.py View on Github external
def __init__(
            self,
            lang: str = 'en',
            tokenizer: Tokenizer = None,
            word_embedder: WordEmbedding = None) -> None:
        tokenizer = tokenizer or {"en": SimpleTokenizer(),
                                  "fr": SimpleTokenizer(),
                                  "ja": JapaneseTokenizer()}[lang]
        word_embedder = word_embedder or FasttextEmbedding(lang)
        super().__init__(tokenizer, word_embedder)