How to use the konoha.word_tokenizers.JanomeTokenizer function in konoha

To help you get started, weโ€™ve selected a few konoha examples, based on popular ways it is used in public projects.

Secure your code as it's written. Use Snyk Code to scan source code in minutes - no build needed - and fix issues immediately.

github himkt / tiny_tokenizer / konoha / word_tokenizer.py View on Github external
raise ValueError("`model_path` must be specified for sentencepiece.")

            self.tokenizer = word_tokenizers.SentencepieceTokenizer(
                model_path=self.model_path,
            )

        if self._tokenizer == "mecab":
            self.tokenizer = word_tokenizers.MeCabTokenizer(
                user_dictionary_path=self.user_dictionary_path,
                system_dictionary_path=self.system_dictionary_path,
                with_postag=self.with_postag,
                dictionary_format=self.dictionary_format,
            )

        if self._tokenizer == "janome":
            self.tokenizer = word_tokenizers.JanomeTokenizer(
                user_dictionary_path=self.user_dictionary_path,
                with_postag=self.with_postag,
            )

        if self._tokenizer == "sudachi":
            if self.mode is None:
                raise ValueError("`mode` must be specified for sudachi.")

            self.tokenizer = word_tokenizers.SudachiTokenizer(
                mode=self.mode,
                with_postag=self.with_postag,
            )