How to use the sudachipy.tokenizer.Tokenizer function in SudachiPy

To help you get started, we’ve selected a few SudachiPy examples, based on popular ways it is used in public projects.

Secure your code as it's written. Use Snyk Code to scan source code in minutes - no build needed - and fix issues immediately.

github megagonlabs / ginza / ja_ginza / displacy.py View on Github external
nlp = load_model(model_path)
    if disable_pipes:
        print("disabling pipes: {}".format(disable_pipes), file=sys.stderr)
        nlp.disable_pipes(disable_pipes)
        print("using : {}".format(nlp.pipe_names), file=sys.stderr)
    else:
        # to ensure reflect local changes of corrector
        if recreate_corrector and 'JapaneseCorrector' in nlp.pipe_names:
            nlp.remove_pipe('JapaneseCorrector')
            corrector = JapaneseCorrector(nlp)
            nlp.add_pipe(corrector, last=True)

    if mode == 'A':
        nlp.tokenizer.mode = OriginalTokenizer.SplitMode.A
    elif mode == 'B':
        nlp.tokenizer.mode = OriginalTokenizer.SplitMode.B
    elif mode == 'C':
        nlp.tokenizer.mode = OriginalTokenizer.SplitMode.C
    else:
        raise Exception('mode should be A, B or C')
    print("mode is {}".format(mode), file=sys.stderr)
    if not use_sentence_separator:
        print("disabling sentence separator", file=sys.stderr)
        nlp.tokenizer.use_sentence_separator = False

    if browser_command:
        browser = webbrowser.get(browser_command)
    else:
        browser = None

    if corpus_type:
        if corpus_type == 'bccwj_ud':
github WorksApplications / SudachiPy / sudachipy / dictionary.py View on Github external
def create(self, mode=None):
        return Tokenizer(
            self.grammar, self.lexicon, self.input_text_plugins, self.oov_provider_plugins, self.path_rewrite_plugins, mode=mode)