How to use the konoha.sentence_tokenizer.SentenceTokenizer.PERIOD function in konoha

To help you get started, weโ€™ve selected a few konoha examples, based on popular ways it is used in public projects.

Secure your code as it's written. Use Snyk Code to scan source code in minutes - no build needed - and fix issues immediately.

github himkt / tiny_tokenizer / konoha / sentence_tokenizer.py View on Github external
document = re.sub(pattern, self.conv_period, document)

        result = []
        for line in document.split("\n"):
            line = line.rstrip()
            line = line.replace("\n", "")
            line = line.replace("\r", "")
            line = line.replace("ใ€‚", "ใ€‚\n")
            sentences = line.split("\n")

            for sentence in sentences:
                if not sentence:
                    continue

                period_special = SentenceTokenizer.PERIOD_SPECIAL
                period = SentenceTokenizer.PERIOD
                sentence = sentence.replace(period_special, period)
                result.append(sentence)

        return result