How to use the sister.tokenizers.Tokenizer function in sister

To help you get started, we’ve selected a few sister examples, based on popular ways it is used in public projects.

Secure your code as it's written. Use Snyk Code to scan source code in minutes - no build needed - and fix issues immediately.

github tofunlp / sister / tests / test_core.py View on Github external
def test_call_not_implemented(self):
        class Dummy(SentenceEmbedding):
            def embed(self, s): ...
        sentence = 'I am a dog.'
        with self.assertRaises(NotImplementedError):
            Dummy(Tokenizer(), WordEmbedding()).__call__(sentence)
github tofunlp / sister / tests / test_core.py View on Github external
def test_embed_not_implemented(self):
        class Dummy(SentenceEmbedding):
            def __call__(self, s): ...
        sentence = 'I am a dog.'
        with self.assertRaises(NotImplementedError):
            Dummy(Tokenizer(), WordEmbedding()).embed(sentence)
github tofunlp / sister / sister / tokenizers.py View on Github external
from typing import List


class Tokenizer(object):

    def tokenize(self, sentence: str) -> List[str]:
        raise NotImplementedError


class SimpleTokenizer(Tokenizer):

    def __init__(self):
        self.replace_tokens = str.maketrans({
            '.': ' .',
            '?': ' ?',
            '!': ' !',
            '(': ' ( ',
            ')': ' ) ',
        })

    def tokenize(self, sentence: str) -> List[str]:
        return sentence.translate(self.replace_tokens).split()