How to use the pycantonese.corpus.CantoneseCHATReader function in pycantonese

To help you get started, we’ve selected a few pycantonese examples, based on popular ways it is used in public projects.

Secure your code as it's written. Use Snyk Code to scan source code in minutes - no build needed - and fix issues immediately.

github jacksonllee / pycantonese / pycantonese / corpus.py View on Github external
def __init__(self, *filenames, **kwargs):
        encoding = kwargs.get("encoding", ENCODING)
        super(CantoneseCHATReader, self).__init__(
            *filenames, encoding=encoding
        )
github jacksonllee / pycantonese / pycantonese / corpus.py View on Github external
def hkcancor():
    """
    Create the corpus object for the Hong Kong Cantonese Corpus.
    """
    data_path = os.path.join(
        os.path.dirname(__file__), "data", "hkcancor", "*.cha"
    )
    return CantoneseCHATReader(data_path, encoding="utf8")
github jacksonllee / pycantonese / pycantonese / corpus.py View on Github external
def read_chat(*filenames, **kwargs):
    """
    Create a corpus object based on *filenames*.

    :param filenames: one or multiple filenames (absolute-path or relative to
        the current directory; with or without glob matching patterns)

    :param kwargs: Keyword arguments. Currently, only ``encoding`` is
        recognized, which defaults to 'utf8'.
    """
    encoding = kwargs.get("encoding", ENCODING)
    return CantoneseCHATReader(*filenames, encoding=encoding)