How to use the fastparquet.writer.find_max_part function in fastparquet

To help you get started, we’ve selected a few fastparquet examples, based on popular ways it is used in public projects.

Secure your code as it's written. Use Snyk Code to scan source code in minutes - no build needed - and fix issues immediately.

github dask / dask / dask / dataframe / io / parquet.py View on Github external
):
            raise ValueError(
                "Appended columns not the same.\n"
                "Previous: {} | New: {}".format(pf.columns, list(df.columns))
            )
        elif (pd.Series(pf.dtypes).loc[pf.columns] != df[pf.columns].dtypes).any():
            raise ValueError(
                "Appended dtypes differ.\n{}".format(
                    set(pf.dtypes.items()) ^ set(df.dtypes.iteritems())
                )
            )
        else:
            df = df[pf.columns + partition_on]

        fmd = pf.fmd
        i_offset = fastparquet.writer.find_max_part(fmd.row_groups)

        if not ignore_divisions:
            minmax = fastparquet.api.sorted_partitioned_columns(pf)
            old_end = minmax[index_cols[0]]["max"][-1]
            if divisions[0] < old_end:
                raise ValueError(
                    "Appended divisions overlapping with the previous ones.\n"
                    "Previous: {} | New: {}".format(old_end, divisions[0])
                )
    else:
        fmd = fastparquet.writer.make_metadata(
            df._meta,
            object_encoding=object_encoding,
            index_cols=index_cols,
            ignore_columns=partition_on,
            **kwargs