How to use the missingno.utils.nullity_filter function in missingno

To help you get started, we’ve selected a few missingno examples, based on popular ways it is used in public projects.

Secure your code as it's written. Use Snyk Code to scan source code in minutes - no build needed - and fix issues immediately.

github ResidentMario / missingno / missingno / missingno.py View on Github external
more information.
    :param p: The cap on the percentage fill of the columns in the filtered DataFrame. See  `nullity_filter()` for
    more information.
    :param sort: The column sort order to apply. Can be "ascending", "descending", or None.
    :param figsize: The size of the figure to display. This is a `matplotlib` parameter which defaults to (20, 12).
    :param fontsize: The figure's font size.
    :param labels: Whether or not to label each matrix entry with its correlation (default is True).
    :param cmap: What `matplotlib` colormap to use. Defaults to `RdBu`.
    :param vmin: The normalized colormap threshold. Defaults to -1, e.g. the bottom of the color scale.
    :param vmax: The normalized colormap threshold. Defaults to 1, e.g. the bottom of the color scale.
    :param inline: Whether or not the figure is inline. If it's not then instead of getting plotted, this method will
    return its figure.
    :return: If `inline` is False, the underlying `matplotlib.figure` object. Else, nothing.
    """
    # Apply filters and sorts, set up the figure.
    df = nullity_filter(df, filter=filter, n=n, p=p)
    df = nullity_sort(df, sort=sort, axis='rows')

    if ax is None:
        plt.figure(figsize=figsize)
        ax0 = plt.gca()
    else:
        ax0 = ax

    # Remove completely filled or completely empty variables.
    df = df.iloc[:,[i for i, n in enumerate(np.var(df.isnull(), axis='rows')) if n > 0]]

    # Create and mask the correlation matrix. Construct the base heatmap.
    corr_mat = df.isnull().corr()
    mask = np.zeros_like(corr_mat)
    mask[np.triu_indices_from(mask)] = True
github ResidentMario / missingno / missingno / missingno.py View on Github external
)
    try:
        import geoplot as gplt
    except ImportError:
        raise ImportError("Install geoplot <= 0.2.4 (the package) for geoplot function support")

    if gplt.__version__ >= "0.3.0":
        raise ImportError(
            "The missingno geoplot function requires geoplot package version 0.2.4 or lower." 
            "To use the geoplot function, downgrade to an older version of the geoplot package."
        )

    import geopandas as gpd
    from shapely.geometry import Point

    df = nullity_filter(df, filter=filter, n=n, p=p)

    nullity = df.notnull().sum(axis='columns') / df.shape[1]
    if x and y:
        gdf = gpd.GeoDataFrame(nullity, columns=['nullity'],
                               geometry=df.apply(lambda srs: Point(srs[x], srs[y]), axis='columns'))
    else:
        raise ValueError("The 'x' and 'y' parameters must be specified.")

    if by:
        if df[by].isnull().any():
            warnings.warn('The "{0}" column included null values. The offending records were dropped'.format(by))
            df = df.dropna(subset=[by])
            gdf = gdf.loc[df.index]

        vc = df[by].value_counts()
        if (vc < 3).any():
github ResidentMario / missingno / missingno / missingno.py View on Github external
return its figure.
    :return: If `inline` is False, the underlying `matplotlib.figure` object. Else, nothing.
    """
    if not figsize:
        if len(df.columns) <= 50 or orientation == 'top' or orientation == 'bottom':
            figsize = (25, 10)
        else:
            figsize = (25, (25 + len(df.columns) - 50) * 0.5)

    if ax is None:
        plt.figure(figsize=figsize)
        ax0 = plt.gca()
    else:
        ax0 = ax

    df = nullity_filter(df, filter=filter, n=n, p=p)

    # Link the hierarchical output matrix, figure out orientation, construct base dendrogram.
    x = np.transpose(df.isnull().astype(int).values)
    z = hierarchy.linkage(x, method)

    if not orientation:
        if len(df.columns) > 50:
            orientation = 'left'
        else:
            orientation = 'bottom'

    hierarchy.dendrogram(
        z,
        orientation=orientation,
        labels=df.columns.tolist(),
        distance_sort='descending',
github ResidentMario / missingno / missingno / missingno.py View on Github external
:param df: The `DataFrame` being mapped.
    :param filter: The filter to apply to the heatmap. Should be one of "top", "bottom", or None (default).
    :param n: The max number of columns to include in the filtered DataFrame.
    :param p: The max percentage fill of the columns in the filtered DataFrame.
    :param sort: The row sort order to apply. Can be "ascending", "descending", or None.
    :param figsize: The size of the figure to display.
    :param fontsize: The figure's font size. Default to 16.
    :param labels: Whether or not to display the column names. Defaults to the underlying data labels when there are
    50 columns or less, and no labels when there are more than 50 columns.
    :param sparkline: Whether or not to display the sparkline. Defaults to True.
    :param width_ratios: The ratio of the width of the matrix to the width of the sparkline. Defaults to `(15, 1)`.
    Does nothing if `sparkline=False`.
    :param color: The color of the filled columns. Default is `(0.25, 0.25, 0.25)`.
    :return: If `inline` is False, the underlying `matplotlib.figure` object. Else, nothing.
    """
    df = nullity_filter(df, filter=filter, n=n, p=p)
    df = nullity_sort(df, sort=sort, axis='columns')

    height = df.shape[0]
    width = df.shape[1]

    # z is the color-mask array, g is a NxNx3 matrix. Apply the z color-mask to set the RGB of each pixel.
    z = df.notnull().values
    g = np.zeros((height, width, 3))

    g[z < 0.5] = [1, 1, 1]
    g[z > 0.5] = color

    # Set up the matplotlib grid layout. A unary subplot if no sparkline, a left-right splot if yes sparkline.
    if ax is None:
        plt.figure(figsize=figsize)
        if sparkline:
github ResidentMario / missingno / missingno / missingno.py View on Github external
A bar chart visualization of the nullity of the given DataFrame.

    :param df: The input DataFrame.
    :param log: Whether or not to display a logorithmic plot. Defaults to False (linear).
    :param filter: The filter to apply to the heatmap. Should be one of "top", "bottom", or None (default).
    :param n: The cap on the number of columns to include in the filtered DataFrame.
    :param p: The cap on the percentage fill of the columns in the filtered DataFrame.
    :param sort: The column sort order to apply. Can be "ascending", "descending", or None.
    :param figsize: The size of the figure to display.
    :param fontsize: The figure's font size. This default to 16.
    :param labels: Whether or not to display the column names. Would need to be turned off on particularly large
    displays. Defaults to True.
    :param color: The color of the filled columns. Default to the RGB multiple `(0.25, 0.25, 0.25)`.
    :return: If `inline` is False, the underlying `matplotlib.figure` object. Else, nothing.
    """
    df = nullity_filter(df, filter=filter, n=n, p=p)
    df = nullity_sort(df, sort=sort, axis='rows')
    nullity_counts = len(df) - df.isnull().sum()

    if ax is None:
        ax1 = plt.gca()
    else:
        ax1 = ax
        figsize = None  # for behavioral consistency with other plot types, re-use the given size

    (nullity_counts / len(df)).plot.bar(
        figsize=figsize, fontsize=fontsize, log=log, color=color, ax=ax1
    )

    axes = [ax1]

    # Start appending elements, starting with a modified bottom x axis.

missingno

Missing data visualization module for Python.

MIT
Latest version published 2 years ago

Package Health Score

61 / 100
Full package analysis