How to use the hdbscan.all_points_membership_vectors function in hdbscan

To help you get started, we’ve selected a few hdbscan examples, based on popular ways it is used in public projects.

Secure your code as it's written. Use Snyk Code to scan source code in minutes - no build needed - and fix issues immediately.

github nestauk / nesta / nesta / core / tasks / projects / ai_diversity / doc2cluster.py View on Github external
# Fetch all document embeddings
            papers = s.query(ArticleVector.article_id, ArticleVector.vector)

            # Unroll abstracts and paper IDs
            self.ids, self.embeddings = zip(*papers)

            # Fit HDBSCAN
            clusterer = hdbscan.HDBSCAN(
                min_cluster_size=self.min_cluster_size,
                min_samples=self.min_samples,
                prediction_data=True,
            ).fit(self.embeddings)

            # Assign soft clusters to embeddings
            self.soft_clusters = hdbscan.all_points_membership_vectors(clusterer)

            # Store clusterer in S3
            store_on_s3(clusterer, self.s3_bucket, self.clusterer_name)
        else:
            logging.info("Loading fitted HDBSCAN from S3.")
            # Load clusterer from S3
            clusterer = load_from_s3(self.s3_bucket, self.clusterer_name)

            # Predict soft labels
            self.soft_clusters = hdbscan.prediction.membership_vector(
                clusterer, np.array(self.embeddings)
            )

        # Group arXiv paper IDs with clusters
        id_clusters_mapping = self._create_mappings(
            self.ids, self.soft_clusters, "clusters"

hdbscan

Clustering based on density with variable density clusters

BSD-3-Clause
Latest version published 1 month ago

Package Health Score

91 / 100
Full package analysis