How to use the torchelastic.metrics.initialize_metrics function in torchelastic

To help you get started, we’ve selected a few torchelastic examples, based on popular ways it is used in public projects.

Secure your code as it's written. Use Snyk Code to scan source code in minutes - no build needed - and fix issues immediately.

github pytorch / elastic / torchelastic / p2p / coordinator_p2p.py View on Github external
self.c10d_backend = c10d_backend
        self.init_method = init_method
        self.rendezvous = dist.rendezvous(init_method)
        assert isinstance(
            self.rendezvous, RendezvousHandler
        ), "CoordinatorP2P requires a torchelastic.rendezvous.RendezvousHandler"

        self.max_num_trainers = max_num_trainers
        self.process_group_timeout = process_group_timeout
        self.rank = -1
        self.world_size = 0
        self.is_worker_straggler = False
        self.stop_training = False
        self.coordinator_process_group = None
        self.monitor_progress_step = 0
        metrics.initialize_metrics()