CheckpointHook¶
- class mmengine.hooks.CheckpointHook(interval=-1, by_epoch=True, save_optimizer=True, save_param_scheduler=True, out_dir=None, max_keep_ckpts=-1, save_last=True, save_best=None, rule=None, greater_keys=None, less_keys=None, file_client_args=None, filename_tmpl=None, backend_args=None, published_keys=None, save_begin=0, **kwargs)[source]¶
Save checkpoints periodically.
- Parameters:
interval (int) – The saving period. If
by_epoch=True, interval indicates epochs, otherwise it indicates iterations. Defaults to -1, which means “never”.by_epoch (bool) – Saving checkpoints by epoch or by iteration. Defaults to True.
save_optimizer (bool) – Whether to save optimizer state_dict in the checkpoint. It is usually used for resuming experiments. Defaults to True.
save_param_scheduler (bool) – Whether to save param_scheduler state_dict in the checkpoint. It is usually used for resuming experiments. Defaults to True.
out_dir (str, Path, Optional) – The root directory to save checkpoints. If not specified,
runner.work_dirwill be used by default. If specified, theout_dirwill be the concatenation ofout_dirand the last level directory ofrunner.work_dir. For example, if the inputour_diris./tmpandrunner.work_diris./work_dir/cur_exp, then the ckpt will be saved in./tmp/cur_exp. Defaults to None.max_keep_ckpts (int) – The maximum checkpoints to keep. In some cases we want only the latest few checkpoints and would like to delete old ones to save the disk space. Defaults to -1, which means unlimited.
save_last (bool) – Whether to force the last checkpoint to be saved regardless of interval. Defaults to True.
save_best (str, List[str], optional) – If a metric is specified, it would measure the best checkpoint during evaluation. If a list of metrics is passed, it would measure a group of best checkpoints corresponding to the passed metrics. The information about best checkpoint(s) would be saved in
runner.message_hubto keep best score value and best checkpoint path, which will be also loaded when resuming checkpoint. Options are the evaluation metrics on the test dataset. e.g.,bbox_mAP,segm_mAPfor bbox detection and instance segmentation.AR@100for proposal recall. Ifsave_bestisauto, the first key of the returnedOrderedDictresult will be used. Defaults to None.rule (str, List[str], optional) – Comparison rule for best score. If set to None, it will infer a reasonable rule. Keys such as ‘acc’, ‘top’ .etc will be inferred by ‘greater’ rule. Keys contain ‘loss’ will be inferred by ‘less’ rule. If
save_bestis a list of metrics andruleis a str, all metrics insave_bestwill share the comparison rule. Ifsave_bestandruleare both lists, their length must be the same, and metrics insave_bestwill use the corresponding comparison rule inrule. Options are ‘greater’, ‘less’, None and list which contains ‘greater’ and ‘less’. Defaults to None.greater_keys (List[str], optional) – Metric keys that will be inferred by ‘greater’ comparison rule. If
None, _default_greater_keys will be used. Defaults to None.less_keys (List[str], optional) – Metric keys that will be inferred by ‘less’ comparison rule. If
None, _default_less_keys will be used. Defaults to None.file_client_args (dict, optional) – Arguments to instantiate a FileClient. See
mmengine.fileio.FileClientfor details. Defaults to None. It will be deprecated in future. Please usebackend_argsinstead.filename_tmpl (str, optional) – String template to indicate checkpoint name. If specified, must contain one and only one “{}”, which will be replaced with
epoch + 1ifby_epoch=Trueelseiteration + 1. Defaults to None, which means “epoch_{}.pth” or “iter_{}.pth” accordingly.backend_args (dict, optional) – Arguments to instantiate the prefix of uri corresponding backend. Defaults to None. New in version 0.2.0.
published_keys (str, List[str], optional) – If
save_lastisTrueorsave_bestis notNone, it will automatically publish model with keys in the list after training. Defaults to None. New in version 0.7.1.save_begin (int) – Control the epoch number or iteration number at which checkpoint saving begins. Defaults to 0, which means saving at the beginning. New in version 0.8.3.
Examples
>>> # Save best based on single metric >>> CheckpointHook(interval=2, by_epoch=True, save_best='acc', >>> rule='less') >>> # Save best based on multi metrics with the same comparison rule >>> CheckpointHook(interval=2, by_epoch=True, >>> save_best=['acc', 'mIoU'], rule='greater') >>> # Save best based on multi metrics with different comparison rule >>> CheckpointHook(interval=2, by_epoch=True, >>> save_best=['FID', 'IS'], rule=['less', 'greater']) >>> # Save best based on single metric and publish model after training >>> CheckpointHook(interval=2, by_epoch=True, save_best='acc', >>> rule='less', published_keys=['meta', 'state_dict'])
- after_train(runner)[source]¶
Publish the checkpoint after training.
- Parameters:
runner (Runner) – The runner of the training process.
- Return type:
None
- after_train_epoch(runner)[source]¶
Save the checkpoint and synchronize buffers after each epoch.
- Parameters:
runner (Runner) – The runner of the training process.
- Return type:
None
- after_train_iter(runner, batch_idx, data_batch=None, outputs=typing.Optional[dict])[source]¶
Save the checkpoint and synchronize buffers after each iteration.