enerzyme.tasks.optimizer.get_optimizer#
- enerzyme.tasks.optimizer.get_optimizer(name: Literal['Adam', 'AdamW', 'CoRe', 'Muon'], model: Module, hyper_params: Dict[str, Any]) Optimizer[source]#
Get an ready-to-use optimizer for a model given the optimizer string and hyperparameters.
Args:#
- name: str
The name of the optimizer. Now it supports the following optimizers:
- Adam:
Pytorch implementation of Adam.
- AdamW:
Pytorch implementation of AdamW.
- CoRe:
CoRe optimizer [1]. It has been proven effective for lifelong learning of NNPs [2].
- Muon:
Muon optimizer [3] for hidden weights and auxiliary AdamW optimizer for the rest. It has been proven effective for fast training convergence and final accuracy of NNPs [4].
- model: torch.nn.Module
The model to optimize.
Now Muon optimizer only supports the following internal models: PhysNet, SpookyNet, LEFTNet, MACE, and SchNet.
- hyper_params: dict
The hyperparameters for the optimizer, depending on the optimizer name.
- Adam:
- lr: float, default 1e-3
Learning rate.
- betas: tuple, default (0.9, 0.999)
Coefficients used for computing running averages of gradient and its square.
- eps: float, default 1e-6
Term added to the denominator to improve numerical stability.
- weight_decay: float, default 0.0
Weight decay (L2 penalty).
- amsgrad: bool, default True
Whether to use the AMSGrad variant of Adam.
- AdamW:
- lr: float, default 1e-3
Learning rate.
- betas: tuple, default (0.9, 0.999)
Coefficients used for computing running averages of gradient and its square.
- eps: float, default 1e-6
Term added to the denominator to improve numerical stability.
- weight_decay: float, default 0.0
Weight decay (L2 penalty).
- amsgrad: bool, default True
Whether to use the AMSGrad variant of Adam.
- CoRe:
The default hyperparamters are from its application for NNP training [2].
- learning_rate: float, default 1e-3
Learning rate.
- step_sizes: tuple, default (1e-6, 1.0)
Step sizes for the optimizer.
- etas: tuple, default (0.5, 1.2)
\(\eta^-\) and \(\eta^+\) in the paper [1].
- betas: tuple, default (0.45, 0.725, 500, 0.999)
\(\beta_1^{\mathrm{a}}\), \(\beta_1^{\mathrm{b}}\), \(\beta_1^{\mathrm{c}}\), \(\beta_2\) in the paper [1].
- eps: float, default 1e-8
Term added to the denominator to improve numerical stability.
- weight_decay: float, default 0.1
Weight decay (L2 penalty).
- score_history: int, default 500
\(t_{\mathrm{hist}}\) in the paper [1].
- frozen: float, default 0.1
Fraction of parameters to compute the \(n_{\mathrm{frozen}}\) in the paper [1].
- Muon:
The usage and hyperparameters are from KellerJordan/Muon
- muon_learning_rate: float, default 1e-2
Learning rate of Muon optimizer. If not provided but with learning_rate provided, use the learning rate.
- muon_weight_decay: float, default 0.01
Weight decay of the muon optimizer. If not provided but with weight_decay provided, use the weight_decay.
- momentum: float, default 0.95
Momentum of the muon optimizer.
- aux_learning_rate: float, default 3e-4
Learning rate of the auxiliary AdamW optimizer. If not provided but with learning_rate provided, use the learning_rate.
- aux_weight_decay: float, default 0.0
Weight decay of the auxiliary AdamW optimizer. If not provided but with weight_decay provided, use the weight_decay.
- betas: tuple, default (0.9, 0.95)
Coefficients of the auxiliary AdamW optimizer used for computing running averages of gradient and its square.
- eps: float, default 1e-10
Term added to the denominator to improve numerical stability of the auxiliary AdamW optimizer.
Returns:#
- optimizer: torch.optim.Optimizer
The optimizer for the model.
Raises:#
- KeyError:
If the optimizer string is not supported.
- TypeError:
If the model is not supported by the optimizer.
- ImportError:
If the optimizer is not in Pytorch and the dependency is not installed.
Tip
To install the dependencies:
- CoRe:
pip install core-optimizer- Muon:
pip install muon-optimizer