enerzyme.tasks.optimizer.get_optimizer#

enerzyme.tasks.optimizer.get_optimizer(name: Literal['Adam', 'AdamW', 'CoRe', 'Muon'], model: Module, hyper_params: Dict[str, Any]) → Optimizer[source]#

Get an ready-to-use optimizer for a model given the optimizer string and hyperparameters.

Args:#

name: str

The name of the optimizer. Now it supports the following optimizers:

Adam:: Pytorch implementation of Adam.
AdamW:: Pytorch implementation of AdamW.
CoRe:: CoRe optimizer [1]. It has been proven effective for lifelong learning of NNPs [2].
Muon:: Muon optimizer [3] for hidden weights and auxiliary AdamW optimizer for the rest. It has been proven effective for fast training convergence and final accuracy of NNPs [4].

model: torch.nn.Module

The model to optimize.

Now Muon optimizer only supports the following internal models: PhysNet, SpookyNet, AlphaNet, MACE, and SchNet.

hyper_params: dict

The hyperparameters for the optimizer, depending on the optimizer name.

Adam:

lr: float, default 1e-3: Learning rate.
betas: tuple, default (0.9, 0.999): Coefficients used for computing running averages of gradient and its square.
eps: float, default 1e-6: Term added to the denominator to improve numerical stability.
weight_decay: float, default 0.0: Weight decay (L2 penalty).
amsgrad: bool, default True: Whether to use the AMSGrad variant of Adam.

AdamW:

lr: float, default 1e-3: Learning rate.
betas: tuple, default (0.9, 0.999): Coefficients used for computing running averages of gradient and its square.
eps: float, default 1e-6: Term added to the denominator to improve numerical stability.
weight_decay: float, default 0.0: Weight decay (L2 penalty).
amsgrad: bool, default True: Whether to use the AMSGrad variant of Adam.

CoRe:

The default hyperparamters are from its application for NNP training [2].

learning_rate: float, default 1e-3: Learning rate.
step_sizes: tuple, default (1e-6, 1.0): Step sizes for the optimizer.
etas: tuple, default (0.5, 1.2): \(\eta^-\) and \(\eta^+\) in the paper [1].
betas: tuple, default (0.45, 0.725, 500, 0.999): \(\beta_1^{\mathrm{a}}\), \(\beta_1^{\mathrm{b}}\), \(\beta_1^{\mathrm{c}}\), \(\beta_2\) in the paper [1].
eps: float, default 1e-8: Term added to the denominator to improve numerical stability.
weight_decay: float, default 0.1: Weight decay (L2 penalty).
score_history: int, default 500: \(t_{\mathrm{hist}}\) in the paper [1].
frozen: float, default 0.1: Fraction of parameters to compute the \(n_{\mathrm{frozen}}\) in the paper [1].

Muon:

The usage and hyperparameters are from KellerJordan/Muon

muon_learning_rate: float, default 1e-2: Learning rate of Muon optimizer. If not provided but with learning_rate provided, use the learning rate.
muon_weight_decay: float, default 0.01: Weight decay of the muon optimizer. If not provided but with weight_decay provided, use the weight_decay.
momentum: float, default 0.95: Momentum of the muon optimizer.
aux_learning_rate: float, default 3e-4: Learning rate of the auxiliary AdamW optimizer. If not provided but with learning_rate provided, use the learning_rate.
aux_weight_decay: float, default 0.0: Weight decay of the auxiliary AdamW optimizer. If not provided but with weight_decay provided, use the weight_decay.
betas: tuple, default (0.9, 0.95): Coefficients of the auxiliary AdamW optimizer used for computing running averages of gradient and its square.
eps: float, default 1e-10: Term added to the denominator to improve numerical stability of the auxiliary AdamW optimizer.

Returns:#

optimizer: torch.optim.Optimizer: The optimizer for the model.

Raises:#

KeyError:: If the optimizer string is not supported.
TypeError:: If the model is not supported by the optimizer.
ImportError:: If the optimizer is not in Pytorch and the dependency is not installed.

Tip

To install the dependencies:

CoRe:: pip install core-optimizer
Muon:: pip install muon-optimizer