enerzyme.tasks.optimizer.get_optimizer#

enerzyme.tasks.optimizer.get_optimizer(name: Literal['Adam', 'AdamW', 'CoRe', 'Muon'], model: Module, hyper_params: Dict[str, Any]) Optimizer[source]#

Get an ready-to-use optimizer for a model given the optimizer string and hyperparameters.

Args:#

name: str

The name of the optimizer. Now it supports the following optimizers:

Adam:

Pytorch implementation of Adam.

AdamW:

Pytorch implementation of AdamW.

CoRe:

CoRe optimizer [1]. It has been proven effective for lifelong learning of NNPs [2].

Muon:

Muon optimizer [3] for hidden weights and auxiliary AdamW optimizer for the rest. It has been proven effective for fast training convergence and final accuracy of NNPs [4].

model: torch.nn.Module

The model to optimize.

Now Muon optimizer only supports the following internal models: PhysNet, SpookyNet, LEFTNet, MACE, and SchNet.

hyper_params: dict

The hyperparameters for the optimizer, depending on the optimizer name.

Adam:
lr: float, default 1e-3

Learning rate.

betas: tuple, default (0.9, 0.999)

Coefficients used for computing running averages of gradient and its square.

eps: float, default 1e-6

Term added to the denominator to improve numerical stability.

weight_decay: float, default 0.0

Weight decay (L2 penalty).

amsgrad: bool, default True

Whether to use the AMSGrad variant of Adam.

AdamW:
lr: float, default 1e-3

Learning rate.

betas: tuple, default (0.9, 0.999)

Coefficients used for computing running averages of gradient and its square.

eps: float, default 1e-6

Term added to the denominator to improve numerical stability.

weight_decay: float, default 0.0

Weight decay (L2 penalty).

amsgrad: bool, default True

Whether to use the AMSGrad variant of Adam.

CoRe:

The default hyperparamters are from its application for NNP training [2].

learning_rate: float, default 1e-3

Learning rate.

step_sizes: tuple, default (1e-6, 1.0)

Step sizes for the optimizer.

etas: tuple, default (0.5, 1.2)

\(\eta^-\) and \(\eta^+\) in the paper [1].

betas: tuple, default (0.45, 0.725, 500, 0.999)

\(\beta_1^{\mathrm{a}}\), \(\beta_1^{\mathrm{b}}\), \(\beta_1^{\mathrm{c}}\), \(\beta_2\) in the paper [1].

eps: float, default 1e-8

Term added to the denominator to improve numerical stability.

weight_decay: float, default 0.1

Weight decay (L2 penalty).

score_history: int, default 500

\(t_{\mathrm{hist}}\) in the paper [1].

frozen: float, default 0.1

Fraction of parameters to compute the \(n_{\mathrm{frozen}}\) in the paper [1].

Muon:

The usage and hyperparameters are from KellerJordan/Muon

muon_learning_rate: float, default 1e-2

Learning rate of Muon optimizer. If not provided but with learning_rate provided, use the learning rate.

muon_weight_decay: float, default 0.01

Weight decay of the muon optimizer. If not provided but with weight_decay provided, use the weight_decay.

momentum: float, default 0.95

Momentum of the muon optimizer.

aux_learning_rate: float, default 3e-4

Learning rate of the auxiliary AdamW optimizer. If not provided but with learning_rate provided, use the learning_rate.

aux_weight_decay: float, default 0.0

Weight decay of the auxiliary AdamW optimizer. If not provided but with weight_decay provided, use the weight_decay.

betas: tuple, default (0.9, 0.95)

Coefficients of the auxiliary AdamW optimizer used for computing running averages of gradient and its square.

eps: float, default 1e-10

Term added to the denominator to improve numerical stability of the auxiliary AdamW optimizer.

Returns:#

optimizer: torch.optim.Optimizer

The optimizer for the model.

Raises:#

KeyError:

If the optimizer string is not supported.

TypeError:

If the model is not supported by the optimizer.

ImportError:

If the optimizer is not in Pytorch and the dependency is not installed.

Tip

To install the dependencies:

CoRe:

pip install core-optimizer

Muon:

pip install muon-optimizer