scimba_torch.optimizers.optimizers_data

A module to handle several optimizers.

Examples: Optimizers usage

import math

from scimba_torch.optimizers.optimizers_data import OptimizerData
from scimba_torch.optimizers.scimba_optimizers import ScimbaMomentum

opt_1 = {
    "name": "adam",
    "optimizer_args": {"lr": 1e-3, "betas": (0.9, 0.999)},
}

opt_2 = {"class": ScimbaMomentum, "switch_at_epoch": 500}

opt_3 = {
    "name": "lbfgs",
    "switch_at_epoch_ratio": 0.7,
    "switch_at_plateau": [500, 20],
    "switch_at_plateau_ratio": 3.0,
}

optimizers = OptimizerData(opt_1, opt_2, opt_3)

print("optimizers: ", optimizers)

class SimpleNN(torch.nn.Module):

    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc1 = torch.nn.Linear(10, 1)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return self.fc1(x)

net = SimpleNN()

opt_1 = {"name": "adam", "optimizer_args": {"lr": 1e-3, "betas": (0.9, 0.999)}}
opt_2 = {
    "name": "adam",
    "optimizer_args": {"lrTEST": 1e-3, "betasTEST": (0.9, 0.999)},
}  # wrong list of arguments

try:
    optimizers2 = OptimizerData(opt_1, opt_2)
    optimizers2.activate_first_optimizer(list(net.parameters()))
except TypeError as error:
    print(error)

input_tensor = torch.randn(10000, 10)  # 10000 samples, each of size 10
target_tensor = torch.sum(input_tensor, dim=1)[
    :, None
]  # Target tensor with batch size 10000

loss = [torch.tensor(float("+inf"))]
loss_func = torch.nn.MSELoss()

opt = optimizers
opt.activate_first_optimizer(list(net.parameters()))

def closure():
    opt.zero_grad()
    output_tensor = net(input_tensor)
    loss[0] = loss_func(output_tensor, target_tensor)
    loss[0].backward(retain_graph=True)
    return loss[0].item()

init_loss = closure()

grads = opt.get_opt_gradients()
print("get_opt_gradients: ", grads)

loss_history = [init_loss]
best_loss = init_loss
best_net = copy.deepcopy(net.state_dict())

epochs = 1000
for epoch in range(epochs):
    opt.step(closure)

    if math.isinf(loss[0].item()) or math.isnan(loss[0].item()):
        loss[0] = torch.tensor(best_loss)
        net.load_state_dict(best_net)

    if loss[0].item() < best_loss:
        best_loss = loss[0].item()
        best_net = copy.deepcopy(net.state_dict())
        opt.update_best_optimizer()

    loss_history.append(loss[0].item())

    if epoch % 100 == 0:
        print("epoch: ", epoch, "loss: ", loss[0].item())

    if opt.test_activate_next_optimizer(
        loss_history, loss[0].item(), init_loss, epoch, epochs
    ):
        print("activate next opt! epoch = ", epoch)
        opt.activate_next_optimizer(list(net.parameters()))

    opt.test_and_activate_next_optimizer(
        list(net.parameters()),
        loss_history,
        loss[0].item(),
        init_loss,
        epoch,
        epochs,
    )

net.load_state_dict(best_net)
closure()
print("loss after training: ", loss[0].item())
print("net( torch.ones( 10 ) ) : ", net(torch.ones(10)))

grads = opt.get_opt_gradients()
print("get_opt_gradients: ", grads)

Classes

OptimizerData(*args, **kwargs)

A class to manage multiple optimizers and their activation criteria.

class OptimizerData(*args, **kwargs)[source]

Bases: object

A class to manage multiple optimizers and their activation criteria.

Parameters:
  • *args (dict[str, Any]) –

    Variable length argument list of optimizer configurations.

    Input dictionary must have one of the form:

    { “class”: value (a subclass of AbstractScimbaOptimizer), keys: value }

    { “name”: value (either “adam” or “lbfgs”), keys: value }

    where pairs keys value can be:

    ”optimizer_args”: a dictionary of arguments for the optimizer,

    ”scheduler”: a subclass of torch.optim.lr_scheduler.LRScheduler,

    ”scheduler_args: a dictionary of arguments for the scheduler

    ”switch_at_epoch”: a bool or an int, default false, if true then default value 5000 is used

    ”switch_at_epoch_ratio”: a bool or a float, default 0.7, if true then default value is used

    ”switch_at_plateau”: a bool or a tuple of two int, default False, if True then default (50, 10) is used

    ”switch_at_plateau_ratio”: a float r, default value 500.; triggers the plateau tests if current_loss < init_loss/r

  • **kwargs – Arbitrary keyword arguments.

Examples

>>> from scimba_torch.optimizers.scimba_optimizers\
... import ScimbaMomentum
>>> opt_1 = {\
... "name": "adam",\
... "optimizer_args": {"lr": 1e-3, "betas": (0.9, 0.999)},\
... }
>>> opt_2 = {"class": ScimbaMomentum, "switch_at_epoch": 500}
>>> opt_3 = {\
... "name": "lbfgs",\
... "switch_at_epoch_ratio": 0.7,\
... "switch_at_plateau": [500, 20],\
... "switch_at_plateau_ratio": 3.0,\
... }
>>> optimizers = OptimizerData(opt_1, opt_2, opt_3)
activated_optimizer: list[AbstractScimbaOptimizer]

A list containing the current optimizer; empty if none.

optimizers: list[dict[str, Any]]

List of optimizers.

next_optimizer: int

Index of the next optimizer to be activated.

step(closure)[source]

Performs an optimization step using the currently activated optimizer.

Parameters:

closure (Callable[[], float]) – A closure that reevaluates the model and returns the loss.

Return type:

None

set_lr(lr)[source]

Set learning rate of activated optimizer.

Parameters:

lr (float) – The new learning rate.

Return type:

None

zero_grad()[source]

Zeros the gradients of the currently activated optimizer.

Return type:

None

test_activate_next_optimizer(loss_history, loss_value, init_loss, epoch, epochs)[source]

Tests whether the next opt. should be activated based on the given criteria.

Parameters:
  • loss_history (list[float]) – History of loss values.

  • loss_value (float) – Current loss value.

  • init_loss (float) – Initial loss value.

  • epoch (int) – Current epoch.

  • epochs (int) – Total number of epochs.

Return type:

bool

Returns:

True if the next optimizer should be activated, False otherwise.

activate_next_optimizer(parameters, verbose=False)[source]

Activates the next optimizer in the list.

Parameters:
  • parameters (Union[Iterable[Tensor], Iterable[dict[str, Any]], Iterable[tuple[str, Tensor]]]) – Parameters to be optimized.

  • verbose (bool) – whether to print activation message or not.

Return type:

None

activate_first_optimizer(parameters, verbose=False)[source]

Activates the first optimizer in the list.

Parameters:
  • parameters (Union[Iterable[Tensor], Iterable[dict[str, Any]], Iterable[tuple[str, Tensor]]]) – Parameters to be optimized.

  • verbose (bool) – whether to print activation message or not.

Return type:

None

test_and_activate_next_optimizer(parameters, loss_history, loss_value, init_loss, epoch, epochs)[source]

Tests whether next optimizer should be activated; activates it.

Parameters:
  • parameters (Union[Iterable[Tensor], Iterable[dict[str, Any]], Iterable[tuple[str, Tensor]]]) – Parameters to be optimized.

  • loss_history (list[float]) – History of loss values.

  • loss_value (float) – Current loss value.

  • init_loss (float) – Initial loss value.

  • epoch (int) – Current epoch.

  • epochs (int) – Total number of epochs.

Return type:

None

get_opt_gradients()[source]

Gets the gradients of the currently activated optimizer.

Return type:

Tensor

Returns:

Flattened tensor of gradients.

update_best_optimizer()[source]

Updates the best state of the currently activated optimizer.

Return type:

None

dict_for_save()[source]

Returns a dictionary containing the best state of the current optimizer.

Return type:

dict

Returns:

dictionary containing the best state of the optimizer.

load_from_dict(parameters, checkpoint)[source]

Loads the optimizer and scheduler states from a checkpoint.

Parameters:
  • parameters (Union[Iterable[Tensor], Iterable[dict[str, Any]], Iterable[tuple[str, Tensor]]]) – Parameters to be optimized.

  • checkpoint (dict) – dictionary containing the optimizer and scheduler states.

Raises:

ValueError – when there is no active optimizer to load in.

Return type:

None