scimba_torch.neural_nets.coordinates_based_nets.mlp

Multi-Layer Perceptron (MLP) architectures.

Functions

factorized_glorot_normal([mean, stddev])

Initializes parameters.

Classes

FactorizedLinear(input_dim, output_dim[, ...])

A linear transformation with factorized parameterization of the weights.

GenericMLP(in_size, out_size, **kwargs)

A general Multi-Layer Perceptron (MLP) architecture.

GenericMMLP(in_size, out_size, **kwargs)

A general Multiplicative Multi-Layer Perceptron (MMLP) architecture.

factorized_glorot_normal(mean=1.0, stddev=0.1)[source]

Initializes parameters.

Use a factorized version of the Glorot normal initialization.

Parameters:
  • mean (float) – Mean of the log-normal distribution used to scale the singular values.

  • stddev (float) – Standard deviation of the log-normal distribution.

Return type:

Callable

Returns:

A function that takes a shape tuple and returns the factorized parameters s and v.

Example

>>> init_fn = factorized_glorot_normal()
>>> s, v = init_fn((64, 128))
class FactorizedLinear(input_dim, output_dim, has_bias=True)[source]

Bases: Module

A linear transformation with factorized parameterization of the weights.

The weight matrix is expressed as the product of two factors: - s: A column-wise scaling factor. - v: A normalized weight matrix.

Parameters:
  • input_dim (int) – Size of each input sample.

  • output_dim (int) – Size of each output sample.

  • has_bias (bool) – Whether to include a bias term (default: True).

s

Column-wise scaling factors

v

Normalized weight matrix

bias

Bias vector added to the output

forward(x)[source]

Forward pass of the FactorizedLinear layer.

Parameters:

x (Tensor) – Input tensor of shape (batch_size, input_dim).

Return type:

Tensor

Returns:

Output tensor of shape (batch_size, output_dim).

class GenericMLP(in_size, out_size, **kwargs)[source]

Bases: ScimbaModule

A general Multi-Layer Perceptron (MLP) architecture.

Parameters:
  • in_size (int) – Dimension of the input

  • out_size (int) – Dimension of the output

  • **kwargs

    Additional keyword arguments:

    • activation_type (str, default=”tanh”): The activation function type to use in hidden layers.

    • activation_output (str, default=”id”): The activation function type to use in the output layer.

    • layer_sizes (list[int], default=[20]*6): A list of integers representing the number of neurons in each hidden layer.

    • weights_norm_bool (bool, default=False): If True, applies weight normalization to the layers.

    • random_fact_weights_bool (bool, default=False): If True, applies factorized weights to the layers.

Example

>>> model = GenericMLP(10, 1, activation_type='relu', layer_sizes=[64, 128, 64])
hidden_layers

A list of hidden linear layers.

output_layer

The final output linear layer.

forward(inputs, with_last_layer=True)[source]

Apply the network to the inputs.

Parameters:
  • inputs (Tensor) – Input tensor

  • with_last_layer (bool) – Whether to apply the final output layer

Return type:

Tensor

Returns:

The result of the network

expand_hidden_layers(new_layer_sizes, set_to_zero=True)[source]

Expands the hidden layers of the MLP to new sizes.

The new sizes must match the number of hidden layers in the MLP. The weights of the new layers are initialized to zero, and the weights of the old layers are copied into the new layers. The output layer is also expanded to match the new sizes.

Parameters:
  • new_layer_sizes (list[int]) – list of integers representing the new sizes of the hidden layers.

  • set_to_zero (bool) – If True, initializes the weights of the new layers to zero. Otherwise, set them to small random values.

class GenericMMLP(in_size, out_size, **kwargs)[source]

Bases: ScimbaModule

A general Multiplicative Multi-Layer Perceptron (MMLP) architecture.

As proposed by Yanfei Xiang.

Parameters:
  • in_size (int) – Dimension of the input

  • out_size (int) – Dimension of the output

  • **kwargs

    Additional keyword arguments:

    • activation_type (str, default=”tanh”): The activation function type to use in hidden layers.

    • activation_output (str, default=”id”): The activation function type to use in the output layer.

    • layer_sizes (list[int], default=[10, 20, 20, 20, 5]): A list of integers representing the number of neurons in each hidden layer.

    • weights_norm_bool (bool, default=False): If True, applies weight normalization to the layers.

    • random_fact_weights_bool (bool, default=False): If True, applies factorized weights to the layers.

Example

>>> model = GenericMMLP(
...     10, 5, activation_type='relu', layer_sizes=[64, 128, 64]
... )
hidden_layers

A list of hidden linear layers.

hidden_layers_mult

A list of multiplicative linear layers.

output_layer

The final output linear layer.

forward(inputs, with_last_layer=True)[source]

Apply the network to the inputs.

Parameters:
  • inputs (Tensor) – Input tensor

  • with_last_layer (bool) – Whether to apply the final output layer (default: True)

Return type:

Tensor

Returns:

The result of the network