Einsum Networks (Einet)¶

Einsum Networks (Einets) are a scalable class of probabilistic circuits that use Einstein summation notation (einsum) to implement efficient sum-product operations in parallel.

Reference¶

Einets are described in the ICML 2020 paper:

Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic Circuits

Overview¶

Einet provides a scalable architecture for Sum-Product Networks using EinsumLayer or LinsumLayer for efficient batched computations. These layers combine product and sum operations into single efficient einsum operations.

Key Characteristics:¶

Efficient batched computations: Leverage PyTorch’s optimized einsum implementation.
Scalable deep architecture: Supports deep stacks of einsum/linsum layers.
Fast inference and sampling: Optimized for high-throughput probabilistic modeling.

Implementation¶

The Einet implementation in SPFlow provides a high-level spflow.zoo.einet.Einet module.

class spflow.zoo.einet.Einet(leaf_modules, num_classes=1, num_sums=10, num_leaves=10, depth=1, num_repetitions=5, layer_type='linsum', structure='top-down')[source]¶

Bases: Module, Classifier

Einsum Network (Einet) for scalable deep probabilistic modeling.

Einet uses efficient einsum-based layers (EinsumLayer or LinsumLayer) to combine product and sum operations, enabling faster training and inference compared to traditional RAT-SPNs.

leaf_modules¶

Leaf distribution modules.

Type:: list[LeafModule]

num_classes¶

Number of output classes (root sum nodes).

Type:: int

num_sums¶

Number of sum nodes per intermediate layer.

Type:: int

num_leaves¶

Number of leaf distribution components.

Type:: int

depth¶

Number of einsum layers.

Type:: int

num_repetitions¶

Number of parallel circuit repetitions.

Type:: int

layer_type¶

Type of intermediate layer (“einsum” or “linsum”).

Type:: str

structure¶

Structure building mode (“top-down” or “bottom-up”).

Type:: str

Reference:: Peharz, R., et al. (2020). “Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic Circuits.” ICML 2020.

__init__(leaf_modules, num_classes=1, num_sums=10, num_leaves=10, depth=1, num_repetitions=5, layer_type='linsum', structure='top-down')[source]¶

Initialize Einet with specified architecture parameters.

Parameters:

leaf_modules (list[LeafModule]) – Leaf distribution modules forming the base layer.
num_classes (int) – Number of root sum nodes (classes). Defaults to 1.
num_sums (int) – Number of sum nodes per intermediate layer. Defaults to 10.
num_leaves (int) – Number of leaf distribution components. Defaults to 10.
depth (int) – Number of einsum layers. Defaults to 1.
num_repetitions (int) – Number of parallel circuit repetitions. Defaults to 5.
layer_type (Literal['einsum', 'linsum']) – Type of intermediate layer (“einsum” or “linsum”). Defaults to “linsum”.
structure (Literal['top-down', 'bottom-up']) – Structure building mode (“top-down” or “bottom-up”). Defaults to “top-down”.

Raises:

InvalidParameterError – If architectural parameters are invalid.

log_likelihood(data, cache=None)[source]¶

Compute log-likelihood for input data.

Parameters:

data (Tensor) – Input data tensor of shape (batch_size, num_features).
cache (Cache | None) – Optional cache for intermediate results.

Return type:

Tensor

Returns:

Log-likelihood tensor of shape (batch_size, 1, num_classes, 1).

log_posterior(data, cache=None)[source]¶

Compute log-posterior probabilities for multi-class models.

Parameters:

data (Tensor) – Input data tensor.
cache (Cache | None) – Optional cache for intermediate results.

Return type:

Tensor

Returns:

Log-posterior probabilities of shape (batch_size, num_classes).

Raises:

UnsupportedOperationError – If model has only one class.

marginalize(marg_rvs, prune=True, cache=None)[source]¶

Marginalize out specified random variables.

Parameters:

marg_rvs (list[int]) – Random variable indices to marginalize.
prune (bool) – Whether to prune redundant modules.
cache (Cache | None) – Optional cache.

Return type:

Module | None

Returns:

Marginalized module or None if fully marginalized.

predict_proba(data)[source]¶

Predict class probabilities.

Parameters:: data (Tensor) – Input data tensor.
Return type:: Tensor
Returns:: Class probabilities of shape (batch_size, num_classes).

sample(num_samples=None, data=None, is_mpe=False, cache=None, return_leaf_params=False)[source]¶

Generate samples from the Einet.

Parameters:

num_samples (int | None) – Number of samples to generate.
data (Tensor | None) – Optional data tensor with NaN values to impute.
is_mpe (bool) – Whether to perform MPE (most probable explanation).
cache (Cache | None) – Optional cache for intermediate results.
return_leaf_params (bool) – Whether to return leaf-parameter records from sampling.

Return type:

Tensor | tuple[Tensor, list[LeafParamRecord]]

Returns:

Sampled tensor.

property feature_to_scope: ndarray¶: Mapping from output features to their scopes.

property n_out: int¶: Number of output nodes.

property scopes_out: list[Scope]¶: Output scopes.

Layers¶

class spflow.modules.einsum.EinsumLayer(inputs, out_channels, num_repetitions=None, weights=None, split_mode=None)[source]¶

Bases: Module

EinsumLayer combining product and sum operations efficiently.

Implements sum(product(x)) using einsum for circuits with arbitrary tree structure. Takes pairs of adjacent features as left/right children, computes their cross-product over channels, and sums with learned weights.

The LogEinsumExp trick is used for numerical stability in log-space. For large channel grids, the weighted channel contraction is factorized into two smaller contractions to reduce bandwidth pressure without changing outputs.

logits¶

Unnormalized log-weights for gradient optimization.

Type:: Parameter

unraveled_channel_indices¶

Mapping from flat to (i,j) channel pairs.

Type:: Tensor

__init__(inputs, out_channels, num_repetitions=None, weights=None, split_mode=None)[source]¶

Initialize EinsumLayer.

Parameters:

inputs (Module | list[Module]) – Either a single module (features will be split into pairs) or a list of exactly two modules (left and right children).
out_channels (int) – Number of output sum nodes per feature.
num_repetitions (int | None) – Number of repetitions. If None, inferred from inputs.
weights (Tensor | None) – Optional initial weights tensor. If provided, must have shape (out_features, out_channels, num_repetitions, left_channels, right_channels).
split_mode (SplitMode | None) – Optional split configuration for single input mode. Use SplitMode.consecutive() or SplitMode.interleaved(). Defaults to SplitMode.consecutive(num_splits=2) if not specified.

Raises:

ValueError – If inputs invalid, out_channels < 1, or weight shape mismatch.

log_likelihood(data, cache=None)[source]¶

Compute log-likelihood using LogEinsumExp trick.

Parameters:

data (Tensor) – Input data of shape (batch_size, num_features).
cache (Cache | None) – Optional cache for intermediate results.

Return type:

Tensor

Returns:

Log-likelihood tensor of shape (batch, out_features, out_channels, reps).

marginalize(marg_rvs, prune=True, cache=None)[source]¶

Marginalize out specified random variables.

Parameters:

marg_rvs (list[int]) – Random variable indices to marginalize.
prune (bool) – Whether to prune unnecessary modules.
cache (Cache | None) – Cache for memoization.

Return type:

Optional[‘EinsumLayer’ | Module]

Returns:

Marginalized module or None if fully marginalized.

property feature_to_scope: ndarray¶: Mapping from output features to their scopes.

property log_weights: Tensor¶: Log-normalized weights (sum to 1 over input channel pairs).

property weights: Tensor¶: Normalized weights (sum to 1 over input channel pairs).

class spflow.modules.einsum.LinsumLayer(inputs, out_channels, num_repetitions=None, weights=None, split_mode=None)[source]¶

Bases: Module

LinsumLayer combining product and sum operations with linear channel combination.

Unlike EinsumLayer which computes cross-product over channels (I × J combinations), LinsumLayer computes a linear combination: pairs left/right features, adds them (product in log-space), then sums over input channels with learned weights.

This results in fewer parameters: weight_shape = (D_out, O, R, C) vs EinsumLayer’s (D_out, O, R, I, J).

logits¶

Unnormalized log-weights for gradient optimization.

Type:: Parameter

__init__(inputs, out_channels, num_repetitions=None, weights=None, split_mode=None)[source]¶

Initialize LinsumLayer.

Parameters:

inputs (Module | list[Module]) – Either a single module (features will be split into pairs) or a list of exactly two modules (left and right children). Unlike EinsumLayer, both inputs must have the same number of channels.
out_channels (int) – Number of output sum nodes per feature.
num_repetitions (int | None) – Number of repetitions. If None, inferred from inputs.
weights (Tensor | None) – Optional initial weights tensor. If provided, must have shape (out_features, out_channels, num_repetitions, in_channels).
split_mode (SplitMode | None) – Optional split configuration for single input mode. Use SplitMode.consecutive() or SplitMode.interleaved(). Defaults to SplitMode.consecutive(num_splits=2) if not specified.

Raises:

ValueError – If inputs invalid, out_channels < 1, or weight shape mismatch.

log_likelihood(data, cache=None)[source]¶

Compute log-likelihood using linear sum over channels.

Unlike EinsumLayer which computes cross-product (I × J), this computes a linear combination: add left+right (product), then logsumexp over channels.

Parameters:

data (Tensor) – Input data of shape (batch_size, num_features).
cache (Cache | None) – Optional cache for intermediate results.

Return type:

Tensor

Returns:

Log-likelihood tensor of shape (batch, out_features, out_channels, reps).

marginalize(marg_rvs, prune=True, cache=None)[source]¶

Marginalize out specified random variables.

Parameters:

marg_rvs (list[int]) – Random variable indices to marginalize.
prune (bool) – Whether to prune unnecessary modules.
cache (Cache | None) – Cache for memoization.

Return type:

Optional[‘LinsumLayer’ | Module]

Returns:

Marginalized module or None if fully marginalized.

property feature_to_scope: ndarray¶: Mapping from output features to their scopes.

property log_weights: Tensor¶: Log-normalized weights (sum to 1 over input channels).

property weights: Tensor¶: Normalized weights (sum to 1 over input channels).