Convolutional Modules

Convolutional layers for modeling spatial structure in image data with probabilistic circuits.

ConvPc

High-level convolutional probabilistic circuit architecture that stacks alternating ProdConv and SumConv layers on top of a leaf distribution. Similar to RAT-SPN but designed specifically for image data with spatial structure.

class spflow.modules.conv.ConvPc(leaf, input_height, input_width, channels, depth, kernel_size=2, num_repetitions=1, use_sum_conv=False)[source]

Bases: Module

Convolutional Probabilistic Circuit.

Builds a multi-layer circuit with alternating ProdConv and SumConv layers on top of a leaf distribution. The architecture progressively reduces spatial dimensions while learning mixture weights at each level.

The layer ordering is: Leaf -> ProdConv -> SumConv -> ProdConv -> SumConv -> … -> Root Sum

Layers are constructed top-down (from root to leaves), then reversed for proper bottom-up evaluation order.

leaf

Leaf distribution module.

Type:

Module

root

Final sum layer producing scalar output per sample.

Type:

Sum

__init__(leaf, input_height, input_width, channels, depth, kernel_size=2, num_repetitions=1, use_sum_conv=False)[source]

Create a ConvPc for image modeling.

Parameters:
  • leaf (Module) – Leaf distribution module (e.g., Normal over pixels).

  • input_height (int) – Height of input image.

  • input_width (int) – Width of input image.

  • channels (int) – Number of channels per sum layer.

  • depth (int) – Number of (ProdConv, SumConv) layer pairs.

  • kernel_size (int) – Kernel size for pooling (default 2x2).

  • num_repetitions (int) – Number of independent repetitions.

  • use_sum_conv (bool) – If True, use SumConv layers with kernel-based spatial weights. If False (default), use regular Sum layers that treat features independently without spatial awareness.

Raises:

ValueError – If depth < 1.

expectation_maximization(data, bias_correction=True, cache=None)[source]

Perform EM update throughout the circuit.

Parameters:
  • data (Tensor) – Input data tensor.

  • bias_correction (bool) – Whether to apply bias correction.

  • cache (Cache | None) – Optional cache with log-likelihoods.

Return type:

None

log_likelihood(data, cache=None)[source]

Compute log likelihood through all layers.

Parameters:
  • data (Tensor) – Input data of shape (batch_size, num_pixels).

  • cache (Cache | None) – Cache for intermediate computations.

Returns:

Log-likelihood of shape (batch, 1, 1, reps).

Return type:

Tensor

marginalize(marg_rvs, prune=True, cache=None)[source]

Marginalize out specified random variables.

Parameters:
  • marg_rvs (list[int]) – List of random variable indices to marginalize.

  • prune (bool) – Whether to prune unnecessary nodes.

  • cache (Cache | None) – Optional cache for storing intermediate results.

Returns:

Marginalized module or None if fully marginalized.

Return type:

ConvPc | Module | None

sample(num_samples=None, data=None, is_mpe=False, cache=None, sampling_ctx=None)[source]

Generate samples by sampling top-down through layers.

Delegates sampling to the root module (RepetitionMixingLayer when num_repetitions > 1, or Sum when num_repetitions == 1), which then recursively propagates sampling to the leaf.

Parameters:
  • num_samples (int | None) – Number of samples to generate.

  • data (Tensor | None) – Data tensor with NaN values to fill with samples.

  • is_mpe (bool) – Whether to perform maximum a posteriori estimation.

  • cache (Cache | None) – Optional cache dictionary.

  • sampling_ctx (SamplingContext | None) – Optional sampling context.

Returns:

Sampled values of shape (num_samples, num_pixels).

Return type:

Tensor

property feature_to_scope: ndarray

Single output feature with full scope.

SumConv

Convolutional sum layer that applies learned weighted sums over input channels within spatial patches. Enables mixture modeling with spatial structure.

class spflow.modules.conv.SumConv(inputs, out_channels, kernel_size, num_repetitions=1)[source]

Bases: Module

Convolutional sum layer for probabilistic circuits.

Applies weighted sum over input channels within spatial patches. Weights are learned and normalized to sum to one per patch position, maintaining valid probability distributions. Useful for modeling spatial structure in image data.

The layer expects input with spatial structure and applies shared weights across all spatial patches of the same position within the kernel.

inputs

Input module providing log-likelihoods.

Type:

Module

kernel_size

Size of the spatial kernel (kernel_size x kernel_size).

Type:

int

in_channels

Number of input channels.

Type:

int

out_channels

Number of output channels (mixture components).

Type:

int

logits

Unnormalized log-weights for gradient optimization.

Type:

Parameter

__init__(inputs, out_channels, kernel_size, num_repetitions=1)[source]

Create a SumConv module for spatial mixture modeling.

Parameters:
  • inputs (Module) – Input module providing log-likelihoods with spatial structure.

  • out_channels (int) – Number of output mixture components.

  • kernel_size (int) – Size of the spatial kernel (kernel_size x kernel_size).

  • num_repetitions (int) – Number of independent repetitions.

Raises:

ValueError – If out_channels < 1 or kernel_size < 1.

expectation_maximization(data, bias_correction=True, cache=None)[source]

Perform expectation-maximization step to update weights.

Follows the standard EM update pattern for sum nodes: 1. Get cached log-likelihoods for input and this module 2. Compute expectations using: log_weights + log_grads + input_lls - module_lls 3. Normalize to get new log_weights

Parameters:
  • data (Tensor) – Input data tensor.

  • bias_correction (bool) – Whether to apply bias correction (unused currently).

  • cache (Cache | None) – Cache dictionary with log-likelihoods from forward pass.

Raises:

MissingCacheError – If required log-likelihoods are not found in cache.

Return type:

None

log_likelihood(data, cache=None)[source]

Compute log likelihood using convolutional weighted sum.

Applies weighted sum over input channels within spatial patches. Each kernel position gets its own set of mixture weights. Uses logsumexp for numerical stability.

Parameters:
  • data (Tensor) – Input data of shape (batch_size, num_features).

  • cache (Cache | None) – Cache for intermediate computations.

Returns:

Log-likelihood of shape (batch, features, out_channels, reps).

Return type:

Tensor

marginalize(marg_rvs, prune=True, cache=None)[source]

Marginalize out specified random variables.

Parameters:
  • marg_rvs (list[int]) – List of random variable indices to marginalize.

  • prune (bool) – Whether to prune unnecessary nodes.

  • cache (Cache | None) – Optional cache for storing intermediate results.

Returns:

Marginalized module or None if fully marginalized.

Return type:

SumConv | Module | None

sample(num_samples=None, data=None, is_mpe=False, cache=None, sampling_ctx=None)[source]

Generate samples from sum conv module.

Each spatial position samples from its per-position kernel weights.

Parameters:
  • num_samples (int | None) – Number of samples to generate.

  • data (Tensor | None) – Data tensor with NaN values to fill with samples.

  • is_mpe (bool) – Whether to perform maximum a posteriori estimation.

  • cache (Cache | None) – Optional cache dictionary.

  • sampling_ctx (SamplingContext | None) – Optional sampling context.

Returns:

Sampled values.

Return type:

Tensor

property feature_to_scope: ndarray

Per-pixel scopes are preserved from input.

property log_weights: Tensor

Returns the log weights normalized to sum to one over input channels.

Returns:

Log weights of shape (out_c, in_c, k, k, reps).

Return type:

Tensor

property weights: Tensor

Returns the weights normalized to sum to one over input channels.

Returns:

Weights of shape (out_c, in_c, k, k, reps).

Return type:

Tensor

ProdConv

Convolutional product layer that computes products over spatial patches, reducing spatial dimensions by the kernel size factor. Aggregates scopes within patches while maintaining proper probabilistic semantics.

class spflow.modules.conv.ProdConv(inputs, kernel_size_h, kernel_size_w, padding_h=0, padding_w=0)[source]

Bases: Module

Convolutional product layer for probabilistic circuits.

Computes products over spatial patches, reducing spatial dimensions by the kernel size factor. This is equivalent to summing log-likelihoods within patches. No learnable parameters.

Scopes are aggregated per patch: a 2×2 patch containing Scope(0), Scope(1), Scope(2), Scope(3) produces Scope([0,1,2,3]).

inputs

Input module providing log-likelihoods.

Type:

Module

kernel_size_h

Kernel height.

Type:

int

kernel_size_w

Kernel width.

Type:

int

padding_h

Padding in height dimension.

Type:

int

padding_w

Padding in width dimension.

Type:

int

__init__(inputs, kernel_size_h, kernel_size_w, padding_h=0, padding_w=0)[source]

Create a ProdConv module for spatial product operations.

Parameters:
  • inputs (Module) – Input module providing log-likelihoods with spatial structure.

  • kernel_size_h (int) – Height of the pooling kernel.

  • kernel_size_w (int) – Width of the pooling kernel.

  • padding_h (int) – Padding in height dimension (added on both sides).

  • padding_w (int) – Padding in width dimension (added on both sides).

Raises:

ValueError – If kernel sizes are < 1.

expectation_maximization(data, bias_correction=True, cache=None)[source]

EM step (delegates to input, no learnable parameters).

Parameters:
  • data (Tensor) – Input data tensor for EM step.

  • bias_correction (bool) – Whether to apply bias correction.

  • cache (Cache | None) – Optional cache for storing intermediate results.

Return type:

None

log_likelihood(data, cache=None)[source]

Compute log likelihood by summing within patches.

Uses depthwise convolution with ones kernel to efficiently sum log-probabilities within patches.

Parameters:
  • data (Tensor) – Input data of shape (batch_size, num_features).

  • cache (Cache | None) – Cache for intermediate computations.

Returns:

Log-likelihood of shape (batch, out_features, channels, reps).

Return type:

Tensor

marginalize(marg_rvs, prune=True, cache=None)[source]

Marginalize out specified random variables.

Parameters:
  • marg_rvs (list[int]) – List of random variable indices to marginalize.

  • prune (bool) – Whether to prune unnecessary nodes.

  • cache (Cache | None) – Optional cache for storing intermediate results.

Returns:

Marginalized module or None if fully marginalized.

Return type:

ProdConv | Module | None

sample(num_samples=None, data=None, is_mpe=False, cache=None, sampling_ctx=None)[source]

Generate samples by delegating to input.

ProdConv has no learnable parameters, so sampling simply expands the sampling context to match input features and delegates.

Parameters:
  • num_samples (int | None) – Number of samples to generate.

  • data (Tensor | None) – Data tensor with NaN values to fill with samples.

  • is_mpe (bool) – Whether to perform maximum a posteriori estimation.

  • cache (Cache | None) – Optional cache dictionary.

  • sampling_ctx (SamplingContext | None) – Optional sampling context.

Returns:

Sampled values.

Return type:

Tensor

property feature_to_scope: ndarray

Aggregated scopes per output feature.

Each output feature’s scope is the join of all input scopes within its patch.

Returns:

2D array of Scope objects (features, repetitions).

Return type:

np.ndarray