Frequently Asked Questions¶

This page answers common questions about SPFlow. For deeper explanations, see Concepts. For end-to-end tutorials, see the User Guide.

General Questions¶

What is SPFlow?¶

SPFlow is a Python library for building and learning Probabilistic Circuits (PCs), including Sum-Product Networks (SPNs). These are deep generative and discriminative models that enable tractable (polynomial-time) probabilistic inference while maintaining expressive power.

SPFlow is built on PyTorch, providing GPU acceleration and seamless integration with modern deep learning workflows.

What version of Python is required?¶

SPFlow requires Python 3.10+ and PyTorch 2.0+.

How do I install SPFlow?¶

See the Getting Started guide for installation instructions. Quick summary:

pip install spflow

Architecture & Concepts¶

What are the main module types in SPFlow?¶

SPFlow provides several core module types:

Leaves: Probability distributions at the terminals (Normal, Categorical, Bernoulli, etc.)
Products: Combine independent distributions (Product, OuterProduct, ElementwiseProduct)
Sums: Weighted mixtures of distributions (Sum, ElementwiseSum)
Specialized architectures: RAT-SPN, ConvPc for images (see Paper Zoo)

See the API Reference for complete documentation.

What is a Scope?¶

A Scope defines which input variables (features) a module operates on. Scopes are what make sums/products well-defined and enforce decomposability/compatibility constraints.

See Scopes and Decomposability.

What are repetitions?¶

Repetitions are independent parameterizations of the same structure. They usually show up as R in model.to_str() and are tracked in module shapes.

See Shapes and Dimensions.

What is the difference between Sum and ElementwiseSum?¶

Sum: Computes weighted mixtures over all input channels. Output has out_channels channels, each being a weighted combination of all input channels.
ElementwiseSum: Sums corresponding channels across multiple input modules element-wise. Requires all inputs to have the same scope and channel count.

What is the difference between Product, OuterProduct, and ElementwiseProduct?¶

Product: Combines inputs by computing products across all features. The inputs must have disjoint scopes.
OuterProduct: Computes the outer product of split inputs. Takes input split into groups and produces all combinations.
ElementwiseProduct: Multiplies corresponding elements across multiple input modules. Requires inputs with compatible shapes.

Model Building¶

How do I create a simple SPN?¶

Here’s a minimal example:

import torch
from spflow.modules.sums import Sum
from spflow.modules.products import Product
from spflow.modules.leaves import Normal
from spflow.meta import Scope

# Create leaves for 2 features
scope = Scope([0, 1])
leaves = Normal(scope=scope, out_channels=4)

# Stack product and sum layers
product = Product(inputs=leaves)
model = Sum(inputs=product, out_channels=1)

# Use the model
data = torch.randn(32, 2)
log_ll = model.log_likelihood(data)

See the Getting Started guide for more examples.

What leaf distributions are available?¶

SPFlow includes many univariate distributions:

Continuous: Normal, LogNormal, Exponential, Laplace, Gamma, Uniform

Discrete: Categorical, Bernoulli, Binomial, Poisson, Geometric, NegativeBinomial, Hypergeometric

See Leaf Modules for complete documentation.

How do I use RAT-SPN?¶

RAT-SPN (Randomized And Tensorized SPN) automatically builds a deep circuit from hyperparameters:

from spflow.zoo.rat import RatSPN
from spflow.modules.leaves import Normal
from spflow.meta import Scope

# Create leaves
scope = Scope(list(range(64)))
leaves = Normal(scope=scope, out_channels=4, num_repetitions=2)

# Build RAT-SPN
model = RatSPN(
    leaf_modules=[leaves],
    n_root_nodes=1,
    n_region_nodes=8,
    num_repetitions=2,
    depth=3
)

See Random and Tensorized Sum-Product Networks (RAT-SPN) for details.

Does SPFlow have image-specific modules?¶

Yes! Use the ConvPc module for image data with spatial structure:

import torch
from spflow.zoo.conv import ConvPc
from spflow.modules.leaves import Normal
from spflow.meta import Scope

# Create leaf layer for 28x28 grayscale images (e.g., MNIST)
height, width = 28, 28
scope = Scope(list(range(height * width)))
leaf = Normal(scope=scope, out_channels=8, num_repetitions=1)

# Build convolutional PC
model = ConvPc(
    leaf=leaf,
    input_height=height,
    input_width=width,
    depth=3,
    channels=16,
    kernel_size=2,
    num_repetitions=1,
)

For adapting existing models to image data, use ImageWrapper:

from spflow.modules.wrapper import ImageWrapper

# Wrap any SPFlow model for image data
wrapped = ImageWrapper(model, num_channel=1, height=28, width=28)

# Now works with 4D tensors: (batch, channels, height, width)
image_data = torch.randn(32, 1, 28, 28)
log_ll = wrapped.log_likelihood(image_data)

ConvPc currently supports num_repetitions == 1 only.

See Convolutional Modules, Convolutional Probabilistic Circuits (ConvPc) and Wrapper Modules for complete documentation.

Training & Learning¶

How do I train a model?¶

SPFlow provides two main training approaches:

Gradient Descent:

from torch.utils.data import DataLoader, TensorDataset
from spflow.learn import train_gradient_descent

dataloader = DataLoader(TensorDataset(train_data), batch_size=64, shuffle=True)

train_gradient_descent(
    model,
    dataloader,
    epochs=100,
    lr=0.01
)

Expectation-Maximization:

from spflow.learn import expectation_maximization

expectation_maximization(model, train_data, max_steps=50)

What is the difference between gradient descent and EM?¶

Both methods use gradients in SPFlow’s implementation:

Gradient Descent: Standard PyTorch optimization. Suitable for most cases, especially when combined with other neural network components.
Expectation-Maximization (EM): A specialized algorithm that alternates between computing expected sufficient statistics and updating parameters. This is usually more stable and converges faster than gradient descent.

Choose based on your use case; gradient descent is generally more flexible.

How do I use structure learning?¶

Use learn_spn to automatically learn circuit structure from data:

from spflow.learn import learn_spn
from spflow.modules.leaves import Normal
from spflow.meta import Scope

scope = Scope(list(range(10)))
leaves = Normal(scope=scope, out_channels=4)

model = learn_spn(
    data,
    leaf_modules=leaves,
    out_channels=1,
    min_instances_slice=100
)

See Learning and Training for details and the User Guide for end-to-end examples.

Inference & Sampling¶

How do I compute log-likelihood?¶

Call the log_likelihood method on your model:

log_likelihood = model.log_likelihood(data)
# Returns tensor of shape (batch_size, features, channels, repetitions)

How do I sample from a model?¶

Use the sample method:

# Generate 100 unconditional samples
samples = model.sample(num_samples=100)

For conditional sampling with evidence, use sample_with_evidence:

# Sample some features given others
evidence = torch.full((10, num_features), float('nan'))
evidence[:, 0] = 0.5  # Condition on feature 0
samples = model.sample_with_evidence(evidence=evidence)

What is MPE (Most Probable Explanation) sampling?¶

MPE returns the most probable state of the model, useful for generating clearer outputs and validating training. This is also known as MAP (Maximum A Posteriori) sampling.

Note

MAP sampling is different from MMAP (Marginal MAP) sampling, which marginalizes over some variables while maximizing over others.

# Get the most probable sample
mpe_sample = model.sample(num_samples=1, is_mpe=True)

MPE can be combined with evidence for conditional MPE:

evidence = torch.full((1, num_features), float('nan'))
evidence[0, :10] = observed_values[:10]  # Condition on first 10 features
conditional_mpe = model.mpe(data=evidence)

How do I handle missing data?¶

Use torch.nan in your data/evidence tensor to indicate missing values:

# Create data with missing values
data = torch.randn(100, 5)
data[0, 2] = float('nan')  # Feature 2 is missing for sample 0
data[1, 0:2] = float('nan')  # Features 0-1 missing for sample 1

# Log-likelihood handles missing data automatically
log_ll = model.log_likelihood(data)

SPFlow will marginalize over missing features when computing likelihoods.

See Missing Data and Evidence for details and conditional sampling patterns.

Visualization & Debugging¶

How do I visualize a circuit?¶

Use the visualize function:

from spflow.utils.visualization import visualize

visualize(
    model,
    output_path="/tmp/my_circuit",
    format="pdf",
    show_scope=True,
    show_shape=True
)

Requires Graphviz to be installed on your system.

What output formats are supported?¶

The visualization function supports multiple formats via Graphviz:

PDF: format="pdf" (recommended for papers)
SVG: format="svg" (scalable, good for web)
PNG: format="png" (raster image)

How do I print the model structure?¶

Use the to_str() method for a text representation:

print(model.to_str())

# Example output:
# Sum [D=1, C=1] [weights: (1, 4, 1)] → scope: 0-1
# └─ Product [D=1, C=4] → scope: 0-1
#    └─ Normal [D=2, C=4] → scope: 0-1

How do I log model complexity (nodes/edges/parameters)?¶

Use get_structure_stats to compute deterministic structure statistics:

from spflow.utils.structure_stats import get_structure_stats

stats = get_structure_stats(model)
print(stats.num_nodes_total, stats.num_edges_total, stats.num_parameters_total)

For a short text overview (similar to to_str()), use:

print(model.print_structure_stats())

The traversal matches model.to_str() (it skips internal Cat/ModuleList wrappers), is DAG-aware, and counts parameters uniquely across shared subgraphs.

What’s the difference between SPFlow v1.x and the legacy version?¶

SPFlow v1.0 is a complete rewrite using PyTorch as the primary backend. Key differences:

Modern PyTorch architecture for GPU acceleration
Significantly improved performance
Enhanced modular design with composable layers

The pre-v1.0.0 version is still available:

On PyPI: pip install spflow==0.0.48
In the legacy branch of the GitHub repository

Models from the legacy version are not compatible with v1.x and need to be rebuilt.

Migration from Legacy¶

How do I migrate from SPFlow 0.x to 1.x?¶

SPFlow 1.0 is a complete rewrite. Key changes:

PyTorch-based: All modules are nn.Module subclasses
Layered composition: Build circuits by stacking modules
New API: Method names and signatures have changed
GPU support: Native CUDA acceleration

There is no automatic migration path. You will need to:

Reinstall: pip install --upgrade spflow (uninstall legacy first if needed)
Rebuild your models using the new API
Retrain your models

Are old models compatible with SPFlow v1.x?¶

No. Models saved with SPFlow 0.x cannot be loaded in SPFlow 1.x due to the complete architectural rewrite.

You must rebuild and retrain your models using the new API. See the User Guide for comprehensive examples.