Concepts¶
This page is a stable, linkable reference for SPFlow concepts (separate from the API reference and notebooks).
See also SOCS for details on signed circuits and compatibility.
Shapes and Dimensions¶
SPFlow modules use a consistent internal shape convention: (features, channels, repetitions).
You will often see this displayed as D (features), C (channels), and R (repetitions) in
model.to_str() output.
Terminology¶
Features (D): Number of random variables represented by the module (usually
len(scope)).Channels (C): Parallel distributions/mixture channels computed in one forward pass.
Repetitions (R): Independent parameterizations of the same structure.
Where shapes appear¶
Input data:
datais shaped(batch, num_features).Log-likelihood outputs:
log_likelihoodreturns(batch, out_features, out_channels, repetitions).Module metadata:
spflow.modules.module_shape.ModuleShapestores(features, channels, repetitions).
Practical tips¶
Use
model.to_str()to sanity-check shapes and scopes end-to-end.If
data.shape[1] != len(model.scope), check your leaf scopes first.
Scopes and Decomposability¶
A scope identifies which input variables (features) a module operates on. Scopes enforce the structural constraints that make inference tractable.
What is a scope?¶
Use spflow.meta.Scope to describe feature indices:
from spflow.meta import Scope
scope = Scope([0, 1, 2])
Rules of thumb¶
Sum nodes combine inputs with the same scope.
Product nodes combine inputs with disjoint scopes (decomposability / independence assumption).
Common failure modes¶
Scope mismatch in a Sum: you mixed modules that do not cover the same variables.
Overlapping scopes in a Product: you combined two modules that both model the same feature(s).
Related references¶
Missing Data and Evidence¶
SPFlow uses NaN-based evidence: missing values are represented with torch.nan.
This makes it easy to mix observed and unobserved variables in the same tensor.
Log-likelihood with missing data¶
When computing likelihoods, NaN entries are treated as “unknown” variables to marginalize out:
import torch
data = torch.randn(32, 5)
data[0, 2] = float("nan") # feature 2 missing for sample 0
log_ll = model.log_likelihood(data)
Conditional sampling with evidence¶
For conditional sampling, you can provide an evidence tensor where NaNs indicate values to sample:
evidence = torch.full((10, num_features), float("nan"))
evidence[:, 0] = 0.5 # condition on feature 0
samples = model.sample_with_evidence(evidence=evidence)
Related references¶
Differentiable Sampling¶
The main public sampling APIs are sample, sample_with_evidence, and mpe.
All three APIs support return_leaf_params=True and then return
(samples, leaf_param_records).
SPFlow also contains differentiable routing/sampling paths for selected modules and leaves, but this is an
advanced interface and not uniformly supported across all components.
For APC models specifically, inference APIs (encode/decode/sampling/likelihood) remain available, while
the model objective APIs (AutoencodingPC.loss_components / loss) are available.
Latent stats are available from APC encoders via exact selected latent leaf parameters.
Exact latent KL/stat extraction is supported for Normal, Bernoulli, Binomial, and Categorical leaves
against fixed canonical priors; unsupported latent families raise explicit errors (no fallback path).
APC trainer helper functions in spflow.zoo.apc.train are available for lightweight training loops.
Caching and Dispatch¶
Probabilistic circuits are DAGs, and many operations reuse subcomputations. SPFlow provides a lightweight caching mechanism to avoid redundant work during inference, learning, and sampling.
Cache basics¶
Use
spflow.utils.cache.Cacheto memoize intermediate results across a single traversal.Many modules use the
spflow.utils.cache.cached()decorator for operations likelog_likelihood.spflow.utils.cache.Cachealso providesCache.extrasfor storing custom, user-defined information that should be available throughout a recursive traversal.
When you should care¶
Repeatedly calling
log_likelihoodon the same model inside a loop can be faster if you reuse a cache.Debugging unexpected values is easier if you can control whether cached results are reused.
Related references¶
spflow.utils.cache.cached()