Concepts¶
This page is a stable, linkable reference for SPFlow concepts (separate from the API reference and notebooks).
Shapes and Dimensions¶
SPFlow modules use a consistent internal shape convention: (features, channels, repetitions).
You will often see this displayed as D (features), C (channels), and R (repetitions) in
model.to_str() output.
Terminology¶
Features (D): Number of random variables represented by the module (usually
len(scope)).Channels (C): Parallel distributions/mixture channels computed in one forward pass.
Repetitions (R): Independent parameterizations of the same structure.
Where shapes appear¶
Input data:
datais shaped(batch, num_features).Log-likelihood outputs:
log_likelihoodreturns(batch, out_features, out_channels).Module metadata:
spflow.modules.module_shape.ModuleShapestores(features, channels, repetitions).
Practical tips¶
Use
model.to_str()to sanity-check shapes and scopes end-to-end.If
data.shape[1] != len(model.scope), check your leaf scopes first.
Scopes and Decomposability¶
A scope identifies which input variables (features) a module operates on. Scopes enforce the structural constraints that make inference tractable.
What is a scope?¶
Use spflow.meta.Scope to describe feature indices:
from spflow.meta import Scope
scope = Scope([0, 1, 2])
Rules of thumb¶
Sum nodes combine inputs with the same scope.
Product nodes combine inputs with disjoint scopes (decomposability / independence assumption).
Common failure modes¶
Scope mismatch in a Sum: you mixed modules that do not cover the same variables.
Overlapping scopes in a Product: you combined two modules that both model the same feature(s).
Related references¶
Missing Data and Evidence¶
SPFlow uses NaN-based evidence: missing values are represented with torch.nan.
This makes it easy to mix observed and unobserved variables in the same tensor.
Log-likelihood with missing data¶
When computing likelihoods, NaN entries are treated as “unknown” variables to marginalize out:
import torch
data = torch.randn(32, 5)
data[0, 2] = float("nan") # feature 2 missing for sample 0
log_ll = model.log_likelihood(data)
Conditional sampling with evidence¶
For conditional sampling, you can provide an evidence tensor where NaNs indicate values to sample:
evidence = torch.full((10, num_features), float("nan"))
evidence[:, 0] = 0.5 # condition on feature 0
samples = model.sample_with_evidence(evidence=evidence)
Related references¶
Caching and Dispatch¶
Probabilistic circuits are DAGs, and many operations reuse subcomputations. SPFlow provides a lightweight caching mechanism to avoid redundant work during inference, learning, and sampling.
Cache basics¶
Use
spflow.utils.cache.Cacheto memoize intermediate results across a single traversal.Many modules use the
spflow.utils.cache.cached()decorator for operations likelog_likelihood.spflow.utils.cache.Cachealso providesCache.extrasfor storing custom, user-defined information that should be available throughout a recursive traversal.
When you should care¶
Repeatedly calling
log_likelihoodon the same model inside a loop can be faster if you reuse a cache.Debugging unexpected values is easier if you can control whether cached results are reused.
Related references¶
spflow.utils.cache.cached()