======== Concepts ======== This page is a stable, linkable reference for SPFlow concepts (separate from the API reference and notebooks). .. _concepts-shapes-and-dimensions: Shapes and Dimensions ===================== SPFlow modules use a consistent internal shape convention: **(features, channels, repetitions)**. You will often see this displayed as ``D`` (features), ``C`` (channels), and ``R`` (repetitions) in ``model.to_str()`` output. Terminology ----------- - **Features (D)**: Number of random variables represented by the module (usually ``len(scope)``). - **Channels (C)**: Parallel distributions/mixture channels computed in one forward pass. - **Repetitions (R)**: Independent parameterizations of the same structure. Where shapes appear ------------------- - **Input data**: ``data`` is shaped ``(batch, num_features)``. - **Log-likelihood outputs**: ``log_likelihood`` returns ``(batch, out_features, out_channels)``. - **Module metadata**: :class:`spflow.modules.module_shape.ModuleShape` stores ``(features, channels, repetitions)``. Practical tips -------------- - Use ``model.to_str()`` to sanity-check shapes and scopes end-to-end. - If ``data.shape[1] != len(model.scope)``, check your leaf scopes first. Related references ------------------ - :ref:`concepts-scopes-and-decomposability` - :doc:`API: ModuleShape ` - :doc:`API: Scope ` .. _concepts-scopes-and-decomposability: Scopes and Decomposability ========================== A **scope** identifies which input variables (features) a module operates on. Scopes enforce the structural constraints that make inference tractable. What is a scope? ---------------- Use :class:`spflow.meta.Scope` to describe feature indices:: from spflow.meta import Scope scope = Scope([0, 1, 2]) Rules of thumb -------------- - **Sum nodes** combine inputs with the **same** scope. - **Product nodes** combine inputs with **disjoint** scopes (decomposability / independence assumption). Common failure modes -------------------- - **Scope mismatch in a Sum**: you mixed modules that do not cover the same variables. - **Overlapping scopes in a Product**: you combined two modules that both model the same feature(s). Related references ------------------ - :doc:`API: Scope ` - :ref:`concepts-shapes-and-dimensions` .. _concepts-missing-data-and-evidence: Missing Data and Evidence ========================= SPFlow uses **NaN-based evidence**: missing values are represented with ``torch.nan``. This makes it easy to mix observed and unobserved variables in the same tensor. Log-likelihood with missing data -------------------------------- When computing likelihoods, NaN entries are treated as "unknown" variables to marginalize out:: import torch data = torch.randn(32, 5) data[0, 2] = float("nan") # feature 2 missing for sample 0 log_ll = model.log_likelihood(data) Conditional sampling with evidence ---------------------------------- For conditional sampling, you can provide an evidence tensor where NaNs indicate values to sample:: evidence = torch.full((10, num_features), float("nan")) evidence[:, 0] = 0.5 # condition on feature 0 samples = model.sample_with_evidence(evidence=evidence) Related references ------------------ - :doc:`FAQ ` - :doc:`API Reference ` .. _concepts-caching-and-dispatch: Caching and Dispatch ==================== Probabilistic circuits are DAGs, and many operations reuse subcomputations. SPFlow provides a lightweight caching mechanism to avoid redundant work during inference, learning, and sampling. Cache basics ------------ - Use :class:`spflow.utils.cache.Cache` to memoize intermediate results across a single traversal. - Many modules use the :func:`spflow.utils.cache.cached` decorator for operations like ``log_likelihood``. - :class:`spflow.utils.cache.Cache` also provides ``Cache.extras`` for storing custom, user-defined information that should be available throughout a recursive traversal. When you should care -------------------- - Repeatedly calling ``log_likelihood`` on the same model inside a loop can be faster if you reuse a cache. - Debugging unexpected values is easier if you can control whether cached results are reused. Related references ------------------ - :class:`spflow.utils.cache.Cache` - :func:`spflow.utils.cache.cached`