Measures

Information Theory

spflow.measures.information_theory.entropy(model, scope, *, method='mc', num_samples=10000, seed=None, channel_agg='logmeanexp', repetition_agg='logmeanexp')[source]

Estimate the entropy H(X) (in nats) for a subset of variables.

The returned value is in nats (natural logarithm base), consistent with SPFlow log-likelihood conventions.

Parameters:
  • model (Module) – SPFlow probabilistic circuit.

  • scope (Scope | int | Iterable[int]) – Variables X to compute entropy for.

  • method (str) – “mc” (Monte Carlo) or “exact” (enumeration for tiny discrete domains).

  • num_samples (int) – Number of samples for Monte Carlo estimation.

  • seed (int | None) – Optional seed for best-effort deterministic sampling.

  • channel_agg (str) – How to aggregate multiple channels (“logmeanexp”, “logsumexp”, “first”).

  • repetition_agg (str) – How to aggregate multiple repetitions (“logmeanexp”, “logsumexp”, “first”).

Return type:

Tensor

Returns:

Scalar tensor containing H(X) in nats.

spflow.measures.information_theory.mutual_information(model, x_scope, y_scope, *, method='mc', num_samples=10000, seed=None, channel_agg='logmeanexp', repetition_agg='logmeanexp')[source]

Estimate mutual information I(X;Y) (in nats).

Return type:

Tensor

spflow.measures.information_theory.conditional_mutual_information(model, x_scope, y_scope, z_scope, *, method='mc', num_samples=10000, seed=None, channel_agg='logmeanexp', repetition_agg='logmeanexp')[source]

Estimate conditional mutual information I(X;Y|Z) (in nats).

Return type:

Tensor

Weight of Evidence

spflow.measures.weight_of_evidence.conditional_probability(model, *, y_index, y_value, evidence, channel_agg='logmeanexp', repetition_agg='logmeanexp')[source]

Compute p(y=y_value | evidence) for a discrete target variable.

This follows the legacy SPFlow definition:

p(y|x) = p(x,y) / p(x)

Parameters:
  • model (Module) – SPFlow probabilistic circuit.

  • y_index (int) – Column index of the target variable Y in the data.

  • y_value (int | float) – Concrete value for Y.

  • evidence (Tensor) – Evidence tensor of shape (batch, D) with NaNs for missing values.

  • channel_agg (str) – How to aggregate multiple channels (“logmeanexp”, “logsumexp”, “first”).

  • repetition_agg (str) – How to aggregate multiple repetitions (“logmeanexp”, “logsumexp”, “first”).

Return type:

Tensor

Returns:

Tensor of shape (batch,) with conditional probabilities in [0, 1].

spflow.measures.weight_of_evidence.weight_of_evidence(model, *, y_index, y_value, evidence_full, evidence_reduced, n, k=None, eps=1e-06, channel_agg='logmeanexp', repetition_agg='logmeanexp')[source]

Compute the weight of evidence (WoE) between two evidence settings (in nats).

This compares evidence_full against evidence_reduced using a log-odds difference:

WoE = logit(L(p(y|e_full))) - logit(L(p(y|e_reduced)))

where L(.) is a Laplace correction:

L(p) = (p*n + 1) / (n + k)

Parameters:
  • model (Module) – SPFlow probabilistic circuit.

  • y_index (int) – Column index of Y.

  • y_value (int | float) – Concrete value for Y.

  • evidence_full (Tensor) – Evidence tensor (batch, D).

  • evidence_reduced (Tensor) – Evidence tensor (batch, D).

  • n (int) – Number of training instances used for Laplace correction.

  • k (int | None) – Cardinality of Y (if None, inferred for Bernoulli/Categorical).

  • eps (float) – Clamp used to keep probabilities away from 0/1 before logit.

  • channel_agg (str) – How to aggregate multiple channels (“logmeanexp”, “logsumexp”, “first”).

  • repetition_agg (str) – How to aggregate multiple repetitions (“logmeanexp”, “logsumexp”, “first”).

Return type:

Tensor

Returns:

Tensor of shape (batch,) with WoE values in nats.

spflow.measures.weight_of_evidence.weight_of_evidence_leave_one_out(model, *, y_index, y_value, x_instance, n, k=None, eps=1e-06, channel_agg='logmeanexp', repetition_agg='logmeanexp')[source]

Compute per-feature leave-one-out WoE attributions (legacy-style, in nats).

For each non-NaN entry X_i in x_instance (excluding y_index), this computes:

WoE_i = logit(L(p(y|x))) - logit(L(p(y|xi)))

Parameters:
  • model (Module) – SPFlow probabilistic circuit.

  • y_index (int) – Column index of Y.

  • y_value (int | float) – Concrete value for Y.

  • x_instance (Tensor) – Evidence tensor of shape (batch, D). NaNs indicate missing values.

  • n (int) – Number of training instances used for Laplace correction.

  • k (int | None) – Cardinality of Y (if None, inferred for Bernoulli/Categorical).

  • eps (float) – Clamp used to keep probabilities away from 0/1 before logit.

  • channel_agg (str) – How to aggregate multiple channels (“logmeanexp”, “logsumexp”, “first”).

  • repetition_agg (str) – How to aggregate multiple repetitions (“logmeanexp”, “logsumexp”, “first”).

Return type:

Tensor

Returns:

Tensor of shape (batch, D) with WoE scores per feature and NaNs elsewhere.