Measures¶

Information Theory¶

spflow.measures.information_theory.entropy(model, scope, *, method='mc', num_samples=10000, seed=None, channel_agg='logmeanexp', repetition_agg='logmeanexp')[source]¶

Estimate the entropy H(X) (in nats) for a subset of variables.

The returned value is in nats (natural logarithm base), consistent with SPFlow log-likelihood conventions.

Parameters:

model (Module) – SPFlow probabilistic circuit.
scope (Scope | int | Iterable[int]) – Variables X to compute entropy for.
method (str) – “mc” (Monte Carlo) or “exact” (enumeration for tiny discrete domains).
num_samples (int) – Number of samples for Monte Carlo estimation.
seed (int | None) – Optional seed for best-effort deterministic sampling.
channel_agg (str) – How to aggregate multiple channels (“logmeanexp”, “logsumexp”, “first”).
repetition_agg (str) – How to aggregate multiple repetitions (“logmeanexp”, “logsumexp”, “first”).

Return type:

Tensor

Returns:

Scalar tensor containing H(X) in nats.

spflow.measures.information_theory.mutual_information(model, x_scope, y_scope, *, method='mc', num_samples=10000, seed=None, channel_agg='logmeanexp', repetition_agg='logmeanexp')[source]¶

Estimate mutual information I(X;Y) (in nats).

Return type:: Tensor

spflow.measures.information_theory.conditional_mutual_information(model, x_scope, y_scope, z_scope, *, method='mc', num_samples=10000, seed=None, channel_agg='logmeanexp', repetition_agg='logmeanexp')[source]¶

Estimate conditional mutual information I(X;Y|Z) (in nats).

Return type:: Tensor

Weight of Evidence¶

spflow.measures.weight_of_evidence.conditional_probability(model, *, y_index, y_value, evidence, channel_agg='logmeanexp', repetition_agg='logmeanexp')[source]¶

Compute p(y=y_value | evidence) for a discrete target variable.

This follows the legacy SPFlow definition:: p(y|x) = p(x,y) / p(x)

Parameters:

model (Module) – SPFlow probabilistic circuit.
y_index (int) – Column index of the target variable Y in the data.
y_value (int | float) – Concrete value for Y.
evidence (Tensor) – Evidence tensor of shape (batch, D) with NaNs for missing values.
channel_agg (str) – How to aggregate multiple channels (“logmeanexp”, “logsumexp”, “first”).
repetition_agg (str) – How to aggregate multiple repetitions (“logmeanexp”, “logsumexp”, “first”).

Return type:

Tensor

Returns:

Tensor of shape (batch,) with conditional probabilities in [0, 1].

spflow.measures.weight_of_evidence.weight_of_evidence(model, *, y_index, y_value, evidence_full, evidence_reduced, n, k=None, eps=1e-06, channel_agg='logmeanexp', repetition_agg='logmeanexp')[source]¶

Compute the weight of evidence (WoE) between two evidence settings (in nats).

This compares evidence_full against evidence_reduced using a log-odds difference:: WoE = logit(L(p(y|e_full))) - logit(L(p(y|e_reduced)))
where L(.) is a Laplace correction:: L(p) = (p*n + 1) / (n + k)

Parameters:

model (Module) – SPFlow probabilistic circuit.
y_index (int) – Column index of Y.
y_value (int | float) – Concrete value for Y.
evidence_full (Tensor) – Evidence tensor (batch, D).
evidence_reduced (Tensor) – Evidence tensor (batch, D).
n (int) – Number of training instances used for Laplace correction.
k (int | None) – Cardinality of Y (if None, inferred for Bernoulli/Categorical).
eps (float) – Clamp used to keep probabilities away from 0/1 before logit.
channel_agg (str) – How to aggregate multiple channels (“logmeanexp”, “logsumexp”, “first”).
repetition_agg (str) – How to aggregate multiple repetitions (“logmeanexp”, “logsumexp”, “first”).

Return type:

Tensor

Returns:

Tensor of shape (batch,) with WoE values in nats.

spflow.measures.weight_of_evidence.weight_of_evidence_leave_one_out(model, *, y_index, y_value, x_instance, n, k=None, eps=1e-06, channel_agg='logmeanexp', repetition_agg='logmeanexp')[source]¶

Compute per-feature leave-one-out WoE attributions (legacy-style, in nats).

For each non-NaN entry X_i in x_instance (excluding y_index), this computes:: WoE_i = logit(L(p(y|x))) - logit(L(p(y|xi)))

Parameters:

model (Module) – SPFlow probabilistic circuit.
y_index (int) – Column index of Y.
y_value (int | float) – Concrete value for Y.
x_instance (Tensor) – Evidence tensor of shape (batch, D). NaNs indicate missing values.
n (int) – Number of training instances used for Laplace correction.
k (int | None) – Cardinality of Y (if None, inferred for Bernoulli/Categorical).
eps (float) – Clamp used to keep probabilities away from 0/1 before logit.
channel_agg (str) – How to aggregate multiple channels (“logmeanexp”, “logsumexp”, “first”).
repetition_agg (str) – How to aggregate multiple repetitions (“logmeanexp”, “logsumexp”, “first”).

Return type:

Tensor

Returns:

Tensor of shape (batch, D) with WoE scores per feature and NaNs elsewhere.