Learning and Training¶
Structure and parameter learning algorithms for probabilistic circuits.
Structure Learning¶
Automatic structure learning using the LearnSPN algorithm based on Randomized Dependence Coefficients (RDC).
- spflow.learn.learn_spn.learn_spn(data, leaf_modules, out_channels=1, min_features_slice=2, min_instances_slice=100, scope=None, clustering_method='kmeans', partitioning_method='rdc', clustering_args=None, partitioning_args=None, full_data=None)[source]¶
LearnSPN structure and parameter learner.
LearnSPN algorithm as described in (Gens & Domingos, 2013): “Learning the Structure of Sum-Product Networks”.
- Parameters:
data (
Tensor) – Two-dimensional Tensor containing the input data. Each row corresponds to a sample.leaf_modules (
list[LeafModule] |LeafModule) – List of leaf modules or single leaf module to use for learning.out_channels (
int) – Number of output channels. Defaults to 1.min_features_slice (
int) – Minimum number of features required to partition. Defaults to 2.min_instances_slice (
int) – Minimum number of instances required to cluster. Defaults to 100.scope – Scope for the SPN. If None, inferred from leaf_modules.
clustering_method (
str|Callable) – String or callable specifying the clustering method. If ‘kmeans’, k-Means clustering is used. If a callable, it should accept data and return cluster assignments.partitioning_method (
str|Callable) – String or callable specifying the partitioning method. If ‘rdc’, randomized dependence coefficients are used. If a callable, it should accept data and return partition assignments.clustering_args (
dict[str,Any] |None) – Optional dictionary of keyword arguments for clustering method.partitioning_args (
dict[str,Any] |None) – Optional dictionary of keyword arguments for partitioning method.full_data (
Tensor|None) – Optional full dataset for parameter estimation.
- Return type:
- Returns:
A Module representing the learned SPN.
- Raises:
ValueError – If arguments are invalid or scopes are not disjoint.
Parameter Learning: EM¶
Expectation-Maximization algorithm for parameter optimization.
- spflow.learn.expectation_maximization.expectation_maximization(module, data, max_steps=-1, verbose=False)[source]¶
Performs expectation-maximization optimization on a given module.
- Parameters:
module (
Module) – Module to perform EM optimization on.data (
Tensor) – Two-dimensional tensor containing the input data. Each row corresponds to a sample.max_steps (
int) – Maximum number of iterations. Defaults to -1, in which case optimization runs until convergence.verbose (
bool) – Whether to print the log-likelihood for each iteration step. Defaults to False.
- Return type:
- Returns:
One-dimensional tensor containing the average log-likelihood for each iteration step.
Parameter Learning: Gradient Descent¶
Gradient descent-based parameter learning using PyTorch optimizers.
- spflow.learn.gradient_descent.train_gradient_descent(model, dataloader, epochs=-1, verbose=False, is_classification=False, optimizer=None, scheduler=None, lr=0.001, loss_fn=None, validation_dataloader=None, callback_batch=None, callback_epoch=None, nll_weight=1.0)[source]¶
Train model using gradient descent.
- Parameters:
model (
Module) – Model to train, must inherit from Module.dataloader (
DataLoader) – Training data loader yielding batches.epochs (
int) – Number of training epochs. Must be positive.verbose (
bool) – Whether to log training progress per epoch.is_classification (
bool) – Whether this is a classification task.optimizer (
Optimizer|None) – Optimizer instance. Defaults to Adam if None.scheduler (
LRScheduler|None) – Learning rate scheduler. Defaults to MultiStepLR if None.lr (
float) – Learning rate for default Adam optimizer.loss_fn (
Callable[[Module,Tensor],Tensor] |None) – Custom loss function. Defaults based on task type if None.validation_dataloader (
DataLoader|None) – Validation data loader for periodic evaluation.callback_batch (
Callable[[Tensor,int],None] |None) – Function called after each batch with (loss, step).callback_epoch (
Callable[[list[Tensor],int],None] |None) – Function called after each epoch with (losses, epoch).nll_weight (
float) – Weight for the density estimation (NLL) term when is_classification=True. Controls the balance between discriminative and generative loss. Default is 1.0.
- Raises:
ValueError – If epochs is not a positive integer.
InvalidTypeError – If is_classification is True and model is not a Classifier instance.