| Did you know ... | Search Documentation: |
| Pack logtalk -- logtalk-3.100.1/docs/handbook/_sources/libraries/gaussian_mixture_clusterer.rst.txt |
.. _library_gaussian_mixture_clusterer:
gaussian_mixture_clustererGaussian mixture model clusterer. It uses deterministic expectation-maximization with diagonal covariance matrices. Supports continuous attributes only.
The library implements the clusterer_protocol defined in the
clustering_protocols library. It provides predicates for learning a
clusterer from a dataset, assigning new instances to clusters, returning
Gaussian-mixture posterior component probabilities for new instances,
and exporting the learned clusterer as a list of predicate clauses or to
a file.
Datasets are represented as objects implementing the
clustering_dataset_protocol protocol from the
clustering_protocols library.
Open the `../../apis/library_index.html#gaussian_mixture_clusterer <../../apis/library_index.html#gaussian_mixture_clusterer>`__ link in a web browser.
To load this library, load the loader.lgt file:
::
| ?- logtalk_load(gaussian_mixture_clusterer(loader)).
To test this library predicates, load the tester.lgt file:
::
| ?- logtalk_load(gaussian_mixture_clusterer(tester)).
To run the performance benchmark suite, load the
tester_performance.lgt file:
::
| ?- logtalk_load(gaussian_mixture_clusterer(tester_performance)).
first_k and
deterministic spread initialization for component means. The
spread strategy uses a canonical first seed and canonical
tie-breaking so equivalent row permutations produce the same
initialization.In addition to the shared cluster/3 predicate from the clustering protocols library, this package provides a Gaussian-mixture-specific predicate:
cluster_probabilities(Clusterer, Instance, Probabilities): Returns
posterior component probabilities for Instance as
Cluster-Probability pairs in component-id order.The following options can be passed to the learn/3 predicate:
k(K): Number of mixture components. Default is 2.initialization(Initialization): Mean initialization strategy.
Options: spread (default) or first_k.feature_scaling(FeatureScaling): Whether to standardize continuous
attributes before clustering. Options: on (default) or off.maximum_iterations(MaximumIterations): Maximum number of EM
iterations. Default is 100.tolerance(Tolerance): Per-example average log-likelihood
convergence tolerance. Default is 0.0001.covariance_regularization(Regularization): Positive diagonal
covariance regularization constant. Default is 0.001.dead_component_policy(Policy): Handling for components whose total
responsibility collapses below the dead-component threshold. Options:
zero_weight (default) keeps the previous component with zero
weight; reseed relocates the component to the least-confident
training row and gives it one-example prior weight.The learned clusterer is represented as a compound term with the functor chosen by the user when exporting the clusterer and arity 5. For example:
::
gaussian_mixture_clusterer(Encoders, Components, Weights, Options, Diagnostics)
Where:
Encoders: List of continuous attribute encoders storing attribute
name, mean, and scale.Components: List of component(Mean, Variances) terms in
component-id order.Weights: List of mixture weights in component-id order.Options: Effective training options used to learn the clusterer.Diagnostics: Training diagnostics including convergence status,
iteration count, average log-likelihood, final delta, and options.