Did you know ... Search Documentation:
Pack logtalk -- logtalk-3.100.1/docs/handbook/_sources/libraries/kmedoids_clusterer.rst.txt

.. _library_kmedoids_clusterer:

kmedoids_clusterer

k-Medoids clusterer. It uses an iterative medoid-update algorithm with deterministic initialization and deterministic cluster assignments. Supports continuous attributes only.

The library implements the clusterer_protocol defined in the clustering_protocols library. It provides predicates for learning a clusterer from a dataset, assigning new instances to clusters, and exporting the learned clusterer as a list of predicate clauses or to a file.

Datasets are represented as objects implementing the clustering_dataset_protocol protocol from the clustering_protocols library.

API documentation

Open the `../../apis/library_index.html#kmedoids_clusterer <../../apis/library_index.html#kmedoids_clusterer>`__ link in a web browser.

Loading

To load this library, load the loader.lgt file:

::

| ?- logtalk_load(kmedoids_clusterer(loader)).

Testing

To test this library predicates, load the tester.lgt file:

::

| ?- logtalk_load(kmedoids_clusterer(tester)).

To run the performance benchmark suite, load the tester_performance.lgt file:

::

| ?- logtalk_load(kmedoids_clusterer(tester_performance)).

Features

  • Continuous Datasets: Accepts datasets containing only continuous attributes.
  • Distance Metrics: Supports Euclidean and Manhattan distances.
  • Deterministic Initialization: Supports first_k and deterministic spread initialization that repeatedly chooses the farthest example from the medoids selected so far.
  • Optional Feature Scaling: Continuous attributes can be standardized using z-score scaling.
  • Rich Training Diagnostics: Learned clusterers report training example count, convergence status, iteration count, and final medoid shift.
  • Portable Export: Learned clusterers can be exported as clauses or files and reused later.
  • Stable Empty-Cluster Handling: Empty clusters keep their previous medoids instead of failing.

Options

The following options can be passed to the learn/3 predicate:

  • k(K): Number of clusters to learn. Default is 2.
  • maximum_iterations(Iterations): Maximum number of medoid-update iterations. Default is 100.
  • tolerance(Tolerance): Maximum medoid shift threshold for convergence. Default is 1.0e-6.
  • initialization(Initialization): Medoid initialization strategy. Options: spread (default) or first_k.
  • distance_metric(Metric): Distance metric to use. Options: euclidean (default) or manhattan.
  • feature_scaling(FeatureScaling): Whether to standardize continuous attributes before clustering. Options: on (default) or off.

Diagnostics

The diagnostics/2 predicate returns a list containing:

  • model(kmedoids_clusterer)
  • medoid_count(Count)
  • training_example_count(Count)
  • convergence(Reason)
  • iterations(Count)
  • final_shift(Shift)
  • options(Options)

Clusterer representation

The learned clusterer is represented as a compound term with the functor chosen by the user when exporting the clusterer and arity 4. For example:

::

kmedoids_clusterer(Encoders, Medoids, Options, Diagnostics)

Where:

  • Encoders: List of continuous attribute encoders storing attribute name, mean, and scale.
  • Medoids: List of medoid vectors in cluster-id order.
  • Options: Effective training options used to learn the clusterer.
  • Diagnostics: Training diagnostics metadata returned by the diagnostics/2 predicate.

References

  1. Kaufman & Rousseeuw (1987) - "Clustering by Means of Medoids". In Statistical Data Analysis Based on the L1-Norm and Related Methods.
  2. Kaufman & Rousseeuw (1990) - "Finding Groups in Data: An Introduction to Cluster Analysis". Wiley.