| Did you know ... | Search Documentation: |
| Pack logtalk -- logtalk-3.100.1/docs/handbook/_sources/libraries/clustering_protocols.rst.txt |
.. _library_clustering_protocols:
clustering_protocols
This library provides protocols used in the implementation of machine
learning clustering algorithms. Datasets are represented as objects
implementing the clustering_dataset_protocol protocol. Clusterers
are represented as objects implementing the clusterer_protocol
protocol.
This library also provides test datasets and a small smoke-test suite. The shared test suite also includes cross-library comparison tests for clusterers that implement the same protocols, allowing common datasets and validation failures to be checked in one place.
Concrete clustering algorithms are intentionally out of scope for this
package. The goal is to provide a portable foundation for future
libraries such as kmeans_clusterer.
Open the `../../apis/library_index.html#clustering_protocols <../../apis/library_index.html#clustering_protocols>`__ link in a web browser.
To load all entities in this library, load the loader.lgt file:
::
| ?- logtalk_load(clustering_protocols(loader)).
To run the library smoke tests and shared comparison tests, load the
tester.lgt file:
::
| ?- logtalk_load(clustering_protocols(tester)).
Several sample datasets are included in the test_datasets directory:
all_noise.lgt â A synthetic 2D continuous dataset with 4 examples
and 2 continuous attributes (x, y). The examples are well
separated, making the dataset useful for all-noise tests where a
density-based clusterer should reject every point under a small
epsilon radius.bridge_noise.lgt â A synthetic 2D continuous dataset with 10
examples and 2 continuous attributes (x, y). The examples form
two dense blobs connected by two sparse bridge points, making the
dataset suitable for testing density-based algorithms that should keep
the blobs separate while treating the bridge as noise.dead_component_blobs.lgt â A synthetic 2D continuous dataset with
6 examples and 2 continuous attributes (x, y). The examples
form two tiny ordered blobs sized so an over-specified Gaussian
mixture with first_k initialization can drive one component fully
dead, making the dataset useful for dead-component policy regression
tests.duplicate_points.lgt â A synthetic 2D continuous dataset with 6
examples and 2 continuous attributes (x, y). The examples
include repeated coordinates forming a dense local cluster plus one
isolated outlier, making the dataset useful for duplicate-point and
density-threshold tests.imbalanced_three_modes.lgt â A synthetic 2D continuous dataset
with 9 examples and 2 continuous attributes (x, y). The
examples form one dense blob plus two much sparser distant modes,
making the dataset useful for imbalanced-cluster and Gaussian mixture
stress tests.iris_unlabeled.lgt â A compact Iris-derived dataset with 9
examples and 4 continuous attributes (sepal_length,
sepal_width, petal_length, petal_width). It is derived
from the classic Iris dataset but intentionally omits species labels
so it can be used with unsupervised algorithms.large_two_blobs.lgt â A synthetic 2D continuous dataset with 100
examples and 2 continuous attributes (x, y). The examples form
two dense 5x10 grids that are well separated, making the dataset
useful for performance benchmarks that need a larger deterministic
density-based clustering workload than the small smoke-test datasets.mixed_profiles.lgt â A mixed-feature dataset with 6 examples, 2
continuous attributes (age, income), and 2 discrete attributes
(channel, region). This dataset is intended for clustering
algorithms that support both numeric and categorical features.scaling_bands.lgt â A synthetic 2D continuous dataset with 6
examples and 2 continuous attributes (x, y). The examples form
two horizontal bands with large variation along x, making the
dataset useful for tests that compare clustering behavior with feature
scaling turned on versus off.shopping_profiles.lgt â A categorical dataset with 6 examples and
4 discrete attributes (channel, region, loyalty,
device). The examples form two clear shopping-profile segments
suitable for categorical clustering smoke tests.single_blob.lgt â A synthetic 2D continuous dataset with 6
examples and 2 continuous attributes (x, y). The examples form
a single compact blob suitable for one-cluster smoke tests.two_blobs.lgt â A synthetic 2D continuous dataset with 8 examples
and 2 continuous attributes (x, y). The examples form two
compact, well-separated blobs suitable for deterministic clustering
smoke tests.