| Did you know ... | Search Documentation: |
| Pack logtalk -- logtalk-3.100.1/docs/handbook/_sources/libraries/anomaly_detection_protocols.rst.txt |
.. _library_anomaly_detection_protocols:
anomaly_detection_protocols
This library provides protocols used in the implementation of machine
learning anomaly-detection algorithms. Datasets are represented as
objects implementing the anomaly_dataset_protocol protocol. Anomaly
detectors are represented as objects importing the
anomaly_detector_common category which imports the
anomaly_detector_protocol protocol. The category provides shared
learn/2, predict/3-4, diagnostics/2, diagnostic/2,
anomaly_detector_options/2, file export, baseline training-selection
helpers, and dataset helper predicates. It keeps threshold-based
prediction and export behavior separate from the algorithm-specific
learning, scoring, clause export, pretty-printing, and diagnostics
metadata code.
The shared category also provides reusable protected predicates for baseline-only training workflows. Libraries can use the baseline_class_values/1 and baseline_selection_policy/1 options via a single helper instead of reimplementing class-label validation and baseline-example filtering or rejection logic locally.
Learned detector terms can be validated explicitly using the shared check_anomaly_detector/1 and valid_anomaly_detector/1 predicates. This validation API is never called implicitly by scoring, prediction, printing, or export predicates.
This library also provides a reusable shared category, anomaly benchmark datasets, and a small family smoke-test suite.
The shared exporter in the anomaly_detector_common category writes a
header before the exported clauses in the following format:
::
% exported anomaly detector predicate: Functor/Arity % training dataset: Dataset % options: Options % Functor(Detector) Functor(Detector)
The exported clauses serialize the learned detector term as a single
predicate argument so that loading the file gives a detector term that
can be passed directly to the predict/3-4 and score_all/3
predicates.
When exporting a serialized detector term, using a noun such as detector/1 or model/1 is recommended.
Open the `../../apis/library_index.html#anomaly-detection-protocols <../../apis/library_index.html#anomaly-detection-protocols>`__ link in a web browser.
To load all entities in this library, load the loader.lgt file:
\| ?- logtalk_load(anomaly_detection_protocols(loader)).
To run the library smoke tests, shared category tests, and dataset
checks, load the tester.lgt file:
\| ?- logtalk_load(anomaly_detection_protocols(tester)).
Several sample datasets are included in the test_datasets directory:
gaussian_anomalies.lgt â A synthetic 2D anomaly detection dataset
with 48 examples and 2 continuous attributes (x, y). Normal points are
sampled from a standard normal distribution centered at the origin.
Anomalous points are placed far from the cluster center. Inspired by
the canonical test case used in the Extended Isolation Forest paper by
Hariri et al. (2019).malformed_anomalies.lgt â A negative fixture with invalid class
labels for testing family-level dataset validation.mixed_anomalies.lgt â A small mixed-feature anomaly dataset with
16 examples, 2 continuous attributes (age, income), and 2 categorical
attributes (student, credit_rating). Includes missing values and
uncommon feature combinations to exercise anomaly-detection code on
heterogeneous data.mixed_distance_behaviors.lgt â A compact mixed-feature anomaly
fixture with 8 examples, 2 continuous attributes (size, weight), and 2
categorical attributes (color, shape). Intended for smoke-testing
continuous plus categorical distance behavior and basic mixed-data
handling.sensor_anomalies.lgt â A synthetic industrial sensor anomaly
dataset with 40 examples and 3 continuous attributes (temperature,
pressure, vibration). Contains missing values (14 examples with
missing values, represented using anonymous variables). Normal
readings cluster around typical operating ranges. Anomalous readings
show extreme values indicating equipment malfunction.shuttle_anomalies.lgt â A subset of the Statlog Shuttle dataset
with 50 examples and 9 continuous attributes representing sensor
readings from the NASA Space Shuttle. Class 1 (Rad Flow) is the
majority class (normal), while all other classes are treated as
anomalies. Originally from Catlett, J. (1991). Available from the UCI
Machine Learning Repository:
https://archive.ics.uci.edu/dataset/148/statlog+shuttlewater_potability.lgt â A water potability dataset with 48 examples
and 9 continuous attributes (pH, hardness, solids, chloramines,
sulfate, conductivity, organic carbon, trihalomethanes, turbidity).
Normal instances represent potable water samples within acceptable
ranges. Anomalous instances represent water samples with hazardous
contamination levels. Based on the publicly available Water Quality
dataset (Kadiwal, A., 2020, Kaggle).