| Did you know ... | Search Documentation: |
| Pack logtalk -- logtalk-3.100.1/docs/handbook/_sources/libraries/iqr_anomaly_detector.rst.txt |
.. _library_iqr_anomaly_detector:
iqr_anomaly_detector
Statistical interquartile-range anomaly detector for continuous
datasets. It is a statistical anomaly-detection method based on Tukey
interquartile fences: for each known continuous attribute value it
learns Q1 and Q3, computes the exceedance beyond [Q1,Q3] in
interquartile-range units normalized by the learned
fence_multiplier/1, and then aggregates the per-attribute normalized
deviations according to score_mode/1, so any value at or beyond
[Q1 - k*IQR, Q3 + k*IQR] reaches the default anomaly boundary when
using fence_multiplier(k).
The library implements the anomaly_detector_protocol defined in the
anomaly_detection_protocols library. It learns a detector from a
continuous dataset, computes anomaly scores for new instances, predicts
normal or anomaly, and exports learned detectors as clauses or
files.
Datasets are represented as objects implementing the
anomaly_dataset_protocol protocol from the
anomaly_detection_protocols library. See the
anomaly_detection_protocols/test_datasets directory for examples.
Open the `../../apis/library_index.html#iqr_anomaly_detector <../../apis/library_index.html#iqr_anomaly_detector>`__ link in a web browser.
To load this library, load the loader.lgt file:
::
| ?- logtalk_load(iqr_anomaly_detector(loader)).
To test this library predicates, load the tester.lgt file:
::
| ?- logtalk_load(iqr_anomaly_detector(tester)).
x, the
library computes the positive exceedance of x beyond the interval
[Q1, Q3] in interquartile-range units, where Q1 and Q3 are
the learned sample quartiles.continuous.statistics library sample
object quartiles/4 predicate to compute per-attribute quartiles.baseline_class_values(ClassValues) and
baseline_selection_policy(Policy) options. The default baseline
class values are [normal]. The default reject policy throws an
error if non-baseline examples are present, while filter removes
them before fitting.score_mode(root_mean_square), the
raw score is aggregated over attributes with positive normalized
deviation so that neutral inlier attributes do not dilute fence
anomalies. The learned detector stores a precomputed attribute schema
so that scoring reuses the same attribute ordering without rebuilding
it on every call.score_mode(root_mean_square) and sparse
anomaly detection using score_mode(any_feature_extreme). The
default root-mean-square mode reuses the numberlist library
Euclidean norm predicate as part of the computation.1.5 corresponds to the
classical Tukey inner fence cutoff, and the learned multiplier is
applied directly in the score path.[0.0, 1.0) using Score = Raw / (1 + Raw).anomaly_threshold(0.5)
corresponds to the learned Tukey fence cutoff after score scaling,
while remaining overrideable in learn/3 and predict/4.domain_error(non_empty_known_values, AttributeNames) exception
when every declared feature is missing in the query.learn/2-3 throws a
domain_error(non_empty_features, Dataset) exception.The following options are supported by the public API:
anomaly_threshold(Threshold): Threshold for predict/3-4
(default: 0.5)baseline_class_values(ClassValues): Learn-time class labels that
are admissible for baseline fitting (default: [normal])baseline_selection_policy(Policy): Learn-time handling of examples
whose class is not listed in baseline_class_values/1. Supported
values are filter and reject (default: reject)fence_multiplier(Multiplier): Learn-time Tukey fence multiplier
stored in the learned detector (default: 1.5)score_mode(Mode): Learn-time score aggregation mode for
learn/3. Supported values are root_mean_square and
any_feature_extreme (default: root_mean_square). If passed to
predict/4, it is ignored and the value stored in the learned
detector is used.The learned detector is represented by default as:
::
iqr_detector(TrainingDataset, AttributeSchema, Encoders, Diagnostics)
Where:
TrainingDataset: training dataset object identifierAttributeSchema: precomputed attribute ordering used for
validation and scoringEncoders: list of
iqr_anomaly_detector(Attribute, Q1, Q3, Scale) recordsDiagnostics: learned metadata terms including model/1,
training_dataset/1, attribute_names/1, feature_count/1,
example_count/1, and options/1
When exported using export_to_clauses/4 or export_to_file/4, this detector term is serialized directly as the single argument of the generated predicate clause so that the exported model can be loaded and reused as-is.
Scoring has three stages. First, the detector computes one per-attribute
IQR deviation score for each known attribute value using its exceedance
beyond the interval [Q1, Q3] in interquartile-range units. Second,
each per-attribute score is normalized by the learned
fence_multiplier/1, so that it reaches 1.0 exactly at the chosen
Tukey fence cutoff. Third, those normalized per-attribute scores are
aggregated into a single raw deviation score according to the learned
score_mode/1 option before being mapped to the interval
[0.0, 1.0) using Score = Raw / (1 + Raw).
The score_mode/1 option does not change the per-attribute quartile
formula. It only changes the aggregation step. With
score_mode(root_mean_square), the raw score is the root mean square
of the positive normalized per-attribute deviations, computed over the
attributes with positive deviation so that inlier padding does not
dilute fence-reaching anomalies. With
score_mode(any_feature_extreme), the raw score is the maximum
normalized per-attribute deviation.
The fence_multiplier/1 option defines the classical Tukey anomaly
cutoff and directly normalizes the per-attribute deviation scores. With
any learned fence_multiplier(K), a per-attribute score of 1.0
means the query has reached the chosen Tukey fence on that attribute, so
the default normalized threshold 0.5 corresponds to that cutoff in
both supported aggregation modes.
The baseline_class_values/1 option declares which dataset class
labels are admissible for fitting the baseline quartiles and
interquartile ranges. The baseline_selection_policy/1 option then
controls what happens when other labels are present in the training
data. The default reject policy raises a
domain_error(baseline_only_training_data, Dataset) exception when
any non-baseline example is found. The filter policy removes
non-baseline examples before fitting.
Attributes with zero observed interquartile range are assigned a
fallback scale of 1.0. This keeps the detector well-defined for
singleton datasets or constant columns while still yielding zero score
for matching values and positive scores for deviating values.
The root-mean-square aggregation keeps the default threshold stable while avoiding dilution from missing or neutral inlier attributes.
Use score_mode(any_feature_extreme) when a single extreme feature
should be sufficient to flag an anomaly in high-dimensional data.