Did you know ... Search Documentation:
Pack logtalk -- logtalk-3.100.1/docs/handbook/_sources/libraries/random_forest_regression.rst.txt

.. _library_random_forest_regression:

random_forest_regression

Random Forest regressor supporting continuous and mixed-feature datasets. The library implements the regressor_protocol defined in the regression_protocols library and learns an ensemble of regression trees trained on bootstrap samples and per-split random feature subsets, predicting with the arithmetic mean of the individual tree predictions.

API documentation

Open the `../../apis/library_index.html#random_forest_regression <../../apis/library_index.html#random_forest_regression>`__ link in a web browser.

Loading

To load this library, load the loader.lgt file:

::

| ?- logtalk_load(random_forest_regression(loader)).

Testing

To test this library predicates, load the tester.lgt file:

::

| ?- logtalk_load(random_forest_regression(tester)).

To run the performance benchmark suite, load the tester_performance.lgt file:

::

| ?- logtalk_load(random_forest_regression(tester_performance)).

Features

  • Bootstrap Ensembles: Trains multiple regression trees on bootstrap samples.
  • Random Feature Subsets: Samples a random subset of the available dataset attributes at each split of every tree.
  • Portable Seeded Sampling: Uses fast_random(xoshiro128pp) so bootstrap and split-level feature sampling are portable and reproducible.
  • Tree Averaging: Predicts numeric targets using the arithmetic mean of the tree predictions.
  • Tree Configuration: Exposes the underlying regression-tree split-feature, depth, minimum-leaf, variance-reduction, and scaling options.
  • Categorical Features Encoding: Uses reference-level dummy coding derived from the declared dataset attribute values, with a missing-value indicator, and the resulting encoded features are treated as ordinary numeric split features by the tree learners.
  • Diagnostics Metadata: Learned regressors record model name, target, training example count, attribute count, tree count, and effective options, accessible using the shared regression diagnostics predicates.
  • Model Export: Learned regressors can be exported as predicate clauses or written to a file.
  • Reference Benchmarks: Includes a dedicated performance suite reporting training time, RMSE, and MAE for representative regression datasets.

Regressor representation

The learned regressor is represented by default as:

  • rf_regressor(Trees, Diagnostics) The exported predicate clauses therefore use the shape:
  • Functor(Trees, Diagnostics)

Diagnostics syntax

The diagnostics/2 predicate returns a list of metadata terms with the form:

::

[ model(random_forest_regression), target(Target), training_example_count(TrainingExampleCount), options(Options), attribute_count(AttributeCount), tree_count(TreeCount) ]

Where:

  • model(random_forest_regression) identifies the learning algorithm that produced the regressor.
  • target(Target) stores the target attribute name declared by the training dataset.
  • training_example_count(TrainingExampleCount) stores the number of examples used during training.
  • options(Options) stores the effective learning options after merging the user options with the library defaults.
  • attribute_count(AttributeCount) stores the number of dataset attributes available to the ensemble before split-level subsampling.
  • tree_count(TreeCount) stores the number of trained regression trees in the ensemble.

    Use the regression_protocols diagnostic/2 and regressor_options/2 helper predicates when you only need a single metadata term or the effective options.

Options

The learn/3 predicate accepts the following options:

  • number_of_trees/1: Number of regression trees to train in the ensemble. Increasing this value usually improves stability at the cost of additional training and prediction time. The default is 10.
  • maximum_features_per_split/1: Number of dataset attributes randomly sampled at each split when searching for the best partition. Accepted values are a positive integer or all. When omitted, the library uses the square root of the total number of available attributes, with a minimum of one attribute. Passing all disables split-level attribute subsampling.
  • maximum_depth/1: Maximum depth allowed for each regression-tree base learner. The default is 10.
  • minimum_samples_leaf/1: Minimum number of training examples required in each leaf of a base learner tree. The default is 1.
  • minimum_variance_reduction/1: Minimum split gain required by each base learner tree before accepting a partition. The default is 0.0.
  • feature_scaling/1: Controls z-score standardization of continuous attributes inside each regression-tree base learner. Accepted values are true and false. The default is false.
  • random_seed/1: Positive integer seed used by the portable fast_random(xoshiro128pp) pseudo-random generator when drawing bootstrap samples and split-level random feature subsets. Using the same seed with the same dataset and options reproduces the same learned regressor. The default is 1357911.