.. _library_random_forest_regression:

`random_forest_regression`

Random Forest regressor supporting continuous and mixed-feature datasets. The library implements the regressor_protocol defined in the regression_protocols library and learns an ensemble of regression trees trained on bootstrap samples and per-split random feature subsets, predicting with the arithmetic mean of the individual tree predictions.

API documentation

Open the `../../apis/library_index.html#random_forest_regression <../../apis/library_index.html#random_forest_regression>`__ link in a web browser.

Loading

To load this library, load the loader.lgt file:

| ?- logtalk_load(random_forest_regression(loader)).

Testing

To test this library predicates, load the tester.lgt file:

| ?- logtalk_load(random_forest_regression(tester)).

To run the performance benchmark suite, load the tester_performance.lgt file:

| ?- logtalk_load(random_forest_regression(tester_performance)).

Features

Bootstrap Ensembles: Trains multiple regression trees on bootstrap samples.
Random Feature Subsets: Samples a random subset of the available dataset attributes at each split of every tree.
Portable Seeded Sampling: Uses fast_random(xoshiro128pp) so bootstrap and split-level feature sampling are portable and reproducible.
Tree Averaging: Predicts numeric targets using the arithmetic mean of the tree predictions.
Tree Configuration: Exposes the underlying regression-tree split-feature, depth, minimum-leaf, variance-reduction, and scaling options.
Categorical Features Encoding: Uses reference-level dummy coding derived from the declared dataset attribute values, with a missing-value indicator, and the resulting encoded features are treated as ordinary numeric split features by the tree learners.
Diagnostics Metadata: Learned regressors record model name, target, training example count, attribute count, tree count, and effective options, accessible using the shared regression diagnostics predicates.
Model Export: Learned regressors can be exported as predicate clauses or written to a file.
Reference Benchmarks: Includes a dedicated performance suite reporting training time, RMSE, and MAE for representative regression datasets.

Regressor representation

The learned regressor is represented by default as:

rf_regressor(Trees, Diagnostics) The exported predicate clauses therefore use the shape:
Functor(Trees, Diagnostics)

Diagnostics syntax

The diagnostics/2 predicate returns a list of metadata terms with the form:

[ model(random_forest_regression), target(Target), training_example_count(TrainingExampleCount), options(Options), attribute_count(AttributeCount), tree_count(TreeCount) ]

Where:

model(random_forest_regression) identifies the learning algorithm that produced the regressor.
target(Target) stores the target attribute name declared by the training dataset.
training_example_count(TrainingExampleCount) stores the number of examples used during training.
options(Options) stores the effective learning options after merging the user options with the library defaults.
attribute_count(AttributeCount) stores the number of dataset attributes available to the ensemble before split-level subsampling.
tree_count(TreeCount) stores the number of trained regression trees in the ensemble.
Use the regression_protocols diagnostic/2 and regressor_options/2 helper predicates when you only need a single metadata term or the effective options.

Options

The learn/3 predicate accepts the following options:

number_of_trees/1: Number of regression trees to train in the ensemble. Increasing this value usually improves stability at the cost of additional training and prediction time. The default is 10.
maximum_features_per_split/1: Number of dataset attributes randomly sampled at each split when searching for the best partition. Accepted values are a positive integer or all. When omitted, the library uses the square root of the total number of available attributes, with a minimum of one attribute. Passing all disables split-level attribute subsampling.
maximum_depth/1: Maximum depth allowed for each regression-tree base learner. The default is 10.
minimum_samples_leaf/1: Minimum number of training examples required in each leaf of a base learner tree. The default is 1.
minimum_variance_reduction/1: Minimum split gain required by each base learner tree before accepting a partition. The default is 0.0.
feature_scaling/1: Controls z-score standardization of continuous attributes inside each regression-tree base learner. Accepted values are true and false. The default is false.
random_seed/1: Positive integer seed used by the portable fast_random(xoshiro128pp) pseudo-random generator when drawing bootstrap samples and split-level random feature subsets. Using the same seed with the same dataset and options reproduces the same learned regressor. The default is 1357911.