|Did you know ...||Search Documentation:|
|Memory management considerations|
Storing RDF triples in main memory provides much better performance than using external databases. Unfortunately, although memory is fairly cheap these days, main memory is severely limited when compared to disks. Memory usage breaks down to the following categories. Rough estimates of the memory usage is given for 64-bit systems. 32-bit system use slightly more than half these amounts.
Bucket arrays are resized if necessary. Old triples remain at their original location. This implies that a query may need to scan multiple buckets. The garbage collector may relocate old indexed triples. It does so by copying the old triple. The old triple is later reclaimed by GC. Reindexed triples will be reused, but many reindexed triples may result in a significant memory fragmentation.
The hash parameters can be controlled with rdf_set/1. Applications that are tight on memory and for which the query characteristics are more or less known can optimize performance and memory by fixing the hash-tables. By fixing the hash-tables we can tailor them to the frequent query patterns, we avoid the need for to check multiple hash buckets (see above) and we avoid memory fragmentation due to optimizing triples for resized hashes.
set_hash_parameters :- rdf_set(hash(s, size, 1048576)), rdf_set(hash(p, size, 1024)), rdf_set(hash(sp, size, 2097152)), rdf_set(hash(o, size, 1048576)), rdf_set(hash(po, size, 2097152)), rdf_set(hash(spo, size, 2097152)), rdf_set(hash(g, size, 1024)), rdf_set(hash(sg, size, 1048576)), rdf_set(hash(pg, size, 2048)).
pg. Parameter is one of:
The garbage collector
The RDF store has a garbage collector that runs in a separate thread named =__rdf_GC=. The garbage collector removes the following objects:
rdfs:subPropertyOfrelations that are related to old queries.
In addition, the garbage collector reindexes triples associated to
the hash-tables before the table was resized. The most recent resize
operation leads to the largest number of triples that require
reindexing, while the oldest resize operation causes the largest
slowdown. The parameter
optimize_threshold controlled by rdf_set/1
can be used to determine the number of most recent resize operations for
which triples will not be reindexed. The default is 2.
Normally, the garbage collector does it job in the background at a low priority. The predicate rdf_gc/0 can be used to reclaim all garbage and optimize all indexes.Warming up the database
The RDF store performs many operations lazily or in background threads. For maximum performance, perform the following steps:
warm_indexes :- ignore(rdf(s, _, _)), ignore(rdf(_, p, _)), ignore(rdf(_, _, o)), ignore(rdf(s, p, _)), ignore(rdf(_, p, o)), ignore(rdf(s, p, o)), ignore(rdf(_, _, _, g)), ignore(rdf(s, _, _, g)), ignore(rdf(_, p, _, g)).
__rdf_GCperforms garbage collection as long as it is considered‘useful'.
Using rdf_gc/0 should only be needed to ensure a fully clean database for analysis purposes such as leak detection.
The duplicates marks are used to reduce the administrative load of avoiding duplicate answers. Normally, the duplicates are marked using a background thread that is started on the first query that produces a substantial amount of duplicates.
The predicate rdf_monitor/2
allows registrations of call-backs with the RDF store. These call-backs
are typically used to keep other databases in sync with the RDF store.
library(library(semweb/rdf_persistency)) monitors the RDF
store for maintaining a persistent copy in a set of files and
library(library(semweb/rdf_litindex)) uses added and
deleted literal values to maintain a fulltext index of literals.
literal(Arg)of the triple's object. This event is introduced in version 2.5.0 of this library.
end(Nesting). Nesting expresses the nesting level of transactions, starting at‘0' for a toplevel transaction. Id is the second argument of rdf_transaction/2. The following transaction Ids are pre-defined by the library:
Mask is a list of events this monitor is interested in.
Default (empty list) is to report all events. Otherwise each element is
of the form +Event or -Event to include or exclude monitoring for
certain events. The event-names are the functor names of the events
described above. The special name
all refers to all events
assert(load) to assert events originating from rdf_load_db/1.
As loading triples using rdf_load_db/1
is very fast, monitoring this at the triple level may seriously harm
This predicate is intended to maintain derived data, such as a journal, information for undo, additional indexing in literals, etc. There is no way to remove registered monitors. If this is required one should register a monitor that maintains a dynamic list of subscribers like the XPCE broadcast library. A second subscription of the same hook predicate only re-assignes the mask.
The monitor hooks are called in the order of registration and in the
same thread that issued the database manipulation. To process all
changes in one thread they should be send to a thread message queue. For
all updating events, the monitor is called while the calling thread has
a write lock on the RDF store. This implies that these events are
processed strickly synchronous, even if modifications originate from
multiple threads. In particular, the
... updates ... end sequence is never interleaved with
other events. Same for