Did you know ... Search Documentation:
Pack sindice -- prolog/sindice.pl
PublicShow source

This module provides the ability to formulate queries to the Sindice semantic web search engine, and to analyse the results obtained. It is based on an original module by Yves Raimond, but mostly rewritten by Samer Abdallah.

Sindice queries have serveral components:

  1. A keyword based query, which may use + and - operators to mark terms that are required or must be excluded. It may also use boolean operators AND, OR and NOT, though note that NOT has the semantics of set difference, not the set complement. NOT is a binary operator in Sindice queries.
  2. A triple based query which is built using Boolean operators from RDF triples. In this queries, a '*' denotes an constrained URI or literal
  3. One or more filters, which specify certain simple test to be applied to the returned objects.

Other parameters determine what and how much information is returned:

  1. The page parameter determines which page of a multipage query is returned.
  2. The sortbydate paramater affects the order of results (the default is to sort by relevance).
  3. The field parameter determines what information is returned about each object.

Results

Results are retreived as a named RDF graph. To interpret this, it is necessary to understand the Sindice ontology. The results consist of a set of resources of the class sindice:Result. Each item has the following properties:

  • dc:title :: literal
  • dc:created
  • sindice:cache
  • sindice:link :: url
  • sindice:rank :: xsd:integer
  • sindice:explicit_content_length
  • sindice:explicit_content_size
  • si_field:format
  • si_field:class
  • si_field:ontology
  • si_field:property

As well as information about each item, the results also contain data about the search itself, which is represented as a resource of class sindice:Query, and data about the returned page, represented as a resource of class sindice:Page. The sindice:Query has the following properties

  • sindice:totalResults :: xsd:integer
  • dc:title
  • dc:creator
  • dc:date
  • sindice:searchTerms :: literal
  • sindice:totalResults :: literal(integer)
  • sindice:itemsPerPage :: literal(integer)
  • sindice:first :: sindice:Page
  • sindice:last :: sindice:Page
  • result :: sindiceResult [nondet]

The sindice:Page has the following properties:

  • dc:title :: literal
  • sindice:next :: sindice:Page
  • sindice:previous :: sindice:Page
  • sindice:startIndex :: literal

Running queries

The core predicate for running a Sindice query is si_with_graph/4, which formulates a query from a term of type si_request and a list of options, and then loads into the RDF store, temporarily, a named graph containing the results. The last argument to si_with_graph/4 is a goal which is called with the results graph in context. The graph is only available to this goal, and is unloaded after si_with_graph/4 finished. You may use any RDF-related predicates to interrogate the graph.

On top of this is built a high-abstraction: si_with_result/5, which hides the details of large, multi-page result sets and calls a supplied goal once (disjunctively) for each result, automatically issuing multiple Sindice requests to iterate through multiple pages. You may interrogate the properties of each result only within the supplied goal. For convenience, the si_facet/2 allows a number of properties to be extracted from the RDF graph with type conversions from RDF literals to Prolog values where appropriate.

Building queries

The three main parts of a Sindice query are represented by a term of type si_request, which has several forms. Currently, these are

si_request ---> keyword(atom)
              ; keywords(list(atom))
              ; uri(resource).

A resource can be an atomic URI or a Prefix:Suffix term as understood by rdf_global_id/2. Eventually, Sindice's full query syntax, including ntriple queries and Boolean operators, will be implemented.

@seealso http://sindice.com/ http://sindice.com/developers/queryLanguage#QueryLanguage

Samer Abdallah, UCL, University of London; Yves Raimond, C4DM, Queen Mary, University of London

 sindice_url(+Req:si_request, +Opts:options, -URL:atom) is det
Formulates a Sindice query URL from a request and options. Recognised options:
sort_by_date(B:boolean)
If true, then results are sorted by date rather than relevance
fields(F:list(atom))
Specify which fields are returned for each result.
count(P:nonneg)
Number of results per page. (Incompatible with from option.)
page(P:nonneg)
Request a given page number. (Incompatible with from option.)
from(Offset:nonneg, Count:nonneg)
Starts from result number Offset+1, with Count results per page. Incompatible with count and page options. The resulting URL can be loaded with rdf_load/2.
 si_with_graph(+Req:si_request, +Opts:options, -Graph:atom, +Goal:callable) is det
Formulates a Sindice query and temporarily loads the resulting RDF graph. Graph must be a variable; it is unified with the name of the loaded graph and then Goal is called. The graph is not available outside Goal.
 si_with_result(+Req:si_request, +Opts:options, -Prog:progress, -R:resource, +Goal:callable) is nondet
For each result produced by the query, R is unified with the URI of the sindice:Result and Goal is called. Multi-page result sets are traversed automatically and on demand. The graph containing the query results is not available outside Goal and is unloaded when si_with_result/5 is finished. Progress is a term of the form Current/Total, where Total is the total number of results and Current is the index of the result currently bound to R.
 si_facet(-R:resource, -F:si_facet) is nondet
True when search result R has facet F. Current facets are:
si_facet ---> link(url)
            ; cache(url)
            ; rank(nonneg)
            ; title(atom)
            ; class(resource)
            ; predicate(resource)
            ; formats(list(atom))
            ; explicit_content_size(nonneg)
            ; explicit_content_length(nonneg)
            .