SWI-Prolog -- Janus calling spaCy

Documentation
- Reference manual
- Packages
  - SWI-Prolog Python interface
    - Janus by example - Prolog calling Python
      - Janus calling spaCy

3.1 Janus calling spaCy

The spaCy package provides natural language processing. This section illustrates the Janus library using spaCy. Typically, spaCy and the English language models may be installed using

> pip install spacy
> python -m spacy download en

After spaCy is installed, we can define model/1 to represent a Python object for the English language model using the code below. Note that by tabling this code as shared, the model is loaded only once and is accessible from multiple Prolog threads.

:- table english/1 as shared.

english(NLP) :-
    py_call(spacy:load(en_core_web_sm), NLP).

Calling english(X) results in X = <py_English>(0x7f703c24f430), a blob that references a Python object. English is the name of the Python class to which the object belongs and 0x7f703c24f430 is the address of the object. The returned object implements the Python callable protocol, i.e., it behaves as a function with additional properties and methods. Calling the model with a string results in a parsed document. We can use this from Prolog using the built-in __call__ method:

?- english(NLP),
   py_call(NLP:'__call__'("This is a sentence."), Doc).
NLP = <py_English>(0x7f703851b8e0),
Doc = [<py_Token>(0x7f70375be9d0), <py_Token>(0x7f70375be930),
       <py_Token>(0x7f70387f8860), <py_Token>(0x7f70376dde40),
       <py_Token>(0x7f70376de200)
      ].

This is not what we want. Because the spaCy Doc class implements the sequence protocol it is translated into a Prolog list of spaCy Token instances. The Doc class implements many more methods that we may wish to use. An example is noun_chunks, which provides a Python generator that enumerates the noun chunks found in the input. Each chunk is an instance of Span, a sequence of Token instances that have the property text. The program below extracts the noun chunks of the input as a non-deterministic Prolog predicate. Note that we use py_object(true) to get the parsed document as a Python object. Next, we use py_iter/2 to access the members of the Python iterator returned by Doc.noun_chunks as Python object references and finally we extract the text of each noun chunk as an atom. The SWI-Prolog (atom) garbage collector will take care of the Doc and Span Python objects. Immediate release of these objects can be enforced using py_free/1.^{2Janus
implementations are not required to implement Python object reference
garbage collection.}

:- use_module(library(janus)).

:- table english/1.

english(NLP) :-
    py_call(spacy:load(en_core_web_sm),NLP).

noun(Sentence, Noun) :-
    english(NLP),
    py_call(NLP:'__call__'(Sentence), Doc, [py_object(true)]),
    py_iter(Doc:noun_chunks, Span, [py_object]),
    py_call(Span:text, Noun).

After which we can call

?- noun("This is a sentence.", Noun).
Noun = 'This' ;
Noun = 'a sentence'.

The subsequent section 4 documents the Prolog library library(janus).