| Did you know ... | Search Documentation: | 
|  | Janus calling spaCy | 
The spaCy package provides natural language processing. This section illustrates the Janus library using spaCy. Typically, spaCy and the English language models may be installed using
> pip install spacy > python -m spacy download en
After spaCy is installed, we can define model/1 to represent a Python object for the English language model using the code below. Note that by tabling this code as shared, the model is loaded only once and is accessible from multiple Prolog threads.
:- table english/1 as shared.
english(NLP) :-
    py_call(spacy:load(en_core_web_sm), NLP).
Calling english(X) results in X =
<py_English>(0x7f703c24f430), a blob that 
references a Python object. English is the name of the Python 
class to which the object belongs and 0x7f703c24f430 is the 
address of the object. The returned object implements the Python
callable protocol, i.e., it behaves as a function with 
additional properties and methods. Calling the model with a string 
results in a parsed document. We can use this from Prolog using the 
built-in __call__ method:
?- english(NLP),
   py_call(NLP:'__call__'("This is a sentence."), Doc).
NLP = <py_English>(0x7f703851b8e0),
Doc = [<py_Token>(0x7f70375be9d0), <py_Token>(0x7f70375be930),
       <py_Token>(0x7f70387f8860), <py_Token>(0x7f70376dde40),
       <py_Token>(0x7f70376de200)
      ].
This is not what we want. Because the spaCy Doc class 
implements the sequence protocol it is translated into a Prolog 
list of spaCy Token instances. The Doc class 
implements many more methods that we may wish to use. An example is
noun_chunks, which provides a Python generator 
that enumerates the noun chunks found in the input. Each chunk is an 
instance of Span, a sequence of Token 
instances that have the property text. The program below 
extracts the noun chunks of the input as a non-deterministic Prolog 
predicate. Note that we use py_object(true) to get the 
parsed document as a Python object. Next, we use py_iter/2 
to access the members of the Python iterator returned by Doc.noun_chunks 
as Python object references and finally we extract the text of each noun 
chunk as an atom. The SWI-Prolog (atom) garbage collector will take care 
of the Doc and Span Python objects. Immediate 
release of these objects can be enforced using py_free/1.2Janus 
implementations are not required to implement Python object reference 
garbage collection.
:- use_module(library(janus)).
:- table english/1.
english(NLP) :-
    py_call(spacy:load(en_core_web_sm),NLP).
noun(Sentence, Noun) :-
    english(NLP),
    py_call(NLP:'__call__'(Sentence), Doc, [py_object(true)]),
    py_iter(Doc:noun_chunks, Span, [py_object]),
    py_call(Span:text, Noun).
After which we can call
?- noun("This is a sentence.", Noun).
Noun = 'This' ;
Noun = 'a sentence'.
The subsequent section 4 documents 
the Prolog library
library(janus).