Did you know ... | Search Documentation: |
Janus calling spaCy |
The spaCy package provides natural language processing. This section illustrates the Janus library using spaCy. Typically, spaCy and the English language models may be installed using
> pip install spacy > python -m spacy download en
After spaCy is installed, we can define model/1 to represent a Python object for the English language model using the code below. Note that by tabling this code as shared, the model is loaded only once and is accessible from multiple Prolog threads.
:- table english/1 as shared. english(NLP) :- py_call(spacy:load(en_core_web_sm), NLP).
Calling english(X)
results in X =
<py_English>(0x7f703c24f430)
, a blob that
references a Python object. English is the name of the Python
class to which the object belongs and 0x7f703c24f430 is the
address of the object. The returned object implements the Python
callable protocol, i.e., it behaves as a function with
additional properties and methods. Calling the model with a string
results in a parsed document. We can use this from Prolog using the
built-in __call__
method:
?- english(NLP), py_call(NLP:'__call__'("This is a sentence."), Doc). NLP = <py_English>(0x7f703851b8e0), Doc = [<py_Token>(0x7f70375be9d0), <py_Token>(0x7f70375be930), <py_Token>(0x7f70387f8860), <py_Token>(0x7f70376dde40), <py_Token>(0x7f70376de200) ].
This is not what we want. Because the spaCy Doc
class
implements the sequence protocol it is translated into a Prolog
list of spaCy Token
instances. The Doc
class
implements many more methods that we may wish to use. An example is
noun_chunks
, which provides a Python generator
that enumerates the noun chunks found in the input. Each chunk is an
instance of Span
, a sequence of Token
instances that have the property text
. The program below
extracts the noun chunks of the input as a non-deterministic Prolog
predicate. Note that we use py_object(true)
to get the
parsed document as a Python object. Next, we use py_iter/2
to access the members of the Python iterator returned by Doc.noun_chunks
as Python object references and finally we extract the text of each noun
chunk as an atom. The SWI-Prolog (atom) garbage collector will take care
of the Doc and Span Python objects. Immediate
release of these objects can be enforced using py_free/1.2Janus
implementations are not required to implement Python object reference
garbage collection.
:- use_module(library(janus)). :- table english/1. english(NLP) :- py_call(spacy:load(en_core_web_sm),NLP). noun(Sentence, Noun) :- english(NLP), py_call(NLP:'__call__'(Sentence), Doc, [py_object(true)]), py_iter(Doc:noun_chunks, Span, [py_object]), py_call(Span:text, Noun).
After which we can call
?- noun("This is a sentence.", Noun). Noun = 'This' ; Noun = 'a sentence'.
The subsequent section 4 documents
the Prolog library
library(janus)
.