| Did you know ... | Search Documentation: |
| Pack pro2sql -- prolog/pro2sql.pro |
A lightweight Prolog to SQL SELECT translator.
This module is a minimal compiler of prolog "query predicates" to sql SELECT statements. I'm inspired by Draxler's work [1][], but I couldn't access his original paper, and it was too much work trying to use Mungall's implementation without much documentation. It seemed to me that it would be less work (and more fun) to implement a minimalist translator:
set_flag(regexp,Template).The aim is that a "query predicate" (a prolog predicate intended to retrieve results from fact predicates) works the same in prolog as in SQL -- except that SQL returns a table, where prolog returns variable result alternatives.
The use case is where you want to extract from a database through SQL and then further query the results in Prolog.
In SWI-Prolog, package_install(pro2sql).
Otherwise, clone the git repository (https://github.com/RdR1024/pro2sql), and copy the files from `...pro2sql/src/prolog/`
Let's assume that you have an SQL table called person with fields id,name,age. In Prolog, assert the following table definition predicate:
:- assert(table_def(person,[id,name,age],[prefix(mydataset)])).
Let's also assume that you have also have a predicate that would list all people with age >= 21.
adult(Name,Age):- person(_,_,Age), Age >= 21.
This query predicate is stock-standard Prolog. And if you had fact predicates for person/3 then the query predicate would simply give you all the names and ages of people over 21. However, let's say that the data is not in Prolog, but in the SQL database. You can translate that predicate into an SQL query like this:
:- pro2sql(adult(Name,Age),SQL). Name = person.name Age = person.age SQL = 'SELECT person.name, person.age FROM mydataset.person WHERE person.age >= 21'
And you can run this query on Google's BigQuery like this (assuming you have an account and that the person table is available in the mydataset dataset):
:- bq(adult(Name,Age),[name,age],Status,[]).
By default, the results become available in Prolog as adult_result(Name,Age) (same predicate name as the query, but with _result suffix). The results also exist in a downloaded `.csv` table, with the column headings given in the second argument of bq/4.
:- adult_result(Name,Age). Name = john Age = 23 ; ...
For more examples, see the code comments and also test_queries.pro in the test folder of the repository. The bq/4 predicate currently uses BigQuery commandline utility (bq). I intend to write an BiqQuery API interface for Prolog at some time in the future. In the meantime, I have also left some notes on BigQuery bq in the `.../doc` folder.
This code is currently alpha.
All requirements for now are implemented, and it seems to be working with a small set of tests. However, I want to test it more comprehensively before calling it beta.
'mytable.person' = jones) should be translated as a string (e.g. "jones"). This method is not perfect -- there could be atoms that contain full-stops (periods) that should be translated as a string. Therefore, the recommendation is to store string values as strings in Prolog.company_age(group(Company), avg(Age)). These aggregation functions will translate to SQL and work fine.
But if you use the predicate directly in Prolog, the aggregation function won't do anything. Rather, to get the same results as SQL, but in Prolog, you would have to run the query predicate through an aggregate predicate (like the aggregate/3 from library(aggregate).
For example, assume that you have a query predicate tonnage(sum(Weight)):- package(_,_,Weight). In Prolog, you could get
the same results as SQL, by applying aggregate/3 -- that is, aggregate(sum(Weight),tonnage(sum(Weight)),Total).
I use a lot of difference lists. My preferred notation is List^Tail, where List = [x1,x2,x3,...|Tail]. In other words, the `^Tail` suffix simply provides another copy of the tail variable. For example I use Select^ST as the difference list for the SELECT clauses. Typically, I use xT for the initial tail variabe (e.g. ST,FT,WT for the SELECT,FROM and WHERE tails respectively) and NxT for the "new" (after processing) tails.
The initial (empty) value when calling a predicate with difference lists is something like S^S. With difference lists, adding a new value happens like this: ST = [NewValue|NST]. Here, the new value is made into a list with the new tail and then that list is unified with the old tail. Alternative, if we have a list of new values, append(NewValues,NST,ST) would do the same.
[1] "Draxler C (1992) Prolog to SQL Compiler, version 1.0. Technical report, CIS Centre for Information and Speech, University of Munich"
BSD 3-Clause License
Copyright (c) 2023, Richard de Rozario
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
table(tablename).
Example. Note: we're using assert here from the prolog prompt, but table_def/3 can
be asserted from a prolog file like any other predicate. Also, load_csv/2 will
automatically create a table_dev/3.
:- assert(table_def(person,[id,name,age],[prefix(tiny),table(people)])).
Example
:- assert(table_def(person,[id,name,age],[prefix(tiny),table(people)])).
:- assert( (adults(Name,Age):- person(_,Name,Age), Age >=21) ).
:- pro2sql(adults(Name,Age), SQL).
Name = 'people.name',
Age = 'people.age',
SQL = 'SELECT people.name,people.age FROM tiny.people WHERE people.age >= 21'
:- assert( (teens(Name,Age):- person(_,Name,Age), between(16,21,Age), \+ member(Name,[john,jane])) ).
:- pro2sql(teens(Name,Age),SQL).
Name = 'people.name',
Age = 'people.age',
SQL = 'SELECT people.name,people.age FROM tiny.people WHERE people.age BETWEEN 16 AND 21 AND NOT ( people.name IN ("john","jane") )' .
Example
Example
:- concat_to_atom([hello,"John"],' ',Atom). Atom = 'hello "John"'
Options:
functor(F) - F is an atom that will be used as the predicate name for the rows read into the Prolog databasetable(T) - 'T' is the table name (as used for SQL). This option is passed on to table_def/3.prefix(P) - 'P' is a atom that will be used as a prefix for the table (as used in the SQL FROM clause). This option is passed on to table_def/3.headings(true)- true (default) means the csv has a headings (i.e. column names) row
Example (assumes existence of a file people.csv)
:- load_csv('people.csv',3,[functor(person),prefix(tiny),table(people)]).
:- table_def(person,Fields,Options).
Fields = [id,name,age]
Options = [prefix(tiny),table(people)] .
:- person(ID,Name,Age).
ID = 1
Name = john
Age = 13 ;...
@arg File The csv file to load @arg Fields List of atom field names. If Fields is an Integer, it specifies the number of csv columns and the field names may be read from first row of csv. @arg Options Options that control the fact predicates loaded into Prolog
Func/N from Opts functor(Func). If no functor is given in Opts
the `<pred>_result convention is used, where <pred>` is the functor
of the QueryPred.
Notes: at the moment this is coded with the assumption that
BigQuery's CLI utility is installed (i.e. bq). At some point
we'll replace that with direct calls to the BQ REST API.
Options:
table(T) - 'T' is the table name (as used for SQL). This option is passed on to table_def/3.prefix(P) - 'P' is a atom that will be used as a prefix for the table (as used in the SQL FROM clause).
This option is passed on to table_def/3.headings(true)- true (default) means the csv has a headings (i.e. column names) row. This option is passed to load_csv/3
Example
:- [test_queries]. :- bq(adults(_,_),[name,age],Status,[]). Status = 0 . :- adults_result(Name,Age). Name = jane, Age = 22 ; ...
@arg QueryPred The prolog query predicate @arg Status 0=successful, 1=error (error text in the results file Func.csv) @arg Fields Integer indicating number of result fields (columns), or a list of column headers @arg Opts Options
The following predicates are exported, but not or incorrectly documented.