pro2sql

A lightweight Prolog to SQL SELECT translator.

Introduction

This module is a minimal compiler of prolog "query predicates" to sql SELECT statements. I'm inspired by Draxler's work [1][], but I couldn't access his original paper, and it was too much work trying to use Mungall's implementation without much documentation. It seemed to me that it would be less work (and more fun) to implement a minimalist translator:

Automatically translates constraints like Age > 16, etc.
Autmatically translates member/2 to `... IN List`
Automatically translates SWI-Prolog between/3
Does "joins" through shared variable names between predicates
No LIKE, but instead REGEXP_CONTAINS (used in Google SQL). Configurable for other SQLs with set_flag(regexp,Template).
DISTINCT and ALL are implemented as one-place ops in Prolog (for head arguments).
Only implements translation to SELECT... (no UPDATE, etc.)
No subqueries. Generate subqueries as result tables first and then use as joins.
Includes a simple utility predicate that will translate a prolog query predicate, send it to Google BigQuery, and load the results as fact predicates.
The aim is that a "query predicate" (a prolog predicate intended to retrieve results from fact predicates) works the same in prolog as in SQL -- except that SQL returns a table, where prolog returns variable result alternatives.

The use case is where you want to extract from a database through SQL and then further query the results in Prolog.

Installation

In SWI-Prolog, package_install(pro2sql).

Otherwise, clone the git repository (https://github.com/RdR1024/pro2sql), and copy the files from `...pro2sql/src/prolog/`

Usage

Let's assume that you have an SQL table called person with fields id,name,age. In Prolog, assert the following table definition predicate:

:- assert(table_def(person,[id,name,age],[prefix(mydataset)])).

Let's also assume that you have also have a predicate that would list all people with age >= 21.

adult(Name,Age):- person(_,_,Age), Age >= 21.

This query predicate is stock-standard Prolog. And if you had fact predicates for person/3 then the query predicate would simply give you all the names and ages of people over 21. However, let's say that the data is not in Prolog, but in the SQL database. You can translate that predicate into an SQL query like this:

:- pro2sql(adult(Name,Age),SQL).
Name = person.name
Age = person.age
SQL = 'SELECT person.name, person.age FROM mydataset.person WHERE person.age >= 21'

And you can run this query on Google's BigQuery like this (assuming you have an account and that the person table is available in the mydataset dataset):

:- bq(adult(Name,Age),[name,age],Status,[]).

By default, the results become available in Prolog as adult_result(Name,Age) (same predicate name as the query, but with _result suffix). The results also exist in a downloaded `.csv` table, with the column headings given in the second argument of bq/4.

:- adult_result(Name,Age).
Name = john
Age = 23 ; ...

For more examples, see the code comments and also test_queries.pro in the test folder of the repository. The bq/4 predicate currently uses BigQuery commandline utility (bq). I intend to write an BiqQuery API interface for Prolog at some time in the future. In the meantime, I have also left some notes on BigQuery bq in the `.../doc` folder.

Status

This code is currently alpha.

All requirements for now are implemented, and it seems to be working with a small set of tests. However, I want to test it more comprehensively before calling it beta.

Notes

field names are always translated with the table name as prefix. We also use this to determine if an atom in a constraint (e.g. 'mytable.person' = jones) should be translated as a string (e.g. "jones"). This method is not perfect -- there could be atoms that contain full-stops (periods) that should be translated as a string. Therefore, the recommendation is to store string values as strings in Prolog.
The arguments in the head of the prolog query predicate may contain aggregation functions, for example company_age(group(Company), avg(Age)). These aggregation functions will translate to SQL and work fine. But if you use the predicate directly in Prolog, the aggregation function won't do anything. Rather, to get the same results as SQL, but in Prolog, you would have to run the query predicate through an aggregate predicate (like the aggregate/3 from library(aggregate).
For example, assume that you have a query predicate tonnage(sum(Weight)):- package(_,_,Weight). In Prolog, you could get the same results as SQL, by applying aggregate/3 -- that is, aggregate(sum(Weight),tonnage(sum(Weight)),Total).

Code Notes

I use a lot of difference lists. My preferred notation is List^Tail, where List = [x1,x2,x3,...|Tail]. In other words, the `^Tail` suffix simply provides another copy of the tail variable. For example I use Select^ST as the difference list for the SELECT clauses. Typically, I use xT for the initial tail variabe (e.g. ST,FT,WT for the SELECT,FROM and WHERE tails respectively) and NxT for the "new" (after processing) tails.

The initial (empty) value when calling a predicate with difference lists is something like S^S. With difference lists, adding a new value happens like this: ST = [NewValue|NST]. Here, the new value is made into a list with the new tail and then that list is unified with the old tail. Alternative, if we have a list of new values, append(NewValues,NST,ST) would do the same.

References

[1] "Draxler C (1992) Prolog to SQL Compiler, version 1.0. Technical report, CIS Centre for Information and Speech, University of Munich"

License

BSD 3-Clause License

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

table_def(+TableName, +FieldList, +Options) is multi[multifile]

Defines the table name and field names of an SQL table. TableName should match the functor (predicate name) of the prolog predicate that represents the table. If the actual table or view has a different name, this can be given as the option table(tablename).

Example. Note: we're using assert here from the prolog prompt, but table_def/3 can be asserted from a prolog file like any other predicate. Also, load_csv/2 will automatically create a table_dev/3.

:- assert(table_def(person,[id,name,age],[prefix(tiny),table(people)])).

Arguments:

`TableName`	- Atom that matches the predicate name, representing an SQL table, or a fact in Prolog.
`FieldList`	- List that contains atomic field names of the SQL table.
`Options`	- List of options. Currently either `prefix-PrefixForTables` and/or `table-TableName`

pro2sql(+Head, -SQL) is semidet

Translate a prolog predicate to an sql SELECT statement.

Example

:- assert(table_def(person,[id,name,age],[prefix(tiny),table(people)])).
:- assert( (adults(Name,Age):- person(_,Name,Age), Age >=21) ).
:- pro2sql(adults(Name,Age), SQL).
Name = 'people.name',
Age = 'people.age',
SQL = 'SELECT people.name,people.age FROM tiny.people WHERE people.age >= 21'

:- assert( (teens(Name,Age):- person(_,Name,Age), between(16,21,Age), \+ member(Name,[john,jane])) ).
:- pro2sql(teens(Name,Age),SQL).
Name = 'people.name',
Age = 'people.age',
SQL = 'SELECT people.name,people.age FROM tiny.people WHERE people.age BETWEEN 16 AND 21 AND NOT ( people.name IN ("john","jane") )' .

Arguments:

`Head`	- Predicate head clause of the predicate to translate into `SQL`
`SQL`	- Resulting `SQL` text

nondot_str(+Atom, -Convert) is semidet

Convert an atom that does not contain a full-stop into a string

Example

Arguments:

`Atom`	- the atom value
`Convert`	- the resulting atom or string

concat_to_atom(+List, -Atom) is semidet

concat_to_atom(+List, +Sep, -Atom) is semidet

Like atomic_list_concat/3 but strings retain quotes and compound terms are converted to atoms

Example

:- concat_to_atom([hello,"John"],' ',Atom).
Atom = 'hello "John"'

Arguments:

`List`	- `List` of atomic values to convert
`Sep`	- `Atom` used as separator
`Atom`	- The resulting atom with concatenated text

load_csv(+File, +Fields, +Options) is semidet

Load the contents of a headerless csv file into the prolog database as clauses of the functor specified in Options.

Options:

functor(F) - F is an atom that will be used as the predicate name for the rows read into the Prolog database
table(T) - 'T' is the table name (as used for SQL). This option is passed on to table_def/3.
prefix(P) - 'P' is a atom that will be used as a prefix for the table (as used in the SQL FROM clause). This option is passed on to table_def/3.
nodef - option to omit creation of a table_def/3 table definition (i.e. not an SQL table import)
headings(true)- true (default) means the csv has a headings (i.e. column names) row Example (assumes existence of a file people.csv)
```
:- load_csv('people.csv',3,[functor(person),prefix(tiny),table(people)]).
:- table_def(person,Fields,Options).
Fields = [id,name,age]
Options = [prefix(tiny),table(people)] .

:- person(ID,Name,Age).
ID = 1
Name = john
Age = 13 ;...
```
@arg File The csv file to load @arg Fields List of atom field names. If Fields is an Integer, it specifies the number of csv columns and the field names may be read from first row of csv. @arg Options Options that control the fact predicates loaded into Prolog

bq(+QueryPred, +Fields, -Status, +Opts) is semidet

Query the Google BigQuery database by translating the Prolog predicate QueryPred into SQL and sending it to BigQuery. Results are automatically loaded into Prolog as fact predicates Func/N from Opts functor(Func). If no functor is given in Opts the `<pred>_result convention is used, where <pred>` is the functor of the QueryPred.

Notes: at the moment this is coded with the assumption that BigQuery's CLI utility is installed (i.e. bq). At some point we'll replace that with direct calls to the BQ REST API.

Options:

table(T) - 'T' is the table name (as used for SQL). This option is passed on to table_def/3.
prefix(P) - 'P' is a atom that will be used as a prefix for the table (as used in the SQL FROM clause). This option is passed on to table_def/3.
nodef - option to omit creation of a table_def/3 table definition (i.e. not an SQL table import)
headings(true)- true (default) means the csv has a headings (i.e. column names) row. This option is passed to load_csv/3 Example
```
:- [test_queries].
:- bq(adults(_,_),[name,age],Status,[]).
Status = 0 .
:- adults_result(Name,Age).
Name = jane,
Age = 22 ; ...
```
@arg QueryPred The prolog query predicate @arg Status 0=successful, 1=error (error text in the results file Func.csv) @arg Fields Integer indicating number of result fields (columns), or a list of column headers @arg Opts Options

Undocumented predicates

The following predicates are exported, but not or incorrectly documented.

pro2sql(Arg1, Arg2, Arg3)
concat_to_atom(Arg1, Arg2, Arg3)
member_dif(Arg1, Arg2)
load_csv(Arg1, Arg2, Arg3, Arg4)