Did you know ... Search Documentation:
http_open.pl -- HTTP client library
PublicShow source

This library defines http_open/3, which opens an URL as a Prolog stream. The functionality of the library can be extended by loading two additional modules that act as plugins:

Loading this library causes http_open/3 to handle HTTPS connections. Relevant options for SSL certificate handling are handed to ssl_context/3. This plugin is loaded automatically if the scheme https is requested using a default SSL context. See the plugin for additional information regarding security.
Loading this library supports the gzip transfer encoding. This plugin is lazily loaded if a connection is opened that claims this transfer encoding.
Loading this library adds tracking cookies to http_open/3. Returned cookies are collected in the Prolog database and supplied for subsequent requests.
This library adds support for chunked encoding and makes the http_open/3 advertise itself as HTTP/1.1 instead of HTTP/1.0.

Here is a simple example to fetch a web-page:

?- http_open('http://www.google.com/search?q=prolog', In, []),
   copy_stream_data(In, user_output),
<!doctype html><head><title>prolog - Google Search</title><script>

The example below fetches the modification time of a web-page. Note that Modified is '' (the empty atom) if the web-server does not provide a time-stamp for the resource. See also parse_time/2.

modified(URL, Stamp) :-
       http_open(URL, In,
                 [ method(head),
                   header(last_modified, Modified)
       Modified \== '',
       parse_time(Modified, Stamp).

Then next example uses Google search. It exploits library(uri) to manage URIs, library(sgml) to load an HTML document and library(xpath) to navigate the parsed HTML. Note that you may need to adjust the XPath queries if the data returned by Google changes (this example indeed no longer works and currently fails at the first xpath/3 call)

:- use_module(library(http/http_open)).
:- use_module(library(xpath)).
:- use_module(library(sgml)).
:- use_module(library(uri)).

google(For, Title, HREF) :-
        uri_encoded(query_value, For, Encoded),
        atom_concat('http://www.google.com/search?q=', Encoded, URL),
        http_open(URL, In, []),
            load_html(In, DOM, []),
        xpath(DOM, //h3(@class=r), Result),
        xpath(Result, //a(@href=HREF0, text), Title),
        uri_components(HREF0, Components),
        uri_data(search, Components, Query),
        uri_query_components(Query, Parts),
        memberchk(q=HREF, Parts).

An example query is below:

?- google(prolog, Title, HREF).
Title = 'SWI-Prolog',
HREF = 'http://www.swi-prolog.org/' ;
Title = 'Prolog - Wikipedia',
HREF = 'https://nl.wikipedia.org/wiki/Prolog' ;
Title = 'Prolog - Wikipedia, the free encyclopedia',
HREF = 'https://en.wikipedia.org/wiki/Prolog' ;
Title = 'Pro-Log is logistiek dienstverlener m.b.t. vervoer over water.',
HREF = 'http://www.pro-log.nl/' ;
Title = 'Learn Prolog Now!',
HREF = 'http://www.learnprolognow.org/' ;
Title = 'Free Online Version - Learn Prolog
See also
- load_html/3 and xpath/3 can be used to parse and navigate HTML documents.
- http_get/3 and http_post/4 provide an alternative interface that convert the reply depending on the Content-Type header.
Source http_open(+URL, -Stream, +Options) is det
Open the data at the HTTP server as a Prolog stream. URL is either an atom specifying a URL or a list representing a broken-down URL as specified below. After this predicate succeeds the data can be read from Stream. After completion this stream must be closed using the built-in Prolog predicate close/1. Options provides additional options:
If false (default true), do not try to automatically authenticate the client if a 401 (Unauthorized) status code is received.
Send authorization. See also http_set_authorization/2. Supported schemes:
basic(+User, +Password)
HTTP Basic authentication.
HTTP Bearer authentication.
digest(+User, +Password)
HTTP Digest authentication. This option is only provided if the plugin library(http/http_digest) is also loaded.
Connect to the given Unix domain socket. In this scenario the host name and port or ignored. If the server replies with a redirect message and the host differs from the original host as normal TCP connection is used to handle the redirect. This option is inspired by curl(1)'s option `--unix-socket`.
Specify the Connection header. Default is close. The alternative is Keep-alive. This maintains a pool of available connections as determined by keep_connection/1. The library(http/websockets) uses Keep-alive, Upgrade. Keep-alive connections can be closed explicitly using http_close_keep_alive/1. Keep-alive connections may significantly improve repetitive requests on the same server, especially if the IP route is long, HTTPS is used or the connection uses a proxy.
Unify FinalURL with the final destination. This differs from the original URL if the returned head of the original indicates an HTTP redirect (codes 301, 302 or 303). Without a redirect, FinalURL is the same as URL if URL is an atom, or a URL constructed from the parts.
header(Name, -AtomValue)
If provided, AtomValue is unified with the value of the indicated field in the reply header. Name is matched case-insensitive and the underscore (_) matches the hyphen (-). Multiple of these options may be provided to extract multiple header fields. If the header is not available AtomValue is unified to the empty atom ('').
If provided, List is unified with a list of Name(Value) pairs corresponding to fields in the reply header. Name and Value follow the same conventions used by the header(Name,Value) option. See also raw_headers(-List) which provides the entire HTTP reply header in unparsed representation.
One of get (default), head, delete, post, put or patch. The head message can be used in combination with the header(Name, Value) option to access information on the resource without actually fetching the resource itself. The returned stream must be closed immediately.

If post(Data) is provided, the default is post.

Size is unified with the integer value of Content-Length in the reply header.
Version is a pair Major-Minor, where Major and Minor are integers representing the HTTP version in the reply header.
Ask for partial content. Range is a term Unit(From,To), where From is an integer and To is either an integer or the atom end. HTTP 1.1 only supports Unit = bytes. E.g., to ask for bytes 1000-1999, use the option range(bytes(1000,1999))
Do not install a decoding filter for Encoding. For example, using raw_encoding('applocation/gzip') the system will not decompress the stream if it is compressed using gzip.
Source map_method(+MethodID, -Method)[multifile]
Support additional METHOD keywords. Default are the official HTTP methods as defined by the various RFCs.
Source http:disable_encoding_filter(+ContentType) is semidet[multifile]
Do not use the Content-encoding as Transfer-encoding encoding for specific values of ContentType. This predicate is multifile and can thus be extended by the user.
Source http_set_authorization(+URL, +Authorization) is det
Set user/password to supply with URLs that have URL as prefix. If Authorization is the atom -, possibly defined authorization is cleared. For example:
?- http_set_authorization('http://www.example.com/private/',
                          basic('John', 'Secret'))
To be done
- Move to a separate module, so http_get/3, etc. can use this too.
Source iostream:open_hook(+Spec, +Mode, -Stream, -Close, +Options0, -Options) is semidet[multifile]
Hook implementation that makes open_any/5 support http and https URLs for Mode == read.
Source http_close_keep_alive(+Address) is det
Close all keep-alive connections matching Address. Address is of the form Host:Port. In particular, http_close_keep_alive(_) closes all currently known keep-alive connections.