Did you know ... Search Documentation:
http_open.pl -- HTTP client library
PublicShow source

This library defines http_open/3, which opens an URL as a Prolog stream. The functionality of the library can be extended by loading two additional modules that act as plugins:

library(http/http_ssl_plugin)
Loading this library causes http_open/3 to handle HTTPS connections. Relevant options for SSL certificate handling are handed to ssl_context/3. This plugin is loaded automatically if the scheme https is requested using a default SSL context. See the plugin for additional information regarding security.
library(zlib)
Loading this library supports the gzip transfer encoding. This plugin is lazily loaded if a connection is opened that claims this transfer encoding.
library(http/http_cookie)
Loading this library adds tracking cookies to http_open/3. Returned cookies are collected in the Prolog database and supplied for subsequent requests.
library(http/http_stream)
This library adds support for chunked encoding. It is lazily loaded if the server sends a Transfer-encoding: chunked header.

Here is a simple example to fetch a web-page:

?- http_open('http://www.google.com/search?q=prolog', In, []),
   copy_stream_data(In, user_output),
   close(In).
<!doctype html><head><title>prolog - Google Search</title><script>
...

The example below fetches the modification time of a web-page. Note that Modified is '' (the empty atom) if the web-server does not provide a time-stamp for the resource. See also parse_time/2.

modified(URL, Stamp) :-
       http_open(URL, In,
                 [ method(head),
                   header(last_modified, Modified)
                 ]),
       close(In),
       Modified \== '',
       parse_time(Modified, Stamp).

Then next example uses Google search. It exploits library(uri) to manage URIs, library(sgml) to load an HTML document and library(xpath) to navigate the parsed HTML. Note that you may need to adjust the XPath queries if the data returned by Google changes (this example indeed no longer works and currently fails at the first xpath/3 call)

:- use_module(library(http/http_open)).
:- use_module(library(xpath)).
:- use_module(library(sgml)).
:- use_module(library(uri)).

google(For, Title, HREF) :-
        uri_encoded(query_value, For, Encoded),
        atom_concat('http://www.google.com/search?q=', Encoded, URL),
        http_open(URL, In, []),
        call_cleanup(
            load_html(In, DOM, []),
            close(In)),
        xpath(DOM, //h3(@class=r), Result),
        xpath(Result, //a(@href=HREF0, text), Title),
        uri_components(HREF0, Components),
        uri_data(search, Components, Query),
        uri_query_components(Query, Parts),
        memberchk(q=HREF, Parts).

An example query is below:

?- google(prolog, Title, HREF).
Title = 'SWI-Prolog',
HREF = 'http://www.swi-prolog.org/' ;
Title = 'Prolog - Wikipedia',
HREF = 'https://nl.wikipedia.org/wiki/Prolog' ;
Title = 'Prolog - Wikipedia, the free encyclopedia',
HREF = 'https://en.wikipedia.org/wiki/Prolog' ;
Title = 'Pro-Log is logistiek dienstverlener m.b.t. vervoer over water.',
HREF = 'http://www.pro-log.nl/' ;
Title = 'Learn Prolog Now!',
HREF = 'http://www.learnprolognow.org/' ;
Title = 'Free Online Version - Learn Prolog
...
See also
- load_html/3 and xpath/3 can be used to parse and navigate HTML documents.
- http_get/3 and http_post/4 provide an alternative interface that convert the reply depending on the Content-Type header.
Source user_agent(-Agent) is det[private]
Default value for User-Agent, can be overruled using the option user_agent(Agent) of http_open/3.
Source http_open(+URL, -Stream, +Options) is det
Open the data at the HTTP server as a Prolog stream. URL is either an atom specifying a URL or a list representing a broken-down URL as specified below. After this predicate succeeds the data can be read from Stream. After completion this stream must be closed using the built-in Prolog predicate close/1. Options provides additional options:
authenticate(+Boolean)
If false (default true), do not try to automatically authenticate the client if a 401 (Unauthorized) status code is received.
authorization(+Term)
Send authorization. See also http_set_authorization/2. Supported schemes:
basic(+User, +Password)
HTTP Basic authentication.
bearer(+Token)
HTTP Bearer authentication.
digest(+User, +Password)
HTTP Digest authentication. This option is only provided if the plugin library(http/http_digest) is also loaded.
unix_socket(+Path)
Connect to the given Unix domain socket. In this scenario the host name and port or ignored. If the server replies with a redirect message and the host differs from the original host as normal TCP connection is used to handle the redirect. This option is inspired by curl(1)'s option `--unix-socket`.
connection(+Connection)
Specify the Connection header. Default is close. The alternative is Keep-alive. This maintains a pool of available connections as determined by keep_connection/1. The library(http/websockets) uses Keep-alive, Upgrade. Keep-alive connections can be closed explicitly using http_close_keep_alive/1. Keep-alive connections may significantly improve repetitive requests on the same server, especially if the IP route is long, HTTPS is used or the connection uses a proxy.
final_url(-FinalURL)
Unify FinalURL with the final destination. This differs from the original URL if the returned head of the original indicates an HTTP redirect (codes 301, 302 or 303). Without a redirect, FinalURL is the same as URL if URL is an atom, or a URL constructed from the parts.
header(Name, -AtomValue)
If provided, AtomValue is unified with the value of the indicated field in the reply header. Name is matched case-insensitive and the underscore (_) matches the hyphen (-). Multiple of these options may be provided to extract multiple header fields. If the header is not available AtomValue is unified to the empty atom ('').
headers(-List)
If provided, List is unified with a list of Name(Value) pairs corresponding to fields in the reply header. Name and Value follow the same conventions used by the header(Name,Value) option. A pseudo header status_code(Code) is added to provide the HTTP status as an integer. See also raw_headers(-List) which provides the entire HTTP reply header in unparsed representation.
method(+Method)
One of get (default), head, delete, post, put or patch. The head message can be used in combination with the header(Name, Value) option to access information on the resource without actually fetching the resource itself. The returned stream must be closed immediately.

If post(Data) is provided, the default is post.

size(-Size)
Size is unified with the integer value of Content-Length in the reply header.
version(-Version)
Version is a pair Major-Minor, where Major and Minor are integers representing the HTTP version in the reply header.
range(+Range)
Ask for partial content. Range is a term Unit(From,To), where From is an integer and To is either an integer or the atom end. HTTP 1.1 only supports Unit = bytes. E.g., to ask for bytes 1000-1999, use the option range(bytes(1000,1999))
raw_encoding(+Encoding)
Do not install a decoding filter for Encoding. For example, using raw_encoding('applocation/gzip') the system will not decompress the stream if it is compressed using gzip.
raw_headers(-Lines)
Unify Lines with a list of strings that represents the complete reply header returned by the server. See also headers(-List).
redirect(+Boolean)
If false (default true), do not automatically redirect if a 3XX code is received. Must be combined with status_code(Code) and one of the header options to read the redirect reply. In particular, without status_code(Code) a redirect is mapped to an exception.
status_code(-Code)
If this option is present and Code unifies with the HTTP status code, do not translate errors (4xx, 5xx) into an exception. Instead, http_open/3 behaves as if 2xx (success) is returned, providing the application to read the error document from the returned stream.
output(-Out)
Unify the output stream with Out and do not close it. This can be used to upgrade a connection.
timeout(+Timeout)
If provided, set a timeout on the stream using set_stream/2. With this option if no new data arrives within Timeout seconds the stream raises an exception. Default is to wait forever (infinite).
post(+Data)
Issue a POST request on the HTTP server. Data is handed to http_post_data/3.
proxy(+Host:Port)
Use an HTTP proxy to connect to the outside world. See also socket:proxy_for_url/3. This option overrules the proxy specification defined by socket:proxy_for_url/3.
proxy(+Host, +Port)
Synonym for proxy(+Host:Port). Deprecated.
proxy_authorization(+Authorization)
Send authorization to the proxy. Otherwise the same as the authorization option.
bypass_proxy(+Boolean)
If true, bypass proxy hooks. Default is false.
request_header(Name=Value)
Additional name-value parts are added in the order of appearance to the HTTP request header. No interpretation is done.
max_redirect(+Max)
Sets the maximum length of a redirection chain. This is needed for some IRIs that redirect indefinitely to other IRIs without looping (e.g., redirecting to IRIs with a random element in them). Max must be either a non-negative integer or the atom infinite. The default value is 10.
user_agent(+Agent)
Defines the value of the User-Agent field of the HTTP header. Default is SWI-Prolog.

The hook http:open_options/2 can be used to provide default options based on the broken-down URL. The option status_code(-Code) is particularly useful to query REST interfaces that commonly return status codes other than 200 that need to be be processed by the client code.

Arguments:
URL- is either an atom or string (url) or a list of parts.

When provided, this list may contain the fields scheme, user, password, host, port, path and either query_string (whose argument is an atom) or search (whose argument is a list of Name(Value) or Name=Value compound terms). Only host is mandatory. The example below opens the URL http://www.example.com/my/path?q=Hello%20World&lang=en. Note that values must not be quoted because the library inserts the required quotes.

http_open([ host('www.example.com'),
            path('/my/path'),
            search([ q='Hello world',
                     lang=en
                   ])
          ])
throws
- error(existence_error(url, Id),Context) is raised if the HTTP result code is not in the range 200..299. Context has the shape context(Message, status(Code, TextCode)), where Code is the numeric HTTP code and TextCode is the textual description thereof provided by the server. Message may provide additional details or may be unbound.
See also
- ssl_context/3 for SSL related options if library(http/http_ssl_plugin) is loaded.
Source hooked_options(+Parts, -Options) is nondet[private]
Calls http:open_options/2 and if necessary upgrades old SSL cacerts_file(File) option to a cacerts(List) option to ensure proper merging of options.
Source autoload_https(+Parts) is det[private]
If the requested scheme is https or wss, load the HTTPS plugin.
Source send_rec_header(+StreamPair, -Stream, +Host, +RequestURI, +Parts, +Options) is det[private]
Send header to Out and process reply. If there is an error or failure, close In and Out and return the error or failure.
Source http_version(-Version:atom) is det[private]
HTTP version we publish. We can only use 1.1 if we support chunked encoding.
Source map_method(+MethodID, -Method)[multifile]
Support additional METHOD keywords. Default are the official HTTP methods as defined by the various RFCs.
Source x_headers(+Options, +URI, +Out) is det[private]
Emit extra headers from request_header(Name=Value) options in Options.
To be done
- Use user/password fields
Source auth_header(+AuthOption, +Options, +HeaderName, +Out)[private]
Source do_open(+HTTPVersion, +HTTPStatusCode, +HTTPStatusComment, +Header, +Options, +Parts, +Host, +In, -FinalIn) is det[private]
Handle the HTTP status once available. If 200-299, we are ok. If a redirect, redo the open, returning a new stream. Else issue an error.
Errors
- existence_error(url, URL)
Source redirect_limit_exceeded(+Options:list(compound), -Max:nonneg) is semidet[private]
True if we have exceeded the maximum redirection length (default 10).
Source redirect_loop(+Parts, +Options) is semidet[private]
True if we are in a redirection loop. Note that some sites redirect once to the same place using cookies or similar, so we allow for two tries. In fact, we should probably test whether authorization or cookie headers have changed.
Source redirect_options(+Parts, +RedirectedParts, +Options0, -Options) is det[private]
A redirect from a POST should do a GET on the returned URI. This means we must remove the method(post) and post(Data) options from the original option-list.

If we are connecting over a Unix domain socket we drop this option if the redirect host does not match the initial host.

Source map_error_code(+HTTPCode, -PrologError) is semidet[private]
Map HTTP error codes to Prolog errors.
To be done
- Many more maps. Unfortunately many have no sensible Prolog counterpart.
Source open_socket(+Address, -StreamPair, +Options) is det[private]
Create and connect a client socket to Address. Options
timeout(+Timeout)
Sets timeout on the stream, after connecting the socket.
To be done
- Make timeout also work on tcp_connect/4.
- This is the same as do_connect/4 in http_client.pl
Source parse_headers(+Lines, -Headers:list(compound)) is det[private]
Parse the header lines for the headers(-List) option. Invalid header lines are skipped, printing a warning using pring_message/2.
Source return_final_url(+Options) is semidet[private]
If Options contains final_url(URL), unify URL with the final URL after redirections.
Source transfer_encoding_filter(+Lines, +In0, -In, +Options) is det[private]
Install filters depending on the transfer encoding. If In0 is a stream-pair, we close the output side. If transfer-encoding is not specified, the content-encoding is interpreted as a synonym for transfer-encoding, because many servers incorrectly depend on this. Exceptions to this are content-types for which disable_encoding_filter/1 holds.
Source http:disable_encoding_filter(+ContentType) is semidet[multifile]
Do not use the Content-encoding as Transfer-encoding encoding for specific values of ContentType. This predicate is multifile and can thus be extended by the user.
Source transfer_encoding(+Lines, -Encoding) is semidet[private]
True if Encoding is the value of the Transfer-encoding header.
Source content_encoding(+Lines, -Encoding) is semidet[private]
True if Encoding is the value of the Content-encoding header.
Source read_header(+In:stream, +Parts, -Version, -Code:int, -Comment:atom, -Lines:list) is det[private]
Read the HTTP reply-header. If the reply is completely empty an existence error is thrown. If the replied header is otherwise invalid a 500 HTTP error is simulated, having the comment Invalid reply header.
Arguments:
Parts- A list of compound terms that describe the parsed request URI.
Version- HTTP reply version as Major-Minor pair
Code- Numeric HTTP reply-code
Comment- Comment of reply-code as atom
Lines- Remaining header lines as code-lists.
Errors
- existence_error(http_reply, Uri)
Source content_length(+Header, -Length:int) is semidet[private]
Find the Content-Length in an HTTP reply-header.
Source integer(-Int)//[private]
Read 1 or more digits and return as integer.
Source rest(-Atom:atom)//[private]
Get rest of input as an atom.
Source reply_header(+Lines, +Options) is det[private]
Return the entire reply header as a list of strings to the option raw_headers(-Headers).
Source http_set_authorization(+URL, +Authorization) is det
Set user/password to supply with URLs that have URL as prefix. If Authorization is the atom -, possibly defined authorization is cleared. For example:
?- http_set_authorization('http://www.example.com/private/',
                          basic('John', 'Secret'))
To be done
- Move to a separate module, so http_get/3, etc. can use this too.
Source authorization(+URL, -Authorization) is semidet[private]
True if Authorization must be supplied for URL.
To be done
- Cleanup cache if it gets too big.
Source parse_url_ex(+URL, -Parts)[private]
Parts: Scheme, Host, Port, User:Password, RequestURI (no fragment).
Source parts_scheme(+Parts, -Scheme) is det[private]
Source parts_uri(+Parts, -URI) is det[private]
Source parts_request_uri(+Parts, -RequestURI) is det[private]
Source parts_search(+Parts, -Search) is det[private]
Source parts_authority(+Parts, -Authority) is semidet[private]
Source iostream:open_hook(+Spec, +Mode, -Stream, -Close, +Options0, -Options) is semidet[multifile]
Hook implementation that makes open_any/5 support http and https URLs for Mode == read.
Source consider_keep_alive(+HeaderLines, +Parts, +Host, +Stream0, -Stream, +Options) is det[private]
If we have agree on a Keep-alive connection, return a range stream rather than the original stream. We also use the content length and a range stream if we are dealing with an HTTPS connection. This is because not all servers seem to complete the TLS closing handshake. If the server does not complete this we receive a TLS handshake error on end-of-file, causing the read to fail.
Source keep_alive(+StreamPair, +Host, +In, -Left) is det
Callback when closing the range stream used to process the content of the reply. This callback makes the stream available for future keep-alive connections or closes the stream. The stream is closed if
  • There are too many bytes left unprocessed in the range stream.
  • There are too many pooled connections.
Source read_incomplete(+In, +Left) is semidet[private]
If we have not all input from a Keep-alive connection, read the remainder if it is short. Else, we fail and close the stream.
Source add_to_pool(+Address, +StreamPair) is semidet[private]
Add a connection to the keep-alive pool after completing the interaction. Fails if there are already too many connections in the pool.
Source get_from_pool(+Address, -StreamPair) is nondet[private]
Get an existing Keep-Alive connection to Address as StreamPair. The caller relies on non-determinism of this predicate to try another connection if the returned one is already closed by the peer. We cannot rely on the non-determinism of retract/1 as that respects the logical update view. Therefore, we must use retract/1 and commit as retract/1 guarantees that the first retracted clause is not already retracted.
Source keep_connection(+Address) is semidet[private]
Succeeds if we want to keep the connection open. We currently keep a maximum of http:max_keep_alive_connections connections waiting and a maximum of http:max_keep_alive_host_connections waiting for the same address. Connections older than http:max_keep_alive_idle seconds are closed.
Source http_close_keep_alive(+Address) is det
Close all keep-alive connections matching Address. Address is of the form Host:Port. In particular, http_close_keep_alive(_) closes all currently known keep-alive connections.
Source keep_alive_error(+Error, +StreamPair)[private]
Deal with an error from reusing a keep-alive connection. If the error is due to an I/O error or end-of-file, fail to backtrack over get_from_pool/2. Otherwise it is a real error and we thus re-raise it. In all cases we close StreamPair rather than returning it to the pool as we may have done a partial read and thus be out of sync wrt. the HTTP protocol.