This library deals with the analysis and construction of a URL,
Universal Resource Locator. URL is the basis for communicating locations
of resources (data) on the web. A URL consists of a protocol identifier
(e.g. HTTP, FTP, and a protocol-specific syntax further defining the
location. URLs are standardized in RFC-1738.
The implementation in this library covers only a small portion of the
defined protocols. Though the initial implementation followed RFC-1738
strictly, the current is more relaxed to deal with frequent violations
of the standard encountered in practical use.
- author
- - Jan Wielemaker
- - Lukas Faulstich
- deprecated
- - New code should use library(uri), provided by the
clib
package.
global_url(+URL, +Base, -Global) is det- Translate a possibly relative URL into an absolute one.
- Errors
- -
syntax_error(illegal_url) if URL is not legal.
is_absolute_url(+URL)- True if URL is an absolute URL. That is, a URL that starts with
a protocol identifier.
http_location(?Parts, ?Location)- Construct or analyze an HTTP location. This is similar to
parse_url/2, but only deals with the location part of an HTTP
URL. That is, the path, search and fragment specifiers. In the
HTTP protocol, the first line of a message is
<Action> <Location> HTTP/<version>
- Arguments:
-
| Location | - Atom or list of character codes. |
csearch(+Attributes)//[private]
cvalue(+Value)// is det[private]- Construct a string from Value. Value is either atomic or a
code-list.
cfragment(+Attributes)//[private]
parse_url(?URL, ?Attributes) is det- Construct or analyse a URL. URL is an atom holding a URL or a
variable. Attributes is a list of components. Each component is
of the format Name(Value). Defined components are:
- protocol(Protocol)
- The used protocol. This is, after the optional
url:, an
identifier separated from the remainder of the URL using :.
parse_url/2 assumes the http protocol if no protocol is
specified and the URL can be parsed as a valid HTTP url. In
addition to the RFC-1738 specified protocols, the file
protocol is supported as well.
- host(Host)
- Host-name or IP-address on which the resource is located.
Supported by all network-based protocols.
- port(Port)
- Integer port-number to access on the \arg{Host}. This only
appears if the port is explicitly specified in the URL.
Implicit default ports (e.g., 80 for HTTP) do not appear
in the part-list.
- path(Path)
- (File-) path addressed by the URL. This is supported for the
ftp, http and file protocols. If no path appears, the
library generates the path /.
- search(ListOfNameValue)
- Search-specification of HTTP URL. This is the part after the
?, normally used to transfer data from HTML forms that
use the HTTP GET method. In the URL it consists of a
www-form-encoded list of Name=Value pairs. This is mapped to
a list of Prolog Name=Value terms with decoded names and
values.
- fragment(Fragment)
- Fragment specification of HTTP URL. This is the part after
the
# character.
The example below illustrates all of this for an HTTP URL.
?- parse_url('http://www.xyz.org/hello?msg=Hello+World%21#x',
P).
P = [ protocol(http),
host('www.xyz.org'),
fragment(x),
search([ msg = 'Hello World!'
]),
path('/hello')
]
By instantiating the parts-list this predicate can be used to
create a URL.
parse_url(+URL, +BaseURL, -Attributes) is det- Similar to parse_url/2 for relative URLs. If URL is relative,
it is resolved using the absolute URL BaseURL.
globalise_path(+LocalPath, +RelativeTo, -FullPath) is det[private]- The first clause deals with the standard URL /... global paths.
The second with file://drive:path on MS-Windows. This is a bit
of a cludge, but unfortunately common practice is -especially on
Windows- not always following the standard
absolute_url//[private]- True if the input describes an absolute URL. This means it
starts with a URL schema. We demand a schema of length > 1 to
avoid confusion with Windows drive letters.
- uri(-Parts)//[private]
schema(-Atom)//[private]- Schema is case-insensitive and the canonical version is
lowercase.
Schema ::= ALPHA *(ALPHA|DIGIT|"+"|"-"|".")
hier_part(+Schema, -Parts, ?Tail)//[private]
query(-Parts, ?Tail)// is det[private]- Extract &Name=Value, ...
search_sep// is semidet[private]- Matches a search-parameter separator. Traditionally, this is the
&-char, but these days there are `newstyle' ;-char separators.
- See also
- - http://perldoc.perl.org/CGI.html
- To be done
- - This should be configurable
fragment(-Fragment, ?Tail)//[private]- Extract the fragment (after the =#=)
- fragment_char(-Char)[private]
- Find a fragment character.
pchar(-Code)//[private]- unreserved|pct_encoded|sub_delim|":"|"@"
Performs UTF-8 decoding of percent encoded strings.
lwalpha(-C)//[private]- Demand alpha, return as lowercase
sub_delim(?Code)[private]- Sub-delimiters
unreserved(+C)[private]- Characters that can be represented without percent escaping
RFC 3986, section 2.3
www_form_encode(+Value, -XWWWFormEncoded) is det
- www_form_encode(-Value, +XWWWFormEncoded) is det
- En/decode to/from application/x-www-form-encoded. Encoding
encodes all characters except RFC 3986 unreserved (ASCII
alnum (see code_type/2)), and one of "-._~" using percent
encoding. Newline is mapped to %OD%OA. When decoding,
newlines appear as a single newline (10) character.
Note that a space is encoded as %20 instead of +.
Decoding decodes both to a space.
- deprecated
- - Use uri_encoded/3 for new code.
www_encode(+Codes, +ExtraUnescaped)//[private]
www_decode(-Codes)//[private]
set_url_encoding(?Old, +New) is semidet- Query and set the encoding for URLs. The default is
utf8.
The only other defined value is iso_latin_1.
- To be done
- - Having a global flag is highly inconvenient, but a
work-around for old sites using ISO Latin 1 encoding.
url_iri(+Encoded, -Decoded) is det
- url_iri(-Encoded, +Decoded) is det
- Convert between a URL, encoding in US-ASCII and an IRI. An IRI
is a fully expanded Unicode string. Unicode strings are first
encoded into UTF-8, after which %-encoding takes place.
parse_url_search(?Spec, ?Fields:list(Name=Value)) is det- Construct or analyze an HTTP search specification. This deals
with form data using the MIME-type
application/x-www-form-urlencoded as used in HTTP GET
requests.
file_name_to_url(+File, -URL) is det
- file_name_to_url(-File, +URL) is semidet
- Translate between a filename and a file:// URL.
- To be done
- - Current implementation does not deal with paths that
need special encoding.