| Did you know ... | Search Documentation: |
| Wide-character versions |
Support for exchange of wide-character strings is still under
consideration. The functions dealing with 8-bit character strings return
failure when operating on a wide-character atom or Prolog string object.
The functions below can extract and unify both 8-bit and wide atoms and
string objects. Wide character strings are represented as C arrays of
objects of the type pl_wchar_t, which is guaranteed to be
the same as wchar_t on platforms supporting this type. For
example, on MS-Windows, this represents a 16-bit UTF-16 string, while
using the GNU C library (glibc) this represents 32-bit UCS4 characters.
(size_t)-1,
it is computed from s using wcslen(). See PL_new_atom()
for error handling.PL_ATOM, PL_STRING,
PL_CODE_LIST or PL_CHAR_LIST.PL_CODE_LIST and PL_CHAR_LIST.
It serves two purposes. It allows for returning very long lists from
data read from a stream without the need for a resizing buffer in C.
Also, the use of difference lists is often practical for further
processing in Prolog. Examples can be found in packages/clib/readutil.c
from the source distribution.
The result is locale-independent and identical on every supported
platform. Width data is sourced at table-build time from
EastAsianWidth.txt (UAX #11) and the general-category
property, and stored as two bits in each uflags_map entry
alongside the syntax classification used by the parser; runtime lookups
are a single byte fetch. The Unicode version that drives the table is
reported by the unicode_syntax_version
flag (see
section 2.15.1.9).
The argument is a full 32-bit code point. Avoid casting to
wchar_t before the call: wchar_t is 16-bit on
Windows, and the cast silently drops non-BMP characters.
src/Unicode/ and consulted by the
Prolog reader. Provided so foreign extensions and embedded toolkits
(notably xpce) classify code points exactly as SWI-Prolog does, locale-
independently. id_start and id_continue follow
Prolog's identifier syntax (close to UAX #31 XID_Start /
XID_Continue, adjusted for Prolog). layout matches the
reader's whitespace set (white_space/1 in
src/Unicode/derived_core_properties.pl).