Branch: development (switch to
stable),
SWI-Prolog Changelog from version 10.1.6 to 10.1.7
[May 10 2026]
- SUBMODULES: Bump pldoc for help/1 UTF-8 fix Co-Authored-By: Claude
Opus 4.7 (1M context) <noreply@anthropic.com>
- SUBMODULES: Bump pointers for documentation rework 34 submodules
carry the .doc -> .plx LaTeX-source rename; ltx2htm additionally
carries the UTF-8 awareness (PL_unify_chars REP_UTF8 on input,
PL_get_chars REP_UTF8 + PL_UTF8_STRING on output, charset meta tag
in HTML). Companion to the master-side switch to lualatex and the
Unicode-Prolog-source examples in overview.plx.
- RENAME: SWI manual sources .doc -> .plx The '.doc' extension predates
Microsoft Word's appropriation of it and confuses file managers,
editors (no LaTeX syntax highlighting), and new contributors. Rename
to '.plx' (Prolog LaTeX) — single extension, no collision inside
SWI-Prolog, .gitattributes maps it to TeX for editor support and
diff hunks.
- DOC: Use literal Unicode in the Unicode Prolog source section Drop
the {...} urldef wrappers around literal characters described in
the section (², ³, ¹, ⁰..⁹, ₀..₉, Dž, «hello, world»,
«»). Add a code block at the end of the section with concrete,
runnable ?- queries covering: superscript variables, Devanagari Nd
via atom_number/2, Unicode symbols as solo atoms, bracket pairs Ps/Pe,
quote pairs Pi/Pf.
- DOC: overview.doc: use literal Unicode characters Smoke test for the
new UTF-8 pipeline: drop the {...} urldef wrapping around ≤, €,
and ·. Both lualatex and ltx2htm now read these bytes directly. The
rendered output (PDF and HTML) is unchanged.
- DOC: Move lualatex font setup from main.doc into pl.sty ltx2htm
parses main.doc as TeX and emits warnings for unknown commands like
\ifdefined and \directlua. .sty files are not parsed as TeX (they
are only read for urldef extraction), so the conditional becomes
invisible. Also override fontspec's \strong via providecommand +
renewcommand so pl.sty's bold-strong definition wins under lualatex.
- DOC: Build PDF manual with lualatex instead of pdflatex Switches the
PDF documentation toolchain to lualatex. The engine swap is the first
step of a wider effort to allow literal Unicode in .doc and Markdown
sources (currently routed through urldef in pl.sty).
- WASM: Included
library(unicode)
[May 7 2026]
[May 8 2026]
- DOC: PIP §4.11 — describe the four-mode unicode_atoms policy
as shipped The §4.11 "Pluggable Unicode normalisation" body
still described the old design — a Boolean read_term/2 option
normalize(Bool) backed by a Boolean Prolog flag unicode_normalize
— neither of which exist in the current implementation. What
actually shipped is the four-mode unicode_atoms family (accept /
nfc / error / reject), surfaced through:
- ENHANCED: code_type/2 prolog_layout + prolog_end_of_line; rebind
end_of_line The Pattern_White_Space set used by the Prolog reader for
layout had no user-facing accessor in code_type/2 / char_type/2. The
existing end_of_line was extended in the line-termination commit
to the seven Pattern_White_Space line-terminator-like code points;
that broadened the ISO Prolog meaning silently.
[May 7 2026]
- DOC: stray-character policy; quoted material accepts any Unicode scalar
After Stage 4's cat-direct dispatch refactor, the parser's default
arm of the c >= 0x80 switch raises
syntax_error(illegal_character)
for every U_CAT_OTHER code point. That is broader than what
man/overview.doc still claimed ("Control and unassigned (C*) characters
produce a syntax error ... outside quoted atoms/strings and outside
comments"), and narrower than what the section claimed about quoted
material (it was silent on what is allowed there).
- DOC: lock atom_number/2 family Unicode policy with tests + manual
prose Empirical audit of
str_number() in pl-read.c against the policy
documented in plan A's M2 doc fix confirms the kernel already enforces
both rules:
- DOC: PIP §4.4.1 — line termination on Pattern_White_Space
line-enders Add a subsection to White space recording the seven
line-terminator-like Pattern_White_Space code points (LF, VT,
FF, CR, NEL, LS, PS) and the three places they apply: %-comment
termination, source-position line counter, and backslash-newline
continuation in quoted strings. Calls out that the remaining four
Pattern_White_Space members (SPACE, LRM, RLM) are layout but not
line-enders. Cross-references
code_type(C, end_of_line) / char_type(C,
end_of_line) for user-code access to the same set.
- DOC: document the seven line-terminator-like Pattern_White_Space
chars Two manual updates to match the implementation:
- ENHANCED: terminate %-comments on every Pattern_White_Space line-ender
The %-comment scanner, the line-counter, the block-comment LF
preservation, the \\<newline> continuation in raw_read_quoted, the
ensure_space whitespace-collapsing macro, and the escape_char \\<EOL>*
skip-blanks site all checked
c == '\n' (LF only). Three of the seven
Pattern_White_Space line-terminator-like code points were silently
ignored: NEL (U+0085), LS (U+2028), PS (U+2029). ASCII CR / VT /
FF were also missed by the comment scanner.
- DOC: PIP unicode.md — paired delimiters and PL_wcwidth vs POSIX
Sync the PIP draft with the design landed on this branch.
- ENHANCED: dispatch tokeniser directly on u_category; drop uflagsW The
uflags_map byte already carried the u_category enum, but every read of
it went through cat_to_flags[] back to the legacy U_* bitmask, then a
Pl*W macro tested specific bits. The cat_to_flags indirection forced
PlSoloW to match four distinct categories (U_CAT_SOLO, U_CAT_BRACKET,
U_CAT_QUOTE, U_CAT_ID_CONTINUE_SOLO), which is why case_solo had a
post-hoc pl_pair_lookup demux to re-separate brackets from quotes
from solo atoms.
- TYPE: pl_pair_lookup is_open is bool; clean up generator function
layout Two cleanups:
- ENHANCED: extend code_type/2
paren(Close) and add a matching
quote(Close) paren(Close) used to know only the three ASCII bracket
pairs (), {} and []; quote(Close) didn't exist at all. Both now back
onto the same pl_pair_table that drives the parser, so they cover
the full Unicode Ps/Pe and Pi/Pf sets out of the box.
- ENHANCED: read Unicode quote pairs (Pi/Pf) as literal strings Quote
pairs were previously parsed like brackets — content was interpreted
as a Prolog term and wrapped in '<open><close>'/1. That is the wrong
shape for quotation marks, where the contained text is naturally
literal (cf. ASCII '...', "...",
...).
- ENHANCED: drop mk_wcwidth.c; PL_wcwidth replaces it everywhere After
Stage 5 PL_wcwidth() reads its values directly from the per-code-point
width slot in uflags_map, leaving mk_wcwidth.c as a thin wrapper that
just forwarded to PL_wcwidth. With no remaining internal nor submodule
callers (the libedit and xpce typedefs of uchar_t are local to those
packages), the wrapper is pure dead weight.
- DOC: paired delimiters, wcwidth source, prolog_syntax_map.pl header
(Stage 7) Three doc updates that close the loop on the Unicode-syntax
refactor:
- ENHANCED: parse Unicode bracket pairs as '<open><close>'/1 compounds
Lift the existing {Term} ⇒ '{}'(Term) reader path to a generic
paired-delimiter routine and wire every Unicode bracket pair through
it.
- ENHANCED: pack uflags_map into 4-bit category enum + 2-bit wcwidth
slot The per-code-point byte in uflags_map was a bag of 8 disjoint U_*
flag bits, almost full (7 of 8 used). Repack the same byte:
- DOC: NBSP is no longer whitespace; document Pattern_White_Space-only
layout The previous text claimed SWI-Prolog kept treating U+00A0 as
whitespace for backward compatibility with ISO Latin-1. That was true
under the legacy 256-entry _PL_char_types[] table; with the parser
cutoff moved to < 0x80 the Unicode flag table is the single source
of truth, and NBSP (which is not in Pattern_White_Space) raises a
stray-character syntax error outside quoted material.
- ENHANCED: classify code points >= 0x80 via the Unicode flag table The
legacy ISO Latin-1 _PL_char_types[] used to extend to 256 entries,
giving an SWI-specific opinion about each byte 0x80..0xff. Those
entries duplicated and occasionally disagreed with the Unicode flag
table (uflags_map[0]) that already covers the same code points
correctly per UAX #31. Two parallel mechanisms in the 0x80..0xff
range invited drift.
- DOC: align manual with implemented Unicode source syntax Two small
documentation updates that bring man/ and src/Unicode/ in sync with
the parser's actual behaviour:
[May 6 2026]
[May 5 2026]
- UNICODE: Updated unicode_block/3 to Unicode 17.0
- FIXED: Windows init_locale also sets LC_CTYPE to UTF-8 Forcing
LD->encoding to ENC_UTF8 unconditionally on Windows in the prior commit
left the C runtime's LC_CTYPE locale at its default ("C" locale)
— and so mbrtowc / wcrtomb were ASCII-only. Any UTF-8 byte above
0x7F flowing through a path that uses mbrtowc to canonicalise text
(PL_canonicalise_text in pl-text.c) returned EILSEQ, surfacing as a
"Syntax error: illegal_multibyte_sequence" the moment a user typed
an emoji or other non-ASCII char in the libedit prompt.
[May 4 2026]
- ENHANCED: default encoding to UTF-8 on Windows The system locale
on Windows usually reports a legacy codepage (Windows-1252 or
similar), making the default Prolog flag
encoding ANSI/Latin-1.
This caused UTF-8 source files to be read as their byte-wise Latin-1
interpretation. UTF-8 is the de-facto encoding for source files and
the Windows C runtime's locale-based wide-character functions are
weaker than the Unicode tables Prolog uses internally, so always use
UTF-8 as the default on Windows.
[May 3 2026]
- TYPE: PL_is_id_start, PL_is_id_continue, PL_is_uppercase,
PL_is_decimal, PL_is_layout return bool Pure predicates; bool conveys
the contract better than int.
- ADDED: PL_is_id_start, PL_is_id_continue, PL_is_uppercase,
PL_is_decimal, PL_is_layout Thin shims around the
existing
uflagsW() table (src/pl-umap.c, generated from
src/Unicode/derived_core_properties.pl) so foreign extensions and
embedded toolkits — notably xpce — classify code points exactly
as SWI-Prolog does, without needing the locale-dependent POSIX iswX()
or inventing their own tables. Documented in man/foreign.doc.
- ENHANCED: Use
mk_wcwidth() in the kernel for locale-independent
width Switch every kernel caller (pl-read, pl-write, pl-fli, pl-fmt,
pl-stream, pl-ctype) and PL_wcwidth() to the bundled mk_wcwidth()
instead of the system wcwidth(3), which under C/POSIX returns -1 for
non-ASCII on glibc. Always link mk_wcwidth.c (drop the HAVE_WCWIDTH
guard) and add the modern emoji wide ranges (Kana Supplement,
Mahjong/Domino, Misc Symbols & Pictographs, Symbols & Pictographs
Ext-A) so column counts match what xpce already returns.
- FIXED: unload_file/1: clear isfile so use_module/1 reloads
unloadFile() left sf->isfile set, so a subsequent use_module/1 saw
the file as already loaded and skipped its directives — notably :-
use_foreign_library/1. Reset isfile, move garbage_collect_clauses
to Prolog (unload_file/1) and call '$clear_source_admin'/1 there.
- DOC: atom_normalize_hook Prolog flag Add a concise entry for the new
atom_normalize_hook flag and note in the unicode_atoms entry that
mode
error falls back to the wcwidth-based check when the hook is
not registered, which can over-reject scripts (e.g. Thai) that use
combining marks in NFC.
- TEST: syntax_unicode_atoms: use atom_normalize_hook and term_string
Replace
current_module(unicode) hook-state probes with the new
atom_normalize_hook Prolog flag (reliable across rerun). Switch the
suite to term_string/3 with explicit string literals throughout,
add a thai_hello_world/1 helper to demonstrate the wcwidth-fallback
false-positive on NFC Thai text and verify that the precise utf8proc
check accepts it.
- ADDED: foreign_library_property/2 to query foreign library properties
[May 2 2026]
- DOC: PL_wcwidth() in foreign.doc Co-Authored-By: Claude Opus 4.7
(1M context) <noreply@anthropic.com>
- ADDED: PL_wcwidth() — locale-portable display-column width for a
code point Add PL_wcwidth(int) to libswipl so foreign extensions and
embedded GUI toolkits can ask for the column width of a Unicode code
point with the same answer as the rest of SWI-Prolog, regardless of
the process LC_CTYPE or platform. On Unix/macOS it forwards to system
wcwidth(3); on Windows it goes through Markus Kuhn's mk_wcwidth()
table.
- ENHANCED: Fold error and auto-load into atom_to_unicode_atoms_ex
Rename atom_to_unicode_atoms to atom_to_unicode_atoms_ex (the _ex
suffix marks helpers that raise Prolog exceptions on failure) and
absorb the boilerplate every caller had to write:
[May 1 2026]
- ENHANCED: Cache hook-load attempt to avoid repeated
retries
ensure_unicode_normalize_hook(false) was retrying
'$install_unicode_normalize_hook'/0 on every call once the load had
failed. Track the attempt in GD->atoms.normalize_hook_load_attempted
so subsequent calls fall through immediately. PL_cleanup re-zeroes
GD on next initialisation, so the cache resets across an embedded
Prolog teardown/restart cycle (a static would not).
- ENHANCED: Auto-load library(unicode) for
unicode_atoms(nfc) at
every entry point The Prolog flag's active setter has auto-loaded
library(unicode) for modes nfc/error since the original landing,
but read_term/2,3 and read_clause/2,3 with unicode_atoms(nfc) were
left to error with existence_error(hook, unicode_normalize) when
the library was not already loaded. set_stream/2 and open/4 had the
auto-load logic duplicated inline.
- ENHANCED: Loosen
unicode_atoms(error) and centralise the value docs
Per PIP review:
- ENHANCED: Multi-valued unicode_atoms policy and Trojan-source
bidi reject Replace the boolean unicode_normalize flag and the
normalize(Bool) read_term option with a single multi-valued
unicode_atoms policy that follows the same three-tier hierarchy
as encoding (Prolog flag -> stream property -> per-call option),
and unconditionally reject Unicode bidi-override / isolate code
points (U+202A..U+202E and U+2066..U+2069) in source tokens, quoted
strings and comments as a defence against the Trojan-source attack
(CVE-2021-42574).
[Apr 30 2026]
- ENHANCED: Pluggable Unicode normalisation in reader and writer Add
a kernel-level callback for Unicode normalisation, and use it to
give read_term/2,3 and read_clause/2,3 a
normalize(Bool) option
that NFC-normalises the text of unquoted atoms before interning.
Quoted atoms and string literals are byte-faithful and not touched.
Also force-quote atoms holding combining marks under writeq's
quoted(true), which makes denormalised text visible and is independent
of the normalisation library (it uses wcwidth only).
- DOC: Route non-ASCII Unicode in manual prose through urldefs Add a
block of \urldef{\Sname}\satom{<char>} entries to man/pl.sty for the
super-/subscript digits and a handful of math and typographic symbols
used in §sec:unicodesyntax (≤, «, €, ·, Dž). With these in
place, man/doc2tex.pl auto-translates the literal {<char>} forms in
overview.doc to {\Sname} in the generated .tex, so the .tex stays
ASCII-only and no longer triggers ltx2htm's non-ASCII-byte warnings.
- DOC: Update Unicode syntax section Rewrite man/overview.doc
§sec:unicodesyntax to reflect the new rules: XID_Start/XID_Continue
identifiers, super- and subscript digits as identifier-continue
extension, Pattern_White_Space layout set, all-S?/P? as solo
(with a note that this is a deliberate break from the previous
glueing behaviour), Lu-only uppercase (so Lt letters now start
atoms), and a hedge that NBSP is still treated as whitespace by the
legacy ISO Latin-1 table. Add a flag entry for the new read-only
unicode_syntax_version and fix a "curently" typo in max_char_code.
- TEST: Add Unicode syntax tests and a demo source file
tests/core/test_syntax_unicode.pl exercises the new XID-based
identifier rules, super- and subscript continuation, Pattern_White_
Space layout (LRM), all-S?/P? as solo (≤, ≤≤, «, €), Lt
starting an atom (Dž), mixed-script number rejection, same-script
Devanagari digits, and the new unicode_syntax_version flag.
Auto-discovered by tests/test.pl; runs as part of swipl:core.
- ENHANCED: Wire subscript digits and add unicode_syntax_version
flag Wire subscript digits (₀..₉) as XID_Continue extension
in prolog_syntax_map.pl, alongside the existing superscripts, and
regenerate pl-umap.c. Read the Unicode version from the header
of DerivedCoreProperties.txt and emit a tiny pl-umap-version.h
header with UNICODE_SYNTAX_VERSION. Register a read-only Prolog
flag
unicode_syntax_version in pl-prologflag.c, distinct from
unicode:unicode_version/1 (which reports the linked utf8proc's view
of Unicode and may differ from the kernel classifier's).
- DOC: Flag
max_code_point is 0x10ffff on all platforns for a while.
[Apr 29 2026]
- MODIFIED: Unicode interpretation and updated to Unicode 17 (was 14).
[Apr 30 2026]
[Apr 23 2026]
- ADDED: el_get/2 and runtime bracketed_paste control el_get(Stream,
editor(?Editor)) reads the current libedit editor (
emacs / vi).
el_set/2 and el_get/2 grow a bracketed_paste(?Bool) property, and
editline.pl routes it through a new enable_bracketed_paste/1 helper
that skips (and unbinds) in vi mode.
Package PDT
[May 10 2026]
- RENAME: PDT.doc -> PDT.plx (LaTeX manual source) Co-Authored-By:
Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Package RDF
[May 10 2026]
- RENAME: RDF.doc -> RDF.plx (LaTeX manual source) Co-Authored-By:
Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Package archive
[May 10 2026]
- RENAME: archive.doc -> archive.plx (LaTeX manual source)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Package bdb
[May 10 2026]
- RENAME: bdb.doc -> bdb.plx (LaTeX manual source) Co-Authored-By:
Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Package clib
[May 10 2026]
- RENAME: clib.doc -> clib.plx (LaTeX manual source) Co-Authored-By:
Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Package cpp
[May 10 2026]
- RENAME: pl2cpp.doc -> pl2cpp.plx (LaTeX manual source) Update pkg_doc
SOURCES reference. Co-Authored-By: Claude Opus 4.7 (1M context)
<noreply@anthropic.com>
[May 4 2026]
- ENHANCED: PlTerm::
unify_blob() takes std::unique_ptr<T> for any T
derived from PlBlob The unique_ptr<PlBlob> overload became a function
template unify_blob(std::unique_ptr<T>*) with a static_assert that T
derives from PlBlob. Callers can now pass std::unique_ptr<MyBlob>*
directly without an upcast through std::unique_ptr<PlBlob>; the
obsolete make_unique() workaround in the docs is removed.
Package cql
[May 10 2026]
- RENAME: cql.doc -> cql.plx (LaTeX manual source) Co-Authored-By:
Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Package http
[May 10 2026]
- RENAME: http.doc -> http.plx (LaTeX manual source) Co-Authored-By:
Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Package jpl
[May 10 2026]
- RENAME: jpl.doc -> jpl.plx (LaTeX manual source) Co-Authored-By:
Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Package json
[May 10 2026]
- RENAME: json.doc -> json.plx (LaTeX manual source) Co-Authored-By:
Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Package libedit
[May 10 2026]
- RENAME: libedit.doc -> libedit.plx (LaTeX manual source)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[May 6 2026]
- ADDED: bind ^Z to send EOF on Windows Matches the platform convention
for end-of-file (Unix uses ^D). Routed through libedit's built-in
em-delete-or-list / vi-list-or-eof, which already return CC_EOF on
an empty line.
[May 5 2026]
- FIXED:
el_cursor() takes wchar_t units, electric() hands code points
The Prolog electric() handler computes Move = Index - Len, where both
Index (matching_open's position) and Len (string_length of Before) are
CODE-POINT counts. The C wrapper passed Move straight to el_cursor(),
which adds it to el_line.cursor as wchar_t units.
- ENHANCED: install PL_wcwidth as libedit's wcwidth implementation
libedit no longer ships its own mk_wcwidth table;
el_set(EL_WCWIDTH,
fn) injects one. Wire PL_wcwidth here so swipl and swipl-win share the
kernel's src/mk_wcwidth.c with pl-write, pl-fmt, xpce — one table,
one source of truth, no more drift between layers.
- FIXED: pl_line and el_history_encoded use REP_EL not REP_MB
PL_unify_chars(PL_STRING|REP_MB, ...) decodes the bytes via the
C runtime's mbrtowc. On Windows with the legacy ANSI codepage as
LC_CTYPE that fails for any byte above 0x7F — and libedit hands us
UTF-8 bytes there (ct_encode_char unconditionally calls utf8_put_char
on Windows).
[Apr 23 2026]
- ADDED: bracketed_paste(?Boolean) property to el_set/2 and el_get/2
Bracketed paste mode is now tracked per el_context and can be toggled
at runtime. el_set/2 immediately emits the enable (ESC[?2004h) or
disable (ESC[?2004l) sequence when the value changes; el_get/2 reads
the current state.
- ADDED: el_get/2 to query editline properties Initially supports
editor(?Editor), unifying with
emacs or vi via EL_EDITOR.
Unknown properties raise a domain_error(editline_property, _).
Package ltx2htm
[May 10 2026]
- RENAME: ltx2htm.doc -> ltx2htm.plx (LaTeX manual source)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- ENHANCED: ltx2htm: TOK_VERB / TOK_VERBATIM use UTF-8 round-trip The
verbatim/verb code paths still passed body strings through PL_CHARS /
PL_STRING (Latin-1), which double-encoded UTF-8 bytes when combined
with the new REP_UTF8 output side: a literal \verb= 'X²' = in source
rendered as 'X²' in HTML.
- ENHANCED: Make ltx2htm UTF-8 aware The tokenizer used to read .tex
files as raw bytes and warn for every byte >= 128, and the HTML writer
used PL_atom_chars which is Latin-1 only. As a result, .tex authors
had to route every non-ASCII glyph through urldef in pl.sty.
[Apr 30 2026]
- ENHANCED: Render non-ASCII urldefs as numeric HTML entities Adds
a block of \urldef{\Sname}\satom{<char>} entries to pl.sty for the
super-/subscript digits and a handful of math and typographic symbols
(≤, ≥, «, », €, ·, Dž), registers each \Sname in pldoc.cmd
as a known no-arg command, and adds explicit cmd/2 handlers in
sty_pldoc.pl that emit numeric HTML entities for these names.
Package mqi
[May 10 2026]
- RENAME: mqi.doc -> mqi.plx (LaTeX manual source) Co-Authored-By:
Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Package nlp
[May 10 2026]
- RENAME: nlp.doc -> nlp.plx (LaTeX manual source) Co-Authored-By:
Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Package odbc
[May 10 2026]
- RENAME: odbc.doc -> odbc.plx (LaTeX manual source) Co-Authored-By:
Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Package paxos
[May 10 2026]
- RENAME: paxos.doc -> paxos.plx (LaTeX manual source) Co-Authored-By:
Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Package pcre
[May 10 2026]
- RENAME: pcre.doc -> pcre.plx (LaTeX manual source) Co-Authored-By:
Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Package pengines
[May 10 2026]
- RENAME: pengines.doc -> pengines.plx (LaTeX manual source)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Package pldoc
[May 10 2026]
- FIXED: help/1: tell SGML parser the manual is UTF-8 load_man_object/4
opens the HTML file in binary mode (so byte offsets stay meaningful
for seek/3) and feeds the bytes to sgml_parse/2 directly. Without an
explicit encoding option the parser defaults to ISO-8859-1, so UTF-8
multi-byte sequences in the manual were read as Latin-1 codepoints
—
help(unicodesyntax) then displayed X² as X² and so on.
- RENAME: pldoc.doc -> pldoc.plx (LaTeX manual source) Co-Authored-By:
Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[Apr 30 2026]
- DOC: Add urldefs for non-ASCII Unicode characters used in manual
prose Adds a block of \urldef{\Sname}\satom{<char>} entries for the
super-/subscript digits and a handful of mathematical and typographic
symbols (≤, ≥, «, », €, ·, Dž). This lets doc2tex.pl in
swipl-devel translate `{<char>}` in .doc to
{\Sname} in .tex,
keeping the .tex ASCII-only and avoiding the non-ASCII warnings from
ltx2htm's tex.c reader.
Package plunit
[May 10 2026]
- RENAME: plunit.doc -> plunit.plx (LaTeX manual source) Co-Authored-By:
Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Package protobufs
[May 10 2026]
- RENAME: protobufs.doc -> protobufs.plx (LaTeX manual source)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Package redis
[May 10 2026]
- RENAME: redis.doc -> redis.plx (LaTeX manual source) Co-Authored-By:
Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Package semweb
[May 10 2026]
- RENAME: semweb.doc and rdflib.doc -> .plx (LaTeX manual source)
Update pkg_doc SOURCES reference. Co-Authored-By: Claude Opus 4.7
(1M context) <noreply@anthropic.com>
Package sgml
[May 10 2026]
- RENAME: sgml.doc -> sgml.plx (LaTeX manual source) Co-Authored-By:
Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Package ssl
[May 10 2026]
- RENAME: ssl.doc and crypto.doc -> .plx (LaTeX manual source) Update
pkg_doc SOURCES reference. Co-Authored-By: Claude Opus 4.7 (1M context)
<noreply@anthropic.com>
Package stomp
[May 10 2026]
- RENAME: stomp.doc -> stomp.plx (LaTeX manual source) Co-Authored-By:
Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Package sweep
[May 10 2026]
- RENAME: sweep.doc -> sweep.plx (LaTeX manual source) Update pkg_doc
SOURCES reference. Co-Authored-By: Claude Opus 4.7 (1M context)
<noreply@anthropic.com>
Package swipy
[May 10 2026]
- RENAME: swipy.doc -> swipy.plx (LaTeX manual source) Co-Authored-By:
Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Package table
[May 10 2026]
- RENAME: table.doc -> table.plx (LaTeX manual source) Co-Authored-By:
Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Package tipc
[May 10 2026]
- RENAME: tipc.doc -> tipc.plx (LaTeX manual source) Co-Authored-By:
Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Package utf8proc
[May 10 2026]
- RENAME: utf8proc.doc -> utf8proc.plx (LaTeX manual source)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- PORT: Fix detection and avoid dependency on
ssize_t
[May 4 2026]
- FIXED: Safe unification for unicode enum values.
[Apr 30 2026]
- ASAN: Add obsolete boundclasses The change is needed to establish the
1:1 correspondence to the boundclasses in the original utf8proc.h.
Prevents an ASAN error on my system.
[May 1 2026]
- TEST: Migrate normalize-hook tests to the unicode_atoms vocabulary
Rename the suite from utf8proc_normalize_hook to utf8proc_unicode_atoms
and reshape its tests to use the multi-valued unicode_atoms policy
that replaces the boolean unicode_normalize flag and
normalize(true)
read_term option. Cover all four modes (accept / nfc / error /
reject), stream_property/2 read-back, the flag-driven default, the
writeq combining-mark force-quoting (independent of this library),
and Trojan-source bidi-override rejection in unquoted atoms, quoted
atoms, and via \u escape bypass.
- DOC:
category(Cat) using Upper-lower
[Apr 30 2026]
- ENHANCED: Register utf8proc as the kernel's Unicode NFC hook
Loading library(unicode) now installs an NFC normaliser into
the SWI-Prolog kernel via PL_atom_normalize_hook(), enabling the
normalize(true) option of read_term/2,3 and read_clause/2,3 and the
default value of the unicode_normalize Prolog flag. The callback
uses utf8proc_map with STABLE|COMPOSE in place: NFC is always shorter
than or equal to the input, so the caller's buffer suffices and there
is no malloc/free dance. Also extend test_utf8proc.pl with a new
utf8proc_normalize_hook suite covering the round-trip, the per-call
option vs. flag-default override, the quoted-atom isolation rule,
and the writer's combining-mark force-quoting.
- DOC: Cross-reference unicode_syntax_version Prolog flag Note in
unicode_version/1 that the Unicode version reported by the library
may differ from the version of the SWI-Prolog source syntax classifier
exposed via the new read-only Prolog flag unicode_syntax_version.
[Apr 23 2026]
- SANDBOX: Declare the Unicode API as safe.
Package xpce
[May 10 2026]
- RENAME: xpce LaTeX manual sources .doc -> .plx Renames the
63 LaTeX .doc files under man/{course,interface,userguide}
and prolog/lib/{draw,trace/doc} to .plx, leaving the XPCE
man-card binary database under man/reference/ untouched. Updates
man/userguide/Makefile suffix rule. Co-Authored-By: Claude Opus 4.7
(1M context) <noreply@anthropic.com>
[May 8 2026]
- FIXED: Thread monitor icons Reported by Mike Elston.
[May 6 2026]
- MODIFIED: auto-copy behaviour Added the class variable auto_copy
to class
terminal_image as well. It now appears on text_item,
editor and terminal_image. It now defaults to @off for Windows
and MacOS and for now to @on for other platforms. To switch, use
this in the preferences file (or @off to disable on Linux, etc.).
- ENHANCED: SDL backend Windows font specs cover Thai and Yi Add
Leelawadee UI (Thai/Lao, ships since Windows 8) and Microsoft Yi
Baiti (Yi syllables, ships since Vista) to the mono, sans and serif
fallback chains.
- FIXED: PceEmacs class menu to edit a class raised a type error.
- FIXED: XPCE native finder icons.
- ENHANCED: SDL backend Windows font specs include CJK and symbol
fallbacks Pango uses the comma-separated family list as an explicit
fallback chain; it does not synthesise script-based fallbacks the way
Windows DirectWrite does for Notepad. The previous mono/sans/serif
specs covered only Latin, so Korean (Hangul), Japanese, Chinese and
many symbol/emoji glyphs rendered as tofu on Windows.
- ENHANCED: display: expose SDL on-screen-keyboard control Add three
methods on class display so the Prolog side can both control and
observe SDL's on-screen-keyboard policy without recompiling for
DEBUG output:
- ENHANCED: font <-domain unions Pango fontset coverage Drop the
vestigial X11-era
which argument (the row/column selector of
2D charset fonts) and replace it with family=[bool], mirroring
the new font ->member argument. The default takes the union of
PangoCoverage across the requested font and its fallback chain, so
the returned envelope is consistent with `->member`: a code point
outside the domain reliably fails `->member`, while emoji and other
fallback-covered ranges are now inside it. The full-unicode scan is
cached on the WsFont, so the cost is paid once per font.
- ENHANCED: font ->member walks Pango fallback chain
font ->member:
char consulted only the primary PangoFont, so it returned @off for
code points (e.g. U+1F600) that the system can in fact render via
Pango fallback. Walk a PangoFontset for the font's description by
default, and add an optional family=[bool] argument so callers can
force the old "primary font only" check.
[May 5 2026]
- DEBUG: optional terminal trace under -DXPCE_TERM_TRACE Build with
-DXPCE_TERM_TRACE to compile in caret / byte-stream diagnostic logging.
At runtime set XPCE_TERM_TRACE=<file> (or "1" for xpce-terminal.log
in CWD) to capture activity. Inert and zero-cost without the macro:
tlog collapses to ((void)0).
- FIXED: str_ring_alloc no longer marks the result s_readonly s_readonly
was an overload: most call sites (str_set_static, staticCtoString,
the "(nil)" sentinel) used it to mean "permanent, shareable storage"
and rely on initialiseCharArray's fast-path to share the pointer.
str_ring_alloc, in contrast, hands out a temporary slot that the
next 16 ring allocations recycle — sharing that pointer is fatal.
- TEST: SMP code-point fragment positions and round-trip
test_fragment_smp.pl exercises text_buffer indexing and fragment
start/length values on supplementary-plane code points. 10 tests
covering buffer size in code points,
get(_,character,Pos,_) at and
after an SMP slot, fragment start/length pointing past or spanning
an SMP code point (emoji and CJK Ext B), shift_fragments accounting
on insert before/across an SMP, a 10-cluster stress, and save/load
round-trip through the textbuffer file format.
- CLEANUP: drop now-dead surrogate-aware paths - u16_range_length:
the SIZEOF_WCHAR_T == 2 branch returned
len
unchanged because Windows used to store UTF-16 in the buffer, so a
range of N code units held N UTF-16 units. charW is now uint32_t on
Windows too; each SMP code point in the range needs to count as 2
UTF-16 units regardless of platform.
[May 4 2026]
- FIXED: replace wint_t with uchar_t (or int) for code-point holders
MinGW's wint_t is unsigned short (16 bits), so any code-point value
flowing through a wint_t variable on Windows truncates SMP code points
to their low 16 bits. This was the actual cause of emoji rendering
as PUA glyphs (U+1F30F shown as U+F30F): not the storage type, but
the many `wint_t c =
str_fetch(...)` and similar declarations between
the buffer and the rendering call.
- FIXED: c_width takes uchar_t (full code point) not wint_t MinGW's
wint_t is unsigned short (16 bits), so the (wint_t) casts at the
textimage do_fill_line call sites truncated the supplementary- plane
code points stored in tc->value.c (now charW = uint32_t on Windows).
The result: emoji rendered as their PUA-area low 16 bits (e.g. U+1F30F
shown as U+F30F) and produced .notdef glyphs from Pango.
- FIXED: F_UTF8_ENCLENW iterates code points, not wchar_t units The
previous loop encoded each wchar_t as a separate UTF-8 sequence.
On 16-bit wchar_t platforms (Windows) a supplementary-plane code
point arrives as a UTF-16 surrogate pair, so each half emitted its own
3-byte UTF-8 sequence (and the surrogate's encoding form is invalid
UTF-8 anyway). Use
get_wchar() to combine pairs first; on 32-bit
wchar_t platforms get_wchar is a per-element copy.
- ENHANCED: regex CHR width matches charW unconditionally Previously
CHRBITS / CHR_MAX were 16 / 0xffff under WINDOWS and 32 /
0x3fffffff elsewhere, because chr typedef'd to charW (=wchar_t)
which is 16-bit on MinGW. charW is now uint32_t on platforms where
wchar_t is too narrow (Windows), so the regex chr can hold the full
Unicode code-point range there too. The colormap is a sparse trie
(NBYTS = 4 levels), so the larger CHR_MAX does not bloat memory.
- FIXED: ENC_WCHAR stream read paths emit wchar_t, not charW Sread_object
(iostream.c) and pceRead's wide branch (asfile.c) wrote the internal
charW buffer into the stream's wchar_t-typed output buf and reported
the byte count as wchar_t-units. When charW is wider than wchar_t
(Windows post-flip) that wrote 4-byte values into a 2-byte-per-unit
buffer. Convert at the boundary: assign wchar_t per charA-source
char; surrogate-encode via charW_to_wchar for charW sources. When the
typedef equality holds (Linux always; Windows when charW falls back
to wchar_t) the conversion folds to a per-element copy.
- FIXED: route wchar_t<->charW boundary conversions through charW.h charW
is now uint32_t when wchar_t is too narrow (Windows). Adjust the
few sites where wchar_t (caller) meets charW (storage) so neither
side silently truncates supplementary-plane code points:
- ENHANCED: charW falls back to uint32_t when wchar_t is too narrow
On Windows wchar_t is 16-bit UTF-16; storing a supplementary-plane
code point as a surrogate pair across two slots is the source of the
fragment-indexing bug fixed in subsequent commits. Pick uint32_t for
charW when WCHAR_MAX <= 0xFFFF, and otherwise keep charW as wchar_t
— on Linux/macOS the typedef stays exactly as it was, so the pair
of boundary conversions added alongside fold to compile-time identity
and the binary is unchanged.
- ADDED: charW.h boundary helpers between wchar_t and internal charW
Adds wchar_to_charW / charW_to_wchar inlines (and length variants).
Used at the few external boundaries where xpce sees wchar_t buffers
from outside (host embedding API, ENC_WCHAR streams, Win32 file APIs).
Internal text-handling will keep its charW typing.
- FIXED: c_width returns reasonable width for UTF-16 surrogate halves On
Windows (wchar_t==16) supplementary-plane code points reach c_width as
surrogate halves. Encoding a lone surrogate as UTF-8 and asking Pango
for its advance returned the .notdef-glyph width — typically much
wider than the actual emoji or wide glyph the paired UTF-8 renders.
The accumulated x in do_fill_line then over-advanced past where the
glyph was painted, leaving a multi-cell whitespace gap to the right
of each emoji. Account the cluster's width once on the lead (two
average char widths, matching uchar_display_width's column count of
2 for SMP wide glyphs) and let the trail contribute zero.
- ENHANCED: uchar_display_width recognises UTF-16 surrogate halves
On Windows (wchar_t==16) supplementary-plane code points appear
as surrogate pairs in xpce's wide-char buffers. Treat the lead as
double-width (the SMP ranges xpce typically encounters — emoji,
CJK Extensions B-G — are all wide) and the trail as zero-width so
each pair contributes exactly two columns to vcol, the per-cluster
painter groups the pair as one wide cluster, and grapheme_cluster_end /
_start advance by exactly one emoji per arrow press.
- FIXED: textbuffer file load/save handles UTF-16 surrogate pairs On
Windows (wchar_t==16) a supplementary-plane code point is stored as a
UTF-16 surrogate pair in the wide-char buffer. insert_file_textbuffer
truncated SMP code points by direct assignment; save_textbuffer fed
each surrogate half to Sputcode separately, producing invalid UTF-8.
Use put_wchar / get_wchar so loaded files contain proper pairs
and saved files emit the combined code point. No-op on Linux
(wchar_t==32).
[May 3 2026]
- ENHANCED: route THasSyntaxEx fallbacks through the host Add
five callback slots to pce_callback_functions — is_letter,
is_word_char, is_layout, is_digit, is_endsline — named after xpce's
own syntax-table categories. Returning bool, since they are pure
predicates. Host wrappers in itf/interface.c; swipl/interface.c maps
each to one or more PL_is_* shims (and hard-codes is_endsline's seven
line-terminator codepoints, which need no Prolog API of their own).
- FIXED: terminal paint_chunks splits per cluster for non-ASCII text When
the system fixed-width font has no glyphs for a script (e.g. Thai),
Pango falls back to a proportional font and shapes a multi-char run at
its natural advance. A line painted as one Pango call rendered fine,
but with a selection — three chunks (before-sel / sel / after-sel)
— each chunk reshaped independently and the after-sel half drifted
right because it started at a column-grid x rather than where the
unselected line had placed those glyphs.
[May 2 2026]
- ENHANCED: route
uchar_display_width()'s wcwidth fallback through the
host Add a wcwidth slot to pce_callback_functions (replacing pad17),
expose hostWcWidth() in itf/interface.c, and let h/charwidth.h's static
inline delegate to it instead of calling system wcwidth(3) directly.
The swipl-side callback registers PL_wcwidth(), so xpce now uses
the same implementation as pl-read.c / pl-write.c regardless of the
process LC_CTYPE.
- FIXED: convert sel_*_char from cell index to visual column when
painting In rlc_redraw the selection bounds (sel_start_char /
sel_end_char) are cell indices — they index tl->text[] in
rlc_set_selection's snap helpers and in rlc_read_from_window, and
they're already documented that way at the call sites of rlc_snap_start
/ rlc_snap_end. rlc_paint_text in contrast takes visual-column
bounds (see its own header comment). On a line without combining
marks the two are interchangeable, so this never showed up before;
on NFD content (e.g. Thai with U+0E31, U+0E35) the cell index drifts
ahead of the visual column by one per combining mark, and the painted
selection extends past where the user dragged.
[Apr 28 2026]
- FIXED: forward/backward char and delete move by grapheme cluster not
code point For NFD text a visible character may span multiple code
points (a base followed by one or more combining marks). The four
basic editing operations previously stepped one code point at a time,
leaving the caret stranded inside a cluster or deleting only part of
a grapheme.
- TEST: unicode_heavy.pl — Prolog test file for NFD and wide-char
rendering Add tests/unicode_heavy.pl with syntactically valid Prolog
that stresses editor and terminal rendering of:
- ADDED: nfd_style instance variable for visual NFD cluster highlighting
Add an
nfd_style instance variable (Style*) to both Editor and
TerminalImage, mirroring the existing selection_style pattern.
When set to a style with a background colour, every NFD grapheme
cluster — a base character followed by one or more zero-width
combining marks — is drawn with that background. The default is @nil
(disabled), so existing editors are unaffected.
[Apr 27 2026]
- FIXED: paint_line renders Unicode highlights without visual drift
Four related fixes across txt/textimage.c and sdl/sdldraw.c:
- FIXED: Snap selection endpoints to grapheme cluster boundaries
selectionExtendEditor computes the selection from/to from the mouse
click position. If that position falls inside a combining-mark cluster
(e.g. a Thai vowel sign such as ◌ั following its base consonant),
the raw character index points mid-cluster: the base character's
x position is shared by the following zero-advance combiners, so
selecting or deselecting just the combiner produces a mis-sized
highlight.
- FIXED: Selection drag corrupts text at grapheme cluster boundaries
When a mouse selection boundary fell inside a grapheme cluster
(a base character followed by one or more zero-advance combining
marks, such as the Thai above-base vowel signs ั ี ่ ้ in
สวัสดีชาวโลก), the
paint_line() run-grouping
loop split the cluster across two separate s_printW() calls.
Pango then rendered base and combining mark as independent glyphs,
causing visual corruption visible whenever the TXT_HIGHLIGHTED
attribute changed between the base and its mark.
- TEST: Add unicode_heavy.pl — public domain Unicode stress test
Valid Prolog file using NFD combining marks (acute, diaeresis, tilde,
cedilla, circumflex) as literal UTF-8 bytes alongside East-Asian
double-width characters (CJK ideographs, hiragana, katakana, hangul).
- TEST: Add test_editor_unicode.pl for Unicode visual-column tracking
Five PLUnit tests exercise the new Editor <-visual_column getter:
- FIXED: editor column functions account for CJK wide and combining
chars
getColumnEditor() and getColumnLocationEditor() previously
counted every non-tab character as one visual column. This broke
caret positioning after CJK double-width characters (each counts as 2)
and after combining marks (each counts as 0).
- ENHANCED: Add vcol visual-column cache to TextChar; populate in
do_fill_line() struct text_char gains a `short vcol` field that
records the 0-based visual column of each character on a displayed
line. do_fill_line() fills it alongside the existing pixel-x field,
advancing current_vcol by uchar_display_width() for each character
(0 for combining marks, 2 for CJK wide chars, 1 for everything else).
- REFACTOR: Extract
uchar_display_width() into shared h/charwidth.h Move
the Unicode display-width logic (combining=0, CJK wide=2, other=1)
from a static function in txt/terminal.c into a new static-inline
header h/charwidth.h. This makes the same function available to
txt/textimage.c and txt/editor.c without code duplication.
[Apr 26 2026]
- FIXED: Both set primary and secoodary selection in
display->copy
Package yaml
[May 10 2026]
- RENAME: yaml.doc -> yaml.plx (LaTeX manual source) Co-Authored-By:
Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Package zlib
[May 10 2026]
- RENAME: zlib.doc -> zlib.plx (LaTeX manual source) Co-Authored-By:
Claude Opus 4.7 (1M context) <noreply@anthropic.com>