Did you know ... Search Documentation:
Title for pldoc(default)
Branch: development (switch to stable),
version to version

SWI-Prolog Changelog from version 10.1.6 to 10.1.7

[May 10 2026]

  • SUBMODULES: Bump pldoc for help/1 UTF-8 fix Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
  • SUBMODULES: Bump pointers for documentation rework 34 submodules carry the .doc -> .plx LaTeX-source rename; ltx2htm additionally carries the UTF-8 awareness (PL_unify_chars REP_UTF8 on input, PL_get_chars REP_UTF8 + PL_UTF8_STRING on output, charset meta tag in HTML). Companion to the master-side switch to lualatex and the Unicode-Prolog-source examples in overview.plx.
  • RENAME: SWI manual sources .doc -> .plx The '.doc' extension predates Microsoft Word's appropriation of it and confuses file managers, editors (no LaTeX syntax highlighting), and new contributors. Rename to '.plx' (Prolog LaTeX) — single extension, no collision inside SWI-Prolog, .gitattributes maps it to TeX for editor support and diff hunks.
  • DOC: Use literal Unicode in the Unicode Prolog source section Drop the {...} urldef wrappers around literal characters described in the section (², ³, ¹, ⁰..⁹, ₀..₉, Dž, «hello, world», «»). Add a code block at the end of the section with concrete, runnable ?- queries covering: superscript variables, Devanagari Nd via atom_number/2, Unicode symbols as solo atoms, bracket pairs Ps/Pe, quote pairs Pi/Pf.
  • DOC: overview.doc: use literal Unicode characters Smoke test for the new UTF-8 pipeline: drop the {...} urldef wrapping around ≤, €, and ·. Both lualatex and ltx2htm now read these bytes directly. The rendered output (PDF and HTML) is unchanged.
  • DOC: Move lualatex font setup from main.doc into pl.sty ltx2htm parses main.doc as TeX and emits warnings for unknown commands like \ifdefined and \directlua. .sty files are not parsed as TeX (they are only read for urldef extraction), so the conditional becomes invisible. Also override fontspec's \strong via providecommand + renewcommand so pl.sty's bold-strong definition wins under lualatex.
  • DOC: Build PDF manual with lualatex instead of pdflatex Switches the PDF documentation toolchain to lualatex. The engine swap is the first step of a wider effort to allow literal Unicode in .doc and Markdown sources (currently routed through urldef in pl.sty).
  • WASM: Included library(unicode)

[May 7 2026]

[May 8 2026]

  • DOC: PIP §4.11 — describe the four-mode unicode_atoms policy as shipped The §4.11 "Pluggable Unicode normalisation" body still described the old design — a Boolean read_term/2 option normalize(Bool) backed by a Boolean Prolog flag unicode_normalize — neither of which exist in the current implementation. What actually shipped is the four-mode unicode_atoms family (accept / nfc / error / reject), surfaced through:
  • ENHANCED: code_type/2 prolog_layout + prolog_end_of_line; rebind end_of_line The Pattern_White_Space set used by the Prolog reader for layout had no user-facing accessor in code_type/2 / char_type/2. The existing end_of_line was extended in the line-termination commit to the seven Pattern_White_Space line-terminator-like code points; that broadened the ISO Prolog meaning silently.

[May 7 2026]

  • DOC: stray-character policy; quoted material accepts any Unicode scalar After Stage 4's cat-direct dispatch refactor, the parser's default arm of the c >= 0x80 switch raises syntax_error(illegal_character) for every U_CAT_OTHER code point. That is broader than what man/overview.doc still claimed ("Control and unassigned (C*) characters produce a syntax error ... outside quoted atoms/strings and outside comments"), and narrower than what the section claimed about quoted material (it was silent on what is allowed there).
  • DOC: lock atom_number/2 family Unicode policy with tests + manual prose Empirical audit of str_number() in pl-read.c against the policy documented in plan A's M2 doc fix confirms the kernel already enforces both rules:
  • DOC: PIP §4.4.1 — line termination on Pattern_White_Space line-enders Add a subsection to White space recording the seven line-terminator-like Pattern_White_Space code points (LF, VT, FF, CR, NEL, LS, PS) and the three places they apply: %-comment termination, source-position line counter, and backslash-newline continuation in quoted strings. Calls out that the remaining four Pattern_White_Space members (SPACE, LRM, RLM) are layout but not line-enders. Cross-references code_type(C, end_of_line) / char_type(C, end_of_line) for user-code access to the same set.
  • DOC: document the seven line-terminator-like Pattern_White_Space chars Two manual updates to match the implementation:
  • ENHANCED: terminate %-comments on every Pattern_White_Space line-ender The %-comment scanner, the line-counter, the block-comment LF preservation, the \\<newline> continuation in raw_read_quoted, the ensure_space whitespace-collapsing macro, and the escape_char \\<EOL>* skip-blanks site all checked c == '\n' (LF only). Three of the seven Pattern_White_Space line-terminator-like code points were silently ignored: NEL (U+0085), LS (U+2028), PS (U+2029). ASCII CR / VT / FF were also missed by the comment scanner.
  • DOC: PIP unicode.md — paired delimiters and PL_wcwidth vs POSIX Sync the PIP draft with the design landed on this branch.
  • ENHANCED: dispatch tokeniser directly on u_category; drop uflagsW The uflags_map byte already carried the u_category enum, but every read of it went through cat_to_flags[] back to the legacy U_* bitmask, then a Pl*W macro tested specific bits. The cat_to_flags indirection forced PlSoloW to match four distinct categories (U_CAT_SOLO, U_CAT_BRACKET, U_CAT_QUOTE, U_CAT_ID_CONTINUE_SOLO), which is why case_solo had a post-hoc pl_pair_lookup demux to re-separate brackets from quotes from solo atoms.
  • TYPE: pl_pair_lookup is_open is bool; clean up generator function layout Two cleanups:
  • ENHANCED: extend code_type/2 paren(Close) and add a matching quote(Close) paren(Close) used to know only the three ASCII bracket pairs (), {} and []; quote(Close) didn't exist at all. Both now back onto the same pl_pair_table that drives the parser, so they cover the full Unicode Ps/Pe and Pi/Pf sets out of the box.
  • ENHANCED: read Unicode quote pairs (Pi/Pf) as literal strings Quote pairs were previously parsed like brackets — content was interpreted as a Prolog term and wrapped in '<open><close>'/1. That is the wrong shape for quotation marks, where the contained text is naturally literal (cf. ASCII '...', "...", ...).
  • ENHANCED: drop mk_wcwidth.c; PL_wcwidth replaces it everywhere After Stage 5 PL_wcwidth() reads its values directly from the per-code-point width slot in uflags_map, leaving mk_wcwidth.c as a thin wrapper that just forwarded to PL_wcwidth. With no remaining internal nor submodule callers (the libedit and xpce typedefs of uchar_t are local to those packages), the wrapper is pure dead weight.
  • DOC: paired delimiters, wcwidth source, prolog_syntax_map.pl header (Stage 7) Three doc updates that close the loop on the Unicode-syntax refactor:
  • ENHANCED: parse Unicode bracket pairs as '<open><close>'/1 compounds Lift the existing {Term} ⇒ '{}'(Term) reader path to a generic paired-delimiter routine and wire every Unicode bracket pair through it.
  • ENHANCED: pack uflags_map into 4-bit category enum + 2-bit wcwidth slot The per-code-point byte in uflags_map was a bag of 8 disjoint U_* flag bits, almost full (7 of 8 used). Repack the same byte:
  • DOC: NBSP is no longer whitespace; document Pattern_White_Space-only layout The previous text claimed SWI-Prolog kept treating U+00A0 as whitespace for backward compatibility with ISO Latin-1. That was true under the legacy 256-entry _PL_char_types[] table; with the parser cutoff moved to < 0x80 the Unicode flag table is the single source of truth, and NBSP (which is not in Pattern_White_Space) raises a stray-character syntax error outside quoted material.
  • ENHANCED: classify code points >= 0x80 via the Unicode flag table The legacy ISO Latin-1 _PL_char_types[] used to extend to 256 entries, giving an SWI-specific opinion about each byte 0x80..0xff. Those entries duplicated and occasionally disagreed with the Unicode flag table (uflags_map[0]) that already covers the same code points correctly per UAX #31. Two parallel mechanisms in the 0x80..0xff range invited drift.
  • DOC: align manual with implemented Unicode source syntax Two small documentation updates that bring man/ and src/Unicode/ in sync with the parser's actual behaviour:

[May 6 2026]

[May 5 2026]

  • UNICODE: Updated unicode_block/3 to Unicode 17.0
  • FIXED: Windows init_locale also sets LC_CTYPE to UTF-8 Forcing LD->encoding to ENC_UTF8 unconditionally on Windows in the prior commit left the C runtime's LC_CTYPE locale at its default ("C" locale) — and so mbrtowc / wcrtomb were ASCII-only. Any UTF-8 byte above 0x7F flowing through a path that uses mbrtowc to canonicalise text (PL_canonicalise_text in pl-text.c) returned EILSEQ, surfacing as a "Syntax error: illegal_multibyte_sequence" the moment a user typed an emoji or other non-ASCII char in the libedit prompt.

[May 4 2026]

  • ENHANCED: default encoding to UTF-8 on Windows The system locale on Windows usually reports a legacy codepage (Windows-1252 or similar), making the default Prolog flag encoding ANSI/Latin-1. This caused UTF-8 source files to be read as their byte-wise Latin-1 interpretation. UTF-8 is the de-facto encoding for source files and the Windows C runtime's locale-based wide-character functions are weaker than the Unicode tables Prolog uses internally, so always use UTF-8 as the default on Windows.

[May 3 2026]

  • TYPE: PL_is_id_start, PL_is_id_continue, PL_is_uppercase, PL_is_decimal, PL_is_layout return bool Pure predicates; bool conveys the contract better than int.
  • ADDED: PL_is_id_start, PL_is_id_continue, PL_is_uppercase, PL_is_decimal, PL_is_layout Thin shims around the existing uflagsW() table (src/pl-umap.c, generated from src/Unicode/derived_core_properties.pl) so foreign extensions and embedded toolkits — notably xpce — classify code points exactly as SWI-Prolog does, without needing the locale-dependent POSIX iswX() or inventing their own tables. Documented in man/foreign.doc.
  • ENHANCED: Use mk_wcwidth() in the kernel for locale-independent width Switch every kernel caller (pl-read, pl-write, pl-fli, pl-fmt, pl-stream, pl-ctype) and PL_wcwidth() to the bundled mk_wcwidth() instead of the system wcwidth(3), which under C/POSIX returns -1 for non-ASCII on glibc. Always link mk_wcwidth.c (drop the HAVE_WCWIDTH guard) and add the modern emoji wide ranges (Kana Supplement, Mahjong/Domino, Misc Symbols & Pictographs, Symbols & Pictographs Ext-A) so column counts match what xpce already returns.
  • FIXED: unload_file/1: clear isfile so use_module/1 reloads unloadFile() left sf->isfile set, so a subsequent use_module/1 saw the file as already loaded and skipped its directives — notably :- use_foreign_library/1. Reset isfile, move garbage_collect_clauses to Prolog (unload_file/1) and call '$clear_source_admin'/1 there.
  • DOC: atom_normalize_hook Prolog flag Add a concise entry for the new atom_normalize_hook flag and note in the unicode_atoms entry that mode error falls back to the wcwidth-based check when the hook is not registered, which can over-reject scripts (e.g. Thai) that use combining marks in NFC.
  • TEST: syntax_unicode_atoms: use atom_normalize_hook and term_string Replace current_module(unicode) hook-state probes with the new atom_normalize_hook Prolog flag (reliable across rerun). Switch the suite to term_string/3 with explicit string literals throughout, add a thai_hello_world/1 helper to demonstrate the wcwidth-fallback false-positive on NFC Thai text and verify that the precise utf8proc check accepts it.
  • ADDED: foreign_library_property/2 to query foreign library properties

[May 2 2026]

  • DOC: PL_wcwidth() in foreign.doc Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
  • ADDED: PL_wcwidth() — locale-portable display-column width for a code point Add PL_wcwidth(int) to libswipl so foreign extensions and embedded GUI toolkits can ask for the column width of a Unicode code point with the same answer as the rest of SWI-Prolog, regardless of the process LC_CTYPE or platform. On Unix/macOS it forwards to system wcwidth(3); on Windows it goes through Markus Kuhn's mk_wcwidth() table.
  • ENHANCED: Fold error and auto-load into atom_to_unicode_atoms_ex Rename atom_to_unicode_atoms to atom_to_unicode_atoms_ex (the _ex suffix marks helpers that raise Prolog exceptions on failure) and absorb the boilerplate every caller had to write:

[May 1 2026]

  • ENHANCED: Cache hook-load attempt to avoid repeated retries ensure_unicode_normalize_hook(false) was retrying '$install_unicode_normalize_hook'/0 on every call once the load had failed. Track the attempt in GD->atoms.normalize_hook_load_attempted so subsequent calls fall through immediately. PL_cleanup re-zeroes GD on next initialisation, so the cache resets across an embedded Prolog teardown/restart cycle (a static would not).
  • ENHANCED: Auto-load library(unicode) for unicode_atoms(nfc) at every entry point The Prolog flag's active setter has auto-loaded library(unicode) for modes nfc/error since the original landing, but read_term/2,3 and read_clause/2,3 with unicode_atoms(nfc) were left to error with existence_error(hook, unicode_normalize) when the library was not already loaded. set_stream/2 and open/4 had the auto-load logic duplicated inline.
  • ENHANCED: Loosen unicode_atoms(error) and centralise the value docs Per PIP review:
  • ENHANCED: Multi-valued unicode_atoms policy and Trojan-source bidi reject Replace the boolean unicode_normalize flag and the normalize(Bool) read_term option with a single multi-valued unicode_atoms policy that follows the same three-tier hierarchy as encoding (Prolog flag -> stream property -> per-call option), and unconditionally reject Unicode bidi-override / isolate code points (U+202A..U+202E and U+2066..U+2069) in source tokens, quoted strings and comments as a defence against the Trojan-source attack (CVE-2021-42574).

[Apr 30 2026]

  • ENHANCED: Pluggable Unicode normalisation in reader and writer Add a kernel-level callback for Unicode normalisation, and use it to give read_term/2,3 and read_clause/2,3 a normalize(Bool) option that NFC-normalises the text of unquoted atoms before interning. Quoted atoms and string literals are byte-faithful and not touched. Also force-quote atoms holding combining marks under writeq's quoted(true), which makes denormalised text visible and is independent of the normalisation library (it uses wcwidth only).
  • DOC: Route non-ASCII Unicode in manual prose through urldefs Add a block of \urldef{\Sname}\satom{<char>} entries to man/pl.sty for the super-/subscript digits and a handful of math and typographic symbols used in §sec:unicodesyntax (≤, «, €, ·, Dž). With these in place, man/doc2tex.pl auto-translates the literal {<char>} forms in overview.doc to {\Sname} in the generated .tex, so the .tex stays ASCII-only and no longer triggers ltx2htm's non-ASCII-byte warnings.
  • DOC: Update Unicode syntax section Rewrite man/overview.doc §sec:unicodesyntax to reflect the new rules: XID_Start/XID_Continue identifiers, super- and subscript digits as identifier-continue extension, Pattern_White_Space layout set, all-S?/P? as solo (with a note that this is a deliberate break from the previous glueing behaviour), Lu-only uppercase (so Lt letters now start atoms), and a hedge that NBSP is still treated as whitespace by the legacy ISO Latin-1 table. Add a flag entry for the new read-only unicode_syntax_version and fix a "curently" typo in max_char_code.
  • TEST: Add Unicode syntax tests and a demo source file tests/core/test_syntax_unicode.pl exercises the new XID-based identifier rules, super- and subscript continuation, Pattern_White_ Space layout (LRM), all-S?/P? as solo (≤, ≤≤, «, €), Lt starting an atom (Dž), mixed-script number rejection, same-script Devanagari digits, and the new unicode_syntax_version flag. Auto-discovered by tests/test.pl; runs as part of swipl:core.
  • ENHANCED: Wire subscript digits and add unicode_syntax_version flag Wire subscript digits (₀..₉) as XID_Continue extension in prolog_syntax_map.pl, alongside the existing superscripts, and regenerate pl-umap.c. Read the Unicode version from the header of DerivedCoreProperties.txt and emit a tiny pl-umap-version.h header with UNICODE_SYNTAX_VERSION. Register a read-only Prolog flag unicode_syntax_version in pl-prologflag.c, distinct from unicode:unicode_version/1 (which reports the linked utf8proc's view of Unicode and may differ from the kernel classifier's).
  • DOC: Flag max_code_point is 0x10ffff on all platforns for a while.

[Apr 29 2026]

  • MODIFIED: Unicode interpretation and updated to Unicode 17 (was 14).

[Apr 30 2026]

[Apr 23 2026]

  • ADDED: el_get/2 and runtime bracketed_paste control el_get(Stream, editor(?Editor)) reads the current libedit editor (emacs / vi). el_set/2 and el_get/2 grow a bracketed_paste(?Bool) property, and editline.pl routes it through a new enable_bracketed_paste/1 helper that skips (and unbinds) in vi mode.

Package PDT

[May 10 2026]

  • RENAME: PDT.doc -> PDT.plx (LaTeX manual source) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Package RDF

[May 10 2026]

  • RENAME: RDF.doc -> RDF.plx (LaTeX manual source) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Package archive

[May 10 2026]

  • RENAME: archive.doc -> archive.plx (LaTeX manual source) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Package bdb

[May 10 2026]

  • RENAME: bdb.doc -> bdb.plx (LaTeX manual source) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Package clib

[May 10 2026]

  • RENAME: clib.doc -> clib.plx (LaTeX manual source) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Package cpp

[May 10 2026]

  • RENAME: pl2cpp.doc -> pl2cpp.plx (LaTeX manual source) Update pkg_doc SOURCES reference. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

[May 4 2026]

  • ENHANCED: PlTerm::unify_blob() takes std::unique_ptr<T> for any T derived from PlBlob The unique_ptr<PlBlob> overload became a function template unify_blob(std::unique_ptr<T>*) with a static_assert that T derives from PlBlob. Callers can now pass std::unique_ptr<MyBlob>* directly without an upcast through std::unique_ptr<PlBlob>; the obsolete make_unique() workaround in the docs is removed.

Package cql

[May 10 2026]

  • RENAME: cql.doc -> cql.plx (LaTeX manual source) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Package http

[May 10 2026]

  • RENAME: http.doc -> http.plx (LaTeX manual source) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Package jpl

[May 10 2026]

  • RENAME: jpl.doc -> jpl.plx (LaTeX manual source) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Package json

[May 10 2026]

  • RENAME: json.doc -> json.plx (LaTeX manual source) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Package libedit

[May 10 2026]

  • RENAME: libedit.doc -> libedit.plx (LaTeX manual source) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

[May 6 2026]

  • ADDED: bind ^Z to send EOF on Windows Matches the platform convention for end-of-file (Unix uses ^D). Routed through libedit's built-in em-delete-or-list / vi-list-or-eof, which already return CC_EOF on an empty line.

[May 5 2026]

  • FIXED: el_cursor() takes wchar_t units, electric() hands code points The Prolog electric() handler computes Move = Index - Len, where both Index (matching_open's position) and Len (string_length of Before) are CODE-POINT counts. The C wrapper passed Move straight to el_cursor(), which adds it to el_line.cursor as wchar_t units.
  • ENHANCED: install PL_wcwidth as libedit's wcwidth implementation libedit no longer ships its own mk_wcwidth table; el_set(EL_WCWIDTH, fn) injects one. Wire PL_wcwidth here so swipl and swipl-win share the kernel's src/mk_wcwidth.c with pl-write, pl-fmt, xpce — one table, one source of truth, no more drift between layers.
  • FIXED: pl_line and el_history_encoded use REP_EL not REP_MB PL_unify_chars(PL_STRING|REP_MB, ...) decodes the bytes via the C runtime's mbrtowc. On Windows with the legacy ANSI codepage as LC_CTYPE that fails for any byte above 0x7F — and libedit hands us UTF-8 bytes there (ct_encode_char unconditionally calls utf8_put_char on Windows).

[Apr 23 2026]

  • ADDED: bracketed_paste(?Boolean) property to el_set/2 and el_get/2 Bracketed paste mode is now tracked per el_context and can be toggled at runtime. el_set/2 immediately emits the enable (ESC[?2004h) or disable (ESC[?2004l) sequence when the value changes; el_get/2 reads the current state.
  • ADDED: el_get/2 to query editline properties Initially supports editor(?Editor), unifying with emacs or vi via EL_EDITOR. Unknown properties raise a domain_error(editline_property, _).

Package ltx2htm

[May 10 2026]

  • RENAME: ltx2htm.doc -> ltx2htm.plx (LaTeX manual source) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
  • ENHANCED: ltx2htm: TOK_VERB / TOK_VERBATIM use UTF-8 round-trip The verbatim/verb code paths still passed body strings through PL_CHARS / PL_STRING (Latin-1), which double-encoded UTF-8 bytes when combined with the new REP_UTF8 output side: a literal \verb= 'X²' = in source rendered as 'X²' in HTML.
  • ENHANCED: Make ltx2htm UTF-8 aware The tokenizer used to read .tex files as raw bytes and warn for every byte >= 128, and the HTML writer used PL_atom_chars which is Latin-1 only. As a result, .tex authors had to route every non-ASCII glyph through urldef in pl.sty.

[Apr 30 2026]

  • ENHANCED: Render non-ASCII urldefs as numeric HTML entities Adds a block of \urldef{\Sname}\satom{<char>} entries to pl.sty for the super-/subscript digits and a handful of math and typographic symbols (≤, ≥, «, », €, ·, Dž), registers each \Sname in pldoc.cmd as a known no-arg command, and adds explicit cmd/2 handlers in sty_pldoc.pl that emit numeric HTML entities for these names.

Package mqi

[May 10 2026]

  • RENAME: mqi.doc -> mqi.plx (LaTeX manual source) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Package nlp

[May 10 2026]

  • RENAME: nlp.doc -> nlp.plx (LaTeX manual source) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Package odbc

[May 10 2026]

  • RENAME: odbc.doc -> odbc.plx (LaTeX manual source) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Package paxos

[May 10 2026]

  • RENAME: paxos.doc -> paxos.plx (LaTeX manual source) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Package pcre

[May 10 2026]

  • RENAME: pcre.doc -> pcre.plx (LaTeX manual source) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Package pengines

[May 10 2026]

  • RENAME: pengines.doc -> pengines.plx (LaTeX manual source) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Package pldoc

[May 10 2026]

  • FIXED: help/1: tell SGML parser the manual is UTF-8 load_man_object/4 opens the HTML file in binary mode (so byte offsets stay meaningful for seek/3) and feeds the bytes to sgml_parse/2 directly. Without an explicit encoding option the parser defaults to ISO-8859-1, so UTF-8 multi-byte sequences in the manual were read as Latin-1 codepoints — help(unicodesyntax) then displayed X² as X² and so on.
  • RENAME: pldoc.doc -> pldoc.plx (LaTeX manual source) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

[Apr 30 2026]

  • DOC: Add urldefs for non-ASCII Unicode characters used in manual prose Adds a block of \urldef{\Sname}\satom{<char>} entries for the super-/subscript digits and a handful of mathematical and typographic symbols (≤, ≥, «, », €, ·, Dž). This lets doc2tex.pl in swipl-devel translate `{<char>}` in .doc to {\Sname} in .tex, keeping the .tex ASCII-only and avoiding the non-ASCII warnings from ltx2htm's tex.c reader.

Package plunit

[May 10 2026]

  • RENAME: plunit.doc -> plunit.plx (LaTeX manual source) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Package protobufs

[May 10 2026]

  • RENAME: protobufs.doc -> protobufs.plx (LaTeX manual source) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Package redis

[May 10 2026]

  • RENAME: redis.doc -> redis.plx (LaTeX manual source) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Package semweb

[May 10 2026]

  • RENAME: semweb.doc and rdflib.doc -> .plx (LaTeX manual source) Update pkg_doc SOURCES reference. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Package sgml

[May 10 2026]

  • RENAME: sgml.doc -> sgml.plx (LaTeX manual source) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Package ssl

[May 10 2026]

  • RENAME: ssl.doc and crypto.doc -> .plx (LaTeX manual source) Update pkg_doc SOURCES reference. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Package stomp

[May 10 2026]

  • RENAME: stomp.doc -> stomp.plx (LaTeX manual source) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Package sweep

[May 10 2026]

  • RENAME: sweep.doc -> sweep.plx (LaTeX manual source) Update pkg_doc SOURCES reference. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Package swipy

[May 10 2026]

  • RENAME: swipy.doc -> swipy.plx (LaTeX manual source) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Package table

[May 10 2026]

  • RENAME: table.doc -> table.plx (LaTeX manual source) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Package tipc

[May 10 2026]

  • RENAME: tipc.doc -> tipc.plx (LaTeX manual source) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Package utf8proc

[May 10 2026]

  • RENAME: utf8proc.doc -> utf8proc.plx (LaTeX manual source) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
  • PORT: Fix detection and avoid dependency on ssize_t

[May 4 2026]

  • FIXED: Safe unification for unicode enum values.

[Apr 30 2026]

  • ASAN: Add obsolete boundclasses The change is needed to establish the 1:1 correspondence to the boundclasses in the original utf8proc.h. Prevents an ASAN error on my system.

[May 1 2026]

  • TEST: Migrate normalize-hook tests to the unicode_atoms vocabulary Rename the suite from utf8proc_normalize_hook to utf8proc_unicode_atoms and reshape its tests to use the multi-valued unicode_atoms policy that replaces the boolean unicode_normalize flag and normalize(true) read_term option. Cover all four modes (accept / nfc / error / reject), stream_property/2 read-back, the flag-driven default, the writeq combining-mark force-quoting (independent of this library), and Trojan-source bidi-override rejection in unquoted atoms, quoted atoms, and via \u escape bypass.
  • DOC: category(Cat) using Upper-lower

[Apr 30 2026]

  • ENHANCED: Register utf8proc as the kernel's Unicode NFC hook Loading library(unicode) now installs an NFC normaliser into the SWI-Prolog kernel via PL_atom_normalize_hook(), enabling the normalize(true) option of read_term/2,3 and read_clause/2,3 and the default value of the unicode_normalize Prolog flag. The callback uses utf8proc_map with STABLE|COMPOSE in place: NFC is always shorter than or equal to the input, so the caller's buffer suffices and there is no malloc/free dance. Also extend test_utf8proc.pl with a new utf8proc_normalize_hook suite covering the round-trip, the per-call option vs. flag-default override, the quoted-atom isolation rule, and the writer's combining-mark force-quoting.
  • DOC: Cross-reference unicode_syntax_version Prolog flag Note in unicode_version/1 that the Unicode version reported by the library may differ from the version of the SWI-Prolog source syntax classifier exposed via the new read-only Prolog flag unicode_syntax_version.

[Apr 23 2026]

  • SANDBOX: Declare the Unicode API as safe.

Package xpce

[May 10 2026]

  • RENAME: xpce LaTeX manual sources .doc -> .plx Renames the 63 LaTeX .doc files under man/{course,interface,userguide} and prolog/lib/{draw,trace/doc} to .plx, leaving the XPCE man-card binary database under man/reference/ untouched. Updates man/userguide/Makefile suffix rule. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

[May 8 2026]

  • FIXED: Thread monitor icons Reported by Mike Elston.

[May 6 2026]

  • MODIFIED: auto-copy behaviour Added the class variable auto_copy to class terminal_image as well. It now appears on text_item, editor and terminal_image. It now defaults to @off for Windows and MacOS and for now to @on for other platforms. To switch, use this in the preferences file (or @off to disable on Linux, etc.).
  • ENHANCED: SDL backend Windows font specs cover Thai and Yi Add Leelawadee UI (Thai/Lao, ships since Windows 8) and Microsoft Yi Baiti (Yi syllables, ships since Vista) to the mono, sans and serif fallback chains.
  • FIXED: PceEmacs class menu to edit a class raised a type error.
  • FIXED: XPCE native finder icons.
  • ENHANCED: SDL backend Windows font specs include CJK and symbol fallbacks Pango uses the comma-separated family list as an explicit fallback chain; it does not synthesise script-based fallbacks the way Windows DirectWrite does for Notepad. The previous mono/sans/serif specs covered only Latin, so Korean (Hangul), Japanese, Chinese and many symbol/emoji glyphs rendered as tofu on Windows.
  • ENHANCED: display: expose SDL on-screen-keyboard control Add three methods on class display so the Prolog side can both control and observe SDL's on-screen-keyboard policy without recompiling for DEBUG output:
  • ENHANCED: font <-domain unions Pango fontset coverage Drop the vestigial X11-era which argument (the row/column selector of 2D charset fonts) and replace it with family=[bool], mirroring the new font ->member argument. The default takes the union of PangoCoverage across the requested font and its fallback chain, so the returned envelope is consistent with `->member`: a code point outside the domain reliably fails `->member`, while emoji and other fallback-covered ranges are now inside it. The full-unicode scan is cached on the WsFont, so the cost is paid once per font.
  • ENHANCED: font ->member walks Pango fallback chain font ->member: char consulted only the primary PangoFont, so it returned @off for code points (e.g. U+1F600) that the system can in fact render via Pango fallback. Walk a PangoFontset for the font's description by default, and add an optional family=[bool] argument so callers can force the old "primary font only" check.

[May 5 2026]

  • DEBUG: optional terminal trace under -DXPCE_TERM_TRACE Build with -DXPCE_TERM_TRACE to compile in caret / byte-stream diagnostic logging. At runtime set XPCE_TERM_TRACE=<file> (or "1" for xpce-terminal.log in CWD) to capture activity. Inert and zero-cost without the macro: tlog collapses to ((void)0).
  • FIXED: str_ring_alloc no longer marks the result s_readonly s_readonly was an overload: most call sites (str_set_static, staticCtoString, the "(nil)" sentinel) used it to mean "permanent, shareable storage" and rely on initialiseCharArray's fast-path to share the pointer. str_ring_alloc, in contrast, hands out a temporary slot that the next 16 ring allocations recycle — sharing that pointer is fatal.
  • TEST: SMP code-point fragment positions and round-trip test_fragment_smp.pl exercises text_buffer indexing and fragment start/length values on supplementary-plane code points. 10 tests covering buffer size in code points, get(_,character,Pos,_) at and after an SMP slot, fragment start/length pointing past or spanning an SMP code point (emoji and CJK Ext B), shift_fragments accounting on insert before/across an SMP, a 10-cluster stress, and save/load round-trip through the textbuffer file format.
  • CLEANUP: drop now-dead surrogate-aware paths - u16_range_length: the SIZEOF_WCHAR_T == 2 branch returned len unchanged because Windows used to store UTF-16 in the buffer, so a range of N code units held N UTF-16 units. charW is now uint32_t on Windows too; each SMP code point in the range needs to count as 2 UTF-16 units regardless of platform.

[May 4 2026]

  • FIXED: replace wint_t with uchar_t (or int) for code-point holders MinGW's wint_t is unsigned short (16 bits), so any code-point value flowing through a wint_t variable on Windows truncates SMP code points to their low 16 bits. This was the actual cause of emoji rendering as PUA glyphs (U+1F30F shown as U+F30F): not the storage type, but the many `wint_t c = str_fetch(...)` and similar declarations between the buffer and the rendering call.
  • FIXED: c_width takes uchar_t (full code point) not wint_t MinGW's wint_t is unsigned short (16 bits), so the (wint_t) casts at the textimage do_fill_line call sites truncated the supplementary- plane code points stored in tc->value.c (now charW = uint32_t on Windows). The result: emoji rendered as their PUA-area low 16 bits (e.g. U+1F30F shown as U+F30F) and produced .notdef glyphs from Pango.
  • FIXED: F_UTF8_ENCLENW iterates code points, not wchar_t units The previous loop encoded each wchar_t as a separate UTF-8 sequence. On 16-bit wchar_t platforms (Windows) a supplementary-plane code point arrives as a UTF-16 surrogate pair, so each half emitted its own 3-byte UTF-8 sequence (and the surrogate's encoding form is invalid UTF-8 anyway). Use get_wchar() to combine pairs first; on 32-bit wchar_t platforms get_wchar is a per-element copy.
  • ENHANCED: regex CHR width matches charW unconditionally Previously CHRBITS / CHR_MAX were 16 / 0xffff under WINDOWS and 32 / 0x3fffffff elsewhere, because chr typedef'd to charW (=wchar_t) which is 16-bit on MinGW. charW is now uint32_t on platforms where wchar_t is too narrow (Windows), so the regex chr can hold the full Unicode code-point range there too. The colormap is a sparse trie (NBYTS = 4 levels), so the larger CHR_MAX does not bloat memory.
  • FIXED: ENC_WCHAR stream read paths emit wchar_t, not charW Sread_object (iostream.c) and pceRead's wide branch (asfile.c) wrote the internal charW buffer into the stream's wchar_t-typed output buf and reported the byte count as wchar_t-units. When charW is wider than wchar_t (Windows post-flip) that wrote 4-byte values into a 2-byte-per-unit buffer. Convert at the boundary: assign wchar_t per charA-source char; surrogate-encode via charW_to_wchar for charW sources. When the typedef equality holds (Linux always; Windows when charW falls back to wchar_t) the conversion folds to a per-element copy.
  • FIXED: route wchar_t<->charW boundary conversions through charW.h charW is now uint32_t when wchar_t is too narrow (Windows). Adjust the few sites where wchar_t (caller) meets charW (storage) so neither side silently truncates supplementary-plane code points:
  • ENHANCED: charW falls back to uint32_t when wchar_t is too narrow On Windows wchar_t is 16-bit UTF-16; storing a supplementary-plane code point as a surrogate pair across two slots is the source of the fragment-indexing bug fixed in subsequent commits. Pick uint32_t for charW when WCHAR_MAX <= 0xFFFF, and otherwise keep charW as wchar_t — on Linux/macOS the typedef stays exactly as it was, so the pair of boundary conversions added alongside fold to compile-time identity and the binary is unchanged.
  • ADDED: charW.h boundary helpers between wchar_t and internal charW Adds wchar_to_charW / charW_to_wchar inlines (and length variants). Used at the few external boundaries where xpce sees wchar_t buffers from outside (host embedding API, ENC_WCHAR streams, Win32 file APIs). Internal text-handling will keep its charW typing.
  • FIXED: c_width returns reasonable width for UTF-16 surrogate halves On Windows (wchar_t==16) supplementary-plane code points reach c_width as surrogate halves. Encoding a lone surrogate as UTF-8 and asking Pango for its advance returned the .notdef-glyph width — typically much wider than the actual emoji or wide glyph the paired UTF-8 renders. The accumulated x in do_fill_line then over-advanced past where the glyph was painted, leaving a multi-cell whitespace gap to the right of each emoji. Account the cluster's width once on the lead (two average char widths, matching uchar_display_width's column count of 2 for SMP wide glyphs) and let the trail contribute zero.
  • ENHANCED: uchar_display_width recognises UTF-16 surrogate halves On Windows (wchar_t==16) supplementary-plane code points appear as surrogate pairs in xpce's wide-char buffers. Treat the lead as double-width (the SMP ranges xpce typically encounters — emoji, CJK Extensions B-G — are all wide) and the trail as zero-width so each pair contributes exactly two columns to vcol, the per-cluster painter groups the pair as one wide cluster, and grapheme_cluster_end / _start advance by exactly one emoji per arrow press.
  • FIXED: textbuffer file load/save handles UTF-16 surrogate pairs On Windows (wchar_t==16) a supplementary-plane code point is stored as a UTF-16 surrogate pair in the wide-char buffer. insert_file_textbuffer truncated SMP code points by direct assignment; save_textbuffer fed each surrogate half to Sputcode separately, producing invalid UTF-8. Use put_wchar / get_wchar so loaded files contain proper pairs and saved files emit the combined code point. No-op on Linux (wchar_t==32).

[May 3 2026]

  • ENHANCED: route THasSyntaxEx fallbacks through the host Add five callback slots to pce_callback_functions — is_letter, is_word_char, is_layout, is_digit, is_endsline — named after xpce's own syntax-table categories. Returning bool, since they are pure predicates. Host wrappers in itf/interface.c; swipl/interface.c maps each to one or more PL_is_* shims (and hard-codes is_endsline's seven line-terminator codepoints, which need no Prolog API of their own).
  • FIXED: terminal paint_chunks splits per cluster for non-ASCII text When the system fixed-width font has no glyphs for a script (e.g. Thai), Pango falls back to a proportional font and shapes a multi-char run at its natural advance. A line painted as one Pango call rendered fine, but with a selection — three chunks (before-sel / sel / after-sel) — each chunk reshaped independently and the after-sel half drifted right because it started at a column-grid x rather than where the unselected line had placed those glyphs.

[May 2 2026]

  • ENHANCED: route uchar_display_width()'s wcwidth fallback through the host Add a wcwidth slot to pce_callback_functions (replacing pad17), expose hostWcWidth() in itf/interface.c, and let h/charwidth.h's static inline delegate to it instead of calling system wcwidth(3) directly. The swipl-side callback registers PL_wcwidth(), so xpce now uses the same implementation as pl-read.c / pl-write.c regardless of the process LC_CTYPE.
  • FIXED: convert sel_*_char from cell index to visual column when painting In rlc_redraw the selection bounds (sel_start_char / sel_end_char) are cell indices — they index tl->text[] in rlc_set_selection's snap helpers and in rlc_read_from_window, and they're already documented that way at the call sites of rlc_snap_start / rlc_snap_end. rlc_paint_text in contrast takes visual-column bounds (see its own header comment). On a line without combining marks the two are interchangeable, so this never showed up before; on NFD content (e.g. Thai with U+0E31, U+0E35) the cell index drifts ahead of the visual column by one per combining mark, and the painted selection extends past where the user dragged.

[Apr 28 2026]

  • FIXED: forward/backward char and delete move by grapheme cluster not code point For NFD text a visible character may span multiple code points (a base followed by one or more combining marks). The four basic editing operations previously stepped one code point at a time, leaving the caret stranded inside a cluster or deleting only part of a grapheme.
  • TEST: unicode_heavy.pl — Prolog test file for NFD and wide-char rendering Add tests/unicode_heavy.pl with syntactically valid Prolog that stresses editor and terminal rendering of:
  • ADDED: nfd_style instance variable for visual NFD cluster highlighting Add an nfd_style instance variable (Style*) to both Editor and TerminalImage, mirroring the existing selection_style pattern. When set to a style with a background colour, every NFD grapheme cluster — a base character followed by one or more zero-width combining marks — is drawn with that background. The default is @nil (disabled), so existing editors are unaffected.

[Apr 27 2026]

  • FIXED: paint_line renders Unicode highlights without visual drift Four related fixes across txt/textimage.c and sdl/sdldraw.c:
  • FIXED: Snap selection endpoints to grapheme cluster boundaries selectionExtendEditor computes the selection from/to from the mouse click position. If that position falls inside a combining-mark cluster (e.g. a Thai vowel sign such as ◌ั following its base consonant), the raw character index points mid-cluster: the base character's x position is shared by the following zero-advance combiners, so selecting or deselecting just the combiner produces a mis-sized highlight.
  • FIXED: Selection drag corrupts text at grapheme cluster boundaries When a mouse selection boundary fell inside a grapheme cluster (a base character followed by one or more zero-advance combining marks, such as the Thai above-base vowel signs ั ี ่ ้ in สวัสดีชาวโลก), the paint_line() run-grouping loop split the cluster across two separate s_printW() calls. Pango then rendered base and combining mark as independent glyphs, causing visual corruption visible whenever the TXT_HIGHLIGHTED attribute changed between the base and its mark.
  • TEST: Add unicode_heavy.pl — public domain Unicode stress test Valid Prolog file using NFD combining marks (acute, diaeresis, tilde, cedilla, circumflex) as literal UTF-8 bytes alongside East-Asian double-width characters (CJK ideographs, hiragana, katakana, hangul).
  • TEST: Add test_editor_unicode.pl for Unicode visual-column tracking Five PLUnit tests exercise the new Editor <-visual_column getter:
  • FIXED: editor column functions account for CJK wide and combining chars getColumnEditor() and getColumnLocationEditor() previously counted every non-tab character as one visual column. This broke caret positioning after CJK double-width characters (each counts as 2) and after combining marks (each counts as 0).
  • ENHANCED: Add vcol visual-column cache to TextChar; populate in do_fill_line() struct text_char gains a `short vcol` field that records the 0-based visual column of each character on a displayed line. do_fill_line() fills it alongside the existing pixel-x field, advancing current_vcol by uchar_display_width() for each character (0 for combining marks, 2 for CJK wide chars, 1 for everything else).
  • REFACTOR: Extract uchar_display_width() into shared h/charwidth.h Move the Unicode display-width logic (combining=0, CJK wide=2, other=1) from a static function in txt/terminal.c into a new static-inline header h/charwidth.h. This makes the same function available to txt/textimage.c and txt/editor.c without code duplication.

[Apr 26 2026]

  • FIXED: Both set primary and secoodary selection in display->copy

Package yaml

[May 10 2026]

  • RENAME: yaml.doc -> yaml.plx (LaTeX manual source) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Package zlib

[May 10 2026]

  • RENAME: zlib.doc -> zlib.plx (LaTeX manual source) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>