Prolog defines two forms of quoted text. Traditionally, single quoted
text is mapped to atoms while double quoted text is mapped to a list of
character codes (integers) or characters (atoms of length 1).
Representing text using atoms is often considered inadequate for several
reasons:
- It hides the conceptual difference between text and program symbols.
Where content of text often matters because it is used in I/O, program
symbols are merely identifiers that match with the same symbol
elsewhere. Program symbols can often be consistently replaced, for
example to obfuscate or compact a program.
- Atoms are globally unique identifiers. They are stored in a shared
table. Volatile strings represented as atoms come at a significant price
due to the required cooperation between threads for creating atoms.
Reclaiming temporary atoms using Atom garbage collection is a
costly process that requires significant synchronisation.
- Many Prolog systems (not SWI-Prolog) put severe restrictions on the
length of atoms or the maximum number of atoms.
Representing text as lists, be it of character codes or characters,
also comes at a price:
- It is not possible to distinguish (at runtime) a list of integers or
atoms from a string. Sometimes this information can be derived from
(implicit) typing. In other cases the list must be embedded in a
compound term to distinguish the two types. For example,
s("hello world")
could be used to indicate that we are dealing with a string.
Lacking runtime information, debuggers and the toplevel can only use
heuristics to decide whether to print a list of integers as such or as a
string (see portray_text/1).
While experienced Prolog programmers have learned to cope with this,
we still consider this an unfortunate situation.
- Lists are expensive structures, taking 2 cells per character (3 for
SWI-Prolog in its current form). This stresses memory consumption on the
stacks while pushing them on the stack and dealing with them during
garbage collection is unnecessarily expensive.