must_be/2

LogicalCaptain said (2021-01-19T14:36:54):

Reviewed list of `must_be∕2` keywords

These expressions absolutely need to become compositional. AND, OR, NOT of boolean algebra would be of some use. Maybe one also add must_be(successful,Goal).

Also needs a special cases for "not empty atom" and "not empty string" etc. or a "not empty" general expression.

Confusingly, in many (most) cases an "instantiation error" is thrown if an unbound variable is passed, instead of a proper error that there is a "must_be problem". This is documented behaviour. That can be improved, it's like throwing a NullPointerException from inside a contract checking method.

As is the custom, ISO Prolog has absolutely no space for a "must_be/2" error (nothing good can be said about such a decision), so must_be/2 throws ISO "type errors" throughout which are unfortunately often just inappropriate (e.g. for "between(L,H)" tests)

--
any                     : any term, including an unbound variable (this reduces to true)
--
atom,symbol             : passes atom/1
atomic,constant         : passes atomic/1
--
oneof(L)                : ground term that is member of L; if there is an unbound variable in L, everything passes!
--
compound                : passes compound/1, the complement of atomic/1 (includes dicts, but don't use that fact as that dict implementation through compounds may change!)
pair                    : a key-value pair or rather a compound term -/2
--
boolean                 : one of the two atoms 'true' or 'false'
--
integer                 : passes integer/1
positive_integer        : integer > 0
negative_integer        : integer < 0
nonneg                  : nonneg *integer*; this should really be called "nonneg_integer"
...corresponding tests >=0, >0 etc for *general* numbers are missing
--
float                   : passes float/1 (in SWI-Prolog, an IEEE 64-bit float)
--
rational                : a non-integer rational or an integer
--
number                  : passes number/1
--
between(FloatL,FloatU)  : if FloatL is float, all other values may be float or integer (FIXME?); the limits are both INCLUSIVE; limits may be equal but NOT reversed
between(IntL,IntU)          : if IntL is integer, all other values must be integer; the limits are both INCLUSIVE; limits may be equal but NOT reversed
                          FIXME: there should be specific between_int/2 and between_float/2 if one goes that way.
--
acyclic                 : passes acyclic_term/1, i.e. is an acyclic term; includes unbound var (the term is a tree if one disregards "internal sharing")
cyclic                  : passes cyclic_term/1, i.e. the term contains cycles (the term is a "rational tree"). Does not accept an unbound variable.
--
char                    : atom of length 1
chars                   : list of 1-character atoms; includes the empty list
code                    : unicode code point (any integer between 0 and 0x10FFFF)
codes                   : list of integers >= 0; includes the empty list
string                  : passes string/1, an SWI-Prolog string
text                    : atom or string or chars or codes (but not numbers even though some predicates "textify" those)
--
var                     : passes var/1; must be an unbound variable
nonvar                  : passes nonvar/1; anything except an unbound variable
--
list,proper_list        : passes is_list/1 and is a proper/closed list; empty list is allowed
list(Type)              : proper list with elements of type Type (must_be/2(Type,_) is called on elements); empty list is allowed;
                          on error the index is not indicated (why~~~). A type like "list(list(integer))" is ok!
list_or_partial_list    : A partial list (one ending in a variable: [x|_]). This includes an unbound variable.
--
callable                : passes callable/1. Relatively usesless, as "callable" is ill-defined. Basically (atom(X);compound(X))
--
dict                    : passes is_dict/1, an SWI-prolog dict (which is just a compound term of a particular form)
--
encoding                : valid name for a character encoding; see current_encoding/1, e.g. utf8 (but not "utf8" or 'utf-8'; also fails for 'iso_8859_1')
--
stream                  : passes is_stream/1, is a stream name or valid stream handle
--
type                    : Meta: Term is a valid type specification for must_be/2. This is done by looking up whether a clause `has_type(Type,_) :- ....` exists.
                          Example: must_be(type,list(integer)). However, "must_be(type,list(grofx))": OK, but "must_be(type,grofx)": NOT OK.

Maybe missing:

must_be(predicate_indicator(Name/Arity),Term)

Definitely missing:

must_be(nonempty_list,Term)

A replacement: check_that/2 and friends

Can be found at this page: homepage for checks.pl

check_that/2 allows you to write things like:

check_that(X,[break(var),hard(integer),soft(pos0int))

which means "succeed immediately if X is var, otherwise throw an exception if X fails at being an integer, otherwise fail if X fails at being a positive-or-0-integer"

Text fixes

We read:

list_or_partial_list: A list or an open list (ending in a variable); see is_list_or_partial_list/1

That predicate does not exist (although the doc generator directs you to error.pl; the predicate is likely not exported)

It only throws "type error"

The note under "throws" says that "type_error(Type, Term)" is thrown. This means that for in-type, but out-of-domain errors, you get an inappropriate type error instead of a domain error. (That's really fine-tuning though):

?- must_be(nonneg, -3).
ERROR: Type error: `nonneg' expected, found `-3' (an integer)

?- must_be(positive_integer, 0).
ERROR: Type error: `positive_integer' expected, found `0' (an integer)

But note that the implementation for length/2 for example correctly throws domain_error for this:

?- length(L,-1).
ERROR: Domain error: `not_less_than_zero' expected, found `-1'

booleans

must_be(boolean,X) accepts the atoms true and false as booleans (both corresponding to predicates true/0 and false/0), but not undefined (which corresponds to undefined/0)

?- must_be(boolean,undefined).
ERROR: Type error: `boolean' expected, found `undefined' (an atom)

Maybe that's a "troolean?"

list_or_partial_list

This predicate considers unbound variables as partial lists. That's ok.

?- must_be(list_or_partial_list,_).
true.

?- must_be(list_or_partial_list,[]).
true.

?- must_be(list_or_partial_list,[x,y|_]).
true.

?- must_be(list_or_partial_list,[x,y]).
true.

When to use must_be?

This predicate should be used exclusively to perform contract checks and assertion checks, i.e. to inspect the current computational state of the running program.

Because it only does two things, Succeed or Throw:

Throw ISO standard instantiation_error if Term is currently a fresh variable (freshvar), aka. "unbound".
Otherwise throw ISO standard type_error if Term is NOT of Type
Otherwise succeed

So it's a tripwire to be used in Design by Contract.

One would use it to check the parameters passed to "API predicates" but generally not among "friendly/helper predicates" (except during development). One may want to rely on other predicates throwing exceptions "late" due to bad arguments instead of check for erroneous arguments "early". But as usual, it depends. Including on your level of paranoia.

Jan Wielemaker writes:

_must_be/2 is not optimized in any way. It is no more and no less than a cheap and explicit way to add type checks intended primarily for public APIs of libraries. Some people tend to stick it in everywhere. That may seriously slow down the code. I almost never use it in cases where built-in predicates do a reasonable job anyway._

Eric Taucher writes:

If the must_be/2 is used during development to keep me from making dumb mistakes like passing an empty variable then I leave them in until I know the predicate is working for the production code then strip them out by hand.
If the must_be/2 is used for code that is checking for resources like a file or database connection that must exist for the code to work then I leave them in, even for the production code.
_However if during development they are acting as a guard and I want the code to halt as soon as the problem is found but then work as guard with a silent fail once it is in production, I use must_be/2 during development and then change them to is_of_type/2 once the code is production quality._
There are more scenarios but you should start to get the idea that the goal is to get code that works correctly in production, what happens in development is not set in stone.
The reverse process also has uses: If production code has a problem or I did not write the code and want to understand it, I bring the code back in for development, change is_of_type/2 to must_be/2, add additional must_be/2 and other such changes.

Disabling must_be calls for production use

Note that one can "disable assertions" in other programming languages, and if they have been used properly, disabling them should not change program semantics.

You can do something similar here, using goal_expansion/2:

Before loading the application:

user:goal_expansion(must_be(_,_), true).

What's missing

The predicate seems to be missing a third argument: any context information which might be of interest to the whoever is handling the exception. A must_be/3 might be nice.

The ISO Standard exception, (which is not particularly flexible) allows an Extra term in the exception term error(type_error(_,_),Extra) or error(instantiation_error,Extra). Could be used here.

There is even use for an argument that indicates whether we want:

An exception on test failure
A failure on test failure

More desirable features:

One might want to have a disjunction and a conjunction inside the must_be/2 Type field
The criterium of "nonempty list", including "nonempty list of some type"
The criterium of "list of 'type or type'"
The criterium of "a yall lambda expression"
The criterium of "a compound of name N and arity A"
The criterium of "matches a certain dict structure". For example: must_be(io_monad{ stream:_, goal:_ },T): "T must be a dict with tag io_monad and at least the keys stream and goal (but maybe more).", although you can test for "dict-ness" using is_dict.

Various type-testing approaches

Finally, comparing various type-testing approaches with a bit of test code

Default approach which fails silently if the answer is "don't know"
"Sufficiently instantiated" approach which throws if the answer is "don't know"
must_be/2 approach which throws unless the answer is "yes, the type matches" (can be used on predicate entry to check whether a contract regarding input arguments is being upheld)
can_be/2 approach which throws only if the answer is "it's never going to be that type" (can be used on predicate entry to check whether a contract regarding output arguments is being upheld: is it possible to bind a term of a certain type to that argument later on?)

Various problems with the whole approach

?- must_be(list(pairs),[]).
true.

The above is an illusion (it should throw), the type expression is not currently checked:

?- must_be(list(pairs),[X-N]).
ERROR: type `pairs' does not exist

Correction:

?- must_be(list(pair),[X-N]).
true.

But:

?- must_be(list(pair),[N]).
ERROR: Arguments are not sufficiently instantiated

How do I express "list of pairs where an unbound variable is also allowed as it could become pair", which is akin "can be list of pairs" but w/o accepting a unbound variable at the list position?

I cannot resort to ";" between must_be/2 calls because must_be throws instead of failing. So one has to do this:

foo(Pairs) :-
   must_be(list,Pairs),                                      % check it's a list
   (maplist([X]>>(var(X);X=_-_),Pairs) -> R=true ; R=false), % reify individual checks
   (\+R -> must_be(list(pair),Pairs) ; true).                % pretend it's "not a list of pairs" if the above yields false

Nearly works, but if the bad list entry comes after an unbound variable list entry, this yields the inappropriate message:

?- foo([X,x-y,hh]).
ERROR: Arguments are not sufficiently instantiated

Much simpler, I want to accept an X that can be an unbound variable or otherwise a number.

I would like to write

foo(N) :-
   must_be((var;number),N),
   bar(N).

Running this yields an error:

?- foo(12).
ERROR: type `var;number' does not exist

One has to write

foo(N) :-
   (var(N) -> true ; must_be(number,N)).

assertion/1 is much more high-level (but maybe slower as it has to call a goal?)

foo(N) :-
   assertion(var(N);number(N)).

But in principle assertion/1 it is not meant to be used for predicate entry checks (OTOH, a rose by any other name etc. etc.)

But note that this works:

foo(N) :-
   must_be(list(list(number)),N).

?- foo([]).
true.

?- foo([[1,2],[4,5]]).
true.

?- foo([[1,2],[4,z]]).
ERROR: Type error: `number' expected, found `z' (an atom)

Yes!

When assertion is just better

This predicate checks that a dict follows certain requirements on the content.

You cannot cleanly code this with must_be/2, it would just decay into a tremendous mess.

assertion_is_vertex(V) :-
  assertion(ground(V)),
  assertion(is_dict(V)),
  assertion((get_dict(name,V,Name),atom(Name),Name\=='')),
  assertion((get_dict(prior,V,Prior),atom(Prior))),
  assertion((get_dict(local_cost,V,LocalCost),number(LocalCost),LocalCost>=0)),
   % note that LocalCost is dropped after the previous call, so we have to retrieve it again below:
  assertion((get_dict(overall_cost,V,OverallCost),get_dict(local_cost,V,LocalCost),number(OverallCost),OverallCost>=LocalCost)),
  assertion((get_dict(state,V,State),atom(State),memberchk(State,[visit_next,probed,visited]))),
  assertion((get_dict(hops,V,Hops),number(Hops),Hops>=0)).

Compare with

Hamcrest matchers, originally from the Java Universe and used primarily in unit tests:

Hamcrest Tutorial

These have the entry point MatcherAssert.assertThat() and take a Matcher Expression to verify. However, but just return false instead of throwing - throwing a java.lang.AssertionError or some other Throwable is left to the caller.

The Hamcrest Matchers for Erlang seem most comparable.

For example:

string_is_only_digits_test() ->
    assert_that("12345", is(only_digits())).

Ouch time

In https://groups.google.com/g/comp.lang.prolog/c/JToeE7Read8 Ulrich Neumerkel writes:

~~~~

Unfortunately, must_be/2 in SWI is itself not declaratively sound, as it provides some highly problematic domains:

?- X = s(X), must_be(cyclic,X).
X = s(X).

?- must_be(cyclic,X).
ERROR: Domain error: `cyclic_term' expected, found `_G11118'

So a generalization of a success case leads to a domain error. That's impossible.

Instead, use the more pedestrian itypes:

http://stackoverflow.com/a/30600104/772868

acyclic	Acyclic term (tree); see acyclic_term/1
any	any term
`between(FloatL,FloatU)`	Number [FloatL..FloatU]
`between(IntL,IntU)`	Integer [IntL..IntU]
boolean	One of `true` or `false`
callable	Atom or compound term
char	Atom of length 1
chars	Proper list of 1-character atoms
code	Representation Unicode code point (0..0x10ffff)
codes	Proper list of Unicode character codes
compound	compound term
`compound(Term)`	Compound with same name/arity as term; checks arguments
constant	Same as `atomic`
cyclic	Cyclic term (rational tree); see cyclic_term/1
dict	A dictionary term; see is_dict/1
encoding	Valid name for a character encoding; see current_encoding/1
list	A (non-open) list; see is_list/1
`list(Type)`	Proper list with elements of `Type`
list_or_partial_list	A list or an open list (ending in a variable); see is_list_or_partial_list/1
negative_integer	Integer `<` 0
nonneg	Integer `>=` 0
`oneof(L)`	Ground term that is member of L
pair	Key-Value pair. Same as `compound(any-any)`
positive_integer	Integer `>` 0
proper_list	Same as list
stream	A stream name or valid stream handle; see is_stream/1
symbol	Same as `atom`
text	One of `atom`, `string`, `chars` or `codes`
type	`Term` is a valid type specification

See also