Perfection is attained
not when there is nothing left to add
but when there is nothing left to take away
(Antoine de Saint-Exupéry)
(c) Software Lab. Alexander Burger
This document describes the concepts, data types, and kernel functions of the PicoLisp system.
This is not a Lisp tutorial. For an introduction to Lisp, a traditional Lisp book like "Lisp" by Winston/Horn (Addison-Wesley 1981) is recommended. Note, however, that there are significant differences between PicoLisp and Maclisp (and even greater differences to Common Lisp).
Please take a look at the PicoLisp Tutorial for an explanation of some aspects of PicoLisp, and scan through the list of Frequently Asked Questions (FAQ).
PicoLisp is the result of a language design study, trying to answer the question "What is a minimal but useful architecture for a virtual machine?". Because opinions differ about what is meant by "minimal" and "useful", there are many answers to that question, and people might consider other solutions more "minimal" or more "useful". But from a practical point of view, PicoLisp has proven to be a valuable answer to that question.
First of all, PicoLisp is a virtual machine architecture, and then a programming language. It was designed in a "bottom up" way, and "bottom up" is also the most natural way to understand and to use it: Form Follows Function.
PicoLisp has been used in several commercial and research programming projects since 1988. Its internal structures are simple enough, allowing an experienced programmer always to fully understand what's going on under the hood, and its language features, efficiency and extensibility make it suitable for almost any practical programming task.
In a nutshell, emphasis was put on four design objectives. The PicoLisp system should be
An important point in the PicoLisp philosophy is the knowledge about the architecture and data structures of the internal machinery. The high-level constructs of the programming language directly map to that machinery, making the whole system both understandable and predictable.
This is similar to assembly language programming, where the programmer has complete control over the machine.
The PicoLisp virtual machine is both simpler and more powerful than most current (hardware) processors. At the lowest level, it is constructed from a single data structure called "cell":
         +-----+-----+
         | CAR | CDR |
         +-----+-----+
A cell is a pair of machine words, which traditionally are called CAR and CDR in the Lisp terminology. These words can represent either a numeric value (scalar) or the address of another cell (pointer). All higher level data structures are built out of cells.
The type information of higher level data is contained in the pointers to these data. Assuming the implementation on a byte-addressed physical machine and a pointer size of typically 8 bytes, each cell has a size of 16 bytes. Therefore, the pointer to a cell must point to a 16-byte boundary (a number which is a multiple of 16), and its bit-representation will look like:
      xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx0000
(the 'x' means "don't care"). For the individual data types, the
pointer is adjusted to point to other parts of a cell, in effect setting some of
the lower three bits to non-zero values. These bits are then used by the
interpreter to determine the data type.
In any case, bit(0) - the least significant of these bits - is reserved as a mark bit for garbage collection.
Initially, all cells in the memory are unused (free), and linked together to form a "free list". To create higher level data types at runtime, cells are taken from that free list, and returned by the garbage collector when they are no longer needed. All memory management is done via that free list; there are no additional buffers, string spaces or special memory areas, with two exceptions:
On the virtual machine level, PicoLisp supports
NIL.
They are all built from the single cell data structure, and all runtime data cannot consist of any other types than these three.
The following diagram shows the complete data type hierarchy, consisting of the three base types and the symbol variations:
                       cell
                        |
            +-----------+-----------+
            |           |           |
         Number       Symbol       Pair
                        |
                        |
   +--------+-----------+-----------+
   |        |           |           |
  NIL   Internal    Transient    External
A number can represent a signed integral value of arbitrary size. Internally, numeric values of up to 60 bits are stored in "short" numbers,
      xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxS010
Numbers larger than that are "big" numbers, stored in heap cells. The CARs of one or more cells hold the number's "digits" (64 bits each), with the least significant digit first, while the CDRs point to the remaining digits.
         Bignum
         |
         V
      +-----+-----+
      | DIG |  |  |
      +-----+--+--+
               |
               V
            +-----+-----+
            | DIG |  |  |
            +-----+--+--+
                     |
                     V
                  +-----+-----+
                  | DIG | CNT |
                  +-----+-----+
The pointer to a big number points into the middle of the CAR, with an offset of 4 from the cell's start address, and the sign bit in bit(3):
      xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxS100
Thus, a number is recognized by the interpreter when either bit(1) is non-zero (a short number) or bit(2) is non-zero (a big number).
A symbol is more complex than a number. Each symbol has a value, and optionally a name and an arbitrary number of properties. The CDR of a symbol cell is also called VAL, and the CAR points to the symbol's tail. As a minimum, a symbol consists of a single cell, and has no name or properties:
            Symbol
            |
            V
      +-----+-----+
      |  /  | VAL |
      +-----+-----+
That is, the symbol's tail is empty (points to NIL, as indicated
by the '/' character).
The pointer to a symbol points to the CDR of the cell, with an offset of 8 bytes from the cell's start address. Therefore, the bit pattern of a symbol will be:
      xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx1000
Thus, a symbol is recognized by the interpreter when bit(3) is non-zero.
A property is a key-value pair, represented by a cons pair in the symbol's
tail. This is called a "property list". The property list may be terminated by a
number (short or big) representing the symbol's name. In the following example,
a symbol with the name "abcdefghijklmno" has three properties: A
KEY/VAL pair, a cell with only a KEY, and another KEY/VAL pair.
            Symbol
            |
            V
      +-----+-----+                                +----------+---------+
      |  |  | VAL |                                |'hgfedcba'|'onmlkji'|
      +--+--+-----+                                +----------+---------+
         | tail                                       ^
         |                                            |
         V                                            | name
         +-----+-----+     +-----+-----+     +-----+--+--+
         |  |  |  ---+---> | KEY |  ---+---> |  |  |  |  |
         +--+--+-----+     +-----+-----+     +--+--+-----+
            |                                   |
            V                                   V
            +-----+-----+                       +-----+-----+
            | VAL | KEY |                       | VAL | KEY |
            +-----+-----+                       +-----+-----+
Each property in a symbol's tail is either a symbol (like the single KEY
above, then it represents the boolean value T), or a cons pair with
the property key in its CDR and the property value in its CAR. In both cases,
the key should be a symbol, because searches in the property list are performed
using pointer comparisons.
The name of a symbol is stored as a number at the end of the tail. It contains the characters of the name in UTF-8 encoding, using between one and seven bytes in a short number, or eight bytes in a bignum cell. The first byte of the first character, for example, is stored in the lowest 8 bits of the number.
All symbols have the above structure, but depending on scope and
accessibility there are actually four types of symbols: NIL, internal, transient and external symbols.
NIL is a special symbol which exists exactly once in the whole
system. It is used
For that, NIL has a special structure:
      NIL:  /
            |
            V
      +-----+-----+-----+-----+
      |'LIN'|  /  |  /  |  /  |
      +-----+--+--+-----+-----+
The reason for that structure is NIL's dual nature both as a
symbol and as a list:
NIL for its VAL, and be without
properties
NIL should give NIL both for
its CAR and for its CDR
These requirements are fulfilled by the above structure.
Internal Symbols are all those "normal" symbols, as they are used for function definitions and variable names. They are "interned" into an index structure, so that it is possible to find an internal symbol by searching for its name.
There cannot be two different internal symbols with the same name.
Initially, a new internal symbol's VAL is NIL.
Transient symbols are only interned into a index structure for a certain time (e.g. while reading the current source file), and are released after that. That means, a transient symbol cannot be accessed then by its name, and there may be several transient symbols in the system having the same name.
Transient symbols are used
static identifiers in the C language family)
Initially, a new transient symbol's VAL is that symbol itself.
A transient symbol without a name can be created with the box or new functions.
External symbols reside in a database file (or a similar resources, see
*Ext), and are loaded into memory -
and written back to the file - dynamically as needed, and transparently to the
programmer. They are kept in memory ("cached") as long as they are accessible
("referred to") from other parts of the program, or when they were modified but
not yet written to the database file (by commit).
The interpreter recognizes external symbols internally by an additional tag bit in the tail structure.
There cannot be two different external symbols with the same name. External symbols are maintained in index structures while they are loaded into memory, and have their external location (disk file and block offset) directly coded into their names (more details here).
Initially, a new external symbol's VAL is NIL, unless otherwise
specified at creation time.
A list is a sequence of one or more cells (cons pairs), holding numbers, symbols, or cons pairs.
      |
      V
      +-----+-----+
      | any |  |  |
      +-----+--+--+
               |
               V
               +-----+-----+
               | any |  |  |
               +-----+--+--+
                        |
                        V
                        ...
Lists are used in PicoLisp to emulate composite data structures like arrays, trees, stacks or queues.
In contrast to lists, numbers and symbols are collectively called "Atoms".
Typically, the CDR of each cell in a list points to the following cell,
except for the last cell which points to NIL. If, however, the CDR of
the last cell points to an atom, that cell is called a "dotted pair" (because of
its I/O syntax with a dot '.' between the two values).
The PicoLisp interpreter has complete knowledge of all data in the system, due to the type information associated with every pointer. Therefore, an efficient garbage collector mechanism can easily be implemented. PicoLisp employs a simple but fast mark-and-sweep garbage collector.
As the collection process is very fast (in the order of milliseconds per megabyte), it was not necessary to develop more complicated, time-consuming and error-prone garbage collection algorithms (e.g. incremental collection). A compacting garbage collector is also not necessary, because the single cell data type cannot cause heap fragmentation.
Lisp was chosen as the programming language, because of its clear and simple structure.
In some previous versions, a Forth-like syntax was also implemented on top of a similar virtual machine (Lifo). Though that language was more flexible and expressive, the traditional Lisp syntax proved easier to handle, and the virtual machine can be kept considerably simpler. PicoLisp inherits the major advantages of classical Lisp systems like
In the following, some concepts and peculiarities of the PicoLisp language and environment are described.
PicoLisp supports two installation strategies: Local and Global.
Normally, if you didn't build PicoLisp yourself but installed it with your operating system's package manager, you will have a global installation. This allows system-wide access to the executable and library/documentation files.
To get a local installation, you can directly download the PicoLisp tarball, and follow the instructions in the INSTALL file.
A local installation will not interfere in any way with the world outside its directory. There is no need to touch any system locations, and you don't have to be root to install it. Many different versions - or local modifications - of PicoLisp can co-exist on a single machine.
Note that you are still free to have local installations along with a global installation, and invoke them explicitly as desired.
Most examples in the following apply to a global installation.
When PicoLisp is invoked from the command line, an arbitrary number of arguments may follow the command name.
By default, each argument is the name of a file to be executed by the
interpreter. If, however, the argument's first character is a hyphen
'-', then the rest of that argument is taken as a Lisp function
call (without the surrounding parentheses), and a hyphen by itself as an
argument stops evaluation of the rest of the command line (it may be processed
later using the argv and opt functions). This whole mechanism corresponds
to calling (load T).
A special case is if the last argument is a single '+'. This
will switch on debug mode (the *Dbg
global variable) and discard the '+'.
As a convention, PicoLisp source files have the extension ".l".
Note that the PicoLisp executable itself does not expect or accept any
command line flags or options (except the '+', see above). They are
reserved for application programs.
The simplest and shortest invocation of PicoLisp does nothing, and exits
immediately by calling bye:
$ picolisp -bye
$
In interactive mode, the PicoLisp interpreter (see load) will also exit when Ctrl-D
is entered:
$ picolisp
: $                     # Typed Ctrl-D
To start up the standard PicoLisp environment, several files should be loaded. The most commonly used things are in "lib.l" and in a bunch of other files, which are in turn loaded by "ext.l". Thus, a typical call would be:
$ picolisp lib.l ext.l
The recommended way, however, is to call the "pil" shell script, which
includes "lib.l" and "ext.l". Given that your current project is loaded by some
file "myProject.l" and your startup function is main, your
invocation would look like:
$ pil myProject.l -main
For interactive development it is recommended to enable debugging mode, to get the vi-style line editor, single-stepping, tracing and other debugging utilities.
$ pil myProject.l -main +
This is - in a local installation - equivalent to
$ ./pil myProject.l -main +
In any case, the directory part of the first file name supplied (normally,
the path to "lib.l" as called by 'pil') is remembered internally as the
PicoLisp Home Directory. This path is later automatically substituted for
any leading "@" character in file name arguments to I/O functions
(see path).
Instead of the default vi-style line editor, an emacs-style editor can be
used. It can be switched on permanently by calling the function
(em) (i.e. without arguments), or by passing -em on
the command line:
$ pil -em +
:
A single call is enough, because the style will be remembered in a file "~/.pil/editor", and used in all subsequent PicoLisp sessions.
To switch back to 'vi' style, call (vi), use the
-vi command line option, or simply remove "~/.pil/editor".
In Lisp, each internal data structure has a well-defined external representation in human-readable format. All kinds of data can be written to a file, and restored later to their original form by reading that file.
In normal operation, the PicoLisp interpreter continuously executes an infinite "read-eval-print loop". It reads one expression at a time, evaluates it, and prints the result to the console. Any input into the system, like data structures and function definitions, is done in a consistent way no matter whether it is entered at the console or read from a file.
Comments can be embedded in the input stream with the hash #
character. Everything up to the end of that line will be ignored by the reader.
: (* 1 2 3)  # This is a comment
-> 6
A comment spanning several lines may be enclosed between #{ and
}#.
Here is the I/O syntax for the individual PicoLisp data types (numbers, symbols and lists) and for read-macros:
A number consists of an arbitrary number of digits ('0' through
'9'), optionally preceded by a sign character ('+' or
'-'). Legal number input is:
: 7
-> 7
: -12345678901245678901234567890
-> -12345678901245678901234567890
Fixpoint numbers can be input by embedding a decimal point '.',
and setting the global variable *Scl
appropriately:
: *Scl
-> 0
: 123.45
-> 123
: 456.78
-> 457
: (setq *Scl 3)
-> 3
: 123.45
-> 123450
: 456.78
-> 456780
Thus, fixpoint input simply scales the number to an integer value
corresponding to the number of digits in *Scl.
Formatted output of scaled fixpoint values can be done with the format and round functions:
: (format 1234567890 2)
-> "12345678.90"
: (format 1234567890 2 "." ",")
-> "12,345,678.90"
The reader is able to recognize the individual symbol types from their syntactic form. A symbol name should - of course - not look like a legal number (see above).
In general, symbol names are case-sensitive. car is not the same
as CAR.
Besides for standard normal form, NIL is also recognized as
(), [] or "".
: NIL
-> NIL
: ()
-> NIL
: ""
-> NIL
Output will always appear as NIL.
Internal symbol names can consist of any printable (non-whitespace) character, except for the following meta characters:
   "  '  (  )  ,  [  ]  `  ~ { }
It is possible, though, to include these special characters into symbol names
by escaping them with a backslash '\'.
The dot '.' has a dual nature. It is a meta character when
standing alone, denoting a dotted pair, but can otherwise
be used in symbol names.
As a rule, anything not recognized by the reader as another data type will be returned as an internal symbol.
A transient symbol is anything surrounded by double quotes '"'.
With that, it looks - and can be used - like a string constant in other
languages. However, it is a real symbol, and may be assigned a value or a
function definition, and properties.
Initially, a transient symbol's value is that symbol itself, so that it does not need to be quoted for evaluation:
: "This is a string"
-> "This is a string"
However, care must be taken when assigning a value to a transient symbol. This may cause unexpected behavior:
: (setq "This is a string" 12345)
-> 12345
: "This is a string"
-> 12345
The name of a transient symbol can contain any character except the
null-byte. Control characters can be written with a preceding hat
'^' character. A hat or a double quote character can be escaped
with a backslash '\', and a backslash itself has to be escaped with
another backslash.
: "We^Ird\\Str\"ing"
-> "We^Ird\\Str\"ing"
: (chop @)
-> ("W" "e" "^I" "r" "d" "\\" "S" "t" "r" "\"" "i" "n" "g")
The combination of a backslash followed by 'n', 'r' or 't' is replaced with newline ("^J"), return ("^M") or TAB ("^I"), respectively.
: "abc\tdef\r"
-> "abc^Idef^M"
A decimal number between two backslashes can be used to specify any unicode character directly.
: "äöü\8364\xyz"
-> "äöü€xyz"
A backslash in a transient symbol name at the end of a line discards the newline, and continues the name in the next line. In that case, all leading spaces and tabs in that line are discarded, to allow proper source code indentation.
: "abc\
   def"
-> "abcdef"
: "x \
   y \
   z"
-> "x y z"
The index for transient symbols is cleared automatically before and after
loading a source file, or it can be
reset explicitly with the ====
function. With that mechanism, it is possible to create symbols with a local
access scope, not accessible from other parts of the program.
A special case of transient symbols are anonymous symbols. These are
symbols without name (see box, box? or new). They print as a dollar sign
($) followed by a decimal digit string (actually their machine
address).
External symbol names are surrounded by braces ('{' and
'}'). The characters of the symbol's name itself identify the
physical location of the external object. This is
0' through '9',
':', ';', 'A' through 'Z'
and 'a' through 'z').
@' is zero,
'A' is 1 and 'O' is 15 (from "alpha" to "omega")),
immediately followed (without a hyphen) the starting block in octal
('0' through '7').
In both cases, the database file (and possibly the hypen) are omitted for the first (default) file.
Lists are surrounded by parentheses ('(' and ')').
(A) is a list consisting of a single cell, with the symbol
A in its CAR, and NIL in its CDR.
(A B C) is a list consisting of three cells, with the symbols
A, B and C respectively in their CAR, and
NIL in the last cell's CDR.
(A . B) is a "dotted pair", a list
consisting of a single cell, with the symbol A in its CAR, and
B in its CDR.
PicoLisp has built-in support for reading and printing simple circular lists. If the dot in a dotted-pair notation is immediately followed by a closing parenthesis, it indicates that the CDR of the last cell points back to the beginning of that list.
: (let L '(a b c) (conc L L))
-> (a b c .)
: (cdr '(a b c .))
-> (b c a .)
: (cddddr '(a b c .))
-> (b c a .)
A similar result can be achieved with the function circ. Such lists must be used with care,
because many functions won't terminate or will crash when given such a list.
Read-macros in PicoLisp are special forms that are recognized by the reader,
and modify its behavior. Note that they take effect immediately while reading an expression, and are not seen by the
eval in the main loop.
The most prominent read-macro in Lisp is the single quote character
"'", which expands to a call of the quote function. Note that the single quote
character is also printed instead of the full function name.
: '(a b c)
-> (a b c)
: '(quote . a)
-> 'a
: (cons 'quote 'a)   # (quote . a)
-> 'a
: (list 'quote 'a)   # (quote a)
-> '(a)
A comma (,) will cause the reader to collect the following data
item into an idx tree in the global
variable *Uni, and to return a
previously inserted equal item if present. This makes it possible to create a
unique list of references to data which do normally not follow the rules of
pointer equality. If the value of *Uni is T, the
comma read macro mechanism is disabled.
A single backquote character "`" will cause the reader to
evaluate the following expression, and return the result.
: '(a `(+ 1 2 3) z)
-> (a 6 z)
A tilde character ~ inside a list will cause the reader to
evaluate the following expression, and (destructively) splice the result into
the list.
: '(a b c ~(list 'd 'e 'f) g h i)
-> (a b c d e f g h i)
When a tilde character is used to separate two symbol names (without surrounding whitespace), the first is taken as a namespace to look up the second (64-bit version only).
: 'libA~foo  # Look up 'foo' in namespace 'libA'
-> "foo"     # "foo" is not interned in the current namespace
Reading libA~foo is equivalent to switching the current
namespace to libA (with symbols), reading the symbol
foo, and then switching back to the original namespace.
Brackets ('[' and ']') can be used as super
parentheses. A closing bracket will match the innermost opening bracket, or all
currently open parentheses.
: '(a (b (c (d]
-> (a (b (c (d))))
: '(a (b [c (d]))
-> (a (b (c (d))))
Finally, reading the sequence '{}' will result in a new
anonymous symbol with value NIL, equivalent to a call to box without arguments.
: '({} {} {})
-> ($134599965 $134599967 $134599969)
: (mapcar val @)
-> (NIL NIL NIL)
PicoLisp tries to evaluate any expression encountered in the read-eval-print loop. Basically, it does so by applying the following three rules:
: 1234
-> 1234        # Number evaluates to itself
: *Pid
-> 22972       # Symbol evaluates to its VAL
: (+ 1 2 3)
-> 6           # List is evaluated as a function call
For the third rule, however, things get a bit more involved. First - as a special case - if the CAR of the list is a number, the whole list is returned as it is:
: (1 2 3 4 5 6)
-> (1 2 3 4 5 6)
This is not really a function call but just a convenience to avoid having to quote simple data lists.
Otherwise, if the CAR is a symbol or a list, PicoLisp tries to obtain an executable function from that, by either using the symbol's value, or by evaluating the list.
What is an executable function? Or, said in another way, what can be applied to a list of arguments, to result in a function call? A legal function in PicoLisp is
quote) or evaluate only some of their arguments (e.g.
setq).
A few examples should help to understand the practical consequences of these
rules. In the most common case, the CAR will be a symbol defined as a function,
like the * in:
: (* 1 2 3)    # Call the function '*'
-> 6
Inspecting the VAL of * gives
: *            # Get the VAL of the symbol '*'
-> 67318096
The VAL of * is a number. In fact, it is the numeric
representation of a C-function pointer, i.e. a pointer to executable code. This
is the case for all built-in functions of PicoLisp.
Other functions in turn are written as Lisp expressions:
: (de foo (X Y)            # Define the function 'foo'
   (* (+ X Y) (+ X Y)) )
-> foo
: (foo 2 3)                # Call the function 'foo'
-> 25
: foo                      # Get the VAL of the symbol 'foo'
-> ((X Y) (* (+ X Y) (+ X Y)))
The VAL of foo is a list. It is the list that was assigned to
foo with the de function. It would be perfectly legal
to use setq instead of de:
: (setq foo '((X Y) (* (+ X Y) (+ X Y))))
-> ((X Y) (* (+ X Y) (+ X Y)))
: (foo 2 3)
-> 25
If the VAL of foo were another symbol, that symbol's VAL would
be used instead to search for an executable function.
As we said above, if the CAR of the evaluated expression is not a symbol but a list, that list is evaluated to obtain an executable function.
: ((intern (pack "c" "a" "r")) (1 2 3))
-> 1
Here, the intern function returns the symbol car
whose VAL is used then. It is also legal, though quite dangerous, to use the code-pointer directly:
: *
-> 67318096
: ((* 2 33659048) 1 2 3)
-> 6
: ((quote . 67318096) 1 2 3)
-> 6
: ((quote . 1234) (1 2 3))
Segmentation fault
When an executable function is defined in Lisp itself, we call it a lambda expression. A lambda expression always has a list of executable expressions as its CDR. The CAR, however, must be a either a list of symbols, or a single symbol, and it controls the evaluation of the arguments to the executable function according to the following rules:
@
args,
next, arg and rest functions. This allows to define functions
with a variable number of evaluated arguments.
In all cases, the return value is the result of the last expression in the body.
: (de foo (X Y Z)                   # CAR is a list of symbols
   (list X Y Z) )                   # Return a list of all arguments
-> foo
: (foo (+ 1 2) (+ 3 4) (+ 5 6))
-> (3 7 11)                         # all arguments are evaluated
: (de foo @                         # CAR is the symbol '@'
   (list (next) (next) (next)) )    # Return the first three arguments
-> foo
: (foo (+ 1 2) (+ 3 4) (+ 5 6))
-> (3 7 11)                         # all arguments are evaluated
: (de foo X                         # CAR is a single symbol
   X )                              # Return the argument
-> foo
: (foo (+ 1 2) (+ 3 4) (+ 5 6))
-> ((+ 1 2) (+ 3 4) (+ 5 6))        # the whole unevaluated list is returned
Note that these forms can also be combined. For example, to evaluate only the
first two arguments, bind the results to X and Y, and
bind all other arguments (unevaluated) to Z:
: (de foo (X Y . Z)                 # CAR is a list with a dotted-pair tail
   (list X Y Z) )                   # Return a list of all arguments
-> foo
: (foo (+ 1 2) (+ 3 4) (+ 5 6))
-> (3 7 ((+ 5 6)))                  # Only the first two arguments are evaluated
Or, a single argument followed by a variable number of arguments:
: (de foo (X . @)                   # CAR is a dotted-pair with '@'
   (println X)                      # print the first evaluated argument
   (while (args)                    # while there are more arguments
      (println (next)) ) )          # print the next one
-> foo
: (foo (+ 1 2) (+ 3 4) (+ 5 6))
3                                   # X
7                                   # next argument
11                                  # and the last argument
-> 11
In general, if more than the expected number of arguments is supplied to a
function, these extra arguments will be ignored. Missing arguments default to
NIL.
Analogous to built-in functions (which are written in assembly (64-bit version) or C (32-bit version) in the interpreter kernel), PicoLisp functions may also be defined in shared object files (called "DLLs" on some systems). The coding style, register usage, argument passing etc. follow the same rules as for normal built-in functions.
Note that this has nothing to do with external (e.g. third-party) library functions called with native.
When the interpreter encounters a symbol supposed to be called as a function,
without a function definition, but with a name of the form
"lib:sym", then - instead of throwing an "undefined"-error - it
tries to locate a shared object file with the name lib and a
function sym, and stores a pointer to this code in the symbol's
value. From that point, this symbol lib:sym keeps that function
definition, and is undistinguishable from built-in functions. Future calls to
this function do not require another library search.
A consequence of this lookup mechanism, however, is the fact that such
symbols cannot be used directly in a function-passing context (i.e.
"apply" them) like
(apply + (1 2 3))
(mapcar inc (1 2 3))
These calls work because + and inc already have a
(function) value at this point. Applying a shared library function like
(apply ext:Base64 (1 2 3))
works only if ext:Base64 was either called before (and
thus automatically received a function definition), or was fetched explicitly
with (getd 'ext:Base64).
Therefore, it is recommended to always apply such functions by passing the symbol itself and not just the value:
(apply 'ext:Base64 (1 2 3))
Coroutines are independent execution contexts. They may have multiple entry and exit points, and preserve their environment between invocations.
They are available only in the 64-bit version.
A coroutine is identified by a tag. This tag can be passed to other functions, and (re)invoked as needed. In this regard coroutines are similar to "continuations" in other languages.
When the tag goes out of scope while it is not actively running, the coroutine will be garbage collected. In cases where this is desired, using a transient symbol for the tag is recommended.
A coroutine is created by calling co.
Its prg body will be executed, and unless yield is called at some point, the coroutine
will "fall off" at the end and disappear.
When yield is called, control is
either transferred back to the caller, or to some other - explicitly specified,
and already running - coroutine.
A coroutine is stopped and disposed when
co with
that tag but without a prg body
throw into another (co)routine
environment is executed
Reentrant coroutines are not supported: A coroutine cannot resume itself directly or indirectly.
Before using many coroutines, make sure you have sufficient stack space, e.g. by calling
$ ulimit -s unlimited
Without that, the stack limit in Linux is typically 8 MiB.
During the evaluation of an expression, the PicoLisp interpreter can be
interrupted at any time by hitting Ctrl-C. It will then enter the
breakpoint routine, as if ! were called.
Hitting ENTER at that point will continue evaluation, while (quit) will abort evaluation and return the
interpreter to the top level. See also debug, e, ^ and
*Dbg
Other interrupts may be handled by alarm, sigio, *Hup and *Sig[12].
When a runtime error occurs, execution is stopped and an error handler is entered.
The error handler resets the I/O channels to the console, and displays the
location (if possible) and the reason of the error, followed by an error
message. That message is also stored in the global *Msg, and the location of the error in ^. If the VAL of the global *Err is non-NIL it is executed as
a prg body. If the standard input is from a terminal, a
read-eval-print loop (with a question mark "?" as prompt) is
entered (the loop is exited when an empty line is input). Then all pending
finally expressions are executed,
all variable bindings restored, and all files closed. If the standard input is
not from a terminal, the interpreter terminates. Otherwise it is reset to its
top-level state.
: (de foo (A B) (badFoo A B))       # 'foo' calls an undefined symbol
-> foo
: (foo 3 4)                         # Call 'foo'
!? (badFoo A B)                     # Error handler entered
badFoo -- Undefined
? A                                 # Inspect 'A'
-> 3
? B                                 # Inspect 'B'
-> 4
?                                   # Empty line: Exit
:
Errors can be caught with catch,
if a list of substrings of possible error messages is supplied for the first
argument. In such a case, the matching substring (or the whole error message if
the substring is NIL) is returned.
An arbitrary error can be thrown explicitly with quit.
In certain situations, the result of the last evaluation is stored in the VAL
of the symbol @. This can be very convenient, because it often
makes the assignment to temporary variables unnecessary.
This happens in two - only superficially similar - situations:
load
@@@, @@ and @, in that
order (i.e the latest result is in @).
: (+ 1 2 3)
-> 6
: (/ 128 4)
-> 32
: (- @ @@)        # Subtract the last two results
-> 26
NIL results of their conditional expression - in
@.
: (while (read) (println 'got: @))
abc            # User input
got: abc       # print result
123            # User input
got: 123       # print result
NIL
-> 123
: (setq L (1 2 3 4 5 1 2 3 4 5))
-> (1 2 3 4 5 1 2 3 4 5)
: (and (member 3 L) (member 3 (cdr @)) (set @ 999))
-> 999
: L
-> (1 2 3 4 5 1 2 999 4 5)
Functions with controlling expressions are
   case,
   casq,
   prog1,
   prog2,
and the bodies of *Run tasks.
Functions with conditional expressions are and, cond, do, for, if, if2, ifn, loop, nand, nond, nor, not, or, state, unless, until, when and while.
@ is generally local to functions and methods, its value is
automatically saved upon function entry and restored at exit.
In PicoLisp, it is legal to compare data items of arbitrary type. Any two items are either
NIL is always less than anything else, and T is always
greater than anything else.
To demonstrate this, sort a list of
mixed data types:
: (sort '("abc" T (d e f) NIL 123 DEF))
-> (NIL 123 DEF "abc" (d e f) T)
See also max, min, rank, <,
=, > etc.
PicoLisp comes with built-in object oriented extensions. There seems to be a common agreement upon three criteria for object orientation:
PicoLisp implements both objects and classes with symbols. Object-local data are stored in the symbol's property list, while the code (methods) and links to the superclasses are stored in the symbol's VAL (encapsulation).
In fact, there is no formal difference between objects and classes (except that objects usually are anonymous symbols containing mostly local data, while classes are named internal symbols with an emphasis on method definitions). At any time, a class may be assigned its own local data (class variables), and any object can receive individual method definitions in addition to (or overriding) those inherited from its (super)classes.
PicoLisp supports multiple inheritance. The VAL of each object is a (possibly
empty) association list of message symbols and method bodies, concatenated with
a list of classes. When a message is sent to an object, it is searched in the
object's own method list, and then (with a left-to-right depth-first search) in
the tree of its classes and superclasses. The first method found is executed and
the search stops. The search may be explicitly continued with the extra and super functions.
Thus, which method is actually executed when a message is sent to an object depends on the classes that the object is currently linked to (polymorphism). As the method search is fully dynamic (late binding), an object's type (i.e. its classes and method definitions) can be changed even at runtime!
While a method body is being executed, the global variable This is set to the current object, allowing
the use of the short-cut property functions =:, :
and ::.
On the lowest level, a PicoLisp database is just a collection of external symbols. They reside in a database file, and are
dynamically swapped in and out of memory. Only one database can be open at a
time (pool).
In addition, further external symbols can be specified to originate from
arbitrary sources via the *Ext
mechanism.
Whenever an external symbol's value or property list is accessed, it will be
automatically fetched into memory, and can then be used like any other symbol.
Modifications will be written to disk only when commit is called. Alternatively, all
modifications since the last call to commit can be discarded by
calling rollback.
In the typical case there will be multiple processes operating on the same
database. These processes should be all children of the same parent process,
which takes care of synchronizing read/write operations and heap contents. Then
a database transaction is normally initiated by calling (dbSync), and closed by calling (commit 'upd). Short transactions, involving
only a single DB operation, are available in functions like new! and methods like put!> (by convention with an
exclamation mark), which implicitly call (dbSync) and (commit
'upd) themselves.
A transaction proceeds through five phases:
dbSync waits to get a lock on the root object *DB. Other processes continue reading and
writing meanwhile.
dbSync calls sync to synchronize with changes from other
processes. We hold the shared lock, but other processes may continue reading.
put>, set>, lose> etc. We -
and also other processes - can still read the DB.
(commit 'upd).
commit obtains an exclusive lock (no more read operations by other
processes), writes an optional transaction log, and then all modified symbols.
As upd is passed to 'commit', other
processes synchronize with these changes.
The symbols in a database can be used to store arbitrary information structures. In typical use, some symbols represent nodes of search trees, by holding keys, values, and links to subtrees in their VAL's. Such a search tree in the database is called index.
For the most part, other symbols in the database are objects derived from the
+Entity class.
Entities depend on objects of the +relation class hierarchy.
Relation-objects manage the property values of entities, they define the
application database model and are responsible for the integrity of mutual
object references and index trees.
Relations are stored as properties in the entity classes, their methods are
invoked as daemons whenever property values in an entity are changed. When
defining an +Entity class, relations are defined - in addition to
the method definitions of a normal class - with the rel function. Predefined relation classes
include
+Any
   A declarative language is built on top of PicoLisp, that has the semantics of Prolog, but uses the syntax of Lisp.
For an explanation of Prolog's declarative programming style, an introduction like "Programming in Prolog" by Clocksin/Mellish (Springer-Verlag 1981) is recommended.
Facts and rules can be declared with the be function. For example, a Prolog fact
'likes(john,mary).' is written in Pilog as:
(be likes (John Mary))
and a rule 'likes(john,X) :- likes(X,wine), likes(X,food).' is
in Pilog:
(be likes (John @X) (likes @X wine) (likes @X food))
As in Prolog, the difference between facts and rules is that the latter ones have conditions, and usually contain variables.
A variable in Pilog is any symbol starting with an at-mark character
("@"), i.e. a pat?
symbol. The symbol @ itself can be used as an anonymous variable:
It will match during unification, but will not be bound to the matched values.
The cut operator of Prolog (usually written as an exclamation mark
(!)) is the symbol T in Pilog.
An interactive query can be done with the ? function:
(? (likes John @X))
This will print all solutions, waiting for user input after each line. If a
non-empty line (not just a ENTER key, but for example a dot (.)
followed by ENTER) is typed, it will terminate.
Pilog can be called from Lisp and vice versa:
goal (prepare a query from Lisp data) and
prove (return an association list of
successful bindings), and the application level functions pilog and solve.
^, then the CDDR
is executed as a Lisp prg body and the result unified with the
CADR.
-> function.
It was necessary to introduce - and adhere to - a set of conventions for PicoLisp symbol names. Because all (internal) symbols have a global scope (there are no packages or name spaces), and each symbol can only have either a value or function definition, it would otherwise be very easy to introduce name conflicts. Besides this, source code readability is increased when the scope of a symbol is indicated by its name.
These conventions are not hard-coded into the language, but should be so into the head of the programmer. Here are the most commonly used ones:
*"
_"
+", where the first letter
   >"
For example, a local variable could easily overshadow a function definition:
: (de max-speed (car)
   (.. (get car 'speeds) ..) )
-> max-speed
Inside the body of max-speed (and all other functions called
during that execution) the kernel function car is redefined to some
other value, and will surely crash if something like (car Lst) is
executed. Instead, it is safe to write:
: (de max-speed (Car)            # 'Car' with upper case first letter
   (.. (get Car 'speeds) ..) )
-> max-speed
Note that there are also some strict naming rules (as opposed to the voluntary conventions) that are required by the corresponding kernel functionalities, like:
@" (see match and fill)
lib:sym"
With that, the last of the above conventions (local functions start with an underscore) is not really necessary, because true local scope can be enforced with transient symbols.
PicoLisp does not try very hard to be compatible with traditional Lisp systems. If you are used to some other Lisp dialects, you may notice the following differences:
CAR and car are different symbols,
which was not the case in traditional Lisp systems.
QUOTE
QUOTE function returns its
first unevaluated argument. In PicoLisp, on the other hand,
quote returns all (unevaluated) argument(s).
LAMBDA
LAMBDA function, in some way at the heart of traditional
Lisp, is completely missing (and quote is used instead).
PROG
PROG function of traditional Lisp, with its GOTO and ENTER
functionality, is also missing. PicoLisp's prog function is just a
simple sequencer (as PROGN in some Lisps).
The names of the symbols T and NIL violate the naming conventions. They are global symbols, and should
therefore start with an asterisk "*". It is too easy to bind them
to some other value by mistake:
(de foo (R S T)
   ...
However, lint will issue a warning
in such a case.
This section provides a reference manual for the kernel functions, and some extensions. See the thematically grouped list of indexes below.
Though PicoLisp is a dynamically typed language (resolved at runtime, as opposed to statically (compile-time) typed languages), many functions can only accept and/or return a certain set of data types. For each function, the expected argument types and return values are described with the following abbreviations:
The primary data types:
num - Number
sym - Symbol
lst - List
Other (derived) data types
any - Anything: Any data type
flg - Flag: Boolean value (NIL or non-NIL)
cnt - A count or a small number
dat - Date: Days, starting first of March of the year 0 A.D.
tim - Time: Seconds since midnight
obj - Object/Class: A symbol with methods and/or classes
var - Variable: Either a symbol or a cons pair
exe - Executable: An executable expression (eval)
prg - Prog-Body: A list of executable expressions (run)
fun - Function: Either a number (code-pointer), a symbol (message) or a list (lambda)
msg - Message: A symbol sent to an object (to invoke a method)
cls - Class: A symbol defined as an object's class
typ - Type: A list of cls symbols
pat - Pattern: A symbol whose name starts with an at-mark "@"
pid - Process ID: A number, the ID of a Unix process
tree - Database index tree specification
hook - Database hook object
Arguments evaluated by the function (depending on the context) are quoted
(prefixed with the single quote character "'").
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Other
   new
   sym
   str
   char
   name
   sp?
   pat?
   fun?
   all
   symbols
   local
   import
   intern
   extern
   ====
   qsym
   loc
   box?
   str?
   ext?
   touch
   zap
   length
   size
   format
   chop
   pack
   glue
   pad
   align
   center
   text
   wrap
   pre?
   sub?
   low?
   upp?
   lowc
   uppc
   fold
   val
   getd
   set
   setq
   def
   de
   dm
   recur
   undef
   redef
   daemon
   patch
   swap
   xchg
   on
   off
   onOff
   zero
   one
   default
   expr
   subr
   let
   let?
   use
   accu
   push
   push1
   push1q
   pop
   cut
   del
   queue
   fifo
   idx
   lup
   cache
   locale
   dirname
   put
   get
   prop
   ;
   =:
   :
   ::
   putl
   getl
   wipe
   meta
   atom
   pair
   circ?
   lst?
   num?
   sym?
   flg?
   sp?
   pat?
   fun?
   box?
   str?
   ext?
   bool
   not
   ==
   n==
   =
   <>
   =0
   =1
   =T
   n0
   nT
   <
   <=
   >
   >=
   match
   +
   -
   *
   /
   %
   */
   **
   inc
   dec
   >>
   lt0
   le0
   ge0
   gt0
   abs
   bit?
   &
   |
   x|
   sqrt
   seed
   hash
   rand
   max
   min
   length
   size
   accu
   format
   pad
   money
   round
   bin
   oct
   hex
   hax
   fmt64
   car
   cdr
   caar
   cadr
   cdar
   cddr
   caaar
   caadr
   cadar
   caddr
   cdaar
   cdadr
   cddar
   cdddr
   caaaar
   caaadr
   caadar
   caaddr
   cadaar
   cadadr
   caddar
   cadddr
   cdaaar
   cdaadr
   cdadar
   cdaddr
   cddaar
   cddadr
   cdddar
   cddddr
   nth
   con
   cons
   conc
   circ
   rot
   list
   need
   range
   full
   make
   made
   chain
   link
   yoke
   copy
   mix
   append
   delete
   delq
   replace
   insert
   remove
   place
   strip
   split
   reverse
   flip
   trim
   clip
   head
   tail
   stem
   fin
   last
   member
   memq
   mmeq
   sect
   diff
   index
   offset
   prior
   assoc
   rassoc
   asoq
   rank
   sort
   uniq
   group
   length
   size
   bytes
   val
   set
   xchg
   push
   push1
   push1q
   pop
   cut
   queue
   fifo
   idx
   balance
   get
   fill
   apply
   load
   args
   next
   arg
   rest
   pass
   quote
   as
   lit
   eval
   run
   macro
   curry
   def
   de
   dm
   recur
   recurse
   undef
   box
   new
   type
   isa
   method
   meth
   send
   try
   super
   extra
   with
   bind
   job
   let
   let?
   use
   and
   or
   nand
   nor
   xor
   bool
   not
   nil
   t
   prog
   prog1
   prog2
   if
   if2
   ifn
   when
   unless
   cond
   nond
   case
   casq
   state
   while
   until
   loop
   do
   at
   for
   catch
   throw
   finally
   co
   yield
   !
   e
   $
   call
   tick
   ipid
   opid
   kill
   quit
   task
   fork
   pipe
   later
   timeout
   abort
   bye
   apply
   pass
   maps
   map
   mapc
   maplist
   mapcar
   mapcon
   mapcan
   filter
   extract
   seek
   find
   pick
   fully
   cnt
   sum
   maxi
   mini
   fish
   by
   path
   in
   out
   err
   ctl
   ipid
   opid
   pipe
   any
   sym
   str
   load
   hear
   tell
   key
   poll
   peek
   char
   skip
   eol
   eof
   from
   till
   line
   format
   scl
   read
   print
   println
   printsp
   prin
   prinl
   msg
   space
   beep
   tab
   flush
   rewind
   rd
   pr
   wr
   wait
   sync
   echo
   info
   file
   dir
   lines
   open
   close
   port
   listen
   accept
   host
   connect
   udp
   script
   once
   rc
   acquire
   release
   tmp
   pretty
   pp
   show
   view
   here
   prEval
   mail
   *Class
   class
   dm
   rel
   var
   var:
   new
   type
   isa
   method
   meth
   send
   try
   object
   extend
   super
   extra
   with
   This
   can
   dep
   pool
   journal
   id
   seq
   lieu
   lock
   commit
   rollback
   mark
   free
   dbck
   dbs
   dbs+
   db:
   tree
   db
   aux
   collect
   genKey
   genStrKey
   useKey
   +relation
   +Any
   +Bag
   +Bool
   +Number
   +Date
   +Time
   +Symbol
   +String
   +Link
   +Joint
   +Blob
   +Hook
   +Hook2
   +index
   +Key
   +Ref
   +Ref2
   +Idx
   +Sn
   +Fold
   +IdxFold
   +Aux
   +UB
   +Dep
   +List
   +Need
   +Mis
   +Alt
   +Swap
   +Entity
   blob
   dbSync
   new!
   set!
   put!
   inc!
   blob!
   upd
   rel
   request
   obj
   fmt64
   root
   fetch
   store
   count
   leaf
   minKey
   maxKey
   init
   step
   scan
   iter
   ubIter
   prune
   zapTree
   chkTree
   db/3
   db/4
   db/5
   val/3
   lst/3
   map/3
   isa/2
   same/3
   bool/3
   range/3
   head/3
   fold/3
   part/3
   tolr/3
   select/3
   remote/2
   prove
   ->
   unify
   be
   clause
   repeat
   asserta
   assertz
   retract
   rules
   goal
   fail
   pilog
   solve
   query
   ?
   repeat/0
   fail/0
   true/0
   not/1
   call/1
   or/2
   nil/1
   equal/2
   different/2
   append/3
   member/2
   delete/3
   permute/2
   uniq/2
   asserta/1
   assertz/1
   retract/1
   clause/2
   show/1
   for/2
   for/3
   for/4
   db/3
   db/4
   db/5
   val/3
   lst/3
   map/3
   isa/2
   same/3
   bool/3
   range/3
   head/3
   fold/3
   part/3
   tolr/3
   select/3
   remote/2
   pretty
   pp
   show
   loc
   *Dbg
   doc
   more
   depth
   what
   who
   can
   dep
   debug
   d
   unbug
   u
   vi
   em
   ld
   trace
   untrace
   traceAll
   proc
   hd
   bench
   bt
   edit
   lint
   lintAll
   select
   update
   cmd
   argv
   opt
   version
   gc
   raw
   alarm
   sigio
   kids
   protect
   heap
   stack
   adr
   env
   trail
   up
   sys
   date
   time
   usec
   stamp
   dat$
   $dat
   datSym
   datStr
   strDat
   expDat
   day
   week
   ultimo
   tim$
   $tim
   telStr
   expTel
   locale
   allowed
   allow
   pwd
   cd
   chdir
   ctty
   info
   dir
   dirname
   errno
   native
   struct
   lisp
   exec
   call
   tick
   kill
   quit
   task
   fork
   forked
   pipe
   timeout
   mail
   assert
   test
   bye
   NIL
   pico
   *CPU
   *OS
   *DB
   T
   *Solo
   *PPid
   *Pid
   @
   @@
   @@@
   This
   *Prompt
   *Dbg
   *Zap
   *Scl
   *Class
   *Dbs
   *Run
   *Hup
   *Sig1
   *Sig2
   ^
   *Err
   *Msg
   *Uni
   *Led
   *Tsm
   *Adr
   *Allow
   *Fork
   *Bye
The PicoLisp system can be downloaded from the PicoLisp Download page.