parcom is a consise Parser Combinator library in the style of Haskell’s parsec
and Rust’s nom.
(in-package :parcom)
(parse (*> (string "Tempus") #'space (string "fugit")) "Tempus fugit.")fugit
parcom operates strictly on strings, not streamed byte data, but is otherwise
“zero copy” in that extracted substrings of the original input are not
reallocated.
Also provided are the following subsystems, which can be used independently to read their associated formats:
parcom/jsonparcom/tomlparcom/xmlparcom/datetimeparcom/email
parcom and its children have no dependencies.
- Compatibility
- Performance
- API
- Writing your own Parsers
| Compiler | Status |
|---|---|
| SBCL | ✅ |
| ECL | ✅ |
| Clasp | ✅ |
| ABCL | ✅ |
| CCL | ✅ |
| Clisp | ✅ |
| Allegro | ✅ |
| LispWorks | ❓ |
parcom operates over (simple-array character (*)) input, accessing its elements
with schar. The parcom/json component has been measured to parse JSON at ~50mb/s
on a 2016 ThinkPad, which should be sufficient for ordinary usage. Its
performance is competitive across compilers, while remaining expressive and
light in implementation.
See also this article about some of the optimization techniques used within parcom.
In parsing a 25mb JSON file from an in-memory string.
SBCL:
| Method | Time (ms) | Memory (bytes) |
|---|---|---|
| jsown | 200 | 135m |
| jzon | 370 | 132m |
| parcom/json | 500 | 227m |
| shasht | 650 | 410m |
| yason | 1800 | 400m |
Allegro:
| Method | Time (s) | Memory (cons) | Memory (bytes) |
|---|---|---|---|
| jsown | 0.7 | 1.3m | 91m |
| parcom/json | 1.2 | 3.9m | 242m |
| jzon | 1.5 | 0.9m | 258m |
| shasht | 1.5 | 4.5m | 452m |
| yason | 8.2 | 6.4m | 457m |
ECL:
| Method | Time (s) | Memory (bytes) |
|---|---|---|
| shasht | 2.6 | 1.8b |
| parcom/json | 2.8 | 401m |
| jzon | 3.2 | 1.6b |
| yason | 5.4 | 1.9b |
| jsown | 5.8 | 165m |
ABCL:
| Method | Time (s) | Memory (cons cells) |
|---|---|---|
| parcom/json | 6.0 | 2m |
| jsown | 7.4 | 25m |
| jzon | 11.3 | 864k |
| shasht | 83.3 | 110m |
| yason | 328.7 | 366m |
CMUCL:
| Method | Time (s) | Memory (bytes) |
|---|---|---|
| jsown | 0.6 | 71m |
| parcom/json | 1.6 | 175m |
| shasht | 4.5 | 348m |
| yason | 9.1 | 346m |
In parsing a Java .pom file 1000 times.
SBCL:
| Method | Times (ms) | Memory (bytes) |
|---|---|---|
| parcom/xml | 341 | 161m |
| cxml | 795 | 497m |
| plump | 1450 | 656m |
Allegro:
| Method | Time (s) | Memory (cons) | Memory (bytes) |
|---|---|---|---|
| parcom/xml | 1.2 | 2m | 353m |
| plump | 4.9 | 10m | 790m |
| cxml | 5.0 | 13m | 684m |
ABCL:
| Method | Time (s) | Memory (cons cells) |
|---|---|---|
| parcom/xml | 12.8 | 35m |
| plump | 66.7 | 85m |
| cxml | 139.6 | 150m |
cxml is quite old and one of its dependencies did not compile under ECL.
The examples below use (in-package :parcom) for brevity, but it’s assumed that
you’ll use a local nickname like p or pc in your actual code. Further, most
examples run the parsers with parse, but occasionally funcall is used instead to
demonstrate what the remaining input offset would be after the parse succeeded.
You will generally be using parse in your own code.
All parsers have the function signature offset -> (value, offset), where offset
is the current parsing offset. The new value and offset must be yielded via
values as multiple return values, as this is the most memory-efficient.
(in-package :parcom)
(funcall (string "Hello") (in "Hello there"))"Hello", 5
Of course, a parser might fail, in which case a failure message and the offset are returned:
(in-package :parcom)
(funcall (string "Hello") (in "Bye!")):FAIL, 0
In general though, we call parse to fully run some combined parsers and yield
the final output:
(in-package :parcom)
(apply #'+ (parse (sep (char #\.) #'unsigned) "123.456.789!"))1368
parse otherwise ignores any final, unconsumed input. It will also raise a
Condition if the parsing failed.
A “parser” is a function that consumes some specific input and yields a single result.
Parse a given character.
(in-package :parcom)
(parse (char #\a) "apple")#\a
Parse the given string. The parsed string is a slice into the original input.
(in-package :parcom)
(parse (string "Hello") "Hello there!")Hello
Parse any character.
(in-package :parcom)
(parse #'any "Hello there!")#\H
Parse any character except the one you don’t want.
(in-package :parcom)
(parse (any-but #\!) "Hello there!")#\H
(in-package :parcom)
(funcall (any-but #\H) (in "Hello there!")):FAIL, 0
Any character that passes the predicate.
(in-package :parcom)
(parse (any-if #'digit?) "8a")#\8
Parse a hex character of any case.
(in-package :parcom)
(parse (many #'hex) "abcd0efgh")(#\a #\b #\c #\d #\0 #\e #\f)
Yield the given char if it’s the next one, but don’t advance the offset. Like
peek, but character-based and thus more performant.
(in-package :parcom)
(funcall (sneak #\a) (in "aaabcd"))#\a, 0
Recognize the end of the input.
(in-package :parcom)
(parse #'eof "")T
(in-package :parcom)
(parse (*> (string "Mālum") #'eof) "Mālum")T
(in-package :parcom)
(funcall (*> (string "Mālum") #'eof) (in "Mālum rubrum")):FAIL, 5
Parse a positive integer into a fixnum.
(in-package :parcom)
(parse #'unsigned "44")44
Parse a positive or negative integer into a fixnum.
(in-package :parcom)
(parse #'integer "-44")-44
Parse a positive or negative floating point number into a double-float.
(in-package :parcom)
(parse #'float "123.0456")123.0456d0
Matches a single newline character.
(in-package :parcom)
(let ((s (concatenate 'simple-string '(#\newline #\a #\b #\c)))) ; "\nabc"
(parse #'newline s))#\Newline
Parse 0 or more ASCII whitespace and tab characters.
(in-package :parcom)
(length (parse #'space " Salvē!"))3
Parse 1 or more ASCII whitespace and tab characters.
(in-package :parcom)
(length (parse #'space1 " Salvē!"))3
(in-package :parcom)
(funcall #'space1 (in "Salvē!")):FAIL, 0
Parse 0 or more ASCII whitespace, tabs, newlines, and carriage returns.
(in-package :parcom)
(length (parse #'multispace (concatenate 'simple-string '(#\tab #\newline #\tab))))3
Parse 1 or more ASCII whitespace, tabs, newlines, and carriage returns.
(in-package :parcom)
(length (parse #'multispace1 (concatenate 'simple-string '(#\tab #\newline #\tab))))3
(in-package :parcom)
(funcall #'multispace1 (in "Ārcus")):FAIL, 0
These always yield a substring borrowed directly from the original input.
Take n characters from the input.
(in-package :parcom)
(parse (take 3) "Arbor")Arb
It’s okay for n to be too large:
(in-package :parcom)
(parse (take 100) "Arbor")Arbor
Take characters while some predicate holds.
(in-package :parcom)
(parse (take-while (lambda (c) (equal #\a c))) "aaabbb")aaa
take-while1 is like take-while, but must yield at least one character.
(in-package :parcom)
(funcall (take-while1 (lambda (c) (equal #\a c))) (in "bbb")):FAIL, 0
A faster version of take-while and skip when you know you’re character-based and
don’t need the parsed output.
(in-package :parcom)
(funcall (consume (lambda (c) (equal #\a c))) (in "aaabbb"))T, 3
consume1 requires at least one application of the predicate to succeed.
Like take-while, but check two characters at a time. Very useful for parsing
escaped characters where the backslash and char need to be analyzed at the same
time.
The given function must yield two values via values:
- The keyword
:oneand a character to keep if you only wish to advance the parser by one place. For instance, when the character wasn’t escaped. - The keyword
:twoand a character to keep if you want to advance the parser by two.
Note that in both cases, the character yielded by your lambda need not be one of
the inputs given to it. For example, if it detected a backslash and an #\n, it
could yield the single Lisp #\newline character.
(in-package :parcom)
(parse (sliding-take (lambda (a b)
(cond ((and (char= a #\\)
(char= b #\n))
(values :two #\newline))
(t (values :one a)))))
"Hello \\n there!")Hello there!
Consume the rest of the input. Always succeeds.
(in-package :parcom)
(parse (<*> (string "Salvē") (*> #'space #'rest)) "Salvē domine!")("Salvē" "domine!")
Consume no input and just yield a given value.
(in-package :parcom)
(parse (pure :pāx) "Bellum"):PĀX
Useful for chaining with other compound parsers to inject values into the results.
(in-package :parcom)
(parse (<*> (<*> (pure :pāx) (string "PĀX"))
#'multispace
(<*> (pure :bellum) (string "BELLUM")))
"PĀX BELLUM")((:PĀX "PĀX") " " (:BELLUM "BELLUM"))
“Combinators” combine child parsers together to form compound results. They allow us to express intent like “parse this then that” and “parse this, then maybe that, but only if…” etc.
Run multiple parsers one after another, but yield the value of the rightmost
one. right is an alias.
(in-package :parcom)
(parse (*> (char #\!) #'unsigned) "!123?")123
Run multiple parsers one after another, but yield the value of the leftmost
one. left is an alias.
(in-package :parcom)
(parse (<* (char #\!) #'unsigned) "!123?")#\!
Apply a function to the result of any number of parsers. The main way to parse multiple values into a single struct/class.
(in-package :parcom)
(parse (ap (lambda (a b c) (list a (+ b c)))
#'unsigned
(*> (char #\.) #'unsigned)
(*> (char #\.) #'unsigned))
"1.2.3")(1 5)
This is much more memory-efficient variant of the previously Haskell-inspired
combination of fmap and <*>.
Run some parser, but substitute its inner value with something else if parsing
was successful. instead is an alias.
(in-package :parcom)
(parse (<$ :roma (string "Roma")) "Roma!"):ROMA
Accept the results of the first parser from a group to succeed. Can combine as many parsers as you want.
(in-package :parcom)
(parse (alt (string "dog") (string "cat")) "cat")cat
Yield nil if the parser failed, but don’t fail the whole process nor consume any
input.
(in-package :parcom)
(parse (opt (string "Ex")) "Exercitus")Ex
(in-package :parcom)
(parse (opt (string "Ex")) "Facēre")NIL
A main parser flanked by two other ones. Only the value of the main parser is kept. Good for parsing backets, parentheses, etc.
(in-package :parcom)
(parse (between (char #\!) (string "Salvē") (char #\!)) "!Salvē!")Salvē
many parses 0 or more occurrences of a parser. many1 demands that at least one
parse succeeds or a Condition will be raised.
(in-package :parcom)
(parse (many (alt (string "ovēs") (string "avis"))) "ovēsovēsavis!")("ovēs" "ovēs" "avis")
sep parses 0 or more instances of a parser separated by some sep parser. sep1
demands that at least one parse succeeds or a Condition will be raised.
(in-package :parcom)
(parse (sep (char #\!) (string "pilum")) "pilum!pilum!pilum.")("pilum" "pilum" "pilum")
Critically, if a separator is detected, the parent parser must also then succeed
or the entire combination fails. For example, this will not parse due to the !
on the end:
(in-package :parcom)
(funcall (sep (char #\!) (string "pilum")) (in "pilum!pilum!pilum!")):FAIL, 18
For more lenient behaviour regarding the separator, see sep-end.
The same as sep, but the separator may appear at the end of the final “parent”.
Likewise, sep-end1 demands that at least one parse of the parent succeeds.
(in-package :parcom)
(parse (sep-end (char #\!) (string "pilum")) "pilum!pilum!pilum!scūtum")("pilum" "pilum" "pilum")
A combination of sep and consume for when you just wish to advance the parser
and don’t actually require the output. This avoids a lot of wasteful,
intermediate list allocation.
(in-package :parcom)
(parse (consume-sep (char #\!) (string "pilum")) "pilum!pilum!pilum.")17
consume-sep1 requires at least one application of the main parser to succeed.
Parse some parser 0 or more times, but throw away all the results.
(in-package :parcom)
(parse (*> (skip (char #\!)) #'unsigned) "!!!123")123
Yield the value of a parser, but don’t consume the input.
(in-package :parcom)
(funcall (peek (string "he")) (in "hello"))he
Apply a parser a given number of times and collect the results as a list.
(in-package :parcom)
(funcall (count 3 (char #\a)) (in "aaaaaa"))(#\a #\a #\a), 3
Take characters until another parser succeeds. Does not advance the offset by the subparser.
(in-package :parcom)
(funcall (take-until (char #\')) (in "abcd'"))"abcd", 4
If the subparser is just looking for a single char like the above, use
take-while or consume instead. take-until is intended for more complex halting
conditions that can’t easily be detected by a char-by-char predicate function.
If the given parser was successful, return the consumed input as a string instead.
(in-package :parcom)
(funcall (recognize (<*> (string "hi") #'unsigned)) (in "hi123there"))"hi123", 5
If an initial parser succeeds, apply some f to the result of the second parser.
If the first parser doesn’t succeed, the second is attempted as usual but f
isn’t applied.
(in-package :parcom)
(parse (maybe #'1+ (char #\a) #'integer) "a123")124
Pass if the given parser fails, and don’t advance the offset. Fail if it succeeds.
(in-package :parcom)
(funcall (not (char #\a)) (in "bark"))T, 0
Is a given string empty?
(in-package :parcom)
(empty? "")T
Is a given character a number from 0 to 9?
(in-package :parcom)
(digit? #\7)T
Apply a pure function to the result of a successful parse.
(in-package :parcom)
(fmap #'1+ (funcall #'unsigned (in "1")))2, 1
However, in general you should consider ap for applying functions to the
combined results of multiple parsers.
Yield a function that ignores its input and returns some original seed.
(in-package :parcom)
(funcall (const 1) 5)1
By depending on the optional parcom/json system, you can parse JSON strings or
include parcom-compatible JSON parsers into your own custom parsing code.
(in-package :parcom/json) is used below for brevity, but it’s assumed that in
your own code you will use a nickname, perhaps pj or json.
If you don’t care about the individual parsers per se and just want to simply
parse some JSON, use pj:parse.
Conversions:
| JSON | Lisp |
|---|---|
true | T |
false | NIL |
| Array | Vector |
| Object | Hash Table |
| Number | double-float |
| String | String |
null | :NULL |
Attempt to parse any JSON value. Analogous to parse from the main library.
(in-package :parcom/json)
(parse "{\"x\": 1, \"y\": 2, \"z\": [1, {\"a\":true}]}")#<HASH-TABLE :TEST EQUAL :COUNT 3 {1004C0B293}>
(in-package :parcom/json)
(parse "[1.9,true,3e+7,\"hi\",[4],null]")#(1.9d0 T 3.0d7 "hi" #(4.0d0) :NULL)
Non-ascii and unicode characters are supported:
(in-package :parcom/json)
(parse "\"hēllお🐂\\u03B1\"")hēllお🐂α
Parse any kind of JSON (the actual parser).
(in-package :parcom/json)
(json (parcom:in "{\"x\": 1, \"y\": 2, \"z\": [1, {\"a\":true}]} "))#<HASH-TABLE :TEST EQUAL :COUNT 3 {1004CA4C63}>, 38
There are other subparsers exposed, but they are left out here for brevity. Please consult the source code if you need them.
The parcom/toml system provides types and parsers for TOML files.
(in-package :parcom/toml) is used below for brevity, but it’s assumed that in
your own code you will use a nickname, perhaps pt or toml.
If you don’t care about the individual parsers per se and just want to simply
parse some TOML, use pt:parse.
This parser is TOML 1.0.0 compliant, with one exception: inf and nan float
values are not accepted.
This system has no dependencies other than parcom/datetime.
Parse a full TOML document directly from a string.
(in-package :parcom/toml)
(parse "# My Config
[project]
name = \"parcom\"")#<HASH-TABLE :TEST EQUAL :COUNT 1 {10089AB2F3}>
Parse TOML via the actual parser.
(in-package :parcom/toml)
(toml (parcom:in "# My Config
[project]
name = \"parcom\""))#<HASH-TABLE :TEST EQUAL :COUNT 1 {10066C8433}>, 38
There are other subparsers exposed, but they are left out here for brevity. Please consult the source code if you need them.
The parcom/xml system provides types and parsers for XML files.
(in-package :parcom/xml) is used below for brevity, but it’s assumed that in
your own code you will use a nickname, perhaps px or xml.
If you don’t care about the individual parsers per se and just want to simply
parse some XML, use px:parse.
This parser is gradually compliant to XML 1.1. If you discover XML files for which parsing fails, please open an issue.
parse produces a top-level document which may contain metadata.
(in-package :parcom/xml)
(parse "<hello>yes</hello>")#S(DOCUMENT :METADATA NIL :ELEMENT #S(ELEMENT :NAME "hello" :METADATA NIL :CONTENT "yes"))
The actual combinator is xml.
(in-package :parcom/xml)
(xml (parcom:in "<hello>yes</hello>"))#S(DOCUMENT :METADATA NIL :ELEMENT #S(ELEMENT :NAME "hello" :METADATA NIL :CONTENT "yes")) 18
There are other subparsers exposed, but they are left out here for brevity. Please consult the source code if you need them.
A generic function to easily extract the meaningful content of anything element-like.
(in-package :parcom/xml)
(content (parse "<hello>yes</hello>"))yes
The parcom/datetime system provides types and parsers for RFC3339 timestamps.
(in-package :parcom/datetime) is used below for brevity, but it’s assumed that
in your own code you will use a nickname, perhaps pd.
As with the other parcom libraries, this has no external dependencies, which is
an advantage over the otherwise excellent local-time library, which depends on
heavy uiop.
parse is lenient, and will parse any kind of date or time you give it.
(in-package :parcom/datetime)
(parse "1975-04-05")#S(LOCAL-DATE :YEAR 1975 :MONTH 4 :DAY 5)
(in-package :parcom/datetime)
(parse "1975-04-05T04:05:06+03:00")#S(OFFSET-DATE-TIME :DATE #S(LOCAL-DATE :YEAR 1975 :MONTH 4 :DAY 5) :TIME #S(LOCAL-TIME :HOUR 4 :MINUTE 5 :SECOND 6 :MILLIS 0) :OFFSET #S(OFFSET :HOUR 3 :MINUTE 0))
It’s up to you to handle the concrete type that you’re returned. See the date
and time generic functions below.
Right now!
(in-package :parcom/datetime)
(now)#S(OFFSET-DATE-TIME :DATE #S(LOCAL-DATE :YEAR 2025 :MONTH 5 :DAY 5) :TIME #S(LOCAL-TIME :HOUR 10 :MINUTE 0 :SECOND 28 :MILLIS 0) :OFFSET #S(OFFSET :HOUR 9 :MINUTE 0))
It’s a cloudy May morning.
Regardless of what parsed, you can usually pull a local-date out of it.
(in-package :parcom/datetime)
(date (parse "1975-04-05T04:05:06+03:00"))#S(LOCAL-DATE :YEAR 1975 :MONTH 4 :DAY 5)
Regardless of what parsed, you can usually pull a local-time out of it.
(in-package :parcom/datetime)
(time (parse "1975-04-05T04:05:06+03:00"))#S(LOCAL-TIME :HOUR 4 :MINUTE 5 :SECOND 6 :MILLIS 0)
To convert your object back into something human-readable. Note that this is
different from cl:format!
(in-package :parcom/datetime)
(format nil (date (parse "1975-04-05T04:05:06+03:00")))1975-04-05
The parcom/email system provides types and parsers for RFC5322 email addresses.
The implementation is RFC-compliant and particularly memory-efficient for
well-behaved addresses.
(in-package :parcom/email) is used below for brevity, but it’s assumed that
in your own code you will use a nickname, perhaps pe.
Simply parse an email address.
(in-package :parcom/email)
(parse "[email protected]")#S(ADDRESS :NAME "alice" :DOMAIN "bob.com")
Rerender a parsed email address.
(in-package :parcom/email)
(pretty (parse "[email protected]"))[email protected]
It will even clean up obsolete syntax for you.
(in-package :parcom/email)
(pretty (parse "alice(comment)@bob(comment) .com"))[email protected]
Simply validate that a given address is legal.
(in-package :parcom/email)
(valid-email-address? "[email protected]")T
(in-package :parcom/email)
(valid-email-address? "[email protected]")NIL
The top-level parsers that can be fitted in anywhere else you use parcom.
(in-package :parcom/email)
(parcom:parse (*> (parcom:string "address: ")
#'addr-spec)
"address: [email protected]")#S(ADDRESS :NAME "alice" :DOMAIN "bob.com")
msg-id is a simpler variant also found in the spec that denies certain more
complicated syntax.
The whole point of Parser Combinators is that it becomes simple to write your
own parsing functions. Recall that a “fully realized” parser has the signature
offset -> (value, offset). In the simplest case, a parser of yours could look
like:
(in-package :parcom)
(defun excited-apple (offset)
(funcall (<* (string "Mālum") (char #\!)) offset))
(funcall #'excited-apple (in "Mālum! Ō!"))"Mālum", 6
Wherein you utilize the combinators provided by this library to build up composite parsers that are useful to you.
You can also parameterize your parsers, similar to parsers like take or
combinators like count:
(in-package :parcom)
(defun excited-apple (offset)
(funcall (<* (string "Mālum") (char #\!)) offset))
(defun excited-apples (n)
"Parse a certain number of excited apples."
(lambda (offset)
(funcall (count n #'excited-apple) offset)))
(funcall (excited-apples 3) (in "Mālum!Mālum!Mālum!Mālum!"))("Mālum" "Mālum" "Mālum"), 18
So, if your parser is parameterized by some initial argument, it has to return a
lambda that accepts an offset.
You can use ok? and failure? within more complex hand-written parsers to
explicitly test for sub-parser failure, and then react accordingly. Yielding
:fail signals that parsing has failed overall.
(in-package :parcom)
(defun three-sad-pears (offset)
(multiple-value-bind (res next) (funcall (many (string "Pirum trīste")) offset)
(if (or (failure? res)
(< (length res) 3)
(> (length res) 3))
(fail next)
(values res next))))
(three-sad-pears (in "Pirum trīste")):FAIL, 12
Parser function signatures can get quite complex if you write them yourself, but
parcom supplies some shorthands:
(maybe foo)for parser which may fail.(always foo) for parser which always succeed.->for function arguments and return types.fnfor shortening thedeclaimstatement.
All together, this looks like:
(fn any (maybe character))
(defun any (offset)
...)
(fn any-but (-> character (maybe character)))
(defun any-but (c)
...)
(fn any-if (-> (-> character boolean) (maybe character)))
(defun any-if (pred)
...)Especially for higher-order function arguments, this form is significantly shorter.