Regular expressions in Pico Lisp

This is another step towards explaining how a registration form in Pico Lisp can work, I promise you we will get there but there is a lot of ground to cover still.

For an introduction to Pico Lisp you might want to check out the Pico Lisp for beginners series.

Pico Lisp doesn’t have its own regular expression engine but regular expressions are an important part so that leaves us with two options. Link to a C library or create something of our own in Lisp. I’ve heard that the former is really easy and I might try it out if I need to do something more complex someday but for validating form fields we can make do with something simpler and learn more Lisp in the process.

The resultant library is expanding on the match function. If you download the source and put it in /picolisp/lib you can test it out with for example the following:

(load "lib/rgx.l")
(match> '+Rgx "yaddayay@gmail.com" '((word > 0) "{at}" (dmn > 0) "." (ltr > 1 < 5)))

It would return True (T). So instead of proper regular expressions we have a small DSL here. The meaning of the quoted list we use to match the string is the following:

1.) The first slew of characters before the @ sign needs to have a length of at least one and they need to be word characters. I don’t know which non alphanumeric characters are allowed in an email, in this case we allow all of them anyway 🙂
2.) {at} is a replacement for @, we’ll get to the reason for that later.
3.) Next we need at least one character that is allowed in domain names followed by a dot.
4.) Finally there has to be at least 1 and at most 4 letters that are making up the domain type.

Failure to comply with the above will result in a NIL.

Let’s walk the code, and first out is of course the match> method:

(class +Rgx)

(dm match> (Str Ptrn)
  (let Lstr (replace (chop Str) '@ "{at}")
     (=: ptrns (list))
     (when (match (getPtrn> This Ptrn) Lstr)
        (loop 
           (T (= 64 (: pNum)) T)
           (NIL (tstRslt> This (val (intern (pack "@" (char (: pNum)))))))))))

We begin by assigning a chopped (list with single characters in each member) version of the string we want to test to Lstr. We also get rid of @ and replace it with {at}. We create an empty list in the ptrns member variable. The next line is interesting, here we basically piggy back on the global match function, if the overall pattern doesn’t match then match will return NIL and we are already finished. @gmail.com will for instance return NIL here, but $%#$@#$%.234 will return T.

GetPtrn> will return a proper list that can be used with match, in this case ‘(@A{at}@B.@C), in fact if the match is successful each part matched by these @X variables will be bound to them and available in the current space, @A will contain “yaddayay”, @B “gmail” and @C “com”. That’s why we have to replace @ in the address with {at}. Think ${x} in PHP and $x in Ruby.

(dm getPtrn> (Ptrn)
  (=: pNum 64)
  (make
     (while Ptrn 
        (let Cur (pop 'Ptrn)
           (ifn (lst? Cur)
              (link Cur)
              (link (makeLink> This Cur)))))))

(dm makeLink> (Cur) 
  (push (: ptrns) Cur)
  (intern (pack "@" (char (inc (:: pNum))))))

So 64 is the zero point because that is where the capitals start in the ASCII table, as you realize this system can’t handle matches with more than 26 sub expressions and there are no checks for this, it will simply not work properly in that case. However, can you think of a situation where we would need more than 26 subs? Not me. Anyway we are basically increasing pNum by reference to generate @A, @B, @C etc.

Back to match>. As you can see we have a loop going for as long as pNum is bigger than 64 or tstRslt returns T, if tstRslt returns NIL we return NIL and if pNum reaches 64 we return T. As you realize tstRslt is responsible for testing each substring that we now have in @A, @B… Note that makeLink> is pushing the patterns in the DSL list on top of the member variable ptrns, they will be used by tstRslt> later.

(dm tstRslt> (Res)
  (let Clst (pop (: ptrns))
     (and 
        (if (> 2 (length (cdr Clst))) T (len> This Res (cdr Clst)))
        (eval (list (intern (pack (sym (car Clst)) ">")) 'This (lit Res))) 
        (t (dec (:: pNum))))))

We begin with getting the first pattern to test with, in this case (word > 0) and put it in Clst, next we have 3 expression that all have to evaluate to T if we are to return T:
1.) If the length of the cdr (in this case (> 0)) of the expression to test with is lower than 2 we simply return T and move on because that means there is no length argument to test with. However, if the argument is there we pass it on to len> which in turn will return the result of that evaluation.
2.) Next we evaluate the main function in question (in this case (word)), as you can see we make it word> in order to follow Pico Lisp conventions of naming all methods with a right bracket at the end. Anyway we pass the current substring, the first one will be “yaddayay” (in @A) in this case.
3.) The final line is a little bit special, here we use t to first decrease pNum (remember that we stop all this when we get down to 64 again) and then simply return T.

(dm len> (Res Ct)
  (and 
     (eval (list (car Ct) (length Res) (cadr Ct))) 
     (if (= 2 (length Ct)) T (eval (list (caddr Ct) (length Res) (cadddr Ct))))))

Here we have two expressions that both have to evaluate to T in order to get T as a return value:
1.) We create a list and evaluate it with the help of the substring Res(ult) and the control list (Ct) which in this case is (> 0), for the first substring the list will look something like (> 8 0) (8 is the length of “yaddayay”) which will return T.
2.) If the list’s length is not 2 it means that the list is longer than 2 since we must test for a length less than 2 before calling this function. In that case it means we have a test looking like for instance (> 2 < 5) which is the last test we apply to "com". So simply repeat with (< 5) instead of (> 2).

(dm word> (Res)
    (tst> This Res 'nWord>))

(dm nWord> (cn)
    (or (> 33 cn) (= 127 cn)))

(dm tst> (Res F)
  (unless (lst? Res) 
     (setq Res (chop Res)))
  (not (find '((C) (F This (char C))) Res)))

Each type testing function will call tst> with the name of a character testing function, in this case nWord>. I find it easier to examine the “opposite” set than the actual set when determining if something is in the set or not. In this case and with the first substring we get “yaddayay” and ‘nword as arguments to tst>.

We chop the string up and pass each char to the literal function which will expand F automagically. In this case we get ‘((C) (nWord> This (char C))) for the first substring, the result of the search with find will be negated and returned, and that will be the final result of tstRslt> if it is negative.

This little library also has a function for testing single strings without match:

(dm tst-len> (Str Fun Ct)
  (and
     (len> This Str Ct)
     (eval (list (intern (pack (sym Fun) ">")) 'This (lit Str)))))

It will simply use methods we have already walked through, usage example:

(tst-len> '+Rgx "hel#looo" 'alnum '(> 6))

That’s all for this time, I hope this piece have illustrated how easy it is to create a little DSL with Lisp.

Tags: Lisp, Pico Lisp, regular expressions

This entry was posted on Tuesday, July 1st, 2008 at 10:11 am and is filed under Lisp, Pico Lisp. Comments are currently closed.

Regular expressions in Pico Lisp

Related Posts

Archives

Tags

Pages