A Note on Regular Expressions

They’re a bitch 🙂

It seems I was waaay too quick in my bugfixes article. In it I state that

elsif m = /(.*)("[^\\"]*")(.*)/.match(txt)

would solve the “Some escaped \”quotes\” in a string problem”. Of course it doesn’t. Given the following line:

((sym? L)  (link (pack "\"" (esc> '+Str L '("\"")) "\"")))

It will match (in curly brackets): {((sym? L) (link (pack “\”” (esc> ‘+Str L ‘(“\””)) “\} {“”} {)))}. Hardly what we want.

My first impulse was to consider the middle sub-expression (“[^\\”]*”). I wanted to include all types of characters but a double quote, unless that quote didn’t have a backslash in front of it, in which case it should be ignored. However the answer doesn’t lie within the [^\\”] character set, but outside of it:

/(.*)([^\\]".*[^\\]")(.*)/.match(txt)

The above almost fixed the problem, let’s look at the matching pattern of ‘(“\””) to understand why it almost but not quite got me there: {‘} {(“\””} {)}.

Oops, that left parenthesis just got highlighted as a string and since the bracket highlighting is dependent on the style type this will screw it up. A screwed bracket matching situation translates directly into a screwed working situation when coding Lisp. We really need to fix this.

/(.*)([^\\])(".*[^\\]")(.*)/.match(txt)

The above will solve the situation, we get this pattern: {‘} {(} {“\””} {)}. Finally what we want, bad news is that four results will need their own handling since the styleLine in Pico only handles three results. The whole styleMe method now looks like this:

def styleMe(txt, sci, flag = :def)
  if txt && txt.length > 0
    if (m = /(.*)(#.*)/.match(txt)) && (flag != :nocmt)
      a, b = m.captures
      if a.length > 0 && a.scan('"').length % 2 != 0
        self.styleMe(txt, sci, :nocmt)
      else
        self.styleMe(a, sci)
        sci.set_styling(b.length, STC_LISP_COMMENT)
      end
    elsif m = /(.*)([^\\]"")(.*)/.match(txt)
      self.styleLine(m, sci, STC_LISP_STRING)
    elsif m = /(.*)([^\\])(".*[^\\]")(.*)/.match(txt)
      a, b, c, d = m.captures
      self.styleMe(a, sci)
      self.styleTxt(b, sci)
      sci.set_styling(c.length, STC_LISP_STRING)
      self.styleMe(d, sci)
    elsif m = /(.*)('[^()\s]+)(.*)/.match(txt)
      self.styleLine(m, sci, STC_LISP_SYMBOL)
    elsif m = /(.*)([()\s])(.*)/.match(txt)
      if m.captures[1] != ' '
        self.styleLine(m, sci, STC_LISP_OPERATOR)
      else
        self.styleLine(m, sci, STC_LISP_DEFAULT)
      end
    else
      if @project.getConfig(@proj_id, :pico).fetch(:all_keywords).include?(txt.strip)
        sci.set_styling(txt.length, STC_LISP_KEYWORD)
      else
        self.styleTxt(txt, sci)
      end
    end
  end
end

The styleTxt method is new:

def styleTxt(txt, sci)
  if txt.strip =~ /^\d+$/
    sci.set_styling(txt.length, STC_LISP_NUMBER)
  elsif "'*/-+();~,.`".include?(txt.strip)
    sci.set_styling(txt.length, STC_LISP_OPERATOR)
  else
    sci.set_styling(txt.length, STC_LISP_DEFAULT)
  end
end

The left parenthesis above will get caught by the include? check and hence get the needed STC_LISP_OPERATOR style. The statement just above where all this begins in styleMe: elsif m = /(.*)([^\\]””)(.*)/.match(txt) will take care of the empty string edge case: “”, and will also correctly handle “\””.

Finally, hopefully, all is well! 🙂

Update Oct 2008: In turned out that all solutions except Piers‘ below in the comments lead to madness in the end, the styling method now looks like this and is managing all cases.

def styleMe(txt, sci, flag = :def)
  if txt && txt.length > 0
    if (m = /(.*)(#.*)/.match(txt)) && (flag != :nocmt)
      a, b = m.captures
      if a.length > 0 && a.scan('"').length % 2 != 0
        self.styleMe(txt, sci, :nocmt)
      else
        self.styleMe(a, sci)
        sci.set_styling(b.length, STC_LISP_COMMENT)
      end
    elsif m = /([^"]*)("[^\\"]*(?:\\.[^\\"]*)*")(.*)/.match(txt)
      self.styleLine(m, sci, STC_LISP_STRING)
    elsif m = /(.*)('[^()\s]+)(.*)/.match(txt)
      self.styleLine(m, sci, STC_LISP_SYMBOL)
    elsif m = /(.*)([()\s])(.*)/.match(txt)
      if m.captures[1] != ' '
        self.styleLine(m, sci, STC_LISP_OPERATOR)
      else
        self.styleLine(m, sci, STC_LISP_DEFAULT)
      end
    else
      if @project.getConfig(@proj_id, :pico).fetch(:all_keywords).include?(txt.strip)
        sci.set_styling(txt.length, STC_LISP_KEYWORD)
      else
        self.styleTxt(txt, sci)
      end
    end
  end
end

If you look carefully you will see that Piers made himself guilty of a typo, it’s fixed in the code above though.

The editor source has been updated.

Tags: regular expressions, Ruby, wxruby

This entry was posted on Thursday, August 21st, 2008 at 11:59 pm and is filed under Ruby, WxRuby. Comments are currently closed.

A Note on Regular Expressions

Related Posts

Archives

Tags

Pages