Autocomplete in WxRuby’s Scintilla

These are the requirements for the autocompletion feature:


1.) It should only appear on Ctrl+Space, just like in Eclipse.

2.) Because the list is user initiated it must mean that if the list contains only one item it should be safe to automatically select that item and insert it.

3.) The behavior should be case insensitive, if we have defined two functions “Compile” and “compile” Pico Lisp will treat them as two separate functions, the autocomplete should not however. When we write “com” and hit Ctrl+Space we should get a list with both these functions.

The reason for this is that we will have local variables named with initial upper case to separate them from the global namespace, having to write the initial letter in upper caps to get the automatic insertion described in #2 would be a massive drag.

4.) We should be able to set a minimum length for variables from the global namespace that should appear in the list. If I write “ca” and initiate the list I don’t want function names like car to contaminate it. We don’t need autocomplete for so short function names anyway, it’s easier to just write them.

Let’s begin from the beginning in main.rb:

.
menuView.append(6000, "Show autocomplete\tCtrl-Space", "Show autocomplete")
.
evt_menu(6000) {onAutoComplete}
.
def onAutoComplete
  @cur_lexer.onAutoComplete(@sci)
end
.

If you read the prior article you know that we have to do like this because the more straightforward way of using accelerator tables won’t work in GTK. This time I checked the Scintilla key map table carefully. Luckily the Ctrl+Space combo was absent which would mean that we will have better luck this time than with the Shift+Tab combo. I use a high id of 6000 for my custom stuff, probably to avoid conflicts with other stuff, I don’t really know much about this the 6000 number is something I’ve seen in the samples so I’m safing here.

In pico.rb:

def onAutoComplete(sci)
  wrd       = self.getKeyWordStr(sci)
  keywords  = self.getKeyWords(sci, wrd)
  if keywords
    sci.auto_comp_show(wrd.length, keywords)
    sci.auto_comp_select(wrd)
    unless keywords.strip.include? " " then sci.auto_comp_complete end
  elsif sci.auto_comp_active
    sci.auto_comp_cancel
  end
end

def onCharAdded(sci, chr)
  case chr
  when 40
    self.insertParens(sci)
  when 34
    sci.add_text('"')
    sci.goto_pos(sci.get_current_pos - 1)
  when 10 || 13
    self.indentLine(sci, sci.get_current_line, sci.get_current_pos - 1)
  else
    if sci.auto_comp_active then self.onAutoComplete(sci) end
  end
end

wrd contains the partial word we invoke the list on, keywords contains all words that the partial is a sub string of, counting only from the beginning of course.

If we have any keywords we invoke the list, passing the length of our partial and the list of keywords as a space delimited string. We proceed with selecting the best match. If the keywords list doesn’t contain a space we can be sure that it only contains a single word, in that case we automatically insert that word without any active selection having to take place, see #2.

As you see we call onAutoComplete on each key press. So if the keywords list is empty and the list is showing (auto_comp_active) we will hide it (auto_comp_cancel). This is common sense and standard behavior I believe.

I have done some refactoring since the prior article by moving logic from Pico to Lexer in an effort to create a more other-language-friendly development environment. Hence both getKeyWordStr and getKeyWords can be found in Lexer:

def getKeyWords(sci, sub_str)
  unless sub_str.empty?
    arr = @keyword_arr + self.getLocalVars(sci) + self.getGlobals(sci.get_text)
    keywords = arr.find_all{|wrd| 
      wrd.downcase.index(sub_str) == 0 && wrd.downcase != sub_str
    }.uniq.sort.join(" ").strip
    keywords.length > 0 ? keywords : false
  else
    return false
  end
end

sub_str is the partial we will work with. @keyword_arr, getLocalVars and getGlobals will fetch arrays containing words in the global namespace, the current s-expression and the current file respectively.

We will return a list with words that begin with the partial and is not equal to the partial (getLocalVars is pretty stupid at the moment and will return our partial too). We finish off with uniqifying, sorting, joining with a space and finally stripping of possible white spaces at the ends of the resultant string.

getLocalVars is currently defined in Pico since it contains some Pico Lisp specific stuff:

def getLocalVars(sci)
  self.arrTrimWhites(self.getCurBlock(sci).gsub(/"[^"]+"/, '').gsub(/[\(\)]/, '').split(" "))
end

In Lexer:

def arrTrimWhites(arr)
    arr.delete_if{|str| /^\s+$/ =~ str}
  end

We get the current block (s-expression), get rid of all strings with /”[^”]+”/ and all parentheses with /[\(\)]/ and finally split the result with a space as delimiter. The proper way of doing it would have been to check for variable initiators like use and let and parse their arguments, doing the code the way it works now was something like a 30min job though and will give proper results in most cases, in any case it won’t omit anything which is the most important thing. It’s a mess when basically any character is allowed in variable names 🙂 If not we could have just parsed out all words.

In Lexer:

def tLine?(line)
    @tline_rgx.each do |rgx|
      if((rgx =~ line) != nil) then return true end
    end
    return false
  end

  def getTlinePos(sci, direction)
    line_nbr = sci.get_current_line
    to_loop = direction == "down" ? (line_nbr..sci.get_line_count - 1).to_a : (0..line_nbr).to_a.reverse
    to_loop.each do |line_nbr|
      if(self.tLine?(sci.get_line(line_nbr))) then break end
    end
    pos = sci.position_from_line(line_nbr)
    direction == "down" ? sci.get_line_end_position(line_nbr) : pos
  end
  
  def getCurBlock(sci)
    sci.get_text_range(self.getTlinePos(sci, "up"), self.getTlinePos(sci, "down"))
  end

  def getGlobals(text)
    rarr = Array.new
    @globals_rgx.each do |rgx|
      text.scan(rgx){|construct, name| rarr << name}
    end
    return rarr
  end

In getCurBlock we simply use Scintilla’s get_text_range function to get all the text between two different positions, these two positions should correspond to the start and end of the current s-expression.

We find that position by walking up and down, and in each case we look for a line whose contents without a doubt marks the end of an expression. When we find the line in question we break and get the start position of it in case we were walking upwards, if we were walking downwards we have to get the end position. tLine? will test each line with a regex we define in the class, each language will have its own of course:

@auto_comp_limit = 3
@keyword_arr = @keywords.split(" ").find_all{|str| str.length > @auto_comp_limit}
@auto_comp_rgx = /[^\(\)\s]+$/
@tline_rgx = [/^[#\n\r]/, /^\s+$/, /^\s+#/]
@globals_rgx = [/\((de|dm|class)\s+([^\(\)\s]+)/, /(.)(\*[^\(\)\s]+)/]

The above is new stuff since last time we visited the initialize function in Pico. The complete limit takes care of requirement #4 when we create the keyword array of standard globals. @auto_comp_rgx defines the regex we need to slice with in order to get a proper substring for list generation. @tline_rgx contains regular expressions that we use above in tLine? to check if we have a line that terminates an s-expression.

The reason for doing it this way instead of simply matching parentheses in both directions is that sometimes the current s-expression will not be paired up properly. The only way – the way we are doing it now – can break is if we have two s-expressions next to each other, compile-wise that is legal but it’s so far from proper coding conventions as you can get. @globals_rgx will be used to retrieve entries for the autocompletion list in the whole document, in our case that would be stuff sent as arguments to de, dm or class which initializes global functions, class methods and classes respectively.

Another coding convention in Pico Lisp is that global variables should start with an asterisk, hence the look of the second regex in the array.

New stuff in Pico’s loadLexer:

.
    sci.auto_comp_stops("()\s\033'")
    sci.auto_comp_set_fill_ups("\n\r")
    sci.auto_comp_set_ignore_case(true)
end

auto_comp_stops designates – in this case four – characters that will be used to hide the autocomplete list when entered, \033 is escape (having regex docs and an ascii table was very handy when working with this). When we press enter we will terminate and insert the chosen word (auto_comp_set_fill_ups). We will ignore case (see #2 above).

That took care of the keyword list, getting the string used to generate it is easier:

def getKeyWordStr(sci)
  sci.get_text_range(sci.position_from_line(sci.get_current_line), sci.get_current_pos).slice(@auto_comp_rgx)
end

We get the current position and the position where the current line starts with position_from_line followed by a slice with the regex described above.

As you probably have noted already, we currently only work with the current text, the next article will cover the project browser and then we need some way of handling autocompletion for the whole project.

Anyway, so far so good!

WxRuby Scintilla auto complete

Related Posts

Tags: , ,