Hanging in the Treetops

I wanted to find a way to parse short urls without making any database calls. Since I have a pattern for short urls I figured, for a bit of fun, a parser would make this possible.

Enter Treetop: Treetop is a language for describing languages. Combining the elegance of Ruby with cutting-edge parsing expression grammars, it helps you analyze syntax with revolutionary ease.” Treetop

The grammar defined is straight forward: (FILE: message_grammar.treetop)

grammar MessageGrammar
 rule message
   [0-9] / 'X' message / ('Y' / 'Z') message message

So here are some valid codes: 0, X0, XY00, XX0, XY09
Invalid codes: T0, P, PPPPP0, X0X0X00

Make sure you’ve got treetop installed. Drop into terminal in the directory of the grammar file and run
tt message_grammar.treetop
This will result in a file called message_grammar.rb which you can include in another file to use as your parser…

File: message_parser.rb

require "rubygems"
require "treetop"
require "polyglot"
require "message_grammar"
# MessgeGrammarParser is a generated Parsing class based on the grammar
# defined in message_grammar.treetop
parser = MessageGrammarParser.new
STDIN.each do |string|
 # for each string, split on whitespace
 string.split(" ").each do |message|
   # print status of whether the message could be parsed or not
   puts "#{message} #{parser.parse(message) ? 'VALID' : 'INVALID'}"

And we’re Done-zo Washington. Parse away.

One Comment

  1. Hey Mike,

    I like how clean Treetop is. Too bad it’s a code file generator though. I was hoping it generated its code dynamically. Any idea why it doesn’t do that?

    Instead of “while string = gets”, I like “STDIN.each do |string|”. More in keeping with Ruby idioms.


Leave a Reply

Your email address will not be published. Required fields are marked *