Hanging in the Treetops
I wanted to find a way to parse short urls without making any database calls. Since I have a pattern for short urls I figured, for a bit of fun, a parser would make this possible.
Enter Treetop: "Treetop is a language for describing languages. Combining the elegance of Ruby with cutting-edge parsing expression grammars, it helps you analyze syntax with revolutionary ease." Treetop
The grammar defined is straight forward: (FILE: message_grammar.treetop)
grammar MessageGrammar rule message [0-9] / 'X' message / ('Y' / 'Z') message message end end
So here are some valid codes: 0, X0, XY00, XX0, XY09
Invalid codes: T0, P, PPPPP0, X0X0X00
Make sure you've got treetop installed. Drop into terminal in the directory of the grammar file and run
tt message_grammar.treetop
This will result in a file called message_grammar.rb which you can include in another file to use as your parser...
File: message_parser.rb
require "rubygems" require "treetop" require "polyglot" require "message_grammar" # MessgeGrammarParser is a generated Parsing class based on the grammar # defined in message_grammar.treetop parser = MessageGrammarParser.new STDIN.each do |string| # for each string, split on whitespace string.split(" ").each do |message| # print status of whether the message could be parsed or not puts "#{message} #{parser.parse(message) ? 'VALID' : 'INVALID'}" end end
And we're Done-zo Washington. Parse away.




July 21st, 2011 - 01:02
Hey Mike,
I like how clean Treetop is. Too bad it’s a code file generator though. I was hoping it generated its code dynamically. Any idea why it doesn’t do that?
Instead of “while string = gets”, I like “STDIN.each do |string|”. More in keeping with Ruby idioms.
-Colin