peg-trees and you

Posted by s.f. on June 11, 2009

Recently at work, I needed to parse ugly data files from an ancient classified-ads database that a predecessor devoted an entire Ruby application to. While painstakingly building regexes, I remembered looking at Treetop a few months ago.

Treetop is a Ruby library for writing Parsing expression grammars. PEGs are another way of constructing grammars and could be thought of as super-regexes: they don’t allow left-lookup or ambiguity in the parse tree, making them not so useful for natural language but killer for computer languages. Around 40 lines of code and 7 rules took the place of what the original author devoted dedicated tempfiles and regex arrow code to.

That being said, PEGs are conceptually harder to get grips on, and Treetop’s documentation is not entirely clear on some hangups you might find. Most of which you can solve using the excellent mailing list, but I know I wished during the past few days that I could get it summed up for me.
Continue reading…