Friday, May 27, 2005
Boo goes Blogging
I think I've had my Python conversion experience, except with a similar (but distinct) language. The sheer compactness of Boo programs is very seductive. But is shorter always better? At university we got to play with APL, and the game was to print a histogram using one line of code. The results were not easy to read; having to express everything in matrix notation makes some otherwise obvious operations satisfying obscure. Here's the one-liner in the Boo interactive shell, Booish:
>>> arr = (10,2,6,14,20)
(10, 2, 6, 14, 20)
>>> print join(['#'*n for n in arr],'\n')
##########
##
######
##############
####################
The equivalent Python is
print '\n'.join(['#'*n for n in arr]); similar but distinct. Readable? It depends if you're used to list comprehensions. They do correspond nicely to the usual mathematical notation for specifying a set.Formating Blog Entries
One of the things that makes me reluctant to do little articles on code techniques is that formating is painful. HTML is not nice on the fingers, and the simplified markup used in Wikis is much easier. I've written a little program (42 lines) which takes wiki-ish text and outputs the kind of HTML that Blogger is comfortable with.
Here is the Boo program that generated this blog entry. Anything within double braces is set as
code, and triple braces bracket code samples. I've always indicated emphasis with underscores, so that went in as all. And of course we have to replace angle brackets and that kind of stuff.The whole file is read into a string array. Each line is added to a list, and the builtin function
array converts the list into a string array.
# read a text file into a string array
def lines(f as string):
list = []
using inf = File.OpenText(f):
for line in inf:
list.Add(line)
return array(string,list)
Notice that very nice shortcut
for line in inf; Boo automatically turns a stream into an enumerable sequence. I resisted the temptation to do it in two lines, even though it is beautifully Boo-ish:
def lines(f as string):
return array(string,iterator(File.OpenText(f)))
This works, but the file doesn't get closed. It would be nice if this implicit iteration gives the same nice guarantee that
foreach gives to dispose of any IDisposeable objects at the end.
# allow user some nice shortcuts and escape special characters
def massage_line(line as string):
line = /{{(.+?)}}/.Replace(line,'<code>$1</code>')
line = /\s+_(.+?)_/.Replace(line,'<i>$1</i>')
line = /\[(.+?)\|(.+?)\]/.Replace(line,'<a href="$1">$2</a>')
return line
massage_line is perhaps a little hard to read, but regular expressions are like that at first. The first line finds any stuff inside double braces, extracts it, and puts it out inside a code tag. The parentheses define the first group, which is then refered to in the substitution string as $1. The question mark does not indicate puzzlement, but laziness. Normally a regular expression like .+ is 'greedy', which means it tries to find the longest match possible. In this case, we need a 'lazy' match because otherwise it will keep going and find the last closing double brace in the line. All of this applies to any code using .NET regular expressions; the C# would not be very different.The rest of the program is almost mundane:
def escape_characters(line as string):
line = line.Replace('&','&')
line = line.Replace('<','<')
line = line.Replace('>','>')
return line
ll = lines(argv[0])
i = 0
while i < ll.Length:
line = escape_characters(ll[i])
if line.StartsWith('='):
print "<H3>"+line[1:]+"</H3>"
elif line.StartsWith('{{{'):
print "<code><pre>"
++i
while not ll[i].StartsWith('}}}'):
print escape_characters(ll[i])
++i
print "</pre></code>"
else:
line = massage_line(line)
print line
++i
