Wandering around .NET
Monday, May 02, 2005
 
The Boo Programming Language
I've always enjoyed learning new programming languages, espcially those that force a fresh look at things. Boo is a fresh new language by Rodrigo de Oliveira (continuing a Brazillian tradition which marries Python-like syntax with .NET. If Boo did not exist, I would probably have needed to invent it, since I had started to ask similar questions, asking why we needed so many braces, why couldn't the compiler do more type inference, and generally supply more syntactical sugar for common operations; generally, what makes a language agile.

There is a distinction between static and dynamic typing in programming languages. Static typing is the usual situation with compiled languages, where every variable must be declared up front with a definite type. The compiler can then check the program's consistency, and at the least spell-check all variables. A while back, Visual Basic programmers decided that option explicit was the safest option - otherwise new variables are assumed to be variants, which can hold a value of any type. This is exactly what is meant by a dynamic type.

Python is a very no-fuss language, like 'executable pseudo-code':

# a small Python program
def sqr(x):
return x*x

x = 10.0
print sqr(x)

Indentation matters; Python's designer Guido van Rossum believes that if we are going to indent anyway, then make it meaningful. The initialization x = 10.0 creates a variable x and assigns to it a floating point type, which is then passed to sqr(). Continuing the above program, I can assign a string to x and try to call sqr:

x = 'hello'
print sqr(x)

We will then get a runtime error telling us that it's not possible to multiply two strings together. The compiler no longer catches type errors. Pythonistas tend to be very keen on tests, precisely because the compiler gives no real guarantees. They argue in fact that people rely too much on compilers; we need strong testing, not strong typing.

Simple Boo programs



def sqr(x as double):
return x*x

x = 10.0
print sqr(x)

In Boo, the arguments of functions need a definite type, but otherwise generally the compiler can deduce variable types. It's clear here that sqr returns double, and that x is a variable of type double. If we then wrote x = 'hello', it would be a compile-time error, because strings can't be directly converted to numbers.

Boo is a .NET language, so there is full access to the framework classes:

for s in System.IO.Directory.GetFiles('.','*.boo'):
print s

It's easier on the eye to explicitly add namespaces, like in C#. Here the file extension may optionally be passed as a commmand-line argument:

import System.IO
mask = ''
if argv.Length == 0:
mask = '*.boo'
else:
mask = argv[0]
for s in Directory.GetFiles('.',mask):
print s

Boo excels at scripting repetitive tasks. There are many nice little examples that come with the distribution, but I find the big block comment at the top gets in the way of reading. Here's a small and very evil program that makes a copy of a boo file without the copyright notice:

import System.IO

def strip_comment(infile as string, outfile as string):
using rdr = File.OpenText(infile):
# seek the end of the comment region
while s = rdr.ReadLine():
if s.StartsWith("#endregion"):
break
# and write the rest to the other file
using wrtr = File.CreateText(outfile):
while s = rdr.ReadLine():
wrtr.Write(s+'\n')

# assuming we're executed in the examples directory,
# look at all boo source files and write a stripped version
for f in Directory.GetFiles('.','*.boo'):
name = Path.GetFileName(f)
strip_comment(name,'stripped/'+name)

This is very much a C# program without braces; using works exactly the same way as in C# and guarantees that the object is later disposed - in this case, makes sure that the file is closed. This is a productive way to use Boo - as a 'wrist-friendly' C#.

Full access to Windows.Forms means that Boo is well suited to GUI programs:

import System.Windows.Forms from System.Windows.Forms

f = Form(Text: "Hello, Boo!")
b = Button(Text: "Click Me!",Dock: DockStyle.Fill)
f.Controls.Add(b)
b.Click += def(o,e):
print 'clicked!'

Application.Run(f)

We get an extremely large, fully active button to press! Note how object properties can be specified in a constructor call, and note how event handlers can be anonymous functions. The compiler knows that it's an EventHandler in this case.

Here is the C# equivalent. It's twice as large:

using System;
using System.Windows.Forms;

public class ButtonClick{
static void button_Click(object o, EventArgs e) {
Console.WriteLine("clicked!");
}

public static void Main(string[] args) {
Form f = new Form();
Button b = new Button();
b.Text = "Hello, Boo!";
b.Dock = DockStyle.Fill;
b.Click += new EventHandler(button_Click);
f.Controls.Add(b);
Application.Run(f);
}
}


Running Boo



Static typing has two main benefits; firstly, certain kinds of silly errors are harder to make, and programs are more self-documenting. Secondly, better code can be generated. There is no time-consuming dynamic lookup, no frantic search for the methods of an object at runtime. In principle, Boo can generate code as good as C#.

To run Boo you have three options, in order of increasing slackness; the compiler booc, the batch interpreter booi, and an interactive interpreter booish.

C:\boo>booish
>>> i = 10
10
>>> 2*i + 1
21
>>> s = 'hello'
'hello'
>>> s.Substring(0,4)
'hell'
>>> s[0:4]
'hell'
>>> s.Substring(s.Length-1,1)
'o'
>>> s[-1:]
'o'

Boo strings are System.String objects, but Python-style slicing works on them as well. It's certainly easier just to type s[-1:] to get the substring consisting of the last character than the equivalent substring operation!

Arrays and lists can be declared implicitly as well; the difference is that lists can be modified and contain object types. Slicing works with both arrays and lists:

>>> vals = (1,2,3,4)
(1, 2, 3, 4)
>>> vals[1:]
(2, 3, 4)
>>> list = [10,'help',2.3]
[10, 'help', 2.3]
Boo.Lang.List
>>> for i in list:
... print i.GetType()
...
System.Int32
System.String
System.Double
>>> list[1:3]
['help', 2.3]
>>> t = [o.GetType() for o in list]
[System.Int32, System.String, System.Double]


Having a real compiler is useful, which is often not possible with other 'scripting' languages. You can write a quick script to do a job, and then compile it to an executable. Although the resulting assembly depends on boo.dll, you can use http://research.microsoft.com/users/mbarnett/ilmerge.aspx|ILMerge] to make a boo-independent program; the result is only about 45K.

Regular Expressions



Boo has built-in support for regular expressions. Here is a little program which reads from standard input and tries to match lines to a pattern:

# match.boo
pattern = argv[0]
i = 0
for line in System.Console.In:
++i
if line =~ pattern:
print "${i}: ${line}"

The print statement shows Boo's favourite way of constructing output, which is string interpolation. Any expression such as ${i} will be expanded inside a double-quoted string (although not a single-quoted string). Here is this program being exercised:

C:\languages\boo\work>booi match.boo \d$
and so what
20 we go
here we are 1
3: here we are 1
^Z

In this case, the pattern means 'any digit at the end of a line'. This is a useful little program to experiment with regular expressions.

Although every variable in Boo has a well-defined type, it isn't always obvious what that type is. Fortunately, booc has a -vvv option (meaning 'very very verbose') which gives a fascinating glimpse of the compiler's inner ruminations. The lines we're interested in are of the form C:\languages\boo\work\infer.boo(5,1): Type of expression 'xtract' bound to 'System.Text.RegularExpressions.Regex'. Here's a program which extracts this information.


1 import System
2 import System.IO
3
4 defined = {}
5 xtract = @/Type of expression '([a-zA-Z]\w*)' bound to '(.+)'/
6 refpat = @/(.+)\((\d+)/
7 p = shellp('booc.exe','-vvv ' + join(argv))
8 inf = p.StandardError
9 while line = inf.ReadLine():
10 groups = xtract.Match(line).Groups
11 if groups.Count > 1:
12 _,var,val = groups
13 name = var.ToString()
14 if not defined[name]:
15 groups = refpat.Match(line).Groups
16 if groups.Count > 1:
17 _,path,ln = groups
18 file = Path.GetFileName(path.ToString())
19 print "${file}:${ln}: ${name} = ${val}"
20 defined[name] = true


Line 4 creates a map, which is equivalent to a Hashtable. We use defined to avoid mentioning the same variable twice. Line 5 defines a regular expression which will match the lines of interest; line 10 shows it in action. Line 12 shows the automatic 'unpacking' of an array into three new variables. (It's a cute feature, but it can bite you if you're wanting to declare more than one variable on a line.) Lines 7 and 8 show a really cool Boo builtin, shellp, which launches a process and returns a Process object. It's then a simple matter to read the standard error output. Line 6 defines a pattern which is used to extract the file path and line number. And here's what we get:

infer.boo:4: defined = Boo.Lang.Hash
infer.boo:5: xtract = System.Text.RegularExpressions.Regex
infer.boo:6: refpat = System.Text.RegularExpressions.Regex
infer.boo:7: argv = (System.String)
infer.boo:7: p = System.Diagnostics.Process
infer.boo:8: inf = System.IO.StreamReader
infer.boo:9: line = System.String
infer.boo:10: groups = System.Text.RegularExpressions.GroupCollection
infer.boo:13: var = System.Object
infer.boo:13: name = System.String
infer.boo:18: path = System.Object
infer.boo:18: file = System.String
infer.boo:19: ln = System.Object
infer.boo:19: val = System.Object
infer.boo:20: true = System.Boolean


There is of course a bug, which occured to me as I was explaining this code (sometimes known as the rubber duck method of code review). A variable can of course be locally scoped in a function, and really be a different variable. That I think I must leave as an exercise; in the meantime, just comment out line 20. Another cool feature would be an option to generate an annotated listing of the code. That would make hand translation into C# much easier.

Do we Need Another Language?



I appreciate that most people do not learn programming languages for fun but for profit, and so it's a sensible question, particularly if the language is language is young and obscure, and has a silly name. For big projects one has to pick a mature language which is well-known. But any big project involves repetitive actions which can be easily automated. Before serious coding starts, there's invariably some smaller scale experimentation, familiarization with the problem, perhaps prototypes. (I've heard it suggested that prototypes should not be done in the same language as the product, to prevent sloppy code reuse.) Having an agile little language that understands the .NET framework is very useful.

Little scripts can write grown-up code. For example, it's sometimes useful to have a Pair class that wraps up two values with arbitrary types. Obviously a good candidate for a generic type (and in fact the C++ standard library supplies just such a template class) but I'm assuming that this is good old .NET 1.1 here. Here's a script that generates a C# pair class for some given types:


def capitalize(s as string):
return s[0:1].ToUpper() + s[1:]

type1 = argv[0]
type2 = argv[1]
pair_name = "Pair"+capitalize(type1)+capitalize(type2)
print """
public class ${pair_name} {
public ${type1} First;
public ${type2} Second;

public ${pair_name}(${type1} first, ${type2} second) {
First = first;
Second = second;
}
}
"""
...
D:\stuff\boo\work>booi pair.boo int int

public class PairIntInt {
public int First;
public int Second;

public PairIntInt(int first, int second) {
First = first;
Second = second;
}
}


Yes, I know, it should expose properties, but that's a trivial exercise. Boo's doc strings (borrowed from Python) together with string interpolation makes this script very straightforward (in fact it worked the first time.) So a big project can use throw-away code which is part of the private 'implementation' of the project process, and which doesn't need to meet the standards of production code.

It gets more controversial to consider Boo for implementing unit tests, because such code is considered an essential part of the code base, particularly in Test Driven Development (TDD). The designers of Boo have given thought to this, and provide features that make it a good language for writing tests. There is a powerful assert builtin, and Boo integrates with NUnit.
Comments:
Well, the C# code code be a little less lenghty with anonymous delegates and new object syntax:

using System.Windows.Forms;

public class ButtonClick{
static void button_Click(object o, EventArgs e) {
Console.WriteLine("clicked!");
}

public static void Main(string[] args) {
Form f = new Form { Text="Hello, Bool" };
Button b = new Button { Text="Click Me!", Dock=DoackStyle.Fill};
b.Click += delegate(object o, EventArgs e) {
Console.WriteLine("clicked!"); };
f.Controls.Add(b);
Application.Run(f);
}
}
 
Sorry, meant to say:
using System.Windows.Forms;

public static void Main(string[] args) {
Form f = new Form { Text="Hello, Bool" };
Button b = new Button { Text="Click Me!", Dock=DoackStyle.Fill};
b.Click += delegate(object o, EventArgs e) {
Console.WriteLine("clicked!"); };
f.Controls.Add(b);
Application.Run(f);
}
}
 
Post a Comment

<< Home

Powered by Blogger