Wandering around .NET
Tuesday, June 14, 2005
 
CSI Hacks
My little interactive C# interpreter CSI has been coming on nicely. It originally started as a small experiment of about 160 lines; it's now closer to 1200 lines. This happens to all programs, eventually; they take on a life of their own, and suddenly other people want to use them.

The Code Project article gives an overview, but what I wanted to discuss here are some of the cool things that can be done with an extendable C# interpreter.

CSI's preprocessor is very useful in interactive work. For instance, these macro definitions in the .csi file will give you useful new commands:


#def cd(x) Directory.SetCurrentDirectory(#x)
#def pwd(x) Directory.GetCurrentDirectory()
#def ls(x) foreach(string s in Directory.GetFiles(".",#x)) Print(Path.GetFileName(s))
....
# /pwd
(String) 'D:\Downloads\csi'
# /ls *.cs
console.cs
csi.cs
csigui.cs
extension.cs
gui-defs.cs
interpreter.cs
loaded.cs
override.cs
prepro.cs
test.cs
testquotes.cs
# /cd ..
# /pwd
(String) 'D:\Downloads'
# /cd /stuff/boo/work


Now you can always execute these macros directly, e.g. cd("..") but they're easier to type as slash-commands. Please notice that you need to use forward slashes (or double-up your backslashes), since the prepro stringizing operator is not smart enough to do that for you.

Could we not make it more like the shell, and make the slash implicit? But then it's not clear how to distinguish commands from C# statements or expressions; there are too many opportunities for confusion. Why #def, instead of #define? Apart from being easier to type, it reminds us that this is not a full C preprocessor - for instance, there is no #include available. In fact, there's no builtin command for loading code, but it's very easy to create such a command. The compilation context used for all statements in CSI derives from the Utils class, which supplies us useful things like Print and Include. (The last function calls the code responsible for loading the default csi script.) So here's a load command:


val.csi:
/cd /stuff/boo/work
/pwd
...
# #def L(file) Include(#file)
# /L val.csi
(String) 'D:\stuff\boo\work'

CSI makes certain events available to programmers, which makes adding custom functionality possible. For instance, there is no way to list all the currently defined macros. But there is an event which is fired by the MacroSubstitutor object whenever a macro is defined.

...
$_macros = new ArrayList()
void add_macro(string sym) { $_macros.Add(sym); }
$interpreter.macro.MacroDefined += new DefineEvent(add_macro)
void dump_macros() { foreach(string s in $_macros) Print(s); }
#def DM(x) dump_macros()
...
# /DM
DM
cd
pwd
ls
L
FOR
sin
cos
dir
M
MI
PR
ADD_CLOSE
ADD_VAR
OUT
D


CSI exposes much of its internal implementation. In particular, the MacroSubsitutor object is publically available and any public method can be used:


# /dollar
# $mac = $interpreter.macro
# $mac.Lookup("pwd").Subst
(String) 'Directory.GetCurrentDirectory()'
# $mac.Substitute("cd(hello)")
(String) 'Directory.SetCurrentDirectory("hello")'
# $mac.AddMacro("ANSWER","42",null)
# ANSWER
(Int32) 42


There are four interpreter events which CSI scripts can monitor:
- Interpreter.Close
- Interpreter.VarAdding
- Interpreter.FunAdding
- Interpreter.PreprocessLine

For example, you may wish to always save your session at the end:


/var
#def ADD_CLOSE(x) interpreter.Closing += new InterpEvent(x)
void close() { interpreter.Save(null); }
ADD_CLOSE(close)


The other two events are passed the name and the first line of the definition. It can be used to keep track of function definitions:

var _funs = new Hashtable()
void add_fun(string v, string l) {
_funs[v] = l;
}
interpreter.FunAdding += new DeclareEvent(add_fun)
void dump(Hashtable table) { // used to implement the /D command
foreach(string s in table.Keys)
Print(s,table[s]);
}
#def D(x) dump(_##x)

...
# /D funs
dump void dump(Hashtable table) { // used to implement the /D command
# double sqr(double x) { // a suprisingly useful function
. return x*x;
. }
# /D funs
dump void dump(Hashtable table) { // used to implement the /D command
sqr double sqr(double x) { // a suprisingly useful function

Previously I noted that it would be cool if one could dispense with the forward slash when using macro-defined commands. The {{PreprocessLine}} event will pass you the string before it's processed, so you can do just about anything to it. CSI allows you to define classes, like this one

{{{
class CommandProcess {
MacroSubstitutor mac;

string firstWord(string s) {
int idx = s.IndexOf(" ");
if (idx == -1)
idx = s.Length;
return s.Substring(0,idx);
}

string prepro(string line) {
if (! Interpreter.Interactive)
return line;
if (mac.Lookup(firstWord(line)) != null)
line = "/" + line;
return line;
}

public CommandProcess(Interpreter interp) {
interp.PreprocessLine += new LineEvent(prepro);
mac = interp.macro;
}
}
var cp = new CommandProcess(interpreter)


Larger blocks of code like this can be compiled into their own assembly, as long as it references csi.exe (or csigui.exe). Assuming we are in the same directory as csi.exe, then the above class can compiled into an assembly (just remember to put a 'public' in front of the class):


D:\Downloads\csi> csc /t:library CommandProcess.cs /r:csi.exe

...
/r CommandProcess.dll
var cp = new CommandProcess(interpreter)
...


Although this is obviously going to be faster, the disadvantage of loading a class like this is that you can't reload the class, since assemblies can't be unloaded.
Friday, May 27, 2005
 
Boo goes Blogging
I think I've had my Python conversion experience, except with a similar (but distinct) language. The sheer compactness of Boo programs is very seductive. But is shorter always better? At university we got to play with APL, and the game was to print a histogram using one line of code. The results were not easy to read; having to express everything in matrix notation makes some otherwise obvious operations satisfying obscure. Here's the one-liner in the Boo interactive shell, Booish:

>>> arr = (10,2,6,14,20)
(10, 2, 6, 14, 20)
>>> print join(['#'*n for n in arr],'\n')
##########
##
######
##############
####################

The equivalent Python is print '\n'.join(['#'*n for n in arr]); similar but distinct. Readable? It depends if you're used to list comprehensions. They do correspond nicely to the usual mathematical notation for specifying a set.

Formating Blog Entries



One of the things that makes me reluctant to do little articles on code techniques is that formating is painful. HTML is not nice on the fingers, and the simplified markup used in Wikis is much easier. I've written a little program (42 lines) which takes wiki-ish text and outputs the kind of HTML that Blogger is comfortable with.

Here is the Boo program that generated this blog entry. Anything within double braces is set as code, and triple braces bracket code samples. I've always indicated emphasis with underscores, so that went in as all. And of course we have to replace angle brackets and that kind of stuff.

The whole file is read into a string array. Each line is added to a list, and the builtin function array converts the list into a string array.


# read a text file into a string array
def lines(f as string):
list = []
using inf = File.OpenText(f):
for line in inf:
list.Add(line)
return array(string,list)


Notice that very nice shortcut for line in inf; Boo automatically turns a stream into an enumerable sequence. I resisted the temptation to do it in two lines, even though it is beautifully Boo-ish:


def lines(f as string):
return array(string,iterator(File.OpenText(f)))


This works, but the file doesn't get closed. It would be nice if this implicit iteration gives the same nice guarantee that foreach gives to dispose of any IDisposeable objects at the end.


# allow user some nice shortcuts and escape special characters
def massage_line(line as string):
line = /{{(.+?)}}/.Replace(line,'<code>$1</code>')
line = /\s+_(.+?)_/.Replace(line,'<i>$1</i>')
line = /\[(.+?)\|(.+?)\]/.Replace(line,'<a href="$1">$2</a>')
return line


massage_line is perhaps a little hard to read, but regular expressions are like that at first. The first line finds any stuff inside double braces, extracts it, and puts it out inside a code tag. The parentheses define the first group, which is then refered to in the substitution string as $1. The question mark does not indicate puzzlement, but laziness. Normally a regular expression like .+ is 'greedy', which means it tries to find the longest match possible. In this case, we need a 'lazy' match because otherwise it will keep going and find the last closing double brace in the line. All of this applies to any code using .NET regular expressions; the C# would not be very different.

The rest of the program is almost mundane:


def escape_characters(line as string):
line = line.Replace('&','&amp;')
line = line.Replace('<','&lt;')
line = line.Replace('>','&gt;')
return line

ll = lines(argv[0])
i = 0
while i < ll.Length:
line = escape_characters(ll[i])
if line.StartsWith('='):
print "<H3>"+line[1:]+"</H3>"
elif line.StartsWith('{{{'):
print "<code><pre>"
++i
while not ll[i].StartsWith('}}}'):
print escape_characters(ll[i])
++i
print "</pre></code>"
else:
line = massage_line(line)
print line
++i

Monday, May 02, 2005
 
The Boo Programming Language
I've always enjoyed learning new programming languages, espcially those that force a fresh look at things. Boo is a fresh new language by Rodrigo de Oliveira (continuing a Brazillian tradition which marries Python-like syntax with .NET. If Boo did not exist, I would probably have needed to invent it, since I had started to ask similar questions, asking why we needed so many braces, why couldn't the compiler do more type inference, and generally supply more syntactical sugar for common operations; generally, what makes a language agile.

There is a distinction between static and dynamic typing in programming languages. Static typing is the usual situation with compiled languages, where every variable must be declared up front with a definite type. The compiler can then check the program's consistency, and at the least spell-check all variables. A while back, Visual Basic programmers decided that option explicit was the safest option - otherwise new variables are assumed to be variants, which can hold a value of any type. This is exactly what is meant by a dynamic type.

Python is a very no-fuss language, like 'executable pseudo-code':

# a small Python program
def sqr(x):
return x*x

x = 10.0
print sqr(x)

Indentation matters; Python's designer Guido van Rossum believes that if we are going to indent anyway, then make it meaningful. The initialization x = 10.0 creates a variable x and assigns to it a floating point type, which is then passed to sqr(). Continuing the above program, I can assign a string to x and try to call sqr:

x = 'hello'
print sqr(x)

We will then get a runtime error telling us that it's not possible to multiply two strings together. The compiler no longer catches type errors. Pythonistas tend to be very keen on tests, precisely because the compiler gives no real guarantees. They argue in fact that people rely too much on compilers; we need strong testing, not strong typing.

Simple Boo programs



def sqr(x as double):
return x*x

x = 10.0
print sqr(x)

In Boo, the arguments of functions need a definite type, but otherwise generally the compiler can deduce variable types. It's clear here that sqr returns double, and that x is a variable of type double. If we then wrote x = 'hello', it would be a compile-time error, because strings can't be directly converted to numbers.

Boo is a .NET language, so there is full access to the framework classes:

for s in System.IO.Directory.GetFiles('.','*.boo'):
print s

It's easier on the eye to explicitly add namespaces, like in C#. Here the file extension may optionally be passed as a commmand-line argument:

import System.IO
mask = ''
if argv.Length == 0:
mask = '*.boo'
else:
mask = argv[0]
for s in Directory.GetFiles('.',mask):
print s

Boo excels at scripting repetitive tasks. There are many nice little examples that come with the distribution, but I find the big block comment at the top gets in the way of reading. Here's a small and very evil program that makes a copy of a boo file without the copyright notice:

import System.IO

def strip_comment(infile as string, outfile as string):
using rdr = File.OpenText(infile):
# seek the end of the comment region
while s = rdr.ReadLine():
if s.StartsWith("#endregion"):
break
# and write the rest to the other file
using wrtr = File.CreateText(outfile):
while s = rdr.ReadLine():
wrtr.Write(s+'\n')

# assuming we're executed in the examples directory,
# look at all boo source files and write a stripped version
for f in Directory.GetFiles('.','*.boo'):
name = Path.GetFileName(f)
strip_comment(name,'stripped/'+name)

This is very much a C# program without braces; using works exactly the same way as in C# and guarantees that the object is later disposed - in this case, makes sure that the file is closed. This is a productive way to use Boo - as a 'wrist-friendly' C#.

Full access to Windows.Forms means that Boo is well suited to GUI programs:

import System.Windows.Forms from System.Windows.Forms

f = Form(Text: "Hello, Boo!")
b = Button(Text: "Click Me!",Dock: DockStyle.Fill)
f.Controls.Add(b)
b.Click += def(o,e):
print 'clicked!'

Application.Run(f)

We get an extremely large, fully active button to press! Note how object properties can be specified in a constructor call, and note how event handlers can be anonymous functions. The compiler knows that it's an EventHandler in this case.

Here is the C# equivalent. It's twice as large:

using System;
using System.Windows.Forms;

public class ButtonClick{
static void button_Click(object o, EventArgs e) {
Console.WriteLine("clicked!");
}

public static void Main(string[] args) {
Form f = new Form();
Button b = new Button();
b.Text = "Hello, Boo!";
b.Dock = DockStyle.Fill;
b.Click += new EventHandler(button_Click);
f.Controls.Add(b);
Application.Run(f);
}
}


Running Boo



Static typing has two main benefits; firstly, certain kinds of silly errors are harder to make, and programs are more self-documenting. Secondly, better code can be generated. There is no time-consuming dynamic lookup, no frantic search for the methods of an object at runtime. In principle, Boo can generate code as good as C#.

To run Boo you have three options, in order of increasing slackness; the compiler booc, the batch interpreter booi, and an interactive interpreter booish.

C:\boo>booish
>>> i = 10
10
>>> 2*i + 1
21
>>> s = 'hello'
'hello'
>>> s.Substring(0,4)
'hell'
>>> s[0:4]
'hell'
>>> s.Substring(s.Length-1,1)
'o'
>>> s[-1:]
'o'

Boo strings are System.String objects, but Python-style slicing works on them as well. It's certainly easier just to type s[-1:] to get the substring consisting of the last character than the equivalent substring operation!

Arrays and lists can be declared implicitly as well; the difference is that lists can be modified and contain object types. Slicing works with both arrays and lists:

>>> vals = (1,2,3,4)
(1, 2, 3, 4)
>>> vals[1:]
(2, 3, 4)
>>> list = [10,'help',2.3]
[10, 'help', 2.3]
Boo.Lang.List
>>> for i in list:
... print i.GetType()
...
System.Int32
System.String
System.Double
>>> list[1:3]
['help', 2.3]
>>> t = [o.GetType() for o in list]
[System.Int32, System.String, System.Double]


Having a real compiler is useful, which is often not possible with other 'scripting' languages. You can write a quick script to do a job, and then compile it to an executable. Although the resulting assembly depends on boo.dll, you can use http://research.microsoft.com/users/mbarnett/ilmerge.aspx|ILMerge] to make a boo-independent program; the result is only about 45K.

Regular Expressions



Boo has built-in support for regular expressions. Here is a little program which reads from standard input and tries to match lines to a pattern:

# match.boo
pattern = argv[0]
i = 0
for line in System.Console.In:
++i
if line =~ pattern:
print "${i}: ${line}"

The print statement shows Boo's favourite way of constructing output, which is string interpolation. Any expression such as ${i} will be expanded inside a double-quoted string (although not a single-quoted string). Here is this program being exercised:

C:\languages\boo\work>booi match.boo \d$
and so what
20 we go
here we are 1
3: here we are 1
^Z

In this case, the pattern means 'any digit at the end of a line'. This is a useful little program to experiment with regular expressions.

Although every variable in Boo has a well-defined type, it isn't always obvious what that type is. Fortunately, booc has a -vvv option (meaning 'very very verbose') which gives a fascinating glimpse of the compiler's inner ruminations. The lines we're interested in are of the form C:\languages\boo\work\infer.boo(5,1): Type of expression 'xtract' bound to 'System.Text.RegularExpressions.Regex'. Here's a program which extracts this information.


1 import System
2 import System.IO
3
4 defined = {}
5 xtract = @/Type of expression '([a-zA-Z]\w*)' bound to '(.+)'/
6 refpat = @/(.+)\((\d+)/
7 p = shellp('booc.exe','-vvv ' + join(argv))
8 inf = p.StandardError
9 while line = inf.ReadLine():
10 groups = xtract.Match(line).Groups
11 if groups.Count > 1:
12 _,var,val = groups
13 name = var.ToString()
14 if not defined[name]:
15 groups = refpat.Match(line).Groups
16 if groups.Count > 1:
17 _,path,ln = groups
18 file = Path.GetFileName(path.ToString())
19 print "${file}:${ln}: ${name} = ${val}"
20 defined[name] = true


Line 4 creates a map, which is equivalent to a Hashtable. We use defined to avoid mentioning the same variable twice. Line 5 defines a regular expression which will match the lines of interest; line 10 shows it in action. Line 12 shows the automatic 'unpacking' of an array into three new variables. (It's a cute feature, but it can bite you if you're wanting to declare more than one variable on a line.) Lines 7 and 8 show a really cool Boo builtin, shellp, which launches a process and returns a Process object. It's then a simple matter to read the standard error output. Line 6 defines a pattern which is used to extract the file path and line number. And here's what we get:

infer.boo:4: defined = Boo.Lang.Hash
infer.boo:5: xtract = System.Text.RegularExpressions.Regex
infer.boo:6: refpat = System.Text.RegularExpressions.Regex
infer.boo:7: argv = (System.String)
infer.boo:7: p = System.Diagnostics.Process
infer.boo:8: inf = System.IO.StreamReader
infer.boo:9: line = System.String
infer.boo:10: groups = System.Text.RegularExpressions.GroupCollection
infer.boo:13: var = System.Object
infer.boo:13: name = System.String
infer.boo:18: path = System.Object
infer.boo:18: file = System.String
infer.boo:19: ln = System.Object
infer.boo:19: val = System.Object
infer.boo:20: true = System.Boolean


There is of course a bug, which occured to me as I was explaining this code (sometimes known as the rubber duck method of code review). A variable can of course be locally scoped in a function, and really be a different variable. That I think I must leave as an exercise; in the meantime, just comment out line 20. Another cool feature would be an option to generate an annotated listing of the code. That would make hand translation into C# much easier.

Do we Need Another Language?



I appreciate that most people do not learn programming languages for fun but for profit, and so it's a sensible question, particularly if the language is language is young and obscure, and has a silly name. For big projects one has to pick a mature language which is well-known. But any big project involves repetitive actions which can be easily automated. Before serious coding starts, there's invariably some smaller scale experimentation, familiarization with the problem, perhaps prototypes. (I've heard it suggested that prototypes should not be done in the same language as the product, to prevent sloppy code reuse.) Having an agile little language that understands the .NET framework is very useful.

Little scripts can write grown-up code. For example, it's sometimes useful to have a Pair class that wraps up two values with arbitrary types. Obviously a good candidate for a generic type (and in fact the C++ standard library supplies just such a template class) but I'm assuming that this is good old .NET 1.1 here. Here's a script that generates a C# pair class for some given types:


def capitalize(s as string):
return s[0:1].ToUpper() + s[1:]

type1 = argv[0]
type2 = argv[1]
pair_name = "Pair"+capitalize(type1)+capitalize(type2)
print """
public class ${pair_name} {
public ${type1} First;
public ${type2} Second;

public ${pair_name}(${type1} first, ${type2} second) {
First = first;
Second = second;
}
}
"""
...
D:\stuff\boo\work>booi pair.boo int int

public class PairIntInt {
public int First;
public int Second;

public PairIntInt(int first, int second) {
First = first;
Second = second;
}
}


Yes, I know, it should expose properties, but that's a trivial exercise. Boo's doc strings (borrowed from Python) together with string interpolation makes this script very straightforward (in fact it worked the first time.) So a big project can use throw-away code which is part of the private 'implementation' of the project process, and which doesn't need to meet the standards of production code.

It gets more controversial to consider Boo for implementing unit tests, because such code is considered an essential part of the code base, particularly in Test Driven Development (TDD). The designers of Boo have given thought to this, and provide features that make it a good language for writing tests. There is a powerful assert builtin, and Boo integrates with NUnit.
Tuesday, April 19, 2005
 
The SciTE Programmer's Editor
A good programmer's editor is an essential tool. It's amazing what people have done just using Notepad, and sometimes that's all you need (use a good replacement for a superior driving experience). To do serious work, you need a text editor which understands how code works.

Programmers have strong opinions about editors, which is actually not too suprising considering that it's basically where they live when coding. Some swear by EMACS, but I can't remember all the key bindings and customizing it is a lot of work. If I'm not in Visual Studio, then I'm probably using SciTE. It understands most languages, including HTML, does syntax highlighting, folding, automatic indentation, all the standard stuff. It's completely free (with a liberal Python-style licence), is written in a clean and maintainable C++, and runs on both Windows and GTK. It is compact and efficient, which are some of my favourite software virtues. And it's scriptable in Lua.

Languages like C# are pretty verbose, which isn't a problem for reading, but tedious to type. SciTE allows you to specify abbreviations; edit the abbrev.properties file (Options|Open Abbreviations File) and put in these lines:


wl=Console.WriteLine(|)
p=public |
ps=public static |
if=if (|) {\n}\n
cl=public class | {\n\n}\n
sum=/// \n | ///


To insert an abbreviation, type it followed by ctrl-B. After the text has been inserted, the cursor position will be left at the spot indicated by '|'. (This does mean that abbreviations can't contain '|'.) '\n' indicates a newline as usual. You will find that typing 'wl'(ctrl+B) is much easier on the fingers than 'Console.WriteLine()'.

Scripting with SciTE



SciTE scripting actually got me into Lua, which is a little language which comes from Brazil. Here I want to give a flavour of how SciTE scripts can make common coding tasks easier. The first example is declarating and initializing variables in C#, where there is a lot of repetition involved:


ArrayList list = new ArrayList();
int[] arr = new int[30];


What I want is a command which will complete a declaration, so after I type ArrayList list followed by Ctrl+D, the editor will fill out the redundant details. In the case of array declarations it can't of course judge the size, but will leave the cursor between the square brackets. I'm assuming that the C# declaration is on a fresh line, so that the first word will be the type (I'm assuming that you don't leave spaces in array types like 'int[]') which is extracted from the line by a Lua regular expression. If it was not an array, then it needs an extra '()' to make it a constructor call.


scite_Command 'Complete Declaration|complete_declaration()|Ctrl+d'

function complete_declaration()
local line = editor:GetCurLine()
-- fetch the first non-blank token on the line
local s1,s2,classname = string.find(line,'%s*([^%s]+)')
if not s1 then return end
local init = ' = new '..classname
-- if it ends with ']', then we don't have to make it a constructor call
local was_array = string.sub(classname,-1) == ']'
if not was_array then
init = init..'()'
end
init = init..';'
-- this will leave the cursor at the end of the inserted text
editor:AddText(init)
-- generally, people are going to have to specify a size for the array
-- so put the cursor inside the []
if was_array then
editor:CharLeft();
editor:CharLeft();
end
end


To read this code, mentally replace '%' by '\' in regular expressions (Lua uses '%' because strings may contain C-style escapes.) Note that string concatenation is '..', not '+', and that methods (like editor:GetCurLine()) are called using ':' and not '.'. The standard libraries are inside tables, and do use '.', which is the equivalent of being inside a namespace. Otherwise Lua has a fairly standard BASIC-like syntax. It is considered good style to make your variables explicitly local; otherwise variables are implicitly considered global.

This macro uses extman, which is a Lua script that simplifies extending SciTE. Please find the latest version here, which also includes the example C# scripts here as cs.lua. Extract this zip file to your SciTE program directory (please ensure that the directory structure is preserved), and place the following line in your global properties file (Options|Open Global Options File):


ext.lua.startup.script=extman.lua


There will be a subdirectory called scite_lua, which contains the actual scripts. To add new scripts, merely add .lua files to this subdirectory. (They will only become active when SciTE is restarted.) Any new commands will be available from the Tools menu.

Using Ctags



Exuberant Ctags is one of those tools that make life much easier for programmers. The command ctags *.cs executed in your source directory will create a file tags which contains a sorted list of definitions, or tags, that editors can use to navigate around in code. The SciTE interface to Ctags is written completely in Lua, and supplies a 'find tag' command (Ctrl+., that's a period) and 'go to mark' command (Alt+.). You may go to the definition of a class or a field using ctrl+., and this automatically sets a mark. So getting back to your original place in the code is simply Alt+. You can always directly 'push' a mark explicitly with (Ctrl+,) This is useful before a Find operation. To use ctags, make sure that extman is configured as above and ctags has been executed, and put this in your properties file:


ctags.path.cxx=/your_source_directory/tags


Using Macros



Last year I wrote a simple macro preprocessor which expands macros in the current buffer, rather like abbreviations. (If you downloaded the latest version of extman, it will be already present.) For example, I've always found the C-style for-loop tedious to type. If I put this in my global properties file


macro.subst.1=for(i,n)=for(int i = 0; i < n; i++)


and type for(k,10), followed by 'Expand Macro' (Alt+Enter), then a single-pass expansion takes place and for(k = 0; k < 10; k++) is inserted into the buffer. (And this all took only 200-odd lines of Lua.) The most interesting aspect of this macro facility is that you can use it to call Lua code. Any function in a macro definition that begins with '$' is assumed to be an available Lua function; I've supplied a few convenient functions like $eval, which evaluates Lua expressions, $cat to do token-pasting, and $quote to do 'stringizing'.


macro.subst.2=date=$eval(os.date())
macro.subst.3=cs(item)=case $quote(item): return item;


Typing date(Alt+Enter) will insert the current date and time into your document, and typing cs'FINISH'(Alt+Enter) will insert 'case "FINISH": return FINISH'. (In the Lua spirit, parentheses around a single string parameter are not necessary.) And of course you can write your own Lua functions to be called. That makes these macros a way to execute arbitrary code which may have side effects (and why not?) They may not even result in any substitution other than the empty string! I think of it as a way to deal with the interface problem that any extendable system has. SciTE doesn't currently allow users to arbitrarily configure the menu structure. All user commands end up on the Tools Menu, although with the 1.64 preliminary release one can now add items to the context menu. There are a limit to how many hot keys a normal user (such as myself) can remember, and besides Scintilla has already already mapped most of these onto editing functions. So mnemonic short codes such as used in Abbreviations and Macros make sense, especially since programmer's minds are already pretty good at remembering APIs.

Generating Public Properties



Sometimes good practice involves extra work. For instance, data fields should be private, and be exposed as public properties. This macro simplifies the job of generating public properties:


function Prop(type,var)
local first_char = string.sub(var,1,1)
first_char = string.upper(first_char)
local prop_name = first_char..string.sub(var,2)
local text = [[
$type $var;

public $type $prop_name {
get { return $var; }
set { $var = value; }
}
]]
local gsub = string.gsub
text = gsub(text,"$type",type)
text = gsub(text,"$var",var)
text = gsub(text,"$prop_name",prop_name)
return text
end

add_macro 'prop(type,var)=$Prop(type,var)'


In Lua, everything between [[ and ]] is considered a string; it's almost exactly what C# achieves with @"...". The template string contains placeholders starting with '$', and string.gsub does the substitutions. With this definition, prop(int,width) would expand using Alt+Enter to:


int width;

public int Width {
get { return width; }
set { width = value; }
}


Generating Strongly-typed Containers



Strongly-typed containers are not only more convenient but make your code clearer. It becomes a compile-time error to attempt to add objects of the wrong type, and objects can be extracted without typecasts. They unfortunately involve a lot of typing. Creative laziness involves getting the computer to do the typing involved in strong typing. First, we define a function which generates a new source file containing the collection.


local template = [[
using System;
using System.Collections;

public class $List : CollectionBase {
public int Add($ s) {
return List.Add(s);
}

public void Remove($ s) {
List.Remove(s);
}

public $ this[int i] {
get { return ($)(List[i]); }
}

// this is non-standard, but rather useful!
public $[] AsArray() {
$[] arr = new $[List.Count];
List.CopyTo(arr,0);
return arr;
}
}
]]

function list(type)
local classname = type..'List'
local filename = classname..'.cs'
-- does this class already exist?
local f = io.open(filename,'r')
if f then
f:close()
print(filename..' already exists')
return
end

local body = string.gsub(template,'%$',type);
local f = io.open(filename,'w')
f:write(body)
f:close()
return ''
end

scite_require 'macro.lua'

add_macro 'list(name)=$list(name)'


Say I had a type Customer (which seems the traditional name to use in these examples); I would type list(Customer) in the editor, and then expand the macro with Alt+Enter. The result of the expansion is the empty string, but it has a side-effect of generating a source file CustomerList.cs. This you will still have to add manually to your project, unless you are a fan of csc /recurse:*.cs.

Inserting Custom Comments



Continuing the theme of macros which have side-effects, consider the following need: we wish to enter dated notes into source, but also keep track of these notes. Typing note'please remember to change this!' followed by Alt+Enter results in //*note 04/07/05 sdonovan: please remember to change this! and updates a text file with the comment and the file and line number.


local filename
function set_filename(f)
filename = f
end

scite_OnSwitchFile(set_filename)
scite_OnOpen(set_filename)

function current_line_number()
return editor:LineFromPosition(editor.CurrentPos)+1
end

function note(msg)
local comment = '//*note '..os.date('%d/%m/%y')..' '.. os.getenv("USERNAME")..':'..msg
local f = io.open('comments.txt','a')
f:write(comment..'\n')
f:write(filename..':'..current_line_number()..'\n')
f:close()
return comment
end

add_macro 'note(msg)=$note(msg)'


There's no direct way to find the file currently being edited, but one can subscribe to SciTE events (here managed by extman) which keep us up to date. io.open takes a second argument which works just like the C function fopen.
Wednesday, April 06, 2005
 
The Four Non-negotiables of C++
The best way to understand C++'s strengths and weaknesses is to look at four major design principles, which I've called the Four Non-Negotiables because they are essential to what C++ has become and will not be compromised.

Non-Negotiable #1: C Compatibility



Bjarne Stroustrup is quite happy with C programmers using C++ as 'a better C', and in fact a lot of modern C comes from C++, like the insistence on function prototypes and use of const, etc. (The C99 standard will even allow declarations to appear inside for-loops!) He seems more keen than ever to preserve compatibility in things like mathematical libraries, etc. The most important part of this is not the syntax (after all, C# and Java share most of the syntax, but are very different beasts) but certain old slack C conversions. For example, the condition of an if-statement can be a pointer or an integer, as well as the bool type. This makes it easy to write nasty logic bugs. C++ continues to allow raw access to memory, and people continue to run over the ends of arrays and cause obscure crashes (as opposed to the definite crash you would get in other languages.)

Perhaps the most important thing is the way C++ programs are built. They compile to standard object files and are linked almost exactly like C programs. C++ mangles symbols so that an essentially dumb linker can tell the difference between overloaded functions, and when an attempt is being made to link to a function with a wrong signature. (You may be forgiven for thinking that it was to make life difficult for debuggers.) It remains trivial to link C code with C++ programs.

The upshot of this is that C++ still uses a method of creating executables that grew up with Unix in the Seventies.

Non-Negotiable #2: Minimal Core



This may come across as a joke, but C++ remains a small language in the sense that C is; a C++ program requires very little from the runtime library. The minimal library support is for the new and delete operators, and for exception handling, and even that is optional. Contrast this with older languages like FORTRAN, which had built-in statements for doing IO, etc. So it remains possible to make very small executables for embedded environments, etc. Everything else is libraries, and there sure are a lot of those.

This has consequences. For example, it's unlikely that a C++ equivalent of C#'s foreach statement would be accepted, because foreach must make assumptions about libraries. In the case of C# it must assume that there are a pair of interfaces called IEnumerable and IEnumerator; the container object must implement IEnumerable, which involves generating an iterator object which implements IEnumerator. In the case of C++, such a statement would probably have to assume that the container had STL-style iterators (begin(),end(), etc). But that would introduce a detailed dependence of the language on particular libraries, and that would probably be unacceptable.

Non-Negotiable #3: Abstraction and Strong Typing



C++ can sing in the hands of a master because code can be written at the level of the problem. The art is generating a new idiom that can precisely and succintly express the solution. So C++ allows the programmer almost complete power to define a type's behaviour, its conversions to and from other types, what operators will mean, etc. Most of the language's complexity comes from this machinery, and so it is much more difficult to write libraries than to use them. (Which is how it should be.). From a Java or C# perspective, C++ seems more weakly typed - it will implicitly convert from a double to an int and only issue a warning. But in important matters there are few allowed conversions.

Non-Negotiable #4: Performance



One of Stroustrup's mottos is "you don't pay for what you don't use".

The C++ ideal is to not have to pay for abstraction. For instance, OOP doctrine tells us that we should access class data using accessor functions, to allow that class freedom to change its implementation later. In C++, such simple functions will be inlined and so there's no penalty for their use. (This isn't a unique virtue, since Java does this also). So a programmer can make decisions based on a higher-level design and not be constrained by performance issues. An example of C++'s performance mindset is that function calls are not virtual unless specified otherwise because of the slight extra cost and (most importantly) the difficulty of inlining virtual functions. Typecasts in C++ are all static except for dynamic_cast.

Very powerful template techiques have been developed (e.g Blitz) that allow vector and matrix operations which are essentially inlineable.

All obsessions have a downside. An over-emphasis on speed and strong typing results in bloated executables. Most of the standard library is templatized and much of it gets inlined by default. Such all-over optimizations probably even effect ultimate performance because of cache overload.
Tuesday, April 05, 2005
 
Why is C# operator overloading restricted?
Occaisionally it makes sense to borrow a style from somehwere else. For example, I come from a C++ background and would be comfortable with this way of reading items from standard input:

CInStream cin = new CInStream(Console.In);
double x,y,z;
cin >> out x >> out y >> out z;

There must be an 'out', because C# requires references to be specially marked, but otherwise it looks rather like the iostreams C++ library.

Quite apart from whether you like this style, it's a non-starter. First, the second argument of operator>> must be an integer. Clearly the designers felt that right-shift operators should not be arbitrarily redefined. Second, C# will not allow any old expression as a statement; to quote the ultimate authority (CSC) "error CS0201: Only assignment, call, increment, decrement, and new object expressions can be used as a statement". As the SDK comments, "The compiler generates an error when it encounters a meaningless statement". Third, the compiler gets very confused if you use 'out' in this context; it may only appear in function call lists and declarations.

Apart from the third problem (which comes from an insufficiently general grammar) these are deliberate restrictions. It would probably be rather easy to relax them. One can't second-guess an arbitrary expression in the presence of operator overloading, since such overloads may have side-effects. My question is, if one is going to have operator overloading, why not make it a first-class property of the language and allow it to be used without restrictions?

Operator overloading seems to have a bad press. Java is famous for not allowing it, and is generally against 'syntactical sugar' (except for '+' meaning string concatenation, of course). In a sample set of Java interview questions we get the received opinion: "Because C++ has proven by example that operator overloading makes code almost impossible to maintain."

My feeling is that imposing rules is not the job of the language. If you are the responsible adult in a programming project, it is appropriate for you to enforce rules, and one of those rules may be 'no operator overloading unless you can really convince me otherwise'. (I'm sure a sufficiently sophisticated version control system could enforce such code restrictions)

Part of the problem is that the naming problem is particularly intense with operators. The existing C-style operators all have very specific meanings, and moving far from these is going to confuse people. Operator abuse is often just a case of badly-named methods.

My guess is that the designers of C# wanted to restrict operator overloading so that it would be used for its most natural use, which is expressing arithmetric operations and comparisons for value types. There is a lot to be said for this; matrices, vectors, complex numbers, etc all define multiplication and it is convenient to use standard mathematical notation. But extra freedom could be very useful - in moderation.
Monday, April 04, 2005
 
Reading numbers from a file in C#

This may seem like an elementary topic in .NET programming,
but it's not as obvious as it seems at first. It is of
course easy to read in lines from a text file:


TextReader rdr = File.OpenText(file);
string line;
while ((line = rdr.ReadLine()) != null)
Console.WriteLine(line);
}
rdr.Close();

Getting numbers from these lines is the part that isn't
obvious; TextReader provides us with no help here. We have
to split the string into its parts, and convert each one
of these to a number. String.Split() appears to do the
trick:


string[] fields = line.Split(new char[]{' '});


It's less awkward to use the default:

string[] fields = line.Split(null);

But the result is the same. Like the similar function in
Visual Basic, Split will give us blank fields if there's
more than one space between fields.

Apparently, Split will be overhauled for the next 2.0 .NET
release, but we have to work with what we have now. The
regular expression classes in System.Text.RegularExpressions
give a very powerful way to split strings.


Regex spaces = new Regex(@"\s+");
string[] fields = spaces.Split(line);

The regular expression '\s+' is our delimiter and means
'more than one whitespace character' (it will apply to tabs
as well as spaces, for instance). This indeed does the
job.

To convert a string to a number is easy, but you must always
be prepared for a bad conversion exception. Here the exception
is just eaten up:

 
try {
val = double.Parse(str);
} catch {
val = 0.0;
}

This is getting complicated, so I'm going to define a class Parser
which handles the details. Here is how Parser is meant to be used:


TextReader rdr = File.OpenText("test.txt");
Parser p = new Parser(rdr);
double[] values;
while ((values = p.ReadFloats()) != null) {
double x = values[0];
float y = (float)values[1];
int i = (int)values[2];
}
rdr.Close();

The full definition of Parser is:


using System;
using System.IO;
using System.Text.RegularExpressions;

public class Parser {
static Regex spaces = new Regex(@"\s+");
TextReader rdr;

public Parser(TextReader tr) {
rdr = tr;
}

public string[] ReadStrings() {
string line = rdr.ReadLine();
if (line == null)
return null;
return spaces.Split(line);
}

public double[] ReadFloats() {
string[] fields = ReadStrings();
if (fields == null)
return null;
int istart = (fields[0].Length == 0) ? 1 : 0;
double[] obj = new double[fields.Length - istart];
for (int i = 0; i < obj.Length; i++) {
try {
obj[i] = double.Parse(fields[i+istart]);
} catch {
obj[i] = 0;
}
}
return obj;
}
}

There is a gotcha in ReadFloats(); if the line begins with
space, then the first string field will be empty. Float conversion
errors are ignored, which you may not like; it's easy to fix
that.

This class generates a lot of temporary arrays, so I expected
it to be fairly slow. Here for comparison is a similar C++
program:


#include
#include
using namespace std;

int main()
{
float x,y,z;
ifstream in("test.txt");
int k = 0;
while (in >> x >> y >> z)
k++;
cout << k << endl;
}

Testing this with 10,000 lines containing three numbers each,
this took 2 seconds on my old laptop; the C# version using
Parser took 3 seconds. Which seems acceptable to me; it would be
an interesting exercise to recode it to directly look for
whitespace and not generate new arrays for each line, but it
will not make an enormous difference in runtime.

Lack of direct library support for plain text files containing
numbers seems a bit odd. Perhaps the designers of the .NET
libraries felt that nobody would be so last century as to
mess with text data, when the glories of XML are so accessible.
But there are a lot of plain text files out there, and they
need to be read. And they are easier on the eye.


Powered by Blogger