Wandering around .NET
Monday, April 04, 2005
 
Reading numbers from a file in C#

This may seem like an elementary topic in .NET programming,
but it's not as obvious as it seems at first. It is of
course easy to read in lines from a text file:


TextReader rdr = File.OpenText(file);
string line;
while ((line = rdr.ReadLine()) != null)
Console.WriteLine(line);
}
rdr.Close();

Getting numbers from these lines is the part that isn't
obvious; TextReader provides us with no help here. We have
to split the string into its parts, and convert each one
of these to a number. String.Split() appears to do the
trick:


string[] fields = line.Split(new char[]{' '});


It's less awkward to use the default:

string[] fields = line.Split(null);

But the result is the same. Like the similar function in
Visual Basic, Split will give us blank fields if there's
more than one space between fields.

Apparently, Split will be overhauled for the next 2.0 .NET
release, but we have to work with what we have now. The
regular expression classes in System.Text.RegularExpressions
give a very powerful way to split strings.


Regex spaces = new Regex(@"\s+");
string[] fields = spaces.Split(line);

The regular expression '\s+' is our delimiter and means
'more than one whitespace character' (it will apply to tabs
as well as spaces, for instance). This indeed does the
job.

To convert a string to a number is easy, but you must always
be prepared for a bad conversion exception. Here the exception
is just eaten up:

 
try {
val = double.Parse(str);
} catch {
val = 0.0;
}

This is getting complicated, so I'm going to define a class Parser
which handles the details. Here is how Parser is meant to be used:


TextReader rdr = File.OpenText("test.txt");
Parser p = new Parser(rdr);
double[] values;
while ((values = p.ReadFloats()) != null) {
double x = values[0];
float y = (float)values[1];
int i = (int)values[2];
}
rdr.Close();

The full definition of Parser is:


using System;
using System.IO;
using System.Text.RegularExpressions;

public class Parser {
static Regex spaces = new Regex(@"\s+");
TextReader rdr;

public Parser(TextReader tr) {
rdr = tr;
}

public string[] ReadStrings() {
string line = rdr.ReadLine();
if (line == null)
return null;
return spaces.Split(line);
}

public double[] ReadFloats() {
string[] fields = ReadStrings();
if (fields == null)
return null;
int istart = (fields[0].Length == 0) ? 1 : 0;
double[] obj = new double[fields.Length - istart];
for (int i = 0; i < obj.Length; i++) {
try {
obj[i] = double.Parse(fields[i+istart]);
} catch {
obj[i] = 0;
}
}
return obj;
}
}

There is a gotcha in ReadFloats(); if the line begins with
space, then the first string field will be empty. Float conversion
errors are ignored, which you may not like; it's easy to fix
that.

This class generates a lot of temporary arrays, so I expected
it to be fairly slow. Here for comparison is a similar C++
program:


#include
#include
using namespace std;

int main()
{
float x,y,z;
ifstream in("test.txt");
int k = 0;
while (in >> x >> y >> z)
k++;
cout << k << endl;
}

Testing this with 10,000 lines containing three numbers each,
this took 2 seconds on my old laptop; the C# version using
Parser took 3 seconds. Which seems acceptable to me; it would be
an interesting exercise to recode it to directly look for
whitespace and not generate new arrays for each line, but it
will not make an enormous difference in runtime.

Lack of direct library support for plain text files containing
numbers seems a bit odd. Perhaps the designers of the .NET
libraries felt that nobody would be so last century as to
mess with text data, when the glories of XML are so accessible.
But there are a lot of plain text files out there, and they
need to be read. And they are easier on the eye.

Comments:
Good job. I also liked the comment at the end of the page. Thanks a lot.
 
Very useful your code. I really like it. Only two suggestions to improve the code:

1. You can elimite the try-catch block using double.tryparse (remember exceptions are very slow)
if (!double.TryParse(fields[i], out obj[i]))
obj[i] = 0;

2. You can elimite the test for blacks on start of the string using a simple trim when reading the line:
string line = rdr.ReadLine().Trim();

The whole code was:

public class TextFileParser
{
static Regex spaces = new Regex(@"\s+");
TextReader rdr;
public TextFileParser(TextReader tr)
{
rdr = tr;
}
public string[] ReadStrings()
{
string line = rdr.ReadLine().Trim();
if (line == null)
return null;
return spaces.Split(line);
}
public double[] ReadFloats()
{
string[] fields = ReadStrings();
if (fields == null)
return null;
double[] obj = new double[fields.Length];
for (int i = 0; i < obj.Length; i++)
{
if (!double.TryParse(fields[i], out obj[i]))
obj[i] = 0;
}
return obj;
}
}
 
Sorry.
The trim must be on the split to avoid an exception when the read is null:

public string[] ReadStrings()
{
string line = rdr.ReadLine();
if (line == null)
return null;
return spaces.Split(line.Trim());
}
 
Post a Comment

<< Home

Powered by Blogger