Monday, 2 July 2012

BIG NUMBER BASHING

Dick Pountain/Idealog 66/20 January 2000

The Millennium bug turned out to be a damp squib - no surprise there - but the braver among the IT spin doctors are now claiming that this was thanks to the billions spent on their remedies. This logic is unassailable, like that of the chap who was seen throwing confetti out of a London-to-Basildon train window to keep tigers at bay (it proved 100% effective). I missed the fuss personally by choosing to spend several weeks somewhere far-away and sunny, not because I was afraid the world was about to end, but precisely because I was sure that it wasn't.

On my return I find myself faced with cold grey skies and two pressing chores, the first being to write this column and the second to convert my phone number database ready for BT's Big Number changeover in April. The last time BT mucked about with the dialling codes I wrote a little program in Turbo Pascal to mangle my database but this time I thought of a sweeter wheeze - why not kill two birds with one stone and write a column about it? I realise there are many freeware utilities out there that will handle the change for you (including a Wizard from our own Simon Jones) but I have already confessed here, back in issue 61, that I'm sadly addicted to file converting, and opportunities to do it in earnest are not to be missed. Lots of readers mailed me about that column, asking for more information on using regular expressions, so I'll devote this column to describing just how I did my Big Number changeover using a simple text editor. Conveniently enough Helios Software Solution's TextPad 4, which has a full regular expression search-and-replace feature, is one of the Essential Apps contained on every month's PC Pro cover disk, so I used that to illustrate - it employs the standard Unix/POSIX syntax for regular expressions but I regret that some other text editors about use totally different symbols for wild cards and control characters.

The process is simple enough to describe: first export all your phonebook records in some ASCII text format; perform the Big Number conversion on the contents of this text file; delete all the records from the phone database; and finally reimport the converted text file to repopulate the database. The whole process took me a little over 15 minutes on my phonebook of 704 records. I keep my phonebook in Palm Desktop nowadays so I used its File | Export command and chose Tab Delimited as the output format (though Comma Delimited would do just as well). Before deleting all the records I backed up the binary database files to my Zip drive, and after deleting them I performed a dry run by reimporting the unconverted text file just to make sure that the Import option works - paranoid, that's me.

Now to the nitty-gritty of finding and replacing the phone codes. Fortunately I keep my phone number data pretty clean, with area code always joined by a hyphen and no parentheses, as in 0171-000-0000. Before actually replacing anything I did a simple Find | Mark All from TextPad's Search menu to establish roughly how many items I was dealing with. Searching for the regular expression 01[78]1- told me there were 329 phone numbers containing either 0171- or 0181-. The [78] part is called a class expression, and it matches any of the characters written within its brackets (here either a 7 or an 8). You may also employ ranges so that [7-9] would match 7, 8 or 9. I was not quite satisfied yet because some rogue number might just possibly contain a sequence like 30171-, so I searched again for "01[78]1- since Palm Desktop's tab delimited format puts double quotes around each field:

"Academic Press"    "0171-000-0000"     "24-28 Oval Road, London NW1 7DX"

I discovered there were now only 309 matches, but a further search established the missing 20 all had either a space or a newline before the 01 and there were no real rogues. It's always worth doing a little analysis like this before committing to mangle a really large file: you can always use Alt-Z or File | Revert to undo any changes made during such preliminary investigations.

Having decided these were the numbers I really wanted to change I used a tagged regular expression and a replacement expression to do the work.

Find: 01\([78]\)1-
Replace: 020-\1

Those extra brackets \( \) around the [78] turn it into a tagged expression which saves the actual value that matched it in a hidden variable: you can tag up to 9 different sub expressions within a single search string, and retrieve their stored values by referring to them in order as \1, \2 up to \9. Hence in my replacement expression 020-\1 the \1 will be replaced by either a 7 or an 8 depending on which was actually found in any particular instance. Changing a few entries for Cardiff, Southampton, Portsmouth and Coventry via individual searches took only a couple of minutes more and I didn't bother with the mobile and pager numbers since the handful of people I contact via their mobiles appear to change their number every three weeks anyway - I'm prepared to change those as I go.

You can do lots more with regular expression than I've touched on here, like 'anchoring' them to match only the beginning (or end) of lines and words, or negating them so that [^A-Z] for example will match anything except an uppercase letter. The best way to master them is to play with them using TextPad's reasonably good Help screens. Happy matching.

No comments:

Post a Comment

ARTY FACTS

Dick Pountain /Idealog 363/ 05 Oct 2024 03:05 When I’m not writing this column, which let’s face it is most of the time, I perform a variety...