Dick Pountain (21/08/1999 2:51pm): Idealog 61
I've had this nagging feeling recently that something was changing in my life, but only after re-reading my last half dozen columns have I been able to put my finger on it - I'm turning into a computer <user>. In the year since the demise of that US magazine I used to work for, I've not been reviewing computer products, and no longer need the very latest version of everything to be on my hard disk. I have however taken on several large writing projects, and so my computer has become a tool again, instead of a source of amusement or curiosity. When it doesn't work I get very angry indeed, and the idea of fiddling with its innards no longer interests me one little bit. I realize that this is the lot of the vast majority of computer users, but it's a new state of affairs for me, because for the last 18 years I've been playing with computers. All my software has arrive free from a PR agency, and I've installed new software for the heck of it, to see what it does. Ditto with operating systems and learning new programming languages. Now I just need the damn thing to stay up and do what I tell it, with no back-chat.
There is however one way in which I will never, ever, become a 'civilian', one area where I will always be tempted to fiddle, and that is file wrangling. As both a writer and a player-with-computers, I have over the years acquire immense expertise in processing text into different file formats, born out of the repeated need to export my back-archive of work from one word processor, or database, or operating system, or computer to a new one. I've accumulated an arsenal of tools to help with this job, even written some myself, but most important of all, I love it. What gives most people the willies, namely converting data from one file format to another, actually gives me a buzz - a feeling of satisfaction, when it works, comparable to hitting the bullseye with an arrow or breaking the finishing tape with your chest. How sad is that? In fact I'm so fond of file wrestling that I'll often spend a couple of hours writing a script to transform a file, even when it is only going to save me an hour of manual work. It's the principle that counts - if I ever get a coat-of-arms the motto will be 'Never Type It Twice' (perhaps in Latin or Mediaeval French...).
In any case, it's my experience that whenever I spend two hours to save one in this fashion, surprisingly often I'll find that several years later another problem comes along for which that same solution saves me weeks. I'm not just talking about things like changing all the dialling codes in my address book from 01 to 0171, or now to 0207 (I wrote a Turbo Pascal function to do that one). No I'm talking serious processing here, like taking my database of 700-odd address records and separating them into the ones that have a company name and those that have a person's name, then turning the person's names from, say, Avril Williams to Williams, Avril, all with the absolute minimum of manual intervention. Or cutting-and-shutting the said address database to go onto my Palm Pilot. Or taking a database of 6,000 short essays from an Idealist database, converting them to SGML and importing them into a hypertext authoring tool.
The principal tools I've gathered about me to help with these tasks are a powerful text editor that supports regular expressions, a scripting language called Nife (like Perl with a nicer syntax), and a tool I wrote myself that sucks the unique words out files and sorts them in various ways. Regular expressions are a source of great anxiety to anyone not brought up on Unix, but the rewards for learning how to use them are so enormous that I beseech you to try them if you seriously need to mangle text. Take yesterday: I imported those 6,000 short essays, containing cross-references marked up in SGML, into the hypertext tool. They all imported OK, but running the checker showed that over 500 of the cross-references could not be bound because their target word differed in some way from my reference: maybe the word was 'binary-tree' but I'd typed 'binary tree'. It would have taken me days to fix all these by hand, but by searching for 's">' I discovered that no less than 290 of these broken references were caused by plurals, where I'd typed say 'fonts' instead of 'font'. An easy search-and-replace operation turned <CREF HREF="fonts"> into <CREF HREF="font">s, which allowed that reference to bind OK, but that broke many correct references (like 'analysis') that happened to end in an 's'. I ended up by searching for regular expression ^([~isw]^)s"> and replacing with the replacement expression ^1">s which repaired 283 of my broken references at a stroke, while only breaking 6 new ones (which I fixed by hand). A result, as they say.
The downside is that I no longer consider text stored in any binary file format to be safe: not .DOC files, not .XLS files, not .MDB. Wherever the application in question has an export-as-text facility I keep a text copy of any important data, so that when some day a dangling pointer trashes the binary version, I can taste the sweet pleasure of really needing to write a Nife script to rescue my data.
My columns for PC Pro magazine, posted here six months in arrears for copyright reasons
Sunday, 1 July 2012
Subscribe to:
Post Comments (Atom)
ARTY FACTS
Dick Pountain /Idealog 363/ 05 Oct 2024 03:05 When I’m not writing this column, which let’s face it is most of the time, I perform a variety...
-
Dick Pountain /Idealog 360/ 07 Jul 2024 11:12 Astute readers <aside class=”smarm”> which of course to me means all of you </aside...
-
Dick Pountain/Idealog 277/05 August 2017 11:05 I'm not overly prone to hero-worship, but that's not to say that I don't have a...
-
Dick Pountain /Idealog 359/ 11 Jun 2024 09:48 A few weeks ago I 'attended' an interesting webinar organised by IT security firm Sop...
No comments:
Post a Comment