Dick Pountain/ Idealog294/ 6th January 2019 16:47:19
I do love strings. I don’t mean balls of string, or G-strings, or particle physics String Theory, or puppet strings. I mean that plain, simple old data structure, a load of old ASCII arranged in a row, the second type we all learn after numbers. “Hello World!” is a string, strings are the way that computers talk to us. OK, nowadays they all contain a chip that can turn strings into sounds, but that’s just to humour us - inside they talk strings, all the words I’m typing here are strings.
The first computer language I learned was Basic, on a Commodore 4K PET back in 1979, before the advent of bitmapped screens. Being a wordy sort of person, and having no graphics available, it was the string functions that grabbed my imagination. One of the first programs I wrote was a nonsense poetry generator, which created seriously mad lines like:
Should a truffle smoothly stink?
Can its pickled stomach think?
The pepper capers over your floor.
Yes, well. Soon I got a better, CP/M, computer and learned Forth, Lisp and Pascal, but none of these were really much better for mangling strings: they all had short limits on length, and similar numeric, array-oriented functions to find stuff. Then I discovered SNOBOL. Never widely popular and now almost forgotten, this language was entirely dedicated to string processing (the name stands for StriNg Oriented and symBOlic Language). It uses patterns that look very like Backus-Naur expressions, rather than numeric indices, for complex substring searches – it’s perhaps still unsurpassed for this purpose, but the rise of Unix and Perl made regular expressions the more popular solution.
I couldn’t see myself using SNOBOL for everything - it’s not great for numeric work - so I decided to write my own string functions that faintly mimicked the way it works. I called them before() and after() and they do what it says on the tin, so before(“pterodactyl”,”rod”) returns “pte”, whereas after(“pterodactyl”,”rod”) returns “actyl”’. I found these so useful that in every new language I learn, they’re the first things I implement. I’ve done them in Basic, Forth, Pascal, Lisp, POP-11, Ruby, Python and more. I published Turbo Pascal versions in a Byte column, and was gratified to find other programmers using them a few years later. Maybe they’re what will be on my blue plaque (just joking). Python of course provides a string method split() to do this – "pterodactyl".split("rod")[1] returns ”actyl”, but I’m now so attached to my before and after that I still prefer them.
Sometime around 1990 I encountered NIFE, the Non Interactive File Editor, from a small Bristol software house called Cadspa (now defunct). This was a DOS command-line program that took any number of text files, plus a file of NIFE commands, and wrote the results back to files. It was scorchingly fast and handled files of almost unlimited size. It employed a Prolog-like, declarative syntax, a sequence of ‘IF...THEN’ statements that could access all parts of each word, text line and all character types. Over the next decade I performed herculean feats with it, updating huge databases when the London phone numbers changed, helping a friend add Ventura Publisher tags to a book in 365 volumes, and alongside my own Turbo Pascal word-sort program to extract a keyword list from 18-years-worth of Byte issues, when I was writing The Penguin Dictionary of Computing. I reckon it saved me more than a year’s work there. Much missed because, like Turbo Pascal, it no longer runs after Windows 8.
In several previous columns I’ve mentioned my computer music composition system and – surprise, surprise – that works entirely on strings. When I started designing many years ago I had to choose a data structure to represent musical notes, and strings seemed, to me at least, an ideal solution. I could have settled for conventional musical notation and represented tunes as strings of the letters A,B,C,D,E,F and G. But the output of my system is MIDI, not musical notation, so instead I employ all the ASCII characters to represent the 127 pitches that MIDI can play. What’s more pitch, duration, volume and start-time get stored as separate strings so they can be manipulated independently. Python is just brilliant for handling this: sometimes tuples (pitch, time, duration, volume) are what’s needed, other times I might want to mangle pitch, or another of the streams, alone.
Computers stopped being ‘all about 0s and 1s’ for me when I quit writing 8088 assembler, and they’re only really about numbers once a year when I do my accounts. The rest of the time they’re all about strings, or ‘words’ if you must...