Archive for the ‘programming’ Category

My lobbying project has been entered in the Open Data Challenge! Someone posted this to the MySociety list, with rather fewer than the advertised 36 hours left. I was at a wedding and didn’t read it at the time. After my partner and I had tried to invent a tap routine to the back end of Prince’s “Alphabet Street” and had got up at 8am to make it for the sadistic bed & breakfast breakfast and gone back to help clean up and drink any unaccountably unconsumed champagne, and the only thing left to look forward to was the end of the day, I remembered the message and noted that I had to get it filed before midnight.

So it was filed in the Apps category – there’s an Ideas category but that struck me as pathetic, and after all there is some running code. I pushed on to try and get something out under the Visualisation category but ManyEyes was a bit broken that evening and anyway its network diagram view starts to suck after a thousand or so vertices.

As a result, the project now has a name and I have some thin chance of snagging an actual Big Society cheque for a few thousand euros and a trip to Brussels. (You’ve got to take the rough with the smooth.)

The most recent experiment with the Lobster Project – see, it’s got a name! It’s got you its grips before you’re born…it lets you think you’re king when you’re really a prawn…whoops, wrong shellfish – was to try out a new centrality metric, networkx.algorithms.centrality.betweenness_centrality. This is defined as the fraction of the shortest paths between all the pairs of nodes in the network that pass through a given node. As you have probably guessed, this is quite an inefficient metric to compute and the T1700 lappy took over a minute to crunch it compared to 7 seconds to complete the processing script without it. Perhaps the new KillPad would do better but the difference is big enough that it’s obviously my fault.

Worth bothering with?

As far as I can see, though, it’s also not very useful. The results are correlated (R^2 = 0.64) with the infinitely faster weighted graph degree. (It also confirms that Francis Maude is the secret ruler of the world, though.)

The NX functions I’m really interested in, though, are the ones for clique discovery and blockmodelling. It’s obvious that with getting on for 3,000 links and more to come, any visualisation is going to need a lot of reduction. Blockmodelling basically chops your network into groups of nodes you provide and aggregates the links between those groups – it’s one way, for example, to get department level results.

But I’d be really interested to use empirical clique discovery to feed into blockmodelling – the API for the one generates a python list of cliques, which are themselves lists of nodes, and the other accepts a list of nodes or a list of lists (of nodes). Another interesting option might be to blockmodel by edge attribute, which would be a way of deriving results for the content of meetings via the “Purpose of meeting” field. However, that would require creating a list of unique meeting subjects and then iterating over it creating lists of nodes with at least one edge having that subject, and then shoving the resulting list-of-lists into the blockmodeller.

That’s a lorra lorra iteratin’ by anybody’s standards, even if, this being Python, most of it will end up being rolled up in a couple of seriously convoluted list comps. Oddly enough, it would be far easier in a query language or an ORM, but I’ve not heard of anything that lets you do SQL queries against a NX graph.

Having got this far, I notice that I’ve managed to blog my enthusiasm back up.

Anyway, I think it’s perhaps time for a meetup on this next week with Who’s Rob-bying.

Advertisements

So it was OpenTech weekend. I wasn’t presenting anything (although I’m kicking myself for not having done a talk on Tropo and Phono) but of course I was there. This year’s was, I think, a bit better than last year’s – the schedule filled up late on, and there were a couple of really good workshop sessions. As usual, it was also the drinking conference with a code problem (the bar was full by the end of the first session).

Things to note: everyone loves Google Refine, and I really enjoyed the Refine HOWTO session, which was also the one where the presenter asked if anyone present had ever written a screen-scraper and 60-odd hands reached for the sky. Basically, it lets you slurp up any even vaguely tabular data and identify transformations you need to clean it up – for example, identifying particular items, data formats, or duplicates – and then apply them to the whole thing automatically. You can write your own functions for it in several languages and have the application call them as part of the process. Removing cruft from data is always incredibly time consuming and annoying, so it’s no wonder everyone likes the idea of a sensible way of automating it. There’s been some discussion on the ScraperWiki mailing list about integrating Refine into SW in order to provide a data-scrubbing capability and I wouldn’t be surprised if it goes ahead.

Tim Ireland’s presentation on the political uses of search-engine optimisation was typically sharp and typically amusing – I especially liked his point that the more specific a search term, the less likely it is to lead the searcher to a big newspaper website. Also, he made the excellent point that mass audiences and target audiences are substitutes for each other, and the ultimate target audience is one person – the MP (or whoever) themselves.

The Sukey workshop was very cool – much discussion about propagating data by SMS in a peer-to-peer topology, on the basis that everyone has a bucket of inclusive SMS messages and this beats paying through the nose for Clickatell or MBlox to send out bulk alerts. They are facing a surprisingly common mobile tech issue, which is that when you go mobile, most of the efficient push-notification technologies you can use on the Internet stop being efficient. If you want to use XMPP or SIP messaging, your problem is that the users’ phones have to maintain an active data connection and/or recreate one as soon after an interruption as possible. Mobile networks analogise an Internet connection to a phone call – the terminal requests a PDP (Packet Data Profile) data call from the network – and as a result, the radio in the phone stays in an active state as long as the “call” is going on, whether any data is being transferred or not.

This is the inverse of the way they handle incoming messages or phone calls – in that situation, the radio goes into a low power standby mode until the network side signals it on a special paging channel. At the moment, there’s no cross-platform way to do this for incoming Internet packets, although there are some device-specific ways of getting around it at a higher level of abstraction. Hence the interest of using SMS (or indeed MMS).

Their other main problem is the integrity of their data – even without deliberate disinformation, there’s plenty of scope for drivel, duplicates, cockups etc to get propagated, and a risk of a feedback loop in which the crap gets pushed out to users, they send it to other people, and it gets sucked up from Twitter or whatever back into the system. This intersects badly with their use cases – it strikes me, and I said as much, that moderation is a task that requires a QWERTY keyboard, a decent-sized monitor, and a shirt-sleeve working environment. You can’t skim-read through piles of comments on a 3″ mobile phone screen in the rain, nor can you edit them on a greasy touchscreen, and you certainly can’t do either while looking out that you don’t get hit over the head by the cops.

Fortunately, there is no shortage of armchair revolutionaries on the web who could actually contribute something by reviewing batches of updates, and once you have reasonably large buckets of good stuff and crap you can use Bayesian filtering to automate part of the process.

Francis Davey’s OneClickOrgs project is coming along nicely – it automates the process of creating an organisation with legal personality and a constitution and what not, and they’re looking at making it able to set up co-ops and other types of organisation.

I didn’t know that OpenStreetMap is available through multiple different tile servers, so you can make use of Mapquest’s CDN to serve out free mapping.

OpenCorporates is trying to make a database of all the world’s companies (they’re already getting on for four million), and the biggest problem they have is working out how to represent inter-company relationships, which have the annoying property that they are a directed graph but not a directed acylic graph – it’s perfectly possible and indeed common for company X to own part of company Y which owns part of company X, perhaps through the intermediary of company Z.

OpenTech’s precursor, Notcon, was heavier on the hardware/electronics side than OT usually is, but this year there were quite a few hardware projects. However, I missed the one that actually included a cat.

What else? LinkedGov is a bit like ScraperWiki but with civil servants and a grant from the Technology Strategy Board. Francis Maude is keen. Kumbaya is an encrypted, P2P online backup application which has the feature that you only have to store data from people you trust. (Oh yes, and apparently nobody did any of this stuff two years ago. Time to hit the big brown bullshit button.)

As always, the day after is a bit of an enthusiasm killer. I’ve spent part of today trying to implement monthly results for my lobby metrics project and it looks like it’s much harder than I was expecting. Basically, NetworkX is fundamentally node-oriented and the dates of meetings are edge properties, so you can’t just subgraph nodes with a given date. This may mean I’ll have to rethink the whole implementation. Bugger.

I’m also increasingly tempted to scrape the competition‘s meetings database into ScraperWiki as there doesn’t seem to be any way of getting at it without the HTML wrapping. Oddly, although they’ve got the Department of Health’s horrible PDFs scraped, they haven’t got the Scottish Office although it’s relatively easy, so it looks like this wouldn’t be a 100% solution. However, their data cleaning has been much more effective – not surprising as I haven’t really been trying. This has some consequences – I’ve only just noticed that I’ve hugely underestimated Oliver Letwin’s gatekeepership, which should be 1.89 rather than 1.05. Along with his network degree of 2.67 (the eight highest) this suggests that he should be a highly desirable target for any lobbying you might want to do.

Things to get out of the data in this scraper of mine: for each lobby, the monthly meeting counts, degrees in the weighted multigraph, impact factor (i.e. graph degree/meetings to give an idea of productivity), most met ministers, most met departments, topics. For each ministry, meeting counts, most met lobbies, most discussed topics. For each PR agency (Who’s Lobbying had or has a list of clients for some of them), the same metrics as for lobbies. Summary dashboard: top lobbies, top lobbyists, top topics, graph visualisation, top 10 rising and falling lobbies by impact.

Things I’d like to have but aren’t sure how to implement: a metric of gatekeeper-ness for ministers, for example, how often a lobby met a more powerful minister after meeting this one, and its inverse, a metric of how many low-value meetings a minister had. I’ve already done some scripting for this, and NetworkX will happily produce most of the numbers, although the search for an ideal charting solution goes on. Generating the graph and subgraphs is computationally expensive, so I’m thinking of doing this when the data gets loaded up and storing the results, rather than doing the sums at runtime.

Where’s that Django tutorial? Unfortunately it’s 7.05 pm on Sunday and it’s looking unlikely I’ll do it this weekend…

So I scraped the government meetings data and rescraped it as one-edge-per-row. And then, obviously enough, I tidied it up in a spreadsheet and threw it at ManyEyes as a proof-of-concept. Unfortunately, IBM’s otherwise great web site is broken, so although it will preview the network diagram, it fails to actually publish it to the web. Oh well, ticket opened, etc.

Anyway, I was able to demonstrate the thing to Daniel Davies on my laptop, on the bar of the Nelson’s Retreat pub in Old Street. This impressed him excessively. Specifically, we were interested by an odd outlier on the chart. Before I get into that, though, here are some preliminary findings.

1 – Clegg’s Diary

At first sight, Nick Clegg appears to be unexpectedly influential. His calender included meetings with NATO, the World Bank, the Metropolitan Police, the Gates Foundation, and oddly enough, Lord Robertson of Port Ellen. Not only that, he had one-to-one meetings with all of them. However, he also got The Elders (i.e. retired politicos playing at shop) and the leader of the Canadian opposition, one Michael Ignatieff, Esq. God help us, is Clegg turning out to be a Decent?

2 – Dave from PR’s surprisingly dull world

The Prime Minister, no less, meets with some remarkably dull people. In fact, he met quite a lot of people who you’d expect to be left to flunkies while leaving quite a lot of important people to Nick Clegg. He did get BP, Shell, Pfizer, Rupert Murdoch, the TUC general secretary, and Ratan Tata (twice!) as one-on-ones, but he also met a surprising number of minor worthies from Cornwall and vacuous photocalls with people from Facebook.

3 – Francis Maude, evil genius of the coalition

Secretary of State for the Cabinet Office and Paymaster-General, Francis Maude MP, is the surprise hit, as far as I can make out. He seems to have a special responsibility for anything that smacks of privatisation – therefore, the monetary value of meeting him is probably high. Of course, if your evil genius is Francis Mediocritus, you’ve got problems. No wonder we’re in such a mess. All these points are also true of Oliver Letwin.

4 – Communication and Strategy Management Ltd

This is our far outlier. Some of the least significant people on the chart appear to be government whips, which is obviously an artefact of the data set. The data release does not cover intra-governmental or parliamentary meetings, nor does it cover diplomatic activity. Whips, of course, are a key institution in the political system. Given their special role with regard to both the government and parliament, it’s not surprising that they appear to be sheltered from external lobbying – access to the Whips’ Office would be such a powerful and occult influence that it must be held closely.

So what on earth is Communication and Strategy Management Ltd., a company which had one-on-one access to the Government Chief Whip, the Rt. Hon. Patrick McLoughlin MP, and which according to Companies House was founded on the 11th of April? It has no web site or perceptible public presence. It is located in what looks like a private house, here, not far from Stratford upon Avon:

Evidently the hub of political influence, but those are the facts. The directors are Elizabeth Ann Murphy and Richard Anthony Cubitt Murphy*, ignoring a company-formation agent who was a director for one day when setting up the company. It’s not as if C&SM Ltd is a constituent of McLoughlin’s – he’s MP for the Derbyshire Dales. Actually, either the directors are related or else there was a cockup, as Murphy’s name on the books was amended from Bromley the day after the company was formed and both were born in 1963. The Companies House filing* doesn’t give any other information – accounts aren’t due for a while – except that the one share issued is held by Norman Younger, who is a partner in the company formation service that was used.

Anyway, the next stop is to learn how this works and put up a nice little dashboard page to help watch the lobbysphere. I’d be happier doing something with python – such as nodebox – but the diagram is already too big to be useful without interactivity, and you can’t stick a NodeBox window in a web page. I’ve got the search terms for the data as an RSS feed from data.gov.uk, so it should just be a matter of adding more URIs as departments release their data.

*Not the Richard Murphy, who is too young.
*WebCheck – it’s not an ugly website, it’s a way of life…

This has me thinking one thing – TheyWorkForYou needs to integrate the text-mining tool researchers used to estimate the point at which Agatha Christie’s Alzheimer’s disease set in by analysing her books. We could call it WhatHaveTheyForgotten? Or perhaps HowDrunkIsYourMP? Jakob Whitfield pointed me to the original paper, here. It doesn’t seem that complicated, although I have a couple of methodological questions – for a start, are there enough politicians with a track record in Hansard long enough to provide a good baseline for time-series analysis?

Instead, we could do a synchronic comparison and look at which politicians seem to be diverging from the average. Of course, some might object that this would be a comparison against a highly unusual and self-selected sample. Another objection might be that the whole idea is simply too cruel. Yet a further objection might be the classic one that there are some things man should not know.

Update: Implemented!

So I was moaning about the Government and the release of lists of meetings with external organisations. Well, what about some action? I’ve written a scraper that aggregates all the existing data and sticks it in a sinister database. At the moment, the Cabinet Office, DEFRA, and the Scottish Office have coughed up the files and are all included. I’m going to add more departments as they become available. Scraperwiki seems to be a bit sporky this evening; the whole thing has run to completion, although for some reason you can’t see all the data, and I’ve added the link to the UK Open Government Licence twice without it being saved.

A couple of technical points: to start with, I’d like to thank this guy who wrote an alternative to Python’s csv module’s wonderful DictReader class. DictReader is lovely because it lets you open a CSV (or indeed anything-separated value) file and keep the rows of data linked to their column headers as python dictionaries. Unfortunately, it won’t handle Unicode or anything except UTF-8. Which is a problem if you’re Chinese, or as it happens, if you want to read documents produced by Windows users, as they tend to use Really Strange characters for trivial things like apostrophes (\x92, can you believe it?). This, however, will process whatever encoding you give it and will still give you dictionaries. Thanks!

I also discovered something fun about ScraperWiki itself. It’s surprisingly clever under the bonnet – I was aware of various smart things with User Mode Linux and heavy parallelisation going on, and I recall Julian Todd talking about his plans to design a new scaling architecture based on lots of SQLite databases in RAM as read-slaves. Anyway, I had kept some URIs in a list, which I was then planning to loop through, retrieving the data and processing it. One of the URIs, DEFRA’s, ended like so: oct2010.csv.

Obviously, I liked the idea of generating the filename programmatically, in the expectation of future releases of data. For some reason, though, the parsing kept failing as soon as it got to the DEFRA page. Weirdly, what was happening was that the parser would run into a chunk of HTML and, obviously enough, choke. But there was no HTML. Bizarre. Eventually I thought to look in the Scraperwiki debugger’s Sources tab. To my considerable surprise, all the URIs were being loaded at once, in parallel, before the processing of the first file began. This was entirely different from the flow of control in my program, and as a result, the filename was not generated before the HTTP request was issued. DEFRA was 404ing, and because the csv module takes a file object rather than a string, I was using urllib.urlretrieve() rather than urlopen() or scraperwiki.scrape(). Hence the HTML.

So, Scraperwiki does a silent optimisation and loads all your data sources in parallel on startup. Quite cool, but I have to say that some documentation of this feature might be nice, as multithreading is usually meant to be voluntary:-)

TODO, meanwhile: at the moment, all the organisations that take part in a given meeting are lumped together. I want to break them out, to facilitate counting the heaviest lobbyists and feeding visualisation tools. Also, I’d like to clean up the “Purpose of meeting” field so as to be able to do the same for subject matter.

Update: Slight return. Fixed the unique keying requirement by creating a unique meeting id.

Update Update: Would anyone prefer if the data output schema was link-oriented rather than event-oriented? At the moment it preserves the underlying structure of the data releases, which have one row for each meeting. It might be better, when I come to expand the Name of External Org field, to have a row per relationship, i.e. edge in the network. This would help a lot with visualisation. In that case, I’d create a non-unique meeting identifier to make it possible to recreate the meetings by grouping on that key, and instead have a unique constraint on an identifier for each link.

Update Update Update: So I made one.

The Book

Red Plenty is a fictionalised history, or possibly a work of hard historical science fiction, which covers what it describes as the “fifties’ Soviet dream” but which might be better termed the Soviet sixties – the period from Khrushchev’s consolidation of power to the first crackdown on the dissidents and the intervention in Czechoslovakia. This is a big book in a Russian way – it’s always been a science-fiction prerogative to work with the vastness of space, the depth of history, and the wonder and terror of science and technology, but it’s also been fairly common that science-fiction has had a bit of a problem with people. The characters who re-fire the S-IVB main engine for translunar injection, with nothing but a survival pack of big ideas for use on arrival, tend to vanish in the cosmos. At its best, this has given the genre a disturbingly calm new perspective – chuck out your literary chintz, the rocket equation will not be fooled. At worst, well, OH NO JOHN RINGO.

Red Plenty covers a lot of big ideas, some serious hardware and even more serious software, and great swaths of the Soviet Union. But you will also need to be prepared to meet quite a lot of difficult but rewarding people, rather like the geneticist character Zoya Vaynshtayn does at the party Leonid Kantorovich’s students throw in Akademgorodok. In that sense, it has a genuinely Russian scale to it. The characters are a mixture of historical figures (as well as Kantorovich, you will spend some time in Nikita Khrushchev’s interior monologue), pure fictions, and shadow characters for some historical ones. (Emil Shaidullin roughly represents Gorbachev’s adviser Abel Aganbegyan; Vaynshtayn the historical geneticist Raissa Berg.)

So what are they up to?

Rebooting Science

Kantorovich, a central figure of the book, is remembered as the only Soviet citizen to win a Nobel Prize in economics, and the inventor of the mathematical technique of linear programming. As a character, he’s a sort of Soviet Richard Feynman – an egghead and expert dancer and ladies’ man, a collaborator on the nuclear bomb, and a lecturer so cantankerous his students make a myth of him. Politically, it’s never clear if he’s being deliberately provocative or completely naive, or perhaps whether the naivety is protective camouflage.

A major theme of the book is the re-creation of real science in the Soviet Union after the Stalinist era; biology has to start up afresh, economics has to do much the same, and everyone is working in a large degree of ignorance about the history of their fields. Some things simply can’t be restarted – as Spufford points out, despite all the compulsory Marxism-Leninism, even genetics hadn’t been erased as thoroughly as independent Marxist thought, and nobody in charge was willing to even think of opening that particular can of worms. On the other hand, the re-opening of economics as a field of study led to what the biologists would have called an adaptive radiation. Pioneers from engineering, maths, biology and physics began to lay spores in the new territory.

Comrades, let’s optimise!

The new ecosystem was known as cybernetics, which was given a wider meaning than the same word was in the West. Kantorovich’s significance in this is that his work provided both a theoretical framework and a critical technology – if the problem was to allocate the Soviet Union’s economic resources optimally, could it be possible to solve this by considering the economy as a huge system of linear production functions, and then optimising the lot? The idea had been tried before, in the socialist calculation debate of the 1920s, although without the same mathematical tools.

This is one of those events whose significance has changed a great deal over time. The question was whether it was possible for a planned economy to achieve an optimal allocation of resources. The socialists thought so; their critics held that it was impossible, and elaborated a set of criteria for optimal allocation very similar to the ones that are familiar as the standard assumptions in the economic theory of the firm in perfect competition. These days, it’s often presented as if this was a knockout argument. From the firm in perfect competition, we hop to Hayek’s idea that a market economy is better at making use of dispersed, implicit knowledge. Basta. We won.

The socialists weren’t without intellectual originality. In fact, they did actually formulate a mathematical rebuttal to the firm in perfect competition – the Lange model, which demonstrated that optimal allocation was a possibility in theory. The Hayekian critique wasn’t considered that great at the time – it was thought a much better point that the barrier to effective planning was a practical one, not a fundamental one. And even then, it was well known that the standard assumptions don’t, actually, describe any known economy. It would simply be impossible to process all the data with the technology available. Even with the new tools of linear optimisation, who was going to do all those sums, especially as the process is an iterative rather than a formal one? Stalin and Hitler had their own way of solving these arguments – no man, no problem – and the whole thing ended up moot for some time.

Computers: a technical fix

But if it had been impossible to run the numbers with pen and paper in 1920, or with Hollerith machines and input-output tables in 1940, what about computers in 1960? Computers could blast through millions of iterations for hundreds of thousands of production processes in tens of thousands of supply chains; computers were only likely to get better at it, too. Red Plenty is about the moment when it seemed that the new territory of cybernetics was going to give rise to a synthesis between mathematics, market-socialist thinking, and computing that would replace GOSPLAN and deliver Economics II: True Communism.

After all, by the mid-60s it was known that the enormous system of equations could be broken down into its components, providing that the constraints in each sub-system were consistent with the others. If each production unit had its own computer, and the computers in each region or functional organisation were networked, and then the networks were….were internetworked? In fact, the military was already using big computer networks for its command-and-control systems, borrowing a lot of ideas from the US Air Force’s SAGE; by 1964, there were plans for a huge national timesharing computer network, for both military and civilian use, as a horizontal system cutting across all the ministries and organisations. Every town would get a data centre.

The Economics Fairy Strikes Again

But, of course, it didn’t happen. There’s a good paper on the fate of the Soviet internetworkers here; Spufford has a fascinating document on the end of indigenous general-purpose computer development in the USSR here. Eventually, during the 1970s, it became increasingly obvious that the Soviet economy was not going to catch up with and outstrip anyone, let alone the United States, and the Austrian economists were retroactively crowned as having obviously been right all along, and given their own chance to fail. Spufford frames the story as a Russian fairytale; perhaps we can say that in fact, economics is the fairytale, or rather the fairy. Successive groups of intellectuals have fought their way through the stacks of books, past the ideological monsters, and eventually reached the fairy’s grotto, to be granted their greatest wish. And it’s always the same one – a chance to fail.

Why did the Soviet economists fail? Red Plenty gives a spectacular sweep through the Soviet economy as it actually was; from the workings of GOSPLAN, to the management of a viscose factory, to the world of semi-criminal side payments that actually handled the problems of day-to-day survival. In the 1990s, the descendants of one half of the socialist calculation debate swept into Russia as advisers paid by the Thatcher Foundation. Arriving on the fairy’s magic cloud, they knew little of how the Soviet economy worked in practice, and duly got their opportunity to fail. The GOSPLAN officials of the 60s were reliant on data that was both completely unreliable, being the product of political bargaining more than anything else, and typically slightly less than a year out of date. And the market socialists were just as reliant on the management of Soviet industry for the production cost data they needed to make sure all those budget constraints really were consistent.

That’s a technical explanation. But there are others available. Once communism was achieved the state was meant to wither away, and not many of the people in charge of it were at all keen on this as a pension plan. Without the power to intervene in the economy, what was the point of the Party, again? Also, what was that stuff about letting people connect computers to the telephone network and pass messages from factory to factory? Where will it end? The central government, the Politburo, GOSPLAN, STAVKA – they would never accept it.

Another, more radical, is that the eventual promise of Red Plenty was to render not so much the top of the pyramid, but the middle management, redundant. The rapid industrialisation had created a new management class who had every intention of getting rich and staying that way. (This was the Yugoslavs’ take on the Soviet Union – the new class had simply taken over from the capitalists.) What would happen to their bonuses, and their prerogative to control the planners by telling them what they wanted to hear?

And yet another is that the whole project was flawed. Even if it was possible to discern the economy’s underlying cost-structure, write the software, and optimise the whole thing, how would this system deal with dynamic economics? How would it allocate investment? How would it cope with technological change? It’s no help to point out that, in fact, a lot of the questions are nowhere near being solved in any economics.

Soviet History

One view of the USSR’s history is a succession of escape attempts. The NEP of the mid-20s, Nikolai Voznezhensky’s term at GOSPLAN in the 1940s, the Soviet 60s. Each saw a real effort to get away from a political economy which was in many ways a wild caricature of the Industrial Revolution, screwing down the labour share of income in order to boost capital investment and hence industrial output, answering any protest against this with the pistol of the state. As well as trying new economic ideas, they also saw surges of creativity in other fields. They were all crushed.

Arguably, you could say the same thing about perestroika. The people who signed the Alma-Ata protocol to arrange the end of the Soviet Union and the dismissal of Gorbachev were not, in fact, heroic dissidents, but rather career communist bureaucrats, some of whom went on to become their own little Stalins. Spufford says in the endnotes to Red Plenty that part of the book’s aim is a prehistory of perestroika – one view of the characters is that many of them are developing into the people who will eventually transform the country in the 1980s. Green politics was an important strand in the great dissident wave, right across the USSR and Central Europe; Zoya Vaynshteyn’s genetic research, which turns up some very unpleasant facts, is a case in point. Valentin, the programmer and cadre, is going to retain his self-image as a bohemian hacker into the future. Another Party figure in the book is the man who refuses to get used to violence, which will also turn out to be important in 1989.

Anyway, go read the damn book.

fail

Quietly, as the election campaign goes on, the NHS IT programme has gone from “heading for the rocks” to “sailing into the cliff”. Has NPfIT put us back 10 years? asks the NHS chief in Rotherham, who’s taken the recently announced option to bail out of the project and deploy something of his own choice. He’s also chosen to do a soft-launch rather than a monster all-or-nothing go-live – so he’s probably worth listening to.

A key problem, apparently, is a lack of the right skills – people have simply drifted away from the project as the reek of zombiedom has become ever more intense. It’s somehow awe-inspiring that it was possible to spend £12bn without attracting hordes of the talented and the merely opportunistic.

The guy who got out earlier – the Paul Allen figure – speaks, and says that the project was doomed because the clinicians didn’t support it. Where have we heard that before?

A safety critical bug emerges.

Some areas have suspended uploading patient data to the Big DB; weirdly, it turns out that the official business case for the summary care records was never approved.

Even weirder, many of the trusts that sent out letters to millions of people, red-alerting NO2ID into action, weren’t actually planning to upload – they just did it because some budget became available for publicity, and hey! budget! Thus accidentally throwing a giant NO2ID demo at the taxpayers’ expense.

I’ve been reading Bruno Latour’s Aramis, or the Love of Technology, a postmodernist account of the failure of a massive French project to develop a Personal Rapid Transit system. Latour’s book contains chunks of fiction, interviews, historical documents, and authorial comment, broken out by the typography – the experience is more like reading a long blog post containing blockquotes from different sources and snarky comments on them than anything else.

It’s a fascinating exploration of the politics of the project, the nature of projects themselves, and the sources of project failure; running from 1969 to 1987, the scheme went from conceptual paper studies to a major prototype by 1973, and eventually built a large-scale test implementation in the mid-80s, before being suddenly cancelled while an intensive test campaign intended to qualify it for deployment was under way. Latour is primarily interested in how the overall concept and much of the technology stayed the same, although its objectives, planned deployment, and resources changed constantly throughout the project.

He argues that, eventually, the crucial issue was that a project is a fundamentally political concept – it has to recruit the support of people and of interest groups in order to progress, and Aramis was a side-project for nearly everyone involved except for two groups – the engineers working on it, and the French Communist Party. Unfortunately for the first group, the contract for large-scale tests was signed as the last act in office of the Communist transport minister before the party pulled out of the Mitterrand government.

This is of course true; a project needs to create its own tribe and its own culture. However, I’m quite ambivalent about the whole concept; not really about its technical or economic aspects, but rather about the idea of urbanness that was built into its core assumptions. PRT emerged in the 1960s as a technological fix to what its American proponents thought was the steady decline of cities – the big idea was a form of high-capacity public transport that would provide point-to-point service without intermediate stops, in a private environment, rather like a car, but without traffic jams or exhaust fumes or road accidents.

The flip side of this comes up again and again in Latour’s interviews with Matra and RATP executives, regarding their assumptions about the passengers and the user-experience studies that were carried out later in the project. Passengers, apparently, wanted more than anything else to be transported from point to point, “without transfers, without thinking“, without other people. Not that any passengers had actually been asked what they thought at this point. Clearly, the political assumptions built into Aramis from the beginning were that moving around a city was basically unpleasant, and specifically because of the presence of other people. Huge amounts of effort were expended on the contradictory task of building a vehicle and a broader networked system that was both user-controlled and designed to keep the user from engaging with it in any way.

Very significantly, when user studies were actually carried out, the public was notably cool on the idea and found the cabins (patterned, on the inside, on the Renault Espace) unnerving and uncanny – rather than being protected from a sinister and menacing urban jungle, they felt isolated in sealed capsules controlled by automated systems, in which they could still be confronted with strangers. The paranoia and declinism that originally motivated the PRT concept was accurately preserved in its architecture and communicated to its potential passengers.

Of course, if you were to ask me about this on the Northern Line or the 271 bus tomorrow evening, I’d probably be significantly more sympathetic to the idea; it’s much easier to enjoy public transport when it isn’t operating at overload-plus. This was also a criticism of Aramis – the RATP managers found it hard to imagine a system working that didn’t use standing passengers as a buffer for peak demand, which is telling in itself. And the PCF’s interest was presumably in the idea of a communal and high-modernist rival to the car that would also be a major technical boost for French industry.

Another interesting but under-discussed angle is that of failed consilience.

While the most active phase of Aramis development was on, other groups of engineers were solving the problems of routing discrete packets around a dense scalefree network, preventing them from colliding, and providing congestion control, load-balancing, and controllable routing metrics. They were, of course, the IEEE-802 and IETF work groups building the Internet. The engineers down the road at Alcatel working on GSM could probably have told them a thing or two, as well. The analogies between the longest prefix match/shortest path wins logic of BGP and the problems of routing Aramis cars are very close, although one problem that doesn’t come up in internetworking is how to return the empties and make sure there is a sufficient free float of vehicles to maintain the service. (You regularly see small vans redistributing the Velib bikes around Paris in order to deal with just this problem.)

Part of the explanation, and another interesting angle, is that there was clearly a massive culture clash between the Matra defence-electronics managers, the RATP railwaymen, and the software developers subcontracted in to eventually write the routing and speed-control systems. Matra representatives repeatedly mention that there was a need for a revolution in microprocessors, although that is precisely what happened every 18 months throughout the project.

Apparently, a related system is under test around Heathrow Terminal 5, due to go live in “spring 2010”. Anyone taking bets?

An interesting isotope is detected in the CRU report fall-out plume. Apart from the very high concentrations of concern-troll, tone-troll, and pure drivel, there is something worth learning from.

For this reason, many software professionals encountering science software for the first time may be horrified. How, they ask, can we rely on this crude software, developed in primitive conditions – by amateurs, working with such poor tools and such poor understanding of the field? This is a common reaction to GISTEMP, and is exactly the reaction which many critics have had, some very publicly, to the software published with the CRU emails. Such critics do have a point. Science software should be better than it is. Scientists should be provided with more training, and more support. But consider the uses to which science software is put. Most
software written by scientists:

* consists of tiny programs;
* which will only ever be run a small number of times;
* over the course of a few weeks as it is being developed;
* by the scientist who wrote it;
* on data gathered by that scientist’s team;
* concerning a scientific field in which that scientist is expert;
* to perform data processing on which that scientist is expert; and will be discarded, never to be used again, as soon as the paper containing the results is accepted for publication.

There are hardly any scientists today who don’t do some programming of some sort; there’s not much science that doesn’t involve churning through really big data sets. As a result, there’s a lot of it about. Which reminds me of this Eric Sink post from 2006, about the distinctions between “me-ware, us-ware, and them-ware”. Me-ware is software that you write and only you use; us-ware is software that is used by the same organisation that produces it; them-ware is software that is produced by a software company or open-source project for the general public.

There’s a gradient of difficulty; the further from you the end-user is, the less you know about their needs. On the other hand, if you’re just trying to twiddle the chunks to fit through the ChunkCo Chunkstrainer without needing to buy a ChunkCo Hyperchunk, well, although you know just how big they are, you’re unlikely to spend time building a pretty user interface or doing code reviews. Which only matters up to a point; nobody else would bother solving your problem.

But this can bite you on the arse, which is what happened to the climate researchers. It’s fair to say that if you’re processing a scientific data set, what actually matters is the data, or the mathematical operation you want to do to it. You won’t get the paper into Nature because you hacked up a really elegant list comp or whatever; they won’t refuse it because the code is ugly. Anyone who wants to replicate your results will probably roll their own.

This is OK, but the failure mode is when the political equivalent of Brian Coat comes snooping around your #comments or lack of them. Perhaps I should tidy up the Vfeed scripts while I’m at it.