Archive for the ‘can you believe we don’t have a “science” tag?’ Category

The Grauniad Dabatlog has produced a rather fancy network visualisation of the sources cited in Anders Behring Breivik’s personal manifesto/horse-shit compendium. This is great as I now don’t need to worry that I perhaps should have made one. It’s very pretty and you can click on stuff, and see that some of the sources are thinktanks and some of them are newspapers, and well, it’s very pretty and you can click on stuff. It also comes with a piece by Andrew Brown reprising his “Don’t be beastly to the creationists!” shtick but with Melanie Phillips, for some reason.

Unfortunately it’s almost completely intransparent, and gives little indication of what data is being visualised or on what basis, and there is really no obvious conclusion to draw from it. But did I mention pretty and click? If forced to take a view, I would reckon that the underlying data is probably a matrix of which sources appear together with others and the layout algo is a force-directed graph (aka the default in pretty much any visualisation toolkit), probably weighted by appearance count. There’s some sort of proprietary metric called “linkfluence” which appears to be given by(indegree/outdegree)*len(neighbourhood) or words to that effect.

As a result, the only information I got from it was that he linked to Wikipedia, the BBC, and big news sites a lot. Well yes; Wikipedia, bbc.co.uk, etc, generate a hell of a lot of web pages and people read them a lot. Obviously, to say the least, you need to normalise the data with regard to sheer bulk, or you’d end up concluding that Google (or Bing or Yahoo) was his inspiration because he did a lot of web searches, or that he was a normal man twisted by SMTP because he used e-mail.

In fact, I thought they actually did that until I realised that RSS.org is about the other RSS, the Indian extreme-right movement, not the popular Internet syndication standard. Harrowell fail. Anyway, it does show up rather nicely that the groups “European nationalists”, “Counter-Jihad”, and “American Right-Wing” overlap. However, I feel there’s something missing in the characterisation of MEMRI and various other sites as just “Think Tanks” as if they were just like, say, IPPR.

Also, an emergent property of the data is that there is an Axis of Barking running vertically through it: the nearer you are to the top of the diagram, the more extreme and crazy. MEMRI, FrontPage, Gates of Vienna, Melanie Phillips are near the top; the Wikipedia article on the Russo-Turkish War of 1878 is at the bottom. And the MSM is somewhere in the middle. (Although I do wonder if they allocated the sources to groups before or after running the force-directed graph.)

It seems to be one of those command the exciting world of social media with just one click! things.

Anyway, upshot. I want to avoid Project Lobster producing a diagram like this one. It’s too impressionistic and fluffy and reliant on basically aesthetic reasoning. (I think we’ve had this point before.) Of course, that’s partly the difference between the underlying data sets; it was at least thinkable if unlikely that there would be no grouping in Breivik’s sources, while presumably political lobbying is nonrandom and subject to intelligent design.

Elsewhere, a reader passed this along which I need to actually watch (isn’t video time consuming?). There’s a shindig in Warsaw in late October. And I want this on a T-shirt.

Advertisements

So the government thinks this is clever. They also think it constitutes a “searchable online database”. It is not searchable, nor is it a database. It is a collection of links to department web sites, some of which actually lead to useful documents, some of which lead to utterly pointless intermediary pages, some of which lead to documents in a sensible format, some of which lead to documents in pointlessly wrong formats, and some of which lead to PDF files. It provides no clue how often this data will be released or when or where. The URIs sometimes suggest that they might be predictable, sometimes they are just random alphanumeric sequences. Basically, what he said.

Meanwhile, very few of these documents have made it onto data.gov.uk, the government’s data web site (pro-tip: the hint is in the name) which provides all that stuff out of the box. This is not just disappointing – this is actively regressive. Is it official policy to break data.gov.uk?

Anyway, I’ve been fiddling with NetworkX, the network-graph library for Python from Los Alamos National Laboratory. Sadly it doesn’t have a method networkx.earth_shattering_kaboom(). I’ve eventually decided that the visualisation paradigm I wanted was looking me in the eye all along – kc claffy‘s Skitter graph, used by CAIDA to map the Internet’s peering architecture.

The algorithm is fairly simple – nodes are located in terms of polar coordinates, on a circular chart. In the original, the concept is that you are observing from directly above the north or south pole. This gives you two dimensions – angle, or in other words, how far around the circle you are, and radius, your location on the line from the centre to the edge. claffy et al used the longitude of each Autonomous System’s WHOIS technical contact address for their angles, and the inverse of each node’s linkdegree for the radius. Linkdegree is a metric of how deeply connected any given object in the network is; taking the inverse (i.e 1/linkdegree) meant that the more of it you have, the more central you are.

My plan is to define the centre as the prime minister, and to plot the ministries at the distance from him given by the weighting I’d already given them – basically, the prime minister is 1 and the rest are progressively less starting with Treasury and working down – and an arbitrary angle. I’m going to sort them by weight, so that importance falls in a clockwise direction, for purely aesthetic reasons. Then, I’ll plot the lobbies. As they are the unknown factors, they all start with the same, small node weighting. Then add the edges – the links – which will have weights given by the weight of the ministry involved divided by the number of outside participants at that meeting, so a one-on-one is the ideal case.

When we come to draw the graph, the lobbies will be plotted with the mean angle of the ministries they have meetings with, and the inverse of their linkdegree, with the node size scaled by its traffic. Traffic in this case basically means how many meetings it had. Therefore, it should be possible to see both how effective the lobbying was, from the node’s position, and how much effort was expended, from its size. The edges will be coloured by date, so as to make change over time visible. If it works, I’ll also provide some time series things – unfortunately, if the release frequency is quarterly, as it may be, this won’t be very useful.

Anyway, as always, to-do no.1 is to finish the web scraping – the Internet’s dishes. And think of a snappy name.

So we’ve discussed GCHQ and broad politics and GCHQ and technology. Now, what about a case study? Following a link from Richard Aldrich’s Warwick University homepage, here’s a nice article on FISH, the project to break the German high-grade cypher network codenamed TUNNY. You may not be surprised to know that key links in the net were named OCTOPUS (Berlin to Army Group D in the Crimea and Caucasus) and SQUID (Berlin to Army Group South). Everyone always remembers the Enigma break, but FISH is historically important because it was the one for which Bletchley Park invented the COLOSSUS computers, and also because of the extremely sensitive nature of the traffic. The Lorenz cyphersystem was intended to provide secure automated teleprinter links between strategic-level headquarters – essentially, the German army group HQs, OKW and OKH, the U-boat command deployed to France, and key civilian proconsuls in occupied Europe. The article includes a sample decrypt – nothing less than AG South commander von Weichs’ strategic appreciation for the battle of Kursk, as sent to OKH, in its entirety.

Some key points, though. It was actually surprisingly late in the day that the full power of FISH became available – it wasn’t enough to build COLOSSUS, it was also necessary to get enough of them working to fully industrialise the exploit and break everything that was coming in. This was available in time for Normandy, but a major driver of the project must have been as a form of leverage on the Americans (and the Russians). The fate of the two Colossi that the reorganised postwar GCHQ saved from the parts dump is telling – one of them was used to demonstrate that a NSA project wouldn’t work.

Also, COLOSSUS represented a turning point in the nature of British cryptanalysis. It wasn’t just a question of automating an existing exploit; the computers were there to implement a qualitatively new attack on FISH, replacing an analytical method invented by Alan Turing and John Tiltman with a statistical method invented by William Tutte. Arguably, this lost something in terms of scientific elegance – “Turingismus” could work on an intercept of any length, Tutte’s Statistical Method required masses of data to crunch and machines to crunch it on any practical timescale. But that wasn’t the point. The original exploit relied on an common security breach to work – you began by looking for two messages of similar length that began with the same key-indicator group.

Typically, this happened if the message got corrupted by radio interference or the job was interrupted and the German operators were under pressure – the temptation was just to wind back the tape and restart, rather than set up the machine all over again. In mid-1943, though, the Germans patched the system so that the key indicator group was no longer required, being replaced by a codebook distributed by couriers. The statistical attack was now the only viable one, as it depended on the fundamental architecture of FISH. Only a new cypher machine would fix it.

The symbolic figure here is Tommy Flowers, the project chief engineer, a telecoms engineer borrowed from the Post Office research centre who later designed the first all-electronic telephone exchange. Max Newman, Alan Turing’s old tutor and the head of the FISH project, had shown Flowers a copy of On Computable Numbers, which Flowers read but didn’t understand – he was a hacker rather than a logician, after all. He was responsible for the shift from electromechanical technology to electronics at Bletchley, which set both Newman and Turing off towards their rival postwar stored-program computing projects.

Another key point from the book is the unity of cryptography and cryptanalysis, and the related tension between spreading good technology to allies and hoping to retain an advantage over them. Again, the fate of the machines is telling – not only did the FISH project run on, trying to break Soviet cypher networks set up using captured machines, but it seems that GCHQ encouraged some other countries to use the ex-German technology, in the knowledge that this would make their traffic very secure against everyone but the elect. Also, a major use of the surviving computers was to check British crypto material, specifically by evaluating the randomness of the keystreams involved, a task quite similar to the statistical attack on FISH.

Finally, FISH is exhibit A for the debate as to whether the whole thing has been worthwhile. What could have been achieved had the rest of the Colossi been released from the secret world, fanning out to the universities, like the scientists from Bletchley did themselves? Max Newman took racks of top-quality valves away from Bletchley when he moved to Manchester University, and used them in the very first stored-program, digital, Turing-complete computer; Alan Turing tried to do the same thing, but with a human asset, recruiting Tommy Flowers to work on the Pilot-ACE at NPL. (Flowers couldn’t make it – he had to fix the creaking UK telephone network first.) Instead, the machines were broken up and the very existence of the whole project concealed.

On the other hand, though, would either Newman or Turing have considered trying to implement their theories in hardware without the experience, to say nothing of the budget? The fact that Turing’s paper was incomprehensible to one of the most brilliant engineers of a brilliant generation doesn’t inspire confidence, and of course one of the divides that had to be crossed between Cambridge and GPO Research in Dollis Hill was one of class.

Here is a really fascinating interview with David Dunning, of Dunning-Kruger Effect fame. As a taste, the incident that inspired Dunning:

Wheeler had walked into two Pittsburgh banks and attempted to rob them in broad daylight. What made the case peculiar is that he made no visible attempt at disguise. The surveillance tapes were key to his arrest. There he is with a gun, standing in front of a teller demanding money. Yet, when arrested, Wheeler was completely disbelieving. “But I wore the juice,” he said. Apparently, he was under the deeply misguided impression that rubbing one’s face with lemon juice rendered it invisible to video cameras.

He’d done tests.

One outcome of all the MySociety work for this election was the survey administered by DemocracyClub volunteers to all candidates. The results by party are graphed here, with standard deviations and error bars.

Some immediate conclusions: Surprising egalitarianism. Look at question 1, which asks if the budget deficit should be reduced by taxing the rich. Only the very edge of the error bar for the Conservatives touches the 50% mark; the only parties who have any candidates who don’t agree are the BNP and UKIP. Also, question 4 (“It would be a big problem if Britain became more economically unequal over the next 5 years” – agree/disagree) shows that there is a remarkable degree of consensus here. The three main parties of the Left – the Greens, Lib Dems, and Labour – overlap perfectly, and even the lower bound on the Tory percentage is over 50%. Only the ‘kippers and the fash even skim the 50% mark at the bottom end of their distributions. This may actually not be a statement about far-right thinking, because of…

Extremist internal chaos. On every question except the one about immigration for the BNP and the one about the EU for UKIP, these two parties have huge error bars for every question. As soon as they get off that particular topic, the error bars gap out like the bid-offer spread in a crashing market. Clearly, they agree about very little other than their own particular hate-kink. So the result in my first point could just be because they always have the widest standard error and deviation.

Immigration, or a field guide to identifying British politics. If you’re a Liberal, Labour, or a Green, you’ve got no problem with immigrants. Even the upper bounds only just stroke the 50% line. All the parties of the Right, however, overlap around the 80% line. Need to identify someone’s partisan affiliation quickly? Wave an immigrant at them. The other culture-wars question about marriage is similar, although the gap is smaller and the error bars bigger.

The consensus on civil liberties. Everyone, but everyone, thinks there are far too many CCTV cameras about. All parties overlap at between 68-78%…except for Labour. Labour is the only party that supports CCTV and it supports it strongly. There is just the faintest touch of overlap between the top (i.e. least supportive) end of the Labour error range and the bottom (i.e. most supportive) of the Tories’.

Trust and honesty. Liberals, Labour, and Conservatives all think politicians are honest. No doubt this is because the respondents are themselves politicians. Interestingly, the exceptions are the BNP and UKIP. Very interestingly, the BNP is united in cynicism, whereas the UKIP error range gaps-out dramatically on this question. The Greens’ error range converges dramatically on exactly 46% agreement – they are almost perfectly in agreement that they don’t agree.

Art and culture; only ‘kippers, BNPers, and a very few extreme Tories don’t support state funding of the arts.

Britain is a European country and is committed to the European Union. You can’t argue with the data; the Tories and Greens average between 20-30% support for withdrawal, zero for the Liberals and Labour, and even the upper bound for the Tories is well under the 50% line. Obviously, the BNP and UKIP want out, which is obvious and after the election result, arguably trivial.

Pacifist fascists; bellicose conservatives; divided lefties and ‘kippers. OK, so which parties are least keen on military action against Iran, even if they are caught red-handed building a nuke? The Greens are unsurprisingly 86% against with minimal error – perhaps the only occasion they would turn up a chance to oppose nuclear power! The other is the BNP – 82% against. Who knew we would find a scenario in which the BNP would turn up a chance to kill brown people? Labour, the Liberals, and UKIP would split down the middle – they overlap perfectly around the 50% mark. The Tories, however, are the war party – 39% against, with the lower bound well clear of the other parties. The UKIP result is strange – you’d expect them to be basically like Tories or like the BNP, but they are most like Labour on this issue, although they have a tail of happy warriors. The BNP is also the party most opposed to continuing British involvement in Afghanistan – even more than the Greens. Labour, the Liberals, the Tories, and UKIP overlap heavily around being narrowly in favour, although UKIP as usual gaps out when it’s not discussing how much it hates the EU.

Even the Toriest Tories say they support UK Aid. This one’s fairly clear – even the upper bound for the Tories is well below 50% and everyone else serious is much lower. UKIP and the BNP are strongly against, but their error bars are quite wide – clearly, they’re not sure whether they hate foreigners enough that paying them not to be immigrants is a good idea.

Summary: We’re a broadly social democratic European nation, with a few nutters for comic relief. And Chris Lightfoot’s Political Survey results (the primary axis in British politics is liberty-vs-authority, strongly correlated with internationalism-vs-isolationism, and the secondary axis is egalitarianism-vs-libertarianism, but there is surprisingly little variance along it) from 2005 appear to be confirmed.

An interesting isotope is detected in the CRU report fall-out plume. Apart from the very high concentrations of concern-troll, tone-troll, and pure drivel, there is something worth learning from.

For this reason, many software professionals encountering science software for the first time may be horrified. How, they ask, can we rely on this crude software, developed in primitive conditions – by amateurs, working with such poor tools and such poor understanding of the field? This is a common reaction to GISTEMP, and is exactly the reaction which many critics have had, some very publicly, to the software published with the CRU emails. Such critics do have a point. Science software should be better than it is. Scientists should be provided with more training, and more support. But consider the uses to which science software is put. Most
software written by scientists:

* consists of tiny programs;
* which will only ever be run a small number of times;
* over the course of a few weeks as it is being developed;
* by the scientist who wrote it;
* on data gathered by that scientist’s team;
* concerning a scientific field in which that scientist is expert;
* to perform data processing on which that scientist is expert; and will be discarded, never to be used again, as soon as the paper containing the results is accepted for publication.

There are hardly any scientists today who don’t do some programming of some sort; there’s not much science that doesn’t involve churning through really big data sets. As a result, there’s a lot of it about. Which reminds me of this Eric Sink post from 2006, about the distinctions between “me-ware, us-ware, and them-ware”. Me-ware is software that you write and only you use; us-ware is software that is used by the same organisation that produces it; them-ware is software that is produced by a software company or open-source project for the general public.

There’s a gradient of difficulty; the further from you the end-user is, the less you know about their needs. On the other hand, if you’re just trying to twiddle the chunks to fit through the ChunkCo Chunkstrainer without needing to buy a ChunkCo Hyperchunk, well, although you know just how big they are, you’re unlikely to spend time building a pretty user interface or doing code reviews. Which only matters up to a point; nobody else would bother solving your problem.

But this can bite you on the arse, which is what happened to the climate researchers. It’s fair to say that if you’re processing a scientific data set, what actually matters is the data, or the mathematical operation you want to do to it. You won’t get the paper into Nature because you hacked up a really elegant list comp or whatever; they won’t refuse it because the code is ugly. Anyone who wants to replicate your results will probably roll their own.

This is OK, but the failure mode is when the political equivalent of Brian Coat comes snooping around your #comments or lack of them. Perhaps I should tidy up the Vfeed scripts while I’m at it.

If you think the Superfreaks had demonstrated the truth of the Dunning-Kruger effect well enough, especially after this further hammering, and their attempt to gain everyone’s esteem by having NewsCorp send out copyright nastygrams, think again.

Here’s some science, via Lou Grinzo’s blog. We’ve been taking very, very thin samples of the leafmould in the bottom of a rather special Irish lake (peat – not much oxygen, so things *last*), and it’s possible to draw some interesting conclusions about the Younger Dryas event, which flipped the planet into an ice age 13,000 years ago after a huge ice barrier in North America collapsed and let vast amounts of fresh water pour into the Atlantic.

The killer detail, literally: the new ice age kicked off within months. We had thought it took decades, but instead it tore in within a year. A year. No time to adjust; not even that much time to flee.

This should surely kill off any daft ideas of fiddling with the atmosphere. Shouldn’t it?

While we’re kicking the remains of Superfreakonomics around the car park, here’s something else. Via Kevin Drum, it seems that John Meriwether, the chap whose hedge fund LTCM nearly killed the banking sector in 1998, has started another hedge fund, a few months after his come-back ended up being crushed under the financial panic of 2008.

As a comment at RealClimate says with regard to Dubner and Levitt:

So, do Levitt and Dubner list Dunning and Kruger as co-authors on this chapter?

I think you’ll agree this comment wins the Internet. Meriwether seems to be the ideal type of a certain kind of intellectual failure mode, almost an American original – the man obsessed by the notion that his (almost always) numerical expertise makes him an all-round expert.

Nathan Myrhold is another – he even called his post-Microsoft hobby company “Intellectual Ventures”, which ought to put a rough value on the degree of wankishness we’re dealing with here. But, of course, they aren’t intellectuals or even technocrats; they specialise in hyper-specialisation, rather than any broader culture, and by the time they reach this point they are usually many years from dealing with anything practical.

In fact, as a cultural type, they’re almost Soviet figures; believers that if you can get that input-output table\Gaussian copula just right, we’ll be able to achieve the new man and true communism\hedge the entire economy perfectly.

One consequence of the whole Superfreakonomics fiasco, which has been thoroughly reported elsewhere in the blogosphere, is that I’ve changed my mind about geoengineering ideas. Up until now, I was of the opinion that the various proposals to check climate change by doing various things to the atmosphere or the oceans were no substitute for reducing CO2 emissions, but they were worth at least studying in order to have an emergency reserve option. And in fact, I always liked the stratospheric sulphur one because it didn’t involve massive space structures and it was, at least theoretically, reversible – the stuff rains out within weeks to months, so it’s possible to switch the thing off.

I also preferred it because one version – very differently from the daft 18-mile hose with helium balloons and sharks with lasers etc – involved simply changing the specification for Jet-A1 aviation fuel in order to let the refinery leave more sulphur in it. Rather than all that incredibly complicated and expensive fantasy engineering – what James Nicoll would call our viewgraph future – this could be done cheaply.

But the Superfreaks have permitted me, at least, to think this through further. The problems with any climate-engineering, rather than emissions-engineering, approach are just too bad.

The mechanism of action for the sulphate plan is basically that it creates more nuclei for water droplets to coalesce around, and therefore creates high-altitude clouds that reflect heat back out to space. Unfortunately, the nuclei are particles of various sulphate compounds, and when they dissolve in water with sunlight, you get sulphuric acid and hydrogen ions; acid rain. And the plan implies doing this globally, so rather than just damaging forests in northern Europe like we used to, we’d be acidifying the sea at the same time as we’d be acidifying it anyway by asking it to take up a whole lot of CO2.

Then there are the consequences in terms of meteorology rather than climatology, about which the best that can be said is “we have no idea” and the worst that can be said is “there is a nontrivial chance of losing the monsoon, gaining nastier hurricanes, or maybe both”. In fact, these are worryingly like the consequences of acute climate change themselves, which tends to make you wonder what the point is.

Like the really bad climate change scenarios, these all carry a lot of political risk as well; Ken Caldeira, who was misquoted in the book and who came up with the sulphate plan in the first place, remarked back in the 1980s that one solution to climate change would be a nuclear war, which looked if anything more likely at the time. (For the inevitable hard of thinking troll, his point was that a nuclear war would both fill the upper atmosphere with cloud-seeding dusts of various kinds, and effectively stop humanity emitting lots of CO2, by destroying industrial civilisation – not that this was a desirable option.)

Unfortunately, anything that risks the Indians running out of rain, the Chinese out of drinking water, or the Americans out of coastal cities, is by definition a threat of the same class as a medium-sized ballistic missile attack, and you can bet that the powers in question would draw the conclusions that follow from that.

And also, there’s a serious class break for all climate-engineering plans; what happens if it works for an extended period of time, but for some reason we have to stop? If the underlying problem isn’t addressed, as soon as the sulphates rained out, the world would heat up until it hit radiative equilibrium right then, which is as good as any definition of the end of the world. This is the flip side of the advantage of reversibility.

There are, of course, alternative approaches; I think of them as emissions-engineering rather than climate-engineering. They essentially aim to absorb the CO2 rather than change other parts of the system’s response, and are as such significantly less difficult, because they have a more direct feedback loop. Unfortunately, one of the most promising – feeding the ocean plankton – has been subjected to a large-scale trial and doesn’t seem to work, at least not reliably.

I can still imagine a scenario where these could come in handy; but I’m increasingly convinced that nobody who floats them is being serious.

I love the fact the phrase “snake palaeothermometry” exists, and even more that it defines an actual scientific experiment.

(I should probably have a “reptiles” tag, seeing as I just blogged about David Miliband and I’m about to mention Sir Nicholas Winterton.)