Archive for July, 2008

According to the BBC, the Home Office really, really doesn’t get the basic truth that 0.01% of a really big number is quite a big number. The Torygraph reported that the Criminal Records Bureau had mistakenly told its customers between February 2007 and February 2008 that some 680 people had criminal records when in fact they had none. The Home Office’s response:

The Home Office said CRB has a 99.98% accuracy rate in vetting people working with children and vulnerable adults.

Indeed. I keep saying this; 99.98% accuracy, which is the politician’s way of saying a 0.02% failure rate, is only good enough if 0.02% of the total isn’t a large number. It must seem silly to people outside the telecoms business that we go on about 99.999% reliability. But that is a percentage of up to hundreds of millions of calls and signalling events.

Fortunately, there are some numbers in the story. The Home Office claims that 80,000 (a round number, but we’ve got nothing else to go on) people were prevented from taking up posts involving “vulnerable people”; there’s no way of telling whether this means only ones involving “vulnerable people”, only ones where a job offer was withdrawn, or just the total CRB checks that came up positive, and there’s no telling what period of time it refers to. If it was the total for 2007-2008, that means the chance of a positive CRB check being a false positive is 0.85 per cent (99.15 per cent in contractorspeak). And we *haven’t* even considered the false negatives….

So where’s your 0.02 per cent now? Naturally, it’s possible that the 80,000 covers more than one year…but hold on. If there were many more, some such figure recurring every year, then this suggests the actual numbers are even worse. The CRB has been going since, what, 2002? 13,333 refusals a year on average. We know the 680 false positives are for just one year; which would make it a 5.1% false positive rate for 2007-08. (That’s 94.9% in contractorspeak.) So, the Home Office’s figures cannot possibly be right; it’s impossible to have a negative number of false negatives, so we *know* that the CRB does not provide 99.98% accuracy. Surely this means the Government should be suing Capita or whoever?

Here is the presentation I delivered at OpenTech 2008:

I’d publish the text, but I didn’t prepare a text:-)

Anyway, the ViktorFeed is a development of basic python scripts I’ve been using for some time to collect data on certain aircraft movements through Sharjah and Dubai Airports. Both of these place all movements on the Web, but neither of them provide anything like an RSS feed, which is why I began scripting, in order to save checking them myself. (You can read about this phase in the Political Pathetic Python posts on this blog.)

The current version works as follows: the web pages involved are loaded and BeautifulSoup instances created for each one. If a page fails to load and an IOError occurs, this stage is skipped for that one and a default message added. Data is extracted using BeautifulSoup’s find method in list comprehensions. Each flight is represented by a tuple of values in a list. For each flight, the tuple is unpacked and each item in it assigned to a standard variable. If the airline name is found in a whitelist, the tuple is discarded. Otherwise, various standard items – for example, the name of the airport the flight arrived at or departed from – are added, the time variable is processed to provide both a readable time and a time in seconds since the epoch, and a database is queried to provide the geographical locations of the source and destination.

In the event that a location is not given or not found, a default value is specified and a message added. The default location is in the Bermuda Triangle, thanks to Soizick. The values are reassembled as a dictionary and appended to a list. When all pages have been processed, the content of this list is decorated with the time values in seconds since the origin, and sorted into reverse chronological order. This version is then undecorated, and the individual flights are used to create a Simple GeoRSS file through Python string formatting, which is encoded as Unicode and written out to disk.

Items in the file consist of the time and data group, in the title field, the source, destination, airline, and flight number in the description field, a GeoRSS Line tag with the source and destination geocodes, and the current time and date in the pubDate field. This data can be visualised in Google Maps more than simply. The test version was served from my laptop, using the SimpleHTTPRequestHandler and ForkingTCPServer methods and port forwarding.

Things to do: get it going on a permanent Web presence, refactor the code into a slightly less ugly mess, keep all the flights in a database, make it possible to query past movements.

Back in 2004, this blog went to the European Social Forum – we weren’t that impressed, but we did call it “the Caesar’s Palace of Ranting”. I’m not sure what the equivalent for the UKUUG’s OpenTech 2008 would be; there was plenty of ranting, but a sight less committee wank, more practicality, even if no-one can answer the question of what any of this stuff stands for. I ran into, among others, Liz Henry, most of MySociety, the author of Spyblog (who has some damn good war stories), various readers including Duane Griffin, and a small galaxy of assorted hackers, militants, gawpers, freaks and mutants. Good People, as the Doctor would say.

And they are, too; even if the live demonstration of the ViktorFeed didn’t happen due to the lack of a routable IP address (or even working connectivity for that matter), there was the loan of another laptop when OpenSUSE didn’t want to speak to the projector. When I’d finished the show and dealt with all the questions, I was faced with at least two offers of colocated server capacity, and the services of at least three professional software developers, as well as an interview for the BBC World Service, a spare USB key, and a pint of lager. All of which would have come in handy the night before, when I foolishly attempted to change something in the code after midnight and borked the whole thing, forcing me to get up at six the next morning to fix it.

As it turns out, having met Francis Irving, I’m probably going to be assimilated by MySociety, or at least my project is. I was also very interested in some of the green/geek crossover projects – I missed the session on solar power and IT, but I did get to the AMEE presentation on their automated carbon dioxide profiler and Hotmapping’s show of their IR surveying work, intended to classify buildings by the rate at which they lose heat. Apparently they’d already found one urban cannabis farm.

And BT Osmosoft’s TiddlyWiki – a wiki in a single file – may not sound all that much; but I really liked the idea of a zoomable, pseudo 3D interface for wikis. I’m quite keen on the idea of using this to organise contacts – who puts their friends in alphabetical order after all?