Archive for the ‘open source’ Category

So I had the opportunity to take part in an Augmented Reality standardisation meeting on the fringe of this year’s 3GSM Mobile World Congress. First of all, it was the year the heavens opened (someone on twitter said it was as if the show had turned into Glastonbury) and I got drenched and my shoes went bad, and my cab didn’t take me to the Telefonica R&D building in Via Augusta but instead to the main switching centre, this amazingly domineering building…

2011-02-17 13.07.09 Telcos – they live in places like this, they know where your dog goes to school, but can they tell you if it’s really your bank on the line?

So I got soaked again, and eventually arrived, and spent the first session listening to my shoes rotting. I acted as scribe for the session on AR browser implementations, markup language vs. JSON, native application vs. browser plugin and the like. I hope I contributed something of value. I have a Flickr set of the annotated flip charts here; I’ve been asked to help prepare the final report. Which just goes to show the enduring truth that if you want to influence something, wait until the very end and sum up with a balanced account. Supposedly this used to be the way to pass the Diplomatic Service exams – buy a pipe, puff on it occasionally during the team exercise, then “sum up with a balanced account”. But you’re not allowed to smoke these days.

2011-02-17 19.16.26

Advertisement

I’m not quite as sceptical as some about this. However, it’s not clear to me how this differs from the sort of thing UNOSAT does all the time – here’s their analysis of imagery over Abyei, the key border area between North and South Sudan. Actually it looks like the “Enough Project” is going to be using UNOSAT imagery itself, going by UNOSAT’s own website.

If you follow the link you’ll see that they have more than reasonable capability (50cm resolution) and that they routinely observe the presence of refugees/displaced persons and returnees, construction, and the like. There’s obvious relevance to an effort to monitor potential conflict along the border, especially as oil prospecting is an issue. You can’t easily hide oil exploration from a satellite that can resolve objects 50cm across.

However, the downside is that the UNOSAT report is comparing images over a two-year period. I would suspect that they will need much more frequent passes to be operationally responsive, which is where the costs get interesting.

Also, I’ve just been over to the website and it’s a bit of an unstructured clickaround. What I’ve always liked about MySociety sites is that they all have a function – FixMyStreet reports things in your street that need fixing, WDTK issues Freedom of Information Act requests, TWFY looks up information on MPs, TheStraightChoice logged what candidates promised and said about each other during their campaigns. DemocracyClub, for example, worked because as soon as you logged in it gave you something to do and some feedback about doing it, and then it hassled you to do something more. It had structure.

Notoriously, if you don’t give volunteers something to do as soon as they show up, they’ll wander off. It is nowhere easier to wander off than on the Internet. And so there’s a button to twitbookspace it and a donation link. There isn’t, however, a to-do list or, say, a list of pairs of images that need comparing.

So someone’s trying to raise $150,000 to buy a satellite from the bankruptcy of TerreStar, in order to “Connect Everyone”. I admire the aim, but I’m concerned that this is going to be a round of forgetting that a lot of perfectly good GSM operators are doing just that. Also, I can’t find any reference to what they intend to use for the customer-premises equipment except that “we’re building an open source low cost modem”, which would be better if it came with a link to the source repo, right, or at least some requirements documentation? I’m also a little concerned that the team includes this guy:

Fabian is a NYC based Swiss wanna-be-entrepreneur who spends all his time trying to make meaningful connections between ourselves and business.

(and I chose charitably) but not anyone whose potted bio mentions being an RF engineer.

Actually, I think that it would be more worthwhile to start off with the low-cost open source satellite radio, as this may be the difficult bit and would be highly reuseable in other projects. A lot of Indian or African GSM people would find a cheap satellite radio very useful for their backhaul requirements. Depending on the spec it could be used with things like the amateur radio AMSATs, the transponders on the ISS, and the spare US Navy FLTSATCOMs. USRP is way too expensive at the moment (they cost more than a cheap netbook) so that one’s out.

So I was moaning about the Government and the release of lists of meetings with external organisations. Well, what about some action? I’ve written a scraper that aggregates all the existing data and sticks it in a sinister database. At the moment, the Cabinet Office, DEFRA, and the Scottish Office have coughed up the files and are all included. I’m going to add more departments as they become available. Scraperwiki seems to be a bit sporky this evening; the whole thing has run to completion, although for some reason you can’t see all the data, and I’ve added the link to the UK Open Government Licence twice without it being saved.

A couple of technical points: to start with, I’d like to thank this guy who wrote an alternative to Python’s csv module’s wonderful DictReader class. DictReader is lovely because it lets you open a CSV (or indeed anything-separated value) file and keep the rows of data linked to their column headers as python dictionaries. Unfortunately, it won’t handle Unicode or anything except UTF-8. Which is a problem if you’re Chinese, or as it happens, if you want to read documents produced by Windows users, as they tend to use Really Strange characters for trivial things like apostrophes (\x92, can you believe it?). This, however, will process whatever encoding you give it and will still give you dictionaries. Thanks!

I also discovered something fun about ScraperWiki itself. It’s surprisingly clever under the bonnet – I was aware of various smart things with User Mode Linux and heavy parallelisation going on, and I recall Julian Todd talking about his plans to design a new scaling architecture based on lots of SQLite databases in RAM as read-slaves. Anyway, I had kept some URIs in a list, which I was then planning to loop through, retrieving the data and processing it. One of the URIs, DEFRA’s, ended like so: oct2010.csv.

Obviously, I liked the idea of generating the filename programmatically, in the expectation of future releases of data. For some reason, though, the parsing kept failing as soon as it got to the DEFRA page. Weirdly, what was happening was that the parser would run into a chunk of HTML and, obviously enough, choke. But there was no HTML. Bizarre. Eventually I thought to look in the Scraperwiki debugger’s Sources tab. To my considerable surprise, all the URIs were being loaded at once, in parallel, before the processing of the first file began. This was entirely different from the flow of control in my program, and as a result, the filename was not generated before the HTTP request was issued. DEFRA was 404ing, and because the csv module takes a file object rather than a string, I was using urllib.urlretrieve() rather than urlopen() or scraperwiki.scrape(). Hence the HTML.

So, Scraperwiki does a silent optimisation and loads all your data sources in parallel on startup. Quite cool, but I have to say that some documentation of this feature might be nice, as multithreading is usually meant to be voluntary:-)

TODO, meanwhile: at the moment, all the organisations that take part in a given meeting are lumped together. I want to break them out, to facilitate counting the heaviest lobbyists and feeding visualisation tools. Also, I’d like to clean up the “Purpose of meeting” field so as to be able to do the same for subject matter.

Update: Slight return. Fixed the unique keying requirement by creating a unique meeting id.

Update Update: Would anyone prefer if the data output schema was link-oriented rather than event-oriented? At the moment it preserves the underlying structure of the data releases, which have one row for each meeting. It might be better, when I come to expand the Name of External Org field, to have a row per relationship, i.e. edge in the network. This would help a lot with visualisation. In that case, I’d create a non-unique meeting identifier to make it possible to recreate the meetings by grouping on that key, and instead have a unique constraint on an identifier for each link.

Update Update Update: So I made one.

Progress update on fixing the Vfeed.

Dubai Airport has done something awful to their Web site; where once flights were organised in table rows with class names like “data-row2”, now, exactly half the flights are like that, they’ve been split between separate arrival, departure, and cargo-only pages, they only show the latest dozen or so movements each, and the rows that aren’t “data-row2” don’t have any class attributes but random HTML colours.

And the airline names have disappeared, replaced by their logos as GIFs. Unhelpful, but then, why should they want to help me?

Anyway, I’ve solved the parsing issue with following horrible hack.
output = [[td.string or td.img["src"] for td in tr.findAll(True) if td.string or td.img] for tr in soup.findAll('tr', bgcolor=lambda(value): value == 'White' or value == '#F7F7DE')]

As it happened, I later realised I didn’t need to bother grabbing the logo filenames in order to extract airline identifiers from them, so the td.img[“src”] bit can be dropped.

But it looks like I’m going to need to do the lookup from ICAO or IATA identifiers to airline names, which is necessary to avoid having to remake the whitelist and the database and the stats script, myself. Fortunately, there’s a list on wikipedia. The good news is that I’ve come up with a way of differentiating the ICAO and the IATA names in the flight numbers. ICAOs are always three alphabetical characters; IATAs are two alphanumeric characters, which aren’t necessarily globally unique. In a flight number, they can be followed by a number of variable length.

But if the third character in the flight number is a digit, the first two must be an IATA identifier; if a string, it must be an ICAO identifier.

Ill-coordinated links. Great news in RepRapping – South Korean scientists have succeeded in getting bacteria to make polylactic acid. PLA is the RepRap project’s favourite feedstock because it’s a reasonably tractable, general purpose plastic that can be synthesised from starch. The synthesis is not exactly simple, which is why outsourcing the job to germs is interesting. As the kit of parts now costs about £395, I really ought to get started with one of these. Now there’s a Christmas present for you. “Engineered bacteria not included.” MUM! YOU FORGOT THE GERMS!

The uranium-enrichment deal with Iran is still on, but they are looking for stronger guarantees of getting the promised fuel for their research reactor. I reckon this is going to come down to the exact number of kilos that leave at a time, and therefore to a fine judgment about the efficiency of their centrifuges.

Spencer Ackerman mourns a great Mod shop. I remember that Klass Clothing in Leeds was about the first business of any kind in town to have a Web site, apart from these guys for obvious reasons. That’s gone, as is Sam Walker in Covent Garden…and possibly even the SL1200!

Wired reviews a book on the media of the Middle East, The Media Relations Department of Hizbollah Wishes You a Happy Birthday. Well, even pirates have press spokesmen these days. It sounds like it could be interesting, but it strikes me that this piece by Tom Griffin about trolls sponsored by various Middle Eastern actors is its critical, rebellious twin.

The GLORIA Center at IDC gathered about thirty Israeli bloggers and members of Israel’s foreign and defense ministries for an informal gathering to evaluate the blogging effort during the Gaza war, new techniques and future challenges. Topics discussed included lessons of the Gaza battle for blogalogical warfare, live-blogging, new technologies and interactions with government. Bloggers delivered short presentations on their personal experiences and discussed future plans for cooperation….

Who wouldn’t want to be a fly on the wall? It practically glows with a radioactive mixture of trollishness, self-righteousness, and raging, thinktank/intern ambition. A weaponised version of MessageSpace. You’ll laugh; you’ll cry; you’ll read up on freeze-distilling your own hydrogen peroxide to escape all this hideousness!

As always, if you want a practical policy recommendation, make tools. A little investment in annoying javascript thingies pays off hugely by improving the productivity of your trolls; and it doesn’t have to be technically very interesting.

In Italy, meanwhile, they’ve got a truly impressive legislation tracker going.

It allows one to follow an act in its path across the two perfectly symmetrical chambers (La Camera and Il Senato), from its presentation as a proposal, to its final approval.

It tracks all the votations, highlighting rebel voters. It tracks who presented an act, and wether as a first-signer or a co-signer. It also tracks speeches of officials on given acts.

Access to textual documents related to an act is easy and documents can be emended by users online, using an innovative shared comments system (eMend), that allows discussions on a particular act to take place.

Users can describe the acts, using their own words, in a wiki subsystem, acts are ratable and commentable, too.

All acts are tagged with consistent arguments by an editorial board, and that allows to know what’s going on and who’s doing what in relation to a subject.

An event-handling subsystem allows the generation of news. Whenever an act is presented, it moves towards approval or refusal, a votation takes place, someone gives a speech or anything worth noticing happens, news are generated. A dedicated web page and a customized daily e-mail, containing just the news related to those acts, politicians or arguments monitored by the user, allows him/her to follow almost in real time what’s going on.

Pretty cool; better than anything we’ve got. And, I think, that’s much more a piece of real citizen technology than any of the TwitBook propaganda apps, which are all about creating a sense of participation; possibly, they actually exist in order to provide that sense as a substitute for real participation, in order to prevent it.

If that’s not hardcore enough for you, the Make blog has a HOWTO on listening to satellites.

So what do we need to know about a parliamentary bill?

First of all, as soon as a piece of legislation is published, it has certain meta-data. Date originated; originating department; originating MP; originating house; type – primary legislation, order in council, statutory instrument; current status (pre-legislative/Green/White Paper, first reading, committee, report, second, third, Royal Assent, repealed/superseded). And, of course, a unique identifier. But they aren’t isolated; they amend, supersede, or repeal other legislation, so every Bill object needs to keep this information as well.

And if it’s secondary legislation, it has dependencies on at least one past Act of Parliament, so anything with the types order-in-council or statutory instrument has to track which Acts it inherits from. Similarly, a primary Bill may create possible secondary legislation.

Now we need to look at the revisions. Once the bill is published, it starts to attract changes; but it remains the same bill. So we need to have further rows which are permanently associated with the original bill, but uniquely identifiable in themselves. It’s probably simplest to keep only the changes at each step, because much of the point of the whole project is to monitor the changes. It feels right to me, if nothing else, to consider all the texts of a bill to be revisions, contained within the bill wrapper.

So a revision contains the title, the text in its sections, the status of the text, the originating organisation, if possible the originating MPs, the timestamp, and the amend/supersede/repeal/inherit information, and a revision ID. At each revision stage, a new item is added, until the final version gets Royal Assent; it would make sense to sort them in reverse chronological order and make the most recent version the default that is retrieved when that bill is requested.

This gives us a reasonable database of legislation, but it’s not going to be much use; for that we need some more comprehensible semantics. So each bill needs both a summary and some category tags, and both the bills and revisions will need to have users specify their own tags and notes. Add those fields as well… And we’ll need links to the debates at each stage, as well. Chuck in a URI field for Hansard in each Revision.

Summing up in object oriented terms, we’ve got a class called Bill, which has instance methods for the various metadata we’ve described, and a subclass called Revision, whose instance methods provide all the fields for each revision, but which always inherits the metadata and unique identifier of the Bill that created it, and possibly a further subclass of Revision called Comment to contain user notes. Further, the Bill needs a method Amend that creates a new Revision with the amending text, which remains provisional (inheriting the amending Bill’s current status) until the amending Bill is finalised. Of course, if we implemented it in something like Django the code could be precisely that.

In database terms, each Bill is a row with a primary key that uniquely identifies the bill and all its revisions and comments; each Revision and Comment is a row which has the same key as its parent Bill and a key which identifies it in the context of that Bill.

Update: Comments point out that a Comment shouldn’t be a subclass of Bill, for because it’s not legislation itself and it should be an is-a relationship not a has-a relationship. Good point; actually, commentary should probably be logically parallel to the actual text of legislation, but related to it – Commentary, with subclass UserComment, linked by the bill and revision IDs to the actual text.

And Dsquared tells us that the German Bundestag already has a public version control system for legislation! Here it is; it’s very complete and logical, I’ll say that for it, but there is no facility to annotate anything. But if you want to know precisely what the Baden-Württemburg delegation wanted to change in the law on modernisation of accounting requirements in the Federal Council’s Committee stage, it gets you there in two clicks from the search page. User experience design does not mean making things pretty.

What is the legacy of the so-called “loony left”? The conventional wisdom is clear; it was all their fault, for panicking the swing voters and preventing a sensible, Newish Labour solution emerging earlier. Well, how did that work out?

And it has always seemed disingenuous for the Labour Party establishment to blame local councillors for a period when the party’s central institutions were regularly totally out of contact with the public mood and spectacularly incompetent; it certainly serves the interests of the top officials and MPs to push responsibility onto an amorphous and vague stereotype essentially based on hostile newspapers’ take on the 1980s. Arguably, believing hostile newspapers’ take on itself has been the fundamental mistake of the Left since about 1987; the entire Decent Left phenomenon, after all, was all about demonising anyone who was right about Iraq in identical terms. Does anyone imagine that the Sun in the Kelvin McFuck era wouldn’t have savaged and libelled any non-Tory power holders?

In a comment at Dunc’s, Paul “Bickerstaffe Record” says:

I want to kick off a bottom up meets top down economic analysis of how Labour /Left leaning local authorities should now be challenging the Thatcherite orthodoxies of cost control/rate capping in a sort of ‘1980s no cuts militant’ meets 2000s grassroots-dictated economic policy. The institutional/legal framework has of course changed out of recognition since 1984, but heh, that’s a challenge rather than an insurmountable problem

He has a point. Consider the position; it’s still conceivable that Labour might luck into a hung parliament next year, cue Liberal and Nationalist (of various types) rejoicing, but any realistic planning has to include a high probability of a fairly rabid Tory government in the near future. Further, the financial position is not great – it’s nowhere near as bad as Gideon Osborne makes out, as a look at the gilt rates shows, but it’s very far from ideal.

So whoever is in charge will be looking for cuts, and it is a reliable principle of Whitehall politics that one of the best ways to get a policy implemented that you want for your own ideological aims is to attach it to a supposed saving. Only the special relationship and the police-media complex can beat this principle as all-purpose justifiers.

The possibility space includes a Labour government in coalition or under a toleration agreement with the Liberals, which is likely to still be strongly influenced by the Blairite stay-behind agents, a Conservative government heavily influenced by products of 80s Tory culture (the mirror image of the London Labour party in the same period), and some sort of grand-coalition slugthing. It is clear that the balance of risks is towards an effort to legitimise a lot of ugly hard-right baggage through an appeal to cuts.

The Tories are planning to make all spending departments justify their budgets at line item level to none other than William “Annington Homes” Hague; it’s certainly a first in British history that the Foreign Secretary will control the public spending settlement, if of course he finds the time to show up.

Therefore, even though there is a need to steer the public finances back towards balance once the recession is clearly looking over, there is a strategic imperative to push back and push back hard against the agendas the cross-party Right will try to smuggle through. After all, the nonsense industry is already cranking up.

Which brings me back to the importance of being loonies, and a bit of politics by walking around. One thing that strikes me about North London is how much stuff in the way of public services here was visibly built in the late 70s and the 1980s; there is a reason why Ken Livingstone hopped right back into the Mayor’s office. Despite all their best efforts, the Thatcherites were never quite able to shake the core welfare state; was it, in part, because down on the front line people were still pushing out its frontiers and changing its quality?

A lot of ideas (service-user activism, notably, environmentalism, a renewed concern for architecture and urbanism, and the whole identity-politics package) that were considered highly loony back then are now entirely orthodox and are likely to stay that way, especially given the main parties’ obsession with putting taxpayer funds into the “third sector”.

I fully expect that anyone who talks a good game about making black schoolboys click their heels in front of teacher – you know the stuff they like – will be able to secure reliable venture capital funding in the million class from a Cameron government, just as they have been able to from Boris Johnson’s City Hall, with remarkably little monitoring. William Hague will be snarky. Let him. Nobody cares what the Foreign Secretary has to say.

This creates both opportunities for action – perhaps someone should prepare a Creative Commons or GPL toolkit for citizen-initiated delivery quangos and thinktanks – and also targets for ruthless mockery, when the Tories’ preferred third sector entities fuck up. We’ve already had some very fine examples of this courtesy of Boris Johnson. Clearly, the only rational response to the times is to go mad.

An idea, seeing as no-one is very interested in ORGANISE and it looks like I’ll have to learn erlang to make any impact on it.

Observation 1: The price of voice telephony is falling fast. Mobile operators provide some truly huge bundles of minutes, and there’s Skype and Co.
Observation 2: Political campaigns of all kinds often need either outbound or inbound phone banks.
Observation 3: Asterisk rocks.

Conclusion: Wouldn’t a distributed phone bank, based on Asterisk’s AGI interface, be cool?

You could: Register volunteers and their availability. Create a campaign. Send talking points to participants as they become available. Dial them up, then dial the target number, and bridge them in. Log the results of the call.

You could also use it for inbound calls – for example, to take statements after a G20-like event, to provide advice, to register participants. And you could initiate and route calls intelligently, for example, to put callers through to people near them, or to send notifications to groups of volunteers.

Anyone interested? I raised this on the MySociety list and we’ve been discussing use cases.