Archive for January, 2009

pigeon religion

Sometimes, at the moment, politics feels like an entirely unconscious business; just a matter of the right reflexes. Like B.F. Skinner’s religious pigeons (see also here). Tap the right kneejerk and get a pellet. Hence this exchange in a US military press conference:

Q Yes. Do you have any evidence that there are more or fewer Iranian-made weapons going into Iraq?…

MR. MORRELL: I don’t have a strong indication of whether there are more or less, but I think we see persistent evidence that there continues to be Iranian support of special groups who are trying to undermine peace and security in Afghanistan. Whether it be through training or the supply of weapons such as EFPs. Frankly –

Q Afghanistan?

MR. MORRELL: You’re asking Iraq.

Q I was asking about Iraq.

MR. MORRELL: Iraq.

Iraq? Afghanistan? Somewhere you can go with BGIA, anyway. And, of course, by definition all opposition comes from Iran, special groups, flah diddysqort, Ping! It’s always been one of the attractions of state-sponsorship theories that you don’t have to think, ascribe motives, accept agency; the seemly emotion (either hysteria, or else Serious Concern, depending on status) is achievable without using the brain at all.

Of course, one effect of this is that in the end everywhere does start to look like Iraq. Afghan private military companies are becoming a problem, but then, who are we to complain when 2 US combatant commands, NATO, several allied but non-NATO states operating in ISAF, a unilateral US command, a couple of civilian organisations, Afghan police, border guards, and army are supposedly the forces of order?

After MySociety’s triumph on the MPs’ expenses issue, this looks interesting: the Lib Dems are putting out a call for geeks. This was followed up by a survey being sent out; I’ve filled it in, so I may end up spending the next election twiddling bolts on Chris Rennard’s particle accelerator. Apparently, there’s to be some sort of shindig in March.

In other news, all I could think of about this was: He writes a great blog/twittered Harry to Jack…

Suddenly, an awful wet crunching and groaning and sick heavy breathing. It’s…huge…festooned in the rags of a once-respectable suit, waving a bladeserver torn from a rack like a child’s toy…dripping with stale blood. No! The NHS IT Zombie has escaped, and it’s fortified itself by eating BT’s brains. Now it’s coming for us. DAATA! it groans. WAAAANT YOUR DAATA! Run!

Seriously; BT has recently had to spook the stock market by warning of a huge hit to profits from its Global Services big-IT division. But reading this FT story carefully, it seems that a lot of that or maybe even the whole thing is down to the NHS National Programme for IT, and specifically the London Region patient management contract. (The other bits are the ones that haven’t gone to ratshit yet.)

The regional patient-management segments were always the most challenging bits of the NHS NPfIT; partly this was natural, because their function – a workflow, documentation, and management information system for the entirety of a major hospital’s operations – was by far the most complex in the project. The NHS National Network is a big VPN; the Spine needs to authenticate users, validate input, write to the DB, synchronise, and retrieve; but the patient management system needs to deal with all the possible pathways patients take through the hospital.

Partly, however, it is unnatural and caused by the politics of the project. The regions don’t actually correspond to any organisational entity in the NHS – they exist only for the IT project. They therefore have to replace existing systems that vary widely inside each region and cope with organisations in different chains of command. And each region was originally meant to be implemented by a different company; now, most of them have either given up or gone bust, and BT is doing much more work than previously planned, and this of course means that it has to deal with radically varying solutions already installed.

Worst of all, though, the regions mainly exist because the Government wanted to have the job done by the Big Consultancies – Accenture, EDS, and friends – that it was used to dealing with. Assuming that they wouldn’t be interested in small contracts, the Government invented a completely new organisational level in order to sweeten the deal. They further insisted on the contracts being covered by intense secrecy, which cut off any possibility of talking to the users. And the Big Consultants proceeded to move the actual development to the US and India to save money, thus avoiding any institutional knowledge that might somehow have seeped in.

Now, it looks like BT is planning to offer a “more tailored service” to the hospitals – which sounds a lot like “doing the requirements exercise we should have done back in 2001″. Of course, it’s going to cost money and nobody knows how much yet, but I suppose it’s progress, especially as the sacking of Fujitsu from the project means that it looks more and more like a BT job (London, the ex-Fujitsu South, the national projects, and perhaps more besides).

But it’s still not too late to take radical action. Part of the original plan involved using a common data exchange standard for the whole NHS; if this exists, there’s no need for much of the rest, especially not the regions and possibly not the Spine. We could define some goals and a set of data formats, then break out the cash to the individual hospitals, trusts etc to use themselves. In fact, when various US, Australian and Finnish hospital sysadmins tried that, they came up with the best healthcare IT system yet. The problem with the NHS NPfIT is quite simply that it didn’t listen to the bureaucrats.

Which is why this is sense. I have no idea what such a sensible set of ideas is doing in George Osborne’s in-tray, and I suspect the Tories may think it’s a way of preventing IT development in the public sector. But I think a cross-government requirement for common data standards, as much open source as possible, and perhaps even building everything with a sensible API for further development would do nothing but good. And perhaps the project cap might help – after all, the way to deal with zombies is to destroy the brain.

This really is getting strange. The Tories look worryingly convinced of the wisdom of a plan to build a gigantic airport in the North Sea, split between two separate islands, because you never need to change the runway a plane is going to depart from…right? At the same time, the Government is considering a gigantic tidal power scheme in the Bristol Channel. It’s like French engineering civil servants seized control in a bloodless coup.

In fact it’s not; they would at least think they were being rational, but surely not even the promoters of this weird rush to create Big Dumb Objects all over the shop can believe this.

On the one hand, you’ve got the Tories, who are trying to convince themselves that they can find £40 billion, before inevitable cost overruns, to create a operationally crippled airport 53 miles from central London and only 101 miles from the nearest point of Dutch territory, dependent for land transport on spare capacity on the CTRL and on the 6 (I think) Crossrail and 2 LTS train paths an hour slated for the Southend/Shoeburyness route, and for road access on pure handwaving.

BorisWatch deserves some kind of medal for their reporting here; they successfully derived the actual location of the project by following Boris’s boat trip in real time on ShipAIS, a ship-tracking ham radio site, and then prepared a handy Google Map, which is where I got the measurements from.

How often, I wonder, would Borisport be fogged in? Even with CATIIIA/B autoland it’s a serious constraint, and enough of it will stop ground operations even if you can still get in. And then there’s all those heat-seeking gulls to worry about; they hunt in packs! The air traffic control issues are pretty gnarly, too – departures conflicting with arrivals into LHR, LCY and LGW.

Further, they want to be seen as “green” whilst also creating another Heathrow-and-a-half. But why? What is it with this obsession with airports in the Thames estuary? As always, the key to the present lies in the past. Here’s the Hansard transcript of the debate on the Maplin Development Bill back in 1973. Three things come to mind – first of all, weren’t MPs great back then? Of course, there is the usual parish pumpery, Bufton Tuftonism and tiresome faff, but there’s also a lot of well-informed intelligent debate, and in the end the government lost!

Second, all the problems are still the same. This is because they are mostly what the Soviet general staff called the permanently-operating factors – terrain, human terrain, infrastructure. Third, there’s a fascinating bit of the social history of ideas here. We join the debate with Douglas Jay MP on his feet, following up an excellent showing (or shoeing) from Tony Crosland…

Mr. Jay: What was the pressure exerted on the Roskill Commission to omit Stansted from its short list? The Times told us on 4th March 1969 that its inclusion would have been “emotive”. At the same date the Financial Times said that its omission was “diplomatic”. The British Airports Authority and the Board of Trade assumed that it was bound to be on the short list. The British Airports Authority was even told that it need not ask the commission to put it on because it was certain to be included. Yet it was omitted, and the commission’s work was handicapped from the start. 700 Thus handicapped, in my opinion the Roskill Commission did its very best. Faced with the resulting choice between Foulness and a South Midlands site for which there is a good deal to be said, it came down decisively against Foulness and in favour of Cublington.

Then we had another curious alliance between landed interests in Bedfordshire and Buckinghamshire opposed to Cublington and commercial interests anxious to develop Foulness—

Mr. Norman Tebbit (Epping): Before the right hon. Gentleman leaves the point about Stansted, in fairness to my predecessor in this House I ought to say that he was one of those opposed to the Stansted project. I would never think of him as being in the pockets of wealthy landowners or any set of that kind. It happens that I disagree with him on this issue as on many others, but it is right to be fair to him. Incidentally, I have a house on the approach to Stansted too.

Mr. Jay: I never suggested that. I was recalling what happened. According to The Times of 5th April 1971, the group resisting Cublington spent £50,000 “to persuade the Roskill Commission that the airport should be built at Foulness and not at Cublington”—” not just that it should not be built at Cublington but that it should be built at Foulness.

After the Roskill Commission’s report, this group spent a great deal more, and the same article in The Times said that the pro-Foulness propaganda groups together spent “at least £700,000″ to convince the public and Parliament that Foulness was the right solution.

At this point Sir John Howard enters the argument. According to the article in The Times that I have quoted, he was head of a civil engineering firm and, incidentally, a former chairman of the National Union of Conservative and Unionist Associations, though no doubt that is irrelevant. He happened to live near Thurleigh in Bedfordshire and he founded the Thames Estuary Development Company to promote the Maplin project. The Times says that Sir John “first lighted on Foulness during the fight against Stansted, in which he was closely involved.”

He “lighted” on Foulness as it were by chance. His consortium, backed also by RTZ, John Mowlem and Shell, spent more than £500,000 in supporting the Foulness case. Much of the driving force in all this thus came not from people impressed with the merits of Foulness but from those who wanted to keep the airport away from other sites.

Here I return to the speech of the hon. Member for Southend, East. What was the opinion of more than 150,000 people living in the Southend area about this? That is for them and their representatives to say, and I am sure that we shall hear the hon. Member for Essex, South-East (Sir Bernard Braine)—

Sir Bernard Braine: I hope that the right hon. Gentleman will be accurate. There are 310,000 people living in the three constituencies bounded by the Thames and the Crouch who are affected by this proposal.

Mr. Jay: I always believe in understatements because they strengthen one’s case. The hon. Gentleman has strengthened my case further. What were the opinions of those 300,000 persons—far more than live within 20 miles round Stansted, perhaps three times as many? I am sure that the hon. Member for Southend, East will not question this as a fact. But I understand that with the support of the leader of the Southend Corporation the corporation took a share in Sir John Howard’s consortium, and the town clerk of Southend, according to The Times, became a director of it. Whether that was the best way of handling these matters, I have no doubt that all those concerned thought that they were acting in the best interests of Southend.

Sir S. McAdden: The right hon. Gentleman asked what were the opinions of the people of Southend. They were never consulted. This was a decision of the council to invest £100,000 of the ratepayers’ money in Tedco. The council thought that it would make £6 million. Instead, it has lost the lot.

Mr. Jay: It is what I have always suspected to be the truth. I stated it rather diffidently, but the hon Member for Southend, East has confirmed it. From the point of view of this House, the opinion of the Roskill Commission on Maplin is worth a good deal more than 702 that of this consortium formed in the way that I have described.

I am afraid that what emerges from the story is that both the selection of Maplin and the omission of Stansted have been influenced far too much by the money spent on the commercial publicity and far too little by serious consideration of the public interest.

I see Tebbit was already as much of an arse as he later became, too. Permanently operating factors in the human terrain.

More seriously, I’m fascinated by the fact that the whole idea of Maplin/Foulness/Sheppey/Marinair/Borisport pushed by three different Conservative administrations originates with a gaggle of Tory squires trying to win a planning row in some completely different bit of the country. I wonder if Sir John Howard ever seriously meant it? Or did it just get out of hand? The Tories always will be the party of the Landed Interest, just as when their first response to the great crash of 2008 was to look for handouts to their property-shark contingent; another permanently operating factor.

Meanwhile, over the wall, the Government has aimed squarely for a soggy compromise. My own views on Heathrow expansion are heterodox and unpopular. Here goes: I don’t particularly mind if aviation makes up 29% of the 2050 CO2 target, so long as we get there. Nobody sets out to emit CO2 – it’s waste, and when did you last hear of someone saying “Thank God our widget production line produces so many widget flakes we have to dispose of”? Converting stuff into more valuable stuff is what it’s all about, and any production of valueless stuff makes us poorer.

I’m with James Hansen on this one – it’s the coal-fired power stations, stupid, and the buildings. If we can’t fix the cars and buildings and power generation, it doesn’t matter a fucking jot what we do about aviation. Because, after all, buildings are easy, power and cars are getting easier, aeroplanes are hard. We’re not far now; look at this hub-drive electric motor project at Michelin. Solar and wind are now the leading sources of new electrical power.

And, if there must be expansion, it ought to be at an existing airport because of the ATC issues. And if we’re going to be expanding an existing airport, well, it may as well be the one the airlines want to use. Further, it’s good to maintain the various conventions that limit activity at Heathrow – I was surprised to see that mixed-mode operation accounted for almost a third of the expected capacity increase. And yes, I did hold this view when I lived there.

And if we’re doing this, we ought also to do other things, like building a north-south high-speed rail route and better public transport in general – saving oil and CO2 emissions for things that we can’t yet substitute. Like insisting on change to the European ATC system, which could save 10% or more of the air fuel requirement without pouring concrete or sacrificing anything at all. Like air-source heat pumps and insulation, or…well, enter your favourite project here.

Unfortunately, the government has no credibility on this. Neither does it have any credibility on the eventual target for movements at LHR anyway – they always burst the target, which isn’t included in an act of parliament and therefore is pretty meaningless. And their efforts to balance the Heathrow decisions are crap – a high speed rail “hub” at LHR? On a line from where to where? Great Western electrification is good, but this sounds like a piece of recreational investment that might seriously harm the prospects of building a proper LGV network.

And the responsible minister is Geoff. Fucking. Hoon. Of all people. Aren’t you in jail? Aren’t you dead yet? (I suppose that does not die which can eternal lie.) And so, I conclude, I’d better oppose it anyway. It’s the only way to be safe.

Meanwhile, across the way, the Tories want to “examine” high speed rail. Woo. More talk. And, ah, build a forty billion quid airport in the sea, whilst keeping Heathrow open as well (good luck with the 70-odd mile transfer!). As someone said:

Our government is pitiful, whoever you vote for.

They surely can’t mean this; back in 1969, the Foulness scheme was a political manoeuvre, a Straussian statement. I suspect its resurrection is something similar.

What are they trying to hide? Is this an effort to kibosh offshore wind development? Are Dave from PR, Gideon and Boris climate change deniers? Or what?

I don’t think Chris Dillow is right. Chris argues that the whole sordid fiasco about the Disasters Emergency Committee broadcast suggests the BBC should stop doing radio and TV news and put the money into long form reportage and documentaries instead; this, he reckons, would get people to read newspapers, would create a lot of really good reporting, and would channel traffic to great BBC bloggers. Further, he argues, broadcast news is usually content-free free content at best, and active misinformation at worst, and therefore we’d be better off with less of it.

I disagree. First of all, however, I agree that conventional broadcast news is crap. TV news is the worst of the lot – it’s very telling that the look-and-feel and the conventions are unchanging, and only the voiceover, the actual text component, carries any information. Interestingly, just as turning the sound down on the news eliminates all its content and replaces it with surreal attitudinising, providing all the furniture of TV news seems to do the opposite for the participants; whatever the text is, it’s on telly, and therefore they read it out. You ask Chris Morris.

But then, yesterday evening, I turn on the BBC radio news to hear a BBC journalist discussing the BBC with a BBC executive. She asks the great panjandrum “if this decision might have been influenced by recent situations”. Recent situations? It sounds like South African or East German radio news; presumably both of them, and their real audience – politicians and other BBC execs – know what they are saying, but I don’t. It must be either Gaza, or else Russell Brand, or conceivably the Hutton inquiry.

To put it another way, they were scared. This went on for twenty miles at motorway speeds; apparently broadcasting the DEC appeal would not in itself have endangered BBC impartiality, but the reaction to it might have. Put it another way: we’re scared of being beaten up by politicians, PR, astroturfers, Richard Littlejohn, etc. Vaclav Havel told a story of a village butcher who put a sign in the window on Revolution Day every year that said “Workers of the world, unite!” Of course, he didn’t do this because he hoped for unity among the workers of the world; he did it because the government said so. But what if the government had asked him to put up a sign saying “I am afraid and therefore obedient”?

Good point. He might have resisted; more likely he’d have found reasons to half-comply, or comply ineffectively. Perhaps he would have been out of town, or the sign would have fallen down in the night. Just a coincidence. And that is, after all, roughly what happened; when there was a further challenge to the authority of the Party, nobody wanted to put up the sign, and the whole thing fell down.

But there is a problem. You may think that broadcast news is fundamentally crappy, but this doesn’t mean it isn’t significant. The BBC retiring from the field would leave a lot of political space open to all kinds of availability entrepreneurs. The broadcast TV market would be left to the flaky (Five), the flaky (ITN), and the Murdoch. And you can’t base any plan on more people reading the Guardian or the FT or the Morning Star for that matter.

Huge efforts are made to influence the Big News; here, we have the bizarre tale of Glen Jenvey, the expert on online jihadis who knew everything there was to know about the ones who were his sockpuppets, until he was exposed by some truly great blogging. Then, we have the news that Blackfive is actually an arm of a mercenary company, which seems to think it’s an “online intelligence agency”. (Do they get comments from international arms dealers? Perhaps signed “Ed.”)

We should be chary of letting go any of the zone of sanity, which the BBC is still just within. Certainly, Chris is wrong to think that “news gathering” should be cut; I rather think that if he is right, everything else should go, and all resources go into journalism. High gloss is a profit centre for the BBC, but the public has a direct interest in an alternative source of reportage. After all, how much blog can be generated from one BBC story? Nobody else is going to maintain this raw capacity, unless they want to lie systematically.

This may or may not be significant, but soon after the An-12 exodus from Sharjah, a dodgy An-24RV was seriously damaged in Boosaso, northern Somalia. Interestingly, this aircraft (serial 47309406) had been used by Aero-Service in the DRC and also by UTAGE in West Africa, the firm involved in the Christmas Day 2003 727 crash.

Meanwhile, there are still BGIA movements being reported from Dubai. Interestingly, I recently saw (thanks to Alexandre Avrane at atdb) that BGIA actually started way back in 1996 as a freight broker, well before its activation as an airline in 2000. Which was the period when Richard Chichakli was setting up the Sharjah Airport Free Zone.

Speaking of BGIA, I recently reanalysed the flights in the Viktorfeed database whose destination wasn’t correctly geocoded. There are quite a few; Vfeed tries to match the destination given with an old DAFIF db which has two alternative names for each airfield (the match is the UNION of two SELECTs), but if it doesn’t find the airport, it marks the movement “Location Not Found” or “Unknown – Not Stated” as the case may be. In the first case, it appends whatever it was that was said. In both cases, it assigns a default location, which Soizick decided would be in the Bermuda Triangle.

Non-matches are caused by: nonstandard spellings, misspellings, outdated usages, and flights that give no destination but aren’t blank –   and ZZZ are common, perhaps as a workaround for some computer program at Dubai air traffic control that requires a nonzero destination field. In general, this is just the way when you’re working with data just trawled up out of the wild; you should always validate input, but that’s only a meaningful statement if your input comes from a user who responds to messages.

So, I got the data; for a lot of the movements, it was possible to clean up the destination string and get useful information. But these were the long tail; the great bulk, some 3,157 flights, simply gave no destination, and were operated by BGIA. Here’s the inevitable chart:

6e611d4e-e951-11dd-bfde-000255111976 Blog_this_caption

Update: Richard Chichakli has updated his Web site, and he seems to have got rid of the ravings about Nazis, &c. The domain name is now registered in Russia, but the site is still hosted in Toronto, where it always was. This suggests that he’s alive.

Two things. Marty Lederman of popular legal blog Balkinisation has just become the first blogger in good standing to join the Obama Administration. He’s going to be Deputy Assistant Attorney General in the Office of Legal Council.

That’s repellent schreibtischtäter John Yoo’s old job. I repeat, old Organ Failure Yoo has been replaced by Liberal Q. Weblog. That is, I think, change you can believe in. My advice; nothing dinky, Klotzen nicht Kleckern. Just seal the entire building in an evidence bag, like a forensic Christo.

Meanwhile, I wasn’t bothering with the inauguration, but look at this: people are posting to NANOG reporting downstream Internet traffic as much as double normal levels, even on networks that are 80% commercial customers rather than eyeballs. Apparently it’s coming through on port 8247, which is the one CNN’s streaming service uses. Apparently, some sysadmins are running their own mirrors of one stream or other and blocking the rest.

I’m trying to tally the uses of the phrase “middle class” in Britain. So far, I’ve come up with:

Synonym for “bourgeois” – which is problematic, because almost as soon as Marxism was invented, the idea that the bourgeoisie *owned* industry rather than managing it became obsolete. The middle class owns houses, it doesn’t own industry, except in the highly abstract sense of insurance or pension fund shareholdings.

And it certainly doesn’t own land. That’s the upper class; look at the circle around the princes, who mostly aren’t aristocratic or even very rich, but they are all landowners. There are as few Vodafone executives as there are asylum seekers. Ah, surely we’re getting somewhere? But isn’t that just a cheap version of the old distinction between the plutocracy and the aristocracy, the iron boss trying to ape the duke, a cliche of 19th century books? However, the top end of the middle class stereotypically buys property in the country as soon as they can afford to.

OK, the reductive sense; they are not the upper class, they are not the rich, they are not the working class. What is left between these lines must be the middle. But then, things that are described as “middle class” (estate cars, detached houses, Sainsburys) overlap the skilled working class and quite a bit at the top too. Politicians and advertisers draw a careful distinction between the C2s and the ABs.

Further, the suburbs are middle class, but so is London; most of the London so described is actually quite poor. The middle class is supposedly worried about private school fees and always votes Conservative, but statistically neither of these statements can possibly be true.

The middle class is sometimes used as a derisive term for what other European countries call the intelligentsia. At the same time, it supposedly doesn’t care what the intellectuals think. It is a national cliche that the middle class is a fearsome lobby, but also that it is incredibly surprising, faintly comic, and rather touching when its members are moved to protest.

My conclusion is that the phrase means everything and therefore nothing and should be decommissioned in an orderly fashion.

Phil Woolas is beneath contempt.

I was going to say more, but it would be waste verbiage.

So, there’s this rumour-surrounded gadget that GIYUS wants people to install on their computers as part of the War on Terror. Obviously, I wondered exactly how it worked; did it analyse the Web sites you visit semantically, so as to target its talking points precisely? Did it use some sort of social recommendation mechanism? Also, I was wondering if there was any way of characterising the network traffic it generated and estimating how many people are using it.

So I did the obvious thing and I actually downloaded it. It’s packaged as a Firefox extension (.xpi); extensions consist of JavaScript files for the application logic and XUL (XML User interface Language) for the look’n’feel, all wrapped up in a ZIP archive. If you don’t have the source of one, all you need to do is pass it through an archive tool and extract all files, and then you can read them in a text editor.

And actually, it’s kind of disappointing; no folksonomy, no textual analysis, not even crude keyword matching. It just grabs an RSS feed from ws.collactive.com, passing in the string “GIYUS”, presumably to ensure it gets the right one, checks if any items in it aren’t already cached, and if so, fires a graphical alert containing the message. It’s basically a e-mail list gussied up in Web2.0 finery, with the feature that it’s marginally less trivial to forward the content to nonsubscribers. It doesn’t even appear to spy on your browsing history.

Of course, there could be some server-side magic involved. You can usually get a rough idea of location from an IP address, and a rough idea is probably best in terms of hit-rate (you’ve a much better chance of getting your geotargeting right for “North London” than “Archway”). And you can draw some conclusions from browser credentials – OS, screen, browser type and version etc. For example, perhaps you’d want to serve the red meat civilian deaths are all a fake stuff to MSIE5/6 users in teh US heartland and the Decent Left stuff to Mac users in North London. So I considered actually installing the extension; but then I realised I didn’t actually want a simulated Melanie Phillips on my sofa any more than I wanted the real thing. However, it’s possible to view the feed on the Web anyway, so I checked.

But they may not even be doing that; I’m on a weird niche ISP, with a linux machine, in North London, and the feed I see at http://ws.giyus.org/points/list is deeply generic.

Surely, though, it’s possible to do better than this? I envisage a sort of Web force multiplier, that would analyse the texts you read as you browse and compute some kind of digest hash, and do the same for every link you send anyone else, stashing the hash of each link in a remote server. As you browse, it compares the hash of the current page with the ones in the DB, and returns a list of possibly appropriate arguments – the strength of this being that they could be data, poetry, code, pictures, video, or indeed anything. We could incorporate some sort of social element, too, to keep a check on quality.

Who here knows about corpus analysis? Most of the academic papers my casual search found gave me that “dog listening to music” feeling. What I need is something like a rather bad crypto hash function – one where two texts with different content would produce non-randomly different hashes. Obviously we’d filter the text with a list of stop words like search engines do, so as to strip out the tehs and ands. We could, for example, use (say) the distribution of words in Wikipedia as a common baseline, and measure how the distribution of significant words in the target texts differs from it.





Follow

Get every new post delivered to your Inbox.