We’ve Found the Last Mile of Filtering

… and it is human. Try as we might in computer science we have not created a piece of software able to pass the Turing Test.

An interrogator is connected to one person and one machine via a terminal, therefore can’t see her counterparts. Her task is to find out which of the two candidates is the machine, and which is the human only by asking them questions. If the machine can “fool” the interrogator, it is intelligent.

The Turing Test has long been used as a litmus test for intelligence. What the test really assesses is the ability to understand the dynamic state space of context produced by human language. And here we’re only talking about written language, not the really hard stuff like vocal inflection and body language that get added to the mix in face 2 face conversation. The state space is dynamic because languages change over time, words change meaning or are used in different ways, or new words invented. There is a great post that talks about the use of jargon as conversational shorthand here.

Filtering unstructured data possesses the same qualities as the Turing Test. The interrogator would like to find intelligent commentary on their subject of interest. There are many ways to whittle down the problem and some are chainsaws that can eliminate broad swaths of non-relevant content (think spam and porn - unless that’s what you’re searching for of course) in a single stroke. And, there are some especially useful tricks you can use when constraining the problem to a limited space (like publicly traded companies). But at the end of the road of filtering it is you and I that make the call on the quality of data because it is subject to our tastes, our interests, our context.

This is what Sergey and Larry cottoned on to when developing Google page rank and the guys at Digg tried to get at. User voting, either implicit via links or explicit Digging gets part of the way there, but ends up short of the goal. In order to get as far down that last mile of filtering as possible we’ve employed human technology (or as Bezos puts it in his Mechanical Turk, “artificial artificial intelligence”). This greatly simplifies the problem that must be solved, which becomes, how do you pre-filter the data enough so that the human filtering problem is scalable.

More on that later…

Bookmark this article! [?]

BlinkbitsBlinkListsBlogLinesBlogmarksBuddymarksCiteULikeCo.mmentsDel.icio.usDiggDiigo

FarkFeed Me LinksFurlGoogleLinkagogoma.gnoliaNetvouzNewsvinePropellerRawsugar

RedditRojoSimpySphinnSpurlSquidooStumbleUponTailrankTechnoratiYahoo

Tags: Social Media Analytics //

Discussion Area - Leave a Comment