David Weinberger (dweinberger) Wed 30 May 07 13:09
Harmless Drudge, nicely put. But there's one more limitation of traditional library taxonomies: They require classifiers to decide what books are about. Some systems allow only one heading; some allow ten. Nevertheless, classifiers are deciding for others what a book is about. They have to in the first and second orders of order because someone has to, and professional classifiers are expert at it. And their decisions are almost always right. But there's no telling what a book will be about to a particular reader. In the digital world, we can have it all. We can feature the official classification on the home page and allow tags, too. (The U of Penn library has a hybrid system like this.) And, of course, thanks to thesauruses (compiled manually or inferred by computers), we can permit an indefinite number of ways of referring to the same object. There are, of course, exceptions where we absolutely need to find every scrap of information about a topic - e.g., a lawyer researching cases - in which case we'll insist on a controlled vocabulary. But, again, we can have it all. That's the first principle of the third order of order, imo.
Jon Lebkowsky (jonl) Wed 30 May 07 13:52
One concept you refer to in the book is that of natural "joints." Could you explain how knowing the world is like butchering an animal?
David Weinberger (dweinberger) Wed 30 May 07 16:17
Ah, Plato's phrase from the Phaedrus. He says that the skilled thinker/talker carves nature at its joints. It's such a vivid and obvious image. The notion is that the world comes divided up into natural units that come apart cleanly when we're thinking right. But when we're not, it's like hacking away at a bone. Plato is expressing a belief that there is a single, knowable order of the universe, just as there's a single, natural way to butcher a goat. A joint in this case is an essential property of a thing. But it turns out that every property or attribute of a thing can serve as a joint, i.e., a likeness by which we can cluster it with other things. So, big-and-roundness is one type of joint that enables us to carve up the Solar System in a way that gets most of the planets, while "has water" is a joint that gets us the objects in the Solar System that might support life, and "going counter-clockwise around the Sun" is a joint that gets us a different set of objects. There is an indefinite number of joints available to us. Which ones matter to us depends on what we're trying to do. We cluster one set of ingredients if we're trying to bake a cake and another set if we're looking for food to donate to the local open pantry. That's why the notion of there being a single way of carving up the universe doesn't make sense, short of G-d declaring some joints more real than others. The single order of the universe would be the one that was independent of all our projects and interests. That is, the single order of the universe would be, by definition, the one we don't care about.
bill braasch (bbraasch) Wed 30 May 07 17:33
That takes me right back to Zen and the Art of Motorcycle Maintenance. maybe we went the wrong way on that. I suppose a leading indicator of that would be the Geritol ads that pay for the network news broadcasts. To the kids, it's only news if it comes in a text message. Everything else is ads.
bill braasch (bbraasch) Wed 30 May 07 19:10
I saw a presentation recently by a company that built software for fraud detection in casinos. it basically matched up metadata to see who had something in common with someone else on their blacklist. The government came to see it and they're using it now. We're defining ourselves by the photos we tag, maybe the places we visit (http://www.plazes.com has that), the breadcrumbs we drop on twitter. It's a much richer model of the dog on the internet than we've had before. So far, I'm really me on the internets, but it might be handy to have a couple more me's, depending on who's looking. the kids have all seen this coming on facebook. we're miscellaneous until we add tags. How well tagged are you? How well do you think your tags define you?
Ari Davidow (ari) Wed 30 May 07 19:33
I'm interested in how we deal with synonyms. In traditional taxonomies there will be an authority file. That is sort of like a thesaurus, with the added attribute that if you type one of the synonyms, you will not only get back all results, but you'll also get information on the "authority" term. For some purposes I can see how that gets in the way of people typing according to how they will look for information again. But in a lot of cases, people shouldn't have to choose between SF and Frisco - the computer should be smart enough (or humans shoujld be able to point these synonyms out to the computer, which will then act on them). But is anyone supporting such a thing? I guess so, or we wouldn't have such smart 3rd generation searches. On the other hand, there is a difference between a search engine able to know which "Capri" a tag might refer to, and being able to know that "The Dude" should return the same results as a search on "The Big Lebowsky" or whatever - or that searches on Peking and Beijing will usually want to return the same results.
David Weinberger (dweinberger) Wed 30 May 07 20:46
bbraasch, you can often tell more about what a person is interested in by her tag cloud than from her explicitly constructed "profile" precisely because the tag cloud is based on implicit metadata. Someone in an article recently (sorry...too tired to try to find it!) said that the bottom of your Netflix queue is who you'd like to be and the top is who you are :)
David Weinberger (dweinberger) Wed 30 May 07 20:57
ari, I've suggested exactly that to Technorati.com (disclosure: I'm on their board of advisors). Right now, when you search there for "america," you are told the "related tags" are politics, bush, iraq, news, war, usa, religion, government, terrorism, and islam. Now, only one of those is a synonym, and Technorati doesn't know which one that is. It only knows that where the "america" tag is used, those other tags are likely also to be used. Over time, with enough tags, perhaps algorithms will be able to figure out that "america" and "usa" are (nearly) synonyms, and that not only are "San Francisco" and "Frisco" synonyms, that city is in CA. Or perhaps someone will allow users to click on the tags that are synonyms, to help our poor, silicon-based partners along. Or maybe Technorati will just buy a damn thesaurus and gazetteer. One way or another, we're going to get there, or at least get much better at it.
Brian Slesinsky (bslesins) Wed 30 May 07 23:33
One way to reduce unnecessary synonyms is simply to spell out subject headings. That is, tag things with "Worldwide Association Confederation Guild 2007," not just wacg07. Of course that's harder to type, but that's what field auto-completion is for. Tags strike me as abbreviations that we should just get over. In programming, we've seen this already, where older languages used names like strlen and newer languages use String.getLength(). Having to remember abbreviations (other than the truly common ones) is just not worth it. Correctly-spelled words are a wonderful standard and we should make the most of it. I'm actually very impressed by how Wikipedia manages its namespace. They just give each article an encyclopedia-like name and add disambiguation pages when needed. So many sites got it wrong by using a Yahoo directory-like hierarchy rather than a flat namespace, and even most Wikis got it wrong by using InterCaps rather than sticking to regular English. (And the same applies for many other languages of course.)
David Weinberger (dweinberger) Thu 31 May 07 03:49
bslesins, really interesting point. It's the first time I've heard yor suggestion, which seems so obvious (the mark of many a good thought). It'd help, of course, if tagging engines uniformly allowed us to use spaces as characters in tags. But that's just whining. Prolix tags work in the example I gave of a conference encouraging bloggers to use a uniform tag. That example has some peculiarities, though: 1. A central authority can stipulate a particular vocabulary; 2. Taggers are highly motivated to use the standardized tag (because they want their posts to be included in the conference cluster); 3. Taggers are tagging more than one item -- all their posts about the conf -- with that tag, so auto-complete can amortize their labor. In many other environments, no one is in a position to stipulate the tag set, in which case prolix tags may actually increase the number of synonyms: You tag the photo "New York City at sunset," but I tag it "Night falls on Manhattan." In such a case, having a mix of short tags would probably make it easier for a computer to figure out that the tags are related. But, I am not a computer scientist (IANACS). Also, tags have succeeded even though people generally hate explicitly creating metadata in part because tags don't take a lot of thought or typing. Increasing either of those is likely to decrease the number of tags. Somewhere there's a balance of convenience and tolerance of ambiguity that we will strike. Wikipedia is a great example of so many things, including what you point to, Brian. In my book I point out that the list Wikipedia presents of disambiguated meanings of "elephant" is interesting on its own terms. But, of course, Wikipedia is a highly cultivated garden, with a single name for each article (which is part of your point). Tags, on the other hand, are usually accidental gardens. We want to allow multiple ways of saying the same thing with a tag because we want people to remember their stuff the way they want to. Besides, synonyms are rarely fully synonymous; names are a special case, but even they rarely have a single way of being expressed. Right, bslesins? I mean Brian. I mean the Brimeister. :) So, I admire Wikipedia's way of solving its particular problem within its particular constraints. It works. The lesson I draw is not that this is a generalized solution (not that that's what you're suggesting, Brian). Rather, it's that solutions need to be particularized.
Jon Lebkowsky (jonl) Thu 31 May 07 03:56
But the problem is in lacking a consistent approach across many systems, no? E.g. some systems don't handle tags with spaces, others do. I generally don't create tags with spaces for that reason, though that's probably relevant to the fact that I lean more toward social tagging than selfish tagging.
Jon Lebkowsky (jonl) Thu 31 May 07 03:57
David's post slipped in while I was typing.
Jon Lebkowsky (jonl) Thu 31 May 07 04:07
David, it occurs to me, reading your last post, that metadata can have its own metadata, like contextual data. I.e. maybe a system looks, not just at a tag, but at the context for that tag in assessing its meaning or relevance.
Sharon Lynne Fisher (slf) Thu 31 May 07 05:17
In your example above, "You appear to be tagging this picture with Manhattan. Our system has 373,082 tags with Manhattan, and 1,483,217 tags with New York. Do you want to modify this tag? Replace Add Leave the way it is"
Jon Lebkowsky (jonl) Thu 31 May 07 06:42
Is this the origin of the phrase "I'll take Manhattan"?
David Weinberger (dweinberger) Thu 31 May 07 06:50
jonl and slf, these are questions that can only be answered in practice. It depends on what the site is trying to accomplish. If it's trying to pinpoint precise answers, one set of practices is appropriate. If it's just trying to show you some photos of a city before you go there on vacation, another set is called for. In general, I think the best guideline is to allow for as much messiness as possible. (Of course, the "as possible" brings it back to particularities.) This is advisable because getting users to explicitly create well-behaved, well-formulated metadata not only is a burden that will chase many users away, it results in metadata that, because it is explicit, is not as rich. So, rather than insist that users use a controlled vocabulary, it'd be better (usually) to let them use whatever words they want, and then have the computers sort it out. That's true for the meta-metadata that helps us contextualize tags. We have only scratched the surface of the meta-metadata that's just sitting around for us to grab it (or deduce it). That should be our first resort... ...imo, and always paying attention to the particular needs and aims of the users.
Jon Lebkowsky (jonl) Thu 31 May 07 07:11
I think it's really interesting how those needs and aims can coevolve with the technology, where a tool may show me something I couldn't do before and didn't know I needed, but once I have it, my behavior with it may suggest tweaks to the developer. Hence the "perpetual beta" development loop, which amounts to a conversation between developers and users. I think that's how we got the evolution of social from selfish tagging. The developer and user share authority for emerging technologies. And speaking of authority, that seems to be a key theme of the book. You keep showing how the location and flow of authority has changed. In that sense, isn't the book implicitly about politics?
David Weinberger (dweinberger) Thu 31 May 07 08:29
Yes, jonl, the book is implicitly about politics in the extended sense of the term. That is, it's about the shift in the locus and nature of authority that has accrued to those who have done the job of filtering and organizing knowledge -- a job shaped by the accidental nature of paper. Ironically (?), politics itself is likely to be one of the last hold outs in the democratization of knowledge. Politicians are such dedicated, immersed marketers!
Jef Poskanzer (jef) Thu 31 May 07 08:30
I'd like to see a collaborative filtering system that lets me give thumbs up / thumbs down on individual acts of tagging, building up a trust rating for other taggers. I'd like to be able to mark pairs of tags as synonyms. And give thumbs up / thumbs down to other peoples' synonym links. I'd like this to work for other objects besides tags, e.g. flickr groups - same problems. And I'd like a pony.
David Weinberger (dweinberger) Thu 31 May 07 09:12
jef, interesting ideas. Finding a good tagger is a good thing. Thumbs-upping/downing them is one approach. So is having our computers derive which ones we trust by watching what we do. Eventually we'll invent them all. As far as marking synonyms, you may like Freebase, although it's not quite what you're looking for. It's a wiki-based approach to coming up with metadata categories for bunches of different domains (businesses, movies, etc.), and then to collaboratively filling in those metadata categories for as many entities we can find. Very very interesting. PS: Your pony is in the mail.
Jef Poskanzer (jef) Thu 31 May 07 09:27
I got a Freebase beta account and totally couldn't understand it.
Ari Davidow (ari) Thu 31 May 07 09:34
You know, when a problem is entirely understood, it is easy to look at it and say "this is the best practice". Within some limits, Library of Congress classification works for other libraries, and probably works better than Dewey. But tags are entirely new, already have several meanings, and are used to define material in many different contexts. It feels like we are more constructive working out how to help people make links than trying to get people to all tag things the same way with the same terms. In a broad sense, aren't tags an effort to get away from the idea that one set of terms can apply to a single item? David, I'm only a couple of chapters into the new book so far, but am finding it fascinating. Many thanks.
bill braasch (bbraasch) Thu 31 May 07 11:17
the wisdom of the commons is cluttered. no pony for jef til we unclutter the commons. I think it will be an old pony.
Jim Leftwich (jleft) Thu 31 May 07 11:59
<scribbled by jleft Thu 31 May 07 11:59>
James Leftwich, IDSA (jleft) Thu 31 May 07 11:59
Hi David. You've written a very important book, and this is a great inkwell conversation. I'm a product and software interface designer that began working with conceptual models of metadata-driven systems since the late 1980s. My earliest metadata-driven OS/internet model was InfoSpace, which was presented at 3CyberConf at UT Austin in 1993: <http://www.well.com/user/jleft/orbit/infospace/index.html> Your book is the first I've seen to address many of the same issues which I'd discussed and illustrated starting back then and continuing on through 1999, culminating with a topic here on the WELL. In the WIRED conference there's an entire topic where I laid out many of the same metadata concepts and issues you're addressing in your book, but I had arrived at them from the direction of the user interface, where I realized early on that it was the underlying metadata model, and how it could be used to allow interactive visualization of complex and interrelated data (retrieved from queries). I appreciate how incredibly difficult it is to express models regarding metadata systems, and this is captured in that topic. Topic 327 [wired]: (jleft)'s Prophecy: The Visualization Revolution There's also a set of slides that I'd done in 1997, which I've lectured on in several presentations, which make some of the same points I've picked up from Cory Doctorow's posts about your book on Boing Boing: These don't have the text of my lectures associated with them, but I'm betting you'll realize many of the same underlying issues presented by them. <http://www.well.com/user/jleft/orbit/vizrev/slides/> And these slides in particular: <http://www.well.com/user/jleft/orbit/vizrev/slides/1.html> <http://www.well.com/user/jleft/orbit/vizrev/slides/2.html> <http://www.well.com/user/jleft/orbit/vizrev/slides/3.html> <http://www.well.com/user/jleft/orbit/vizrev/slides/4.html> <http://www.well.com/user/jleft/orbit/vizrev/slides/5.html. <http://www.well.com/user/jleft/orbit/vizrev/slides/6.html> <http://www.well.com/user/jleft/orbit/vizrev/slides/8.html> In general these show how the current (1990s, but still existing today) model of a singular presentation of data (both in structure as well as embodiment) could be evolved towards one that used a number of metadata models to enable a second user-side interface. Very difficult to present in the limited text forum here on the WELL, but much easier to discuss verbally and with examples. I'd certainly appreciate the opportunity to discuss this work with you at some point.
Members: Enter the conference to participate