How To Build A Metaverse

by Joe Flower


A version of this article appears in New Scientist, October 14, 1995
International Copyright 1995 New Scientist
Please see our free downloading policy.


  1. It's happening
  2. The Gathering of the VRMN
  3. Flying through molecules
  4. But what's it for?
  5. When will it come to my house?

. . . Hiro's not actually here at all. He's in a computer-generated universe that his computer is drawing onto his goggles and pumping into his earphones. . . . Hiro is approaching the Street. . . . It is the brilliantly lit boulevard that can be seen, miniaturized and backward, reflected in the lenses of his goggles. It does not really exist. But right now, millions of people are walking up and down it.
                               . . . from Snow Crash,  by Neal Stephenson

It's happening

Back in in 1990, when Neal Stephenson wrote those words in the first chapters of his novel, Snow Crash, he was painting a fantasy world.

Today? I've been there. There were no goggles, no earphone-pumped sound, but I have visited a cyberspace world, walked its streets, talked to its people, and sat under its trees.

Like many things that once were pure fantasy, from Dick Tracy's wrist communicator to telesurgery, somone's working on it, they've got a primitive working model, and (again like many another fantasy) the center of this effort is in California.

The Gathering of the VRMN

Over the past few years, the uses of virtual reality have grown in sophistication and power - the U.S. Army re-fights crucial tank battles of the Persian Gulf War using realistic tank simulators linked by computer; drug designers wrestle large molecules into place on simulations of the body's receptor sites; architects walk their clients through buildings that have not yet been built. Some of these uses, such as the Army simulations, use computers that are linked, sometimes thousands of miles apart. But the systems that can do that are very large, the software is custom code, and the communication lines that connect them are the most powerful available. As they used to say at magic shows, "Don't try this at home, kids."

Now all that is changing, and changing fast. Already users of high-end machines can move through virtual worlds on the Internet. In San Francisco, the programmers of the tiny startup Worlds, Inc., have built the first virtual world, like the "Metaverse" of Stephenson's Snow Crash. Block's away in the "multimedia gulch" called SoMa, the architects, painters and tech-heads of another startup, Construct, are madly spinning out virtual spaces, from art galleries to nanomachines mere atoms across. And they are hardly alone: In a wide range of organizations, research sites and companies, scores of programmers (who call themselves VRMN) are beavering away at the myriad possibilities of the new Internet protocol called VRML -- "virtual reality modeling language." Virtual worlds -- visual databanks, explorable molecules, user-built towns -- are already spreading across the Internet in "beta" test phase. Within a year they will be part of the online experience of the average Joe Internet, hanging out in a custom body in some online Nowhere Pub.

Like everything on the Internet, VRML is a group endeavor, but this particular thread can be traced to one person. Mark Pesce, of San Francisco, is a technically gifted net guru -- yet at times he sounds like a cross between Marshall McLuhan and Joseph Campbell, worrying about "the limits of holosthesia,"arguing in dense papers (with titles such as "Pathogenic Ontology in Cyberspace") that the online world is the breeding ground of the next mythology, and intoning: "The creation of a world necessarily implies the creation of a world view."

Back in the dim mists of the past -- 1993, which is ancient times in the swiftly-moving world of online technology -- Pesce and his colleague Anthony Parisi began to wonder whether it would be possible to link virtual reality worlds over standard Internet connections. The Geometry Center at the University of Minnesota had pioneered a graphic display method called OOGL ("Object-Oriented Graphic Language"), a format that allowed geometric forms to be displayed, and to be transmitted across the Internet, for work in mathematics, the physical sciences, and engineering. Autodesk had built an "programming library," a collection of pieces of code that created three-dimensional objects, called CDK ("Cyberspace Developers Kit"). Either of thses, with some re-working, could be used to display 3-D forms online, but it had not happened yet -- neither had kicked off the revolution Pesce and Parisi sought.

By the end of 1993 Pesce and Parisi had developed a three-dimensional interface to the World Wide Web (WWW), the network of interconnected graphic "pages" which resides, for the most part, on the Internet. "HTML," the language of WWW, places two-dimensional graphic objects within the space of a screen, and allows them to be linked to other objects, which might be bits of text or other graphic files. Pesce and Parisi called their extension of WWW "Labyrinth." It allowed WWW to link to 3-D graphic files, and added a third dimension, a distance between the viewer and the each object in the scene. With Labyrinth, instead of merely moving right and left, up and down through a graphic image, you could move around the image, looking at it from all sides.

From that point, the development of online virtual worlds proceeded in typical Internet/hacker fashion: rapidly, in public, by worldwide group endeavor, on several fronts at once, without any central budget.

Pesce and Parisi sent their findings to the Center for European Particle Physics (CERN) in Geneva, to Tim Berners-Lee, the software engineer who had developed the World Wide Web in 1989. He invited Pesce to address the Web's First International Conference in Geneva in the spring of 1994. The "Birds Of a Feather ("BOF") group who heard that talk decided it was time to build a universal specification for a Virtual Reality Modeling Language (VRML).

They saw that it was possible to send 3-D objects across a network, using very little bandwidth. But to do that, everyone would have to agree on the rules. A specification is an accepted set of such rules, ranging from the simple ("All distances are in meters, all angles in radians") to the complex ("Here's how we will describe a cone . . . ").

Brian Behlendorf of Wired Magazine quickly set up an electronic mailing list -- an email ring on which the group could hold its discussions. Within a week, over a thousand people had signed on.

The group decided to look for a piece of existing software to use as a matrix on which they could grow their VRML. They quickly settled on Open Inventor ASCII File Format from Silicon Graphics, Inc. (SGI) in Silicon Valley. Open Inventor is a "graphics toolkit" which renders 3-dimensional objects (including shapes, materials, textures, even lighting and shading) in a code of letters and numbers. Codes from such programs can produce surprisingly realistic objects on the computer screen -- objects that can turn in space, reflect light and shadow, and even mimic surfaces made of metal or cloth.

SGI gave its blessing to the project with permission to use the file format in the open market, and even contributed utilities, pieces of software that would help developers design VRML "browsers" for flying through these virtual worlds. SGI's Gavin Bell and Paul Strauss set to work adapting Open Inventor for VRML, with help from Pesce and Parisi, and design input from the mailing list. By late May of this year, they had a first pass: VRML 1.0. It didn't allow for interactivity (you can't pick something up and play with it) or behavior (no doors that open when you approach). There were no avatars (cyberspace bodies), no multiple users (you can't see anyone else), no animation (the trees don't move, smoke doesn't curl up into the sky), and no sound. But for the first time, the online world had developed a protocol that would allow people to build virtual realities in cyberspace.


The dimensions of the Street are fixed by a protocol, hammered out by the computer-graphics ninja overlords of the Association for Computing Machinery's Global Multimedia Protocol Group. . . .
                               . . . from Snow Crash,  by Neal Stephenson

The process of the VRML project is as fascinating as the product. It isn't just something that the techies in the back room are excited about, throwing together on their spare time, staying late at work after they've clocked out, cadging computer resources wherever they could find them. At the same time, it isn't just something cooked up in the boardrooms and the marketing departments of big corporations. It's both. There is big money in it, the development possibilities are huge, and the major companies know it. And the creative techie types are truly excited about it -- this is the kind of thing they would work on whether you paid them to or not.

The major companies caught on early that this could be the start of something big -- but if it were to work, it had to be a "standard," that is, everyone important had to sign on to the same specifications. So SGI and a partner, Template Graphics Software (TGS) in San Diego, began rounding up endorsers. In short order, eighteen organizations announced that they were in the VRML business in one way or another, including Digital Equipment Corporation (DEC); CERN in Geneva; Japan's NEC Technologies; Silicon Valley's Netscape Communications; Brown University in Boston and Oki Advanced Products in Marlborough, Massachusetts; the San Diego Supercomputer Center; Spyglass, Inc., in Naperville, Illinois; and the University of Darmstadt in Germany. Just as rapidly, new startups began forming around the idea. By 10 March 1995, months before the standard itself was ready, SGI and TGS announced the first commercial VRML browser, WebSpace, a piece of software that, by itself or "helping" a Web browser, allows users to navigate through three-dimensional VRML spaces. By August, 11 companies had put out some form of VRML browser or game system, at least in a "beta" or test version.

"I was surprised," says Bell, the primary author of VRML 1.0. "I was pessimistic that the mailing list approach would actually work. I thought it might be an endless thrash. But it seemed to work pretty well. [VRML] really is becoming a standard. It really shocked me."

Flying through molecules

To try VRML, I visited Construct, Inc., smack in San Francisco's multimedia hotbed, an area centered on South Park in the center of SoMa, the long run-down South of Market district, that has suddenly become the hip heart of cyberspace creativity. The startup was 28 days old the day I visited. The street-side door still said, "Interactive Media Festival," the name the organization had before it morphed into a company. The company offices reflected VRML's roots, struck as deeply in a visual aesthetic drive as in technological know-how: cheap, in an old brick building up steep, dark stairs, the offices themselves were light, airy, open, and punctuated with visuals, big posters, larger-than-life sculptures, an elephant-size hammock. The organization chart sported not only techies, but also several artists, an architect, and a "webmistress" -- not a Bay City diva of offbeat personal proclivities, but the woman in charge of the company's World Wide Web site.

Creative Director Mark Meadows ("pighed" on line) took me on a tour. I expected goggles and gloves. He sat me down at a 17-inch color monitor, with an ordinary keyboard and mouse -- hooked into a $50,000 SGI Indigo graphics workstation computer. An empty church-like theater appeared on the screen. Shadows filled the doorways. Overhead, light poured in the gothic curves of the windows and spilled across a coffered ceiling. "Let's take a look from the balcony," he said. Clicking on small instrument-like icons at the bottom of the screen, he "flew" us to the balcony, where we turned around and looked at the stage. "Gee, I wonder if anyone's left any change under the seats?" he said. By clicking the mouse cursor on small controls at the bottom of the screen, we "got down on our knees" and looked. No change.

We shifted scenes to a convention center, to an interactive art gallery, then to South Park itself. It looked like a simple model, all blocks, polygons, and bits of color, of the trees and grass of the Park and the buildings surrounding it -- except that we could "fly" over it, through it, even under it. I showed him the spot where I had parked my car -- he settled in and showed me what the view from the driver's seat would be, all rendered in simple polygons.

Finally we headed out of town to a collection of molecules in Germany. "What's your favorite molecule?" he asked. I asked for an enzyme. Any enzyme. He clicked on one of a long list of proferred molecules and there it was, like an enormous wire sculpture hanging in space. Part of the wire sculpture was filled in with atoms, blue, red, and green spheres bubbled together. "Here, you drive," he said, handing me the mouse. After a few false starts, I go the hang of it and flew through space toward the nearest white atom, then went into orbit around it. It turned below us like some vast pristine snow planet.

All this must take up huge amounts of memory, bandwidth, and computing power, right? Not necessarily. Three-dimensional reality can be surprisingly compact. A a 2-D rendition of the South Park model, a photograph for instance, could take from tens of thousands to several million bytes of computer memory, depending on how refined a picture you want. But the compressed 3D file takes fewer than ten thousand bytes. The reason is simple: for a photograph, the graphic file must describe each separate dot ("pixel") that is going to appear in the screen. The 3D file, in contrast, is simply a series of equations, describing the geometry and characteristics (color, contrast, and such) of each polygon. "It's very efficient," says Terry Baker, head of TGS, "it allows for a low-bandwidth connection."

Badly written 3D files can be agonizingly slow to appear on the screen, and roughly drawn when they do. Properly written, they 3D files can work surprisingly swiftly, and look amazingly real -- one abstract form showed the glint of wet brushed aluminum, the highlights shifting as we turned it. We toured on a powerful workstation, but most VRML software is designed to work well on a high-end 486 machine, a fairly standard personal computer.


Like any place in Reality, the Street is subject to development. Developers can build their own small streets feeding off the main one. They can build buildings, parks, signs, as well as things that do not exist in Reality, such as vast hovering overhead light shows, special neighborhoods where the rules of three-dimensional space-time are ignored, and free-combat zones where people can go to hunt and kill each other.
                               . . . from Snow Crash,  by Neal Stephenson

A few blocks away from Construct, I dropped into another startup, Worlds, Inc. -- and dropped straight into the first real online attempt at a Snow Crash-style virtual world. AlphaWorld is crude compared to Stephenson's "Metaverse" -- at this early stage, everyone's avatar looks the same. There is no sound -- people's conversation simply appears over their heads. Yet there is something marvelously charming about "walking" through a town (set in a valley in the middle of a palm desert) in which people have staked out lots, built houses, gardens, stores and ziggurats. One man started a newspaper. He had someone "build" newsboxes, which he set around the main corners in town. Click on a newsbox, and a copy of the latest edition downloads to your PC.


The only difference is that since the Street does not really exist -- it's just a computer-graphics protocol written down on a sheet of paper somewhere -- none of these things is being physically built. They are, rather, pieces of software, made available to the public over the worldwide fiber-optics network. When Hiro . . . looks down the Street and sees buildings and electric signs stretching off into the darkness, disappearing over the curve of the globe, he is actually staring at the graphic representations -- the user interfaces -- of myriad different pieces of software . . .
                               . . . from Snow Crash,  by Neal Stephenson

The town, with its few hundred residents, is already old enough to have etiquette. It is rude, and a mark of a "newbie," to walk through someone else's avatar, or to talk to them when they aren't facing you (since they can't see you or "hear" what you said).

In other Worlds, Inc. efforts I visited a virtual bank peopled entirely by "autobots" or "daemons," realistic-looking but stiff people who answer all my questions. I worked my way through a gorgeous baroque library to locate a medieval manuscript -- the library is a "graphic front end" for a real data base of graphic images of manuscripts. And I browsed a "chat room" filled, like most online chat rooms, with strangers who have nothing common to talk about. But here at least they have bodies, selected from the dozen or so avatars available for visitors. There are working escalators to ride up and down, elevators, doors that open when I approach, and a mirror in which to admire my avatar: a giant moth with a human-like body in the center.

Not everyone in the VRML rush is happy with Worlds, Inc. AlphaWorld does things that VRML can't do yet -- multiple users, interactivity, behavior -- not because no one else can do them, but because people haven't all agreed on the standard. Worlds, Inc. is pushing ahead with what it calls "VRML+" and pressing others to accept its standard and get on with it -- but so far, the consensus is not forming around them. As Dan Ambrosi, SGI's Product manager for 3D graphics software puts it, "There are some serious issues with `scalability.' If something works fine for dozens or hundreds of users, that's not enough. We are building something that will be used by millions of people at once across the globe -- and the foundations of the standard have to be able to handle that."


When Hiro first saw this place, ten years ago, the monorail hadn't been written yet; he and his buddies had to write car and motorcycle software in order to get around. They would take their software out and race it in the black desert of the electronic night.
                               . . . from Snow Crash,  by Neal Stephenson

But what's it for?

But is all this herculean effort just to give people virtual joyrides, chat rooms, and fun building towns that don't exist?

Think of it this way: what is it that gives you the feeling that someone is lying to you? What he is saying may be perfectly plausible, but there's something unsettling about the way he is saying it -- maybe his eyes are shifting, maybe there's a sweaty sheen on his forehead, there's something distant about his posture. We take in enormous amounts of subtle information at high speed through our senses, information that would not come through in mere text or even two-dimensional pictures. The rapidly-developing, global network of networks of computers has connected hundreds of millions of people in ways that they never would have thought possible - yet most of that communication still takes little advantage of the diverse and powerful sensorium that feeds the human mind. Almost all of it is forced through the narrow gateway of ASCII -- typed numbers and letters -- and the occasional picture, graphic, or chart. VRML is more than another online game. It is a serious effort to force wider the doors of perception and communication between people who are not physically present in the same space.

In a world in which networked, interactive, multi-user 3-D has become easy, high-resolution, relatively inexpensive, and widespread, all sorts of uses for VRML arise. Consider these scenarios:

* An architect walks her clients through a building. The building does not exist yet -- in fact, it's just a rough draft, an idea. The architect is in San Francisco, her clients are in New York and Toronto. In Louisville, people interested in a new public square and shopping area in the planning stages sit down at their computers, or at terminals in public kiosks in malls and office buildings, and take a stroll through the design. They click the mouse on a waterfall, a food court, or a hotel, and listen to the designer's explanation. When they are done they can type or speak their own comments and suggestions.

* A high school art teacher takes her students on a tour of the Parthenon in its original glory, pointing out the friezes and the proportions of the columns. She clicks on the statue of Athena and a brief text appears, describing the statue's enormous size and unusual decorations. Down the hall a history teacher troops his students into the Battle of Gaugemala, and science teachers fly their classes into a molecule, the ventricles of a living heart, or the fury of a supernova. None of them own the software, or the sophisticated rendering software that built the scenes. They are just visiting VRML sites.

* On her break at work, a woman strolls through a resort in Spain that she is thinking of visiting, looks out the window at the beach, then up at the mountains. Next she tries on a coat she might buy for the trip, in several different styles and colors. She never leaves her desk. The coat slips onto the shoulders of an avatar she has made just to her size.

* Engineers in Osaka and Seattle work together on a three-dimensional models of a new aircraft. They shift to three-dimensional graphs of performance projections -- the output of equations with three variables.

* After school, the kids in the computer lab wrestle with dragons, sword fight with each other, or explore caves with students from other schools -- all from the computers. Some visit their favorite 3-D MUSES -- multi-user simulated environments -- build-it-yourself virtual spaces where they can meet their friends, build a house, or design a new robo-puppet. Some type in their parents' credit card number and visit the newest Disney cyberspace theme parks.

* The boss and two top subordinates are off to the trade show in Frankfurt, but those left behind take some time to "jack in" and stroll the aisles of the virtual version, trying out products, picking up information, meeting the exhibitors. Trade shows that used to reach thousands now reach millions.

* A stock analyst flies through a data field that plots time vs. price vs. price/earnings ratio. The dots for the thousands of companies make clumps and swirls in space. He finds an outlier, a dot that sits off by itself. Curious, he flies toward it. As he approaches it, the dot expands and becomes a new data field full of information about the company.

* A man buying tickets to the Rolling Stones' Golden Jubilee Concert tries out various seats in the stadium before okaying the transaction.

* Wharfside in Kobe, a stevedore in a vast warehouse locates the pottery from Italy among the thousands of shipments on a three-dimensional display. A click of the mouse would highlight all the shipments from Mexico, or those that have not cleared customs, or those more than three days old.

When will it come to my house?

But many of these uses require far more than VRML 1.0 delivers. Giving things three-dimensional shape is just the first part. They need to behave realistically: doors should open, glass should reflect light. They should operate according to physical laws: a dropped glass should fall, people should not be able to walk through one another, heavy things should be hard to push around. They should interact: one object (say, a hand) should have some effect on another object (say, a pool cue). There should be other people there. People should have bodies ("avatars") that should look like them -- or like someone they happen to want to look like. Things should have shadows, There should be sound. SGI's Ambrosi says, "VRML 1.0 left out a lot of things so that we could get a consensus. Now the community has defined a set of features that will make for richer sites. Within a year, I think we'll have agreement on a standard that will support automated behaviors, interaction, and sound. In two years we will have multi-user capabilities."

This ultimate goal -- multiple users on all kinds of platforms across the Internet, interacting in 3-D -- has barriers that are as much organizational as technical. "There's a lot infrastructure you have to build into the Internet to do that," says Gavin Bell. There are questions like: how do you get the avatars past the firewalls," the security barriers that large systems put up to screen them from the Internet, "and how can you tell they aren't carrying viruses, or other mischief?"

The argument at the core of the next stage of VRML seems nearly philosophical, nearly god-like: a lot of things become a lot easier if you assume that the world you are setting out to create is finite, bounded, small enough to all fit on one computer at once. "But we want an infinite scale," says Bell. No one quite yet knows how to build an infinite 3-D world with its geometry, its behaviors, and its animations distributed all across the network. But Bell and his cohorts are determined to try.

In late August, a self-appointed 10-person "VRML Architecture Group" met for three days in San Francisco, laid out these tasks, assigned them to sub-committees, and set to work on them.

"We have a ways to go to get to a `Snow Crash'-style environment," says TGS' Terry Baker. "A scalable infrastructure for simulating virtual worlds and articulating avatars that interact with each other isn't there yet. But it'll be within range of the average computer consumer within two years."

In other words, it's coming soon -- and no one quite knows what it will turn into, what new layer it will add to the flow of information in and out of our lives.

VRML FAQ (Frequently Asked Questions) | The same FAQ in Japanese | Jim Race's VRML page on the Well | VRML Repository | More VRML information | Technology | Articles | Main Page