A Prosody of Space / Non-linear Time

Part I: Background: Linear Prosody[1]

Dimensions of Inequality Among Syllables

Prosody in the English language proceeds from the axiom that syllables are not all created equal; many effects in prosody derive from the time-plot of these inequalities, along various dimensions. The most well known of these is the familiar stress degree, but we will quickly review others.

Pitch Degrees

The usual approach to pitch in prosody is to consider it a "curve", the intonation curve. However, there is a manner of recitation at work in many American communities, most notably in a style of reading in the black community, in which tight-knit patterns of time of various pitches are articulated, in much the same way that stress occurs in more traditional prosodies. This is a very rich prosody that deserves to be studied in its own right. A predominantly pitch degree prosody will have very different characteristics than a predominantly stress degree prosody. Pitch is a purely acoustical property, as opposed to stress, which is a linguistic property which is quite difficult to define acoustically. Thus a pitch degree prosody is much closer to music (in the literal sense of the term); a pitch degree prosody is freer to use an absolute musical sense of time, whereas a stress degree prosody is more likely to be based on "linguistic time", which works differently. (See footnote 8 below.) Not all phonemes carry pitch; a pitch degree prosody may thus change the sound structure balance for how phonemes relate to one another. Where both pitch degree and stress degree prosodies occur simultaneously, incredibly subtle effects are possible.

Vowel Position Degrees

In explaining the meaning of the term "Tone Leading Vowels" as it pertained to the prosody of Ezra Pound (Duncan, 1973), Robert Duncan explained the term as meaning two things: (1) Where a diphthong (a glide between one "pure vowel" and another) occurs, the leading pure vowel of the glide plays a special role. (2) A sound is reinforced when you hear it again, but can also be reinforced when you don't hear it again. A similar concept to this second point is the idea that vowels form clusters according to the position of the mouth when they are articulated; the tight-knit pattern in time delineating which of these clusters is active can form a prosody, much like the stress degree or pitch degree prosody.

Stress Degrees: The Classical Stereotype

Of course the most familiar basis for metrics is the tight-knit pattern in time formed by stress degrees. Stress has been extensively studied in linguistics; see e.g. (Chomsky and Halle, 1968). Before explaining a methodology for how metrical studies of contemporary poetry can be conducted, we review what we will call the stereotype theory for how the stress-degree metric is supposed to operate. (This stereotype has become a significant obstacle in pursuing prosody of contemporary poetry, so it would be well to understand it before considering a different approach.) Classical prosody starts with an a priori inventory of templates of stress degree patterns (e.g. iamb, trochee, anapest, etc.). Scanning is the process of matching these templates to the poem; where repeated instances of a single template match, end-to-end, the line or poem is said to "scan". It is important to note that the word `foot' is profoundly ambiguous in this process, having at least the following two meanings: (1) We speak of a foot as meaning one of the templates. In this usage, `foot' is an abstract concept which exists in advance of any particular poem. (2) We may refer to the actual syllables in a poem matched by a template as being a foot. In this usage, `foot' is a part of a living breathing poem, and as such is a unit of rhythm intermediate between the syllable and the metric line. Much of the poetics that has been influential since the fifties and sixties has focused away from the a priori (Olson, 1967), (Ginsberg, 1971) giving many poets a profoundly uncomfortable feeling about a template concept of metrics. Because the word `foot' is used in these two ways, this has unfortunately led to a neglect of the concept of a non a priori unit of metrics as part of the poem itself which is intermediate between the syllable and the metric line -- and to a decline in opinions as to the value of metrics as a whole. We now give a new treatment to the concept of an intermediate unit of meter which avoids emphasis on the a priori and does not use any concept of template. To avoid confusion, we will abandon the use of the word `foot', and instead use the term `measure'.

Bonding Strength

Another dimension of inequality among syllables (really of syllable boundaries) is bonding strength: the degree of attraction of a syllable to the one ahead of it or behind it. Bonding strength may be defined as the extent to which an artificially injected pause at a particular syllable boundary seems natural or not when compared to the "normal" way the poet would recite the line. Syllable boundaries will differ in their degree of bonding strength; by collecting together into a single unit those syllables where the bonding strength is high one obtains a measure. It cannot be emphasized strongly enough that the assessment of where the measure boundaries are must take place with respect to a particular recitation -- presumably the poet's. A printed text of the poem on the page may not give sufficient information without a sound recording. In this methodology, scanning consists of identifying where the measure boundaries are, where the rhythmic line boundaries are (a rhythmic line is a cluster of measures connected by somewhat higher bonding strength, just as a measure is a cluster of syllables connected by the highest degree of bonding strength), and then attempting to discern whether there may (or may not!) be any regularity to how measures are constructed. Rather than speaking of a poem as being "written in" a meter, meaning a conscious a priori choice of template, one examines the poem empirically to determine if there simply happens to be some regularity to the way the measures are constructed.

The "Standard Measure"

This methodology need not be restricted to poetry: any recitation can be scanned. The statement is often made that English "is" iambic. Instead we find that measure boundaries in English prose tend to be constructed as follows: (1) a measure has only one major stress; (2) the measure tends to end on a major stress, but: (3) if there are unstressed syllables following the major stress out to the end of a major grammatical unit, those unstressed syllables will also be incorporated into the measure. (This may be thought of as a way of restating the tendency of English toward the iambic, but institutionalizing the many counterexamples.) Measures constructed in this way may be called standard measures. Of course not all measure boundaries in poetry will be standard measure boundaries: Robert Creeley, for instance, is well known for having many non-standard measure boundaries in his poems. Interestingly, when Creeley's poems are actually scanned, the results show that while there may be non-standard measure boundaries at the end of the rhythmic line, many lines contain two measures, and in these lines the internal measure boundary is a standard one. I.e. the celebrated Creeley line-break really is a line-break and not a measure break. The non-standard measure boundaries are very easy to hear, but the internal standard measure boundaries are much more subtle. Of course if they were missing, we would certainly hear the result as a flat, lifelessly too regular, much less interesting rhythm. This structure of Creeley's lines may be described as an offset structure: the sound structure of the line endings is clearly articulated, but the grammatical structure proceeds from the middle of one line to the middle of the next. The offset structure is an extremely venerable structure in prosody, going back at least to Anglo Saxon times.

Part II: Non-linear Prosody

Bonding Strength is Spatial

Above we described bonding strength as the attraction of a syllable to the syllable ahead of it or behind it. Although prosody is normally interpreted as how the sound structure works in time, clearly the concept "adjacent" is a spatial concept; thus bonding strength may also be interpreted as a spatial concept, and as such can work in any topology, including a non-linear one. Where above we defined bonding strength as the tendency of a syllable boundary to resist injection of an artificial pause (a time concept), we could just as easily have described it as the tendency to resist artificial injection of space. It should be noted that in one dimension, space and time are nearly the same thing; in the more complex topologies of non-linear writing, as we shall see, this is not the case.

A Review of Hypertext Structure Terminology

A framework for structuring hypertext activity was introduced in (Rosenberg, 1996, "Structure"); we review it briefly here. By hypertext we mean a text where from the user's point of view the text contains embedded structure operations. I.e. the text contains interactive devices which trigger activities. The most familiar of these is the by now well known hypertext link, but many other types are possible.[2] For instance, this author's work contains devices called simultaneities, in which groups of words are layered on top of one another; by moving the mouse among no-click hot spots, the different layers are revealed. Research hypertext systems have been built based on both set models and relation models, and spatial hypertexts have been constructed using such concepts as piles and lists. In all of these cases, the hypertext is operated by performing activities; in the small these activities consist of such actions as following a link, opening up a pile or simultaneity, etc. To such small-scale activities the framework of (Rosenberg, 1996, "Structure") gives the name acteme. In the node-link model of hypertext, the acteme of following a link may be described as disjunctive (from the term disjunction, which in logic is an or): from a given position in the hypertext one may have a choice of several links, i.e. one may choose link A or link B or link C. Other forms of acteme may be described as conjunctive: a simultaneity with layers A, B, C consists of A and B and C.[3] A hypertext can use both kinds of actemes together, and clearly in a poetry context it would even make sense to have actemes whose conjunctivity or disjunctivity was ambiguous.

In most cases the text in a hypertext appears in units called lexia (a term borrowed by George Landow from the writings of Barthes -- see (Landow, 1992).) In a typical node-link hypertext, the lexia is the unit of text at either end of a link; often (though not inevitably!) the lexia has an internal structure which is simply linear. (As we will see, particularly in the context of poetry, the concept of lexia is extremely problematical.)

As the user navigates a hypertext, activities will (hopefully) cohere together into units called episodes. For a node-link hypertext, the episode will tend to be all or part of a path. It must be noted that not all activities will necessarily resolve into an episode. Some activities might be done "by accident". E.g. in pulling down a menu of possible links, one might let up on the mouse unintentionally and follow a link other than the one intended. The user may backtrack, having decided that performing an activity got nowhere. (Backtracking is complex; it may or may not "revoke" membership of an acteme in the episode.) Thus, episode is not the same thing as history. At a certain point the user may not "have" an episode at all, and might indeed be best described as foraging for an episode. The episode is an emergent concept; it emerges retroactively. Ideally, episode structure emerges through use of a gathering interface. Unfortunately, gathering interfaces are still in a quite primitive state, and those that do exist are more related to gathering "bookmarks" than to assembling a full picture of hypertext activity.

Prosody Within the Lexia

In many cases -- perhaps (alas!) most cases -- the lexia is structured linearly. Under these conditions, within-lexia prosody embeds most if not all of traditional linear prosody. Not much need be said here; indeed one would be hard put to make the case that there is any difference at all between within-lexia prosody for a linearly structured lexia and the prosody of the printed page. However, there is no reason at all to suppose the lexia must inevitably be linear. (See (Moulthrop, 1992), (Rosenberg, 1994) for more discussion on linearity of the lexia.) In this section we consider within-lexia prosody issues for a non-linearly structured lexia.

Consider Figure 1, which shows a single screen from a simultaneity taken from a work by the author, (Rosenberg, 1996, Diffractions through). This screen can be read in at least two completely different ways. (1) It can be read polylinearly. The words with the same font can be read as a linear skein, beginning with the word which is capitalized. (2) The graphically clustered fragments of these phrases can be "read in snatches" with the eye wandering about the surface of the screen picking up groups of words and associating them in whatever way seems to work. As discussed in (Rosenberg, 1994), even a simple polylinear reading poses difficult questions for the concept of lexia: is "the lexia" the entire screen, or one of the skeins? A computer-oriented view of the lexia would tend to regard the lexia as whatever is visible on the screen when there is no input to the computer: the mouse is not moved, and no key is pressed. In this case the entire screen should count as one lexia. What happens, in terms of prosody, as the eye moves from one phrase to another? Is this time which "doesn't count" -- a kind of time out, in which there is no prosody?[4] If indeed the time between phrases "doesn't count", we may describe the time units within the skeins as disengaged from one another. This concept of disengaged time is one to which we will return. Or perhaps the prosody of the individual skein, together with the layout of the screen, helps determine when the next phrase begins, in which case the time between skeins definitely is part of the prosody.[5] A lexia with this type of polylinear structure is inherently ambiguous concerning the prosody of what happens between phrases. Still another possibility is simply to say the time relationship between phrases is in the reader's hands completely. Of course something will happen when the poet "recites" such a lexia: a choice will in fact be made. In this case there may be a painful contradiction between the desire of the poet to present the work in a context where oral experience is expected, vs. an inherent aesthetic desire to leave open as many options as possible for the reader.

Figure 1

These issues become even more difficult if we use method 2 to read this screen. What is the prosodic relationship between these clusters of words, read by a kind of "visual wandering"? In this case linearity is so seriously fragmented that the reader may have an impression of the words disengaging from time altogether, with prosody relationships becoming entirely spatial.

Prosody Through the Episode

There is no reason to assume that prosody should be confined within the lexia. In this section we explore issues of prosody within the episode as a whole that go beyond the boundaries of the lexia. "Text" occurs in many places in a hypertext besides the obvious text in the lexia. There is also some amount of text in the devices of the hypertext mechanism itself. For instance, many hypertext systems allow the user to bring up a menu of possible outgoing links. Such a menu is unarguably text. What role does such a menu play in prosody?[6] One approach is to consider the menu of link names as a text object in its own right. In (Larsen, 1996) the author constructs poems from assembled link names. This approach, while interesting, simply reconstitutes the menu of link names as a different form of lexia, though one which has a somewhat complex structural relationship to the lexia from which it was popped up. Another approach is to consider a link name as a "prosody channel" connecting the text at either end of the link. It is typical in hypertext to assume that the user will choose a link based on semantic or logical criteria, but in a poetry context there is no reason to assume prosody is any less valid as a means of choosing a link. To use the terminology we've been using throughout: bonding strength can operate through the link; bonding strength may even be the basis for choosing a link in the first place. It makes sense to speak of a "two-dimensional" prosody in assessing the relationship of prosody within the lexia to prosody through the link. (Or even three dimensional, if the lexia is spatial.) One point worth noting here is that the concept of bonding strength -- the attraction of two text elements across a real or artificially thought-imposed boundary -- sounds quite symmetrical, whereas most hypertext links are one-directional.[7] But the directionality of the hypertext link is not really different from the directionality of time in conventional prosody. It may be true that considering bonding strength through the link reverses the direction of attention compared to the direction of the link, but we do the same for the direction of time in assessing linear prosody.

At its most conservative, a hypertext treats the lexia as a full-fledged document in its own right; the interactive devices, such as links, may even be seen merely as devices for visiting "traditional" documents. A more radical approach treats the episode as a virtual document. (Rosenberg, 1996, "Structure"). In this approach the "center of gravity" is no longer within the lexia, but in what emerges through the use of interactive devices. (At its most extreme, meaning -- and even syntax (Rosenberg, 1996, "Structure") -- are more properly a function of the episode than the lexia. What are the implications for prosody of the episode treated as a virtual document? This is related to a second question: What is the structure of the episode? One obvious answer to this second question is that the episode is structured linearly by time. If we accept this idea, then prosody within the episode seems little different from prosody within the lexia, except that the user has chosen the interactions. In the disjunctive case the user has chosen which alternative to follow in operating a given acteme, and in the conjunctive case the user has chosen the order of visiting various elements -- and of course the reader has the choice of how much repetition is desired; in any case the user has complete control over how much time is spent in any given place in the hypertext. The sense that many alternatives are possible at a given hinge point in the prosody may create the sense of that spot as a slot into which different continuations can be plugged; this very multiplicity may create a sense that some combination of some or all of the continuations is what in fact actually connects to the hinge point -- thus undermining the purity of the concept of disjunctive hypertext.

But is the episode necessarily linear? (Rosenberg, 1996, "Structure") argued that the structure of the episode is what we make of it given the gathering interface that is available. (Alas, for most commercially available hypertext software, there is either no gathering interface at all, or it is at best hideously primitive.) A gathering interface is in effect a hypertext the user constructs of gatherings from the hypertext being read. It may use spatial or conjunctive methods, even if the hypertext being read uses a pure node-link model. For an example of a commercial gathering interface operating on the World Wide Web, see (Bernstein, 1996).

How Does Time Run in a Non-linear Poem?

Clearly this is a large question, and even though much of this paper has dealt with a spatial as opposed to a time-based approach to prosody, one can hardly leave time out of the picture in a full consideration of prosody. First, it is important to note there are multiple concepts of time operating at once. At the most obvious level is what may be described as Usage Time. This is like a "raw tape" of what the user actually does. In fact, such a concept of time can be misleading even in the linear case. While many authors have studied isochrony, the tendency of stressed syllables to form a regular musical beat, what can also happen is that even when stressed syllables do not fall according to a regular beat, the stresses may so heavily influence our perception of time that our sense of the passage of time is more based on the passage of linguistic features like stress than on the purely acoustical features that would be captured by a tape recorder: the stresses become our measure of time, even when their acoustical correlates do not seem to be evenly spaced.[8] Do interactive devices "become" the measure of time in an interactive poem? As hypertext is extended further into the fine structure of language, this may happen. So the concept of usage time is not a simple unified one. Does usage time include all the "accidents" -- backtracking from wrongly taken links (e.g. letting the mouse up at an unintended time), overshooting a scroll-bar, etc?

A second concept of time is Gather Time: the time one spends constructing and reading the results in a gathering interface. (Unfortunately, as mentioned, most often the only gathering interface at hand is the reader's memory.) Gather time may start and stop; e.g. while foraging for episode one may speak of gather time as having stopped. (This is not really different from the concept that "the syllable time of the poem is not running" during the time it takes to find one's place in the poem on the page when momentarily interrupted.) Gather time is complicated. Given a spatial gathering interface, is gather time "running" as one changes the spatial relationship of gathered elements? Some type of time is running. As one manipulates gathered phrases on a screen one is in some type of relationship to them: how does that relationship map to syllables? Is the time spent moving a phrase "mapped to" all the syllables at once? Can usage time work in this same way, given the right interface? Clearly it is possible to lay out words using graphical methods so that, even though an implicit underlying linear structure may be derivable, the eye associates all of the words together as a single object "all at once". How does time work for such an object? There is an initial exposure time, the history of which is arguably linear, but what about time spent contemplating the word object as a whole? What kind of time is that? Is it "suspended time"? Is it "autonomous time", in which the word object becomes in effect an object with its own concept of time, not necessarily reconcilable with the concept of time of other objects present, much in the same way two people in the same scene may not be able to precisely reconcile their concept of time? Perhaps it can seem almost like a kind of loop, where the words, having been initially examined, are treated as though they "keep on playing".

Conjunctive structures bring their own set of questions to the issue of how time works. A conjunctive structure consists of all of its components resolved into a single whole -- as opposed to considering the components as alternatives. What is the time relationship among these components? It makes sense -- at least metaphorically -- to think of the usage time for each component as being equivalenced with that of the other components. In this author's structures called simultaneities, groups of words are placed in the same space, physically and logically -- on top of one another. Usage history will clearly resolve an order in which the elements in the simultaneity were encountered (an order which is under some control by the user). These are different units of time -- they aren't literally simultaneous, in the sense of simultaneous voices, but the term `simultaneity' is meant to convey the idea that these units of time are meant to be treated as equivalent. This concept of equivalenced time as experienced by a single user is admittedly an abstraction, but clearly the concept of equivalenced time makes perfect sense in the context of the time experienced by multiple players in the same event. Equivalenced time is a natural correlate of the concept of autonomous word objects -- words endowed with behavior -- which are so eminently possible with the use of software.

At the opposite extreme from equivalenced time are units that are completely disengaged in time: units whose time relationship to one another is completely null. Juxtaposition -- bringing together elements with no structural relation between them -- may be thought of as the null structure, or "structural zero" (Rosenberg, 1993), and may be considered as the most elemental maneuver at the heart of abstraction; clearly juxtaposition has been an important element in all of the arts for many decades. What is the null structure in the dimension(s) of time? Surely it makes sense to ask this question. In a hypertext, separate episodes may be time-disengaged even though the usage time for one episode does have a clear relationship to the usage time of the other. Consider two memories, each of an incident whose time and date one cannot place, and in fact whose relative time and date one cannot place. Does it really matter in which order the memories were recalled? The true time relationship of the memories is that they are unresolved with respect to time.

In a hypertext, time itself may become spatialized. This may occur in any number of ways. In a multimedia piece, an interactive device may permit playing a sound or movie. Such an object will have "its own" timeline; it is common in interactive devices for playing time-based media for this timeline to actually show on the screen as a control, which the user can directly manipulate. But there is not likely to be such a timeline for the hypertext as a whole; rather the timeline for the particular media object is -- in its entirety -- anchored at a particular location in the hypertext. One may speak of the entire timeline as being spatialized at a particular location. Even for text, where there is no formal "player object", the entirety of the text object may be anchored at a specific location. There is an important point here: for linear text, travel through the text is accomplished by reading in a linear fashion -- though to be sure there are many other ways of navigating in a printed text, and most acts of reading involve a mixture of linear travel along the word stream, and directly accessing various parts of the text, whether through bookmarks, tables of contents, indices, footnotes, or the like. In a hypertext, even given a linear lexia, this linearity is not likely to be used for travel. Instead, the specific interactive devices are likely to be used for travel, leaving the lexia as an anchored spot which "doesn't go anywhere". Thus to the extent there is a linear lexia, it is an anchored linearity. This applies to the time aspects of reading it.

Multiuser Time

Throughout this whole discussion we have taken a perspective that would be called in computer jargon "single-user". We tend to view "a reading" as a single reader reading a work which has a single (even if collective) author. In the computer world, multiuser games are quite common; there is no reason to doubt that in the future we will see an increasing number of multiuser literary works. Multiuser time involves stretches of time that are not necessarily resolvable from one user to another. The events of prosody are typically passages over particular points in a poem -- syllables or line breaks, etc. Where there are multiple readers in the same space at the same time, it may not be possible to resolve any form of synchronization relating when the various users have hit these kinds of events relative to one another. In this sense, the concept of disengaged time is not metaphorical, but a literal description of what takes place.

It is known that the brain is a massively parallel system. A simple act of seeing involves substantial processing by each retina, even before the signals reach the brain. Is it possible that even for a single reader, the "single-user" model may not be correct? Is the brain itself perhaps "multiuser"? This is the question posed by Dennett (Dennett, 1991), who devised a theory of consciousness based on the concept of a parallel "gang of demons"[9]. Dennett's arguments are quite complex, and it is not yet clear whether they will prevail. But they are quite provocative, and raise the possibility that there are centers in the brain which act as "time disengaged actors" even for a single mind. Hypertexts can render this concept external and tangible. The questions raised for prosody have only begun to be asked. Much of prosody concerns reinforcement of sound events by earlier related (or differing! -- note Robert Duncan's comment, cited above) sound events. How does this work when there is no way to determine which will occur first? How does reinforcement operate across disengaged units of time? What happens to these time disengagements when the poet "recites" -- and how in general does a poet perform such a work?

References

Bernstein, Mark. Web Squirrel. Computer Software. Watertown, MA: Eastgate Systems, 1996.

Chomsky, Noam, and Halle, Morris. The Sound Pattern of English. New York: Harper and Row, 1968.

Dennett, Daniel C.. Consciousness Explained. Boston: Little Brown and Company, 1991.

Duncan, Robert. Personal conversation. Circa 1973.

Ginsberg, Allen. Improvised Poetics. San Francisco: Anonym, 1971.

Landow, G. P.. Hypertext: The Convergence of Contemporary Critical Theory and Technology. Baltimore: Johns Hopkins University Press, 1992.

Larsen, Deena. Samplers. Computer Software. Watertown, MA: Eastgate Systems, 1996.

Mac Low, Jackson. 22 Light Poems. Los Angeles: Black Sparrow Press, 1968.

Moulthrop, Stuart. "Shadow of the Informand: A Rhetorical Experiment in Hypertext". Perforations 3. Atlanta, GA: Public Domain, 1992.

Nelson, Theodore H. Literary Machines. Swarthmore, PA: T.H. Nelson, 1981.

Olson, Charles. "Projective Verse". Human Universe and Other Essays. New York: Grove Press, 1967.

Rosenberg, Jim. "Openings: The Connection Direct". http://www.well.com/user/jer/openings.html. Liner notes included in Intergrams. Computer Software. Watertown MA: Eastgate Systems, 1993.

Rosenberg, Jim. "Navigating Nowhere / Hypertext Infrawhere". SIGLINK Newsletter 3, 3, December 1994. http://www.well.com/user/jer/NNHI.html.

Rosenberg, Jim. "Notes Toward a Non-linear Prosody of Space". ht_lit Mailing List. http://www.well.com/user/jer/nonlin_prosody.html. March 26, 1995.

Rosenberg, Jim. "The Structure of Hypertext Activity". Hypertext `96. New York: ACM, 1996.

Rosenberg, Jim. Diffractions through: Thirst weep ransack (frailty) veer tide elegy. Computer Software. Watertown MA: Eastgate Systems, 1996.

Jim Rosenberg's home page o Poetics