数据可视化分析外文文献

ThemeRiver: Visualizing Theme Changes over Time

Susan Havre, Beth Hetzler, and Lucy Nowell

Battelle Pacific Northwest DivisionRichland, Washington 99352 USA

1+509+375-6948

{susan.havre | beth.hetzler | lucy.nowell}@pnl.govAbstract

ThemeRiver is a prototype system that visualizesthematic variations over time within a large collectionof documents. The “river” flows from left to rightthrough time, changing width to depict changes inthematic strength of temporally associated documents.Colored “currents” flowing within the river narrow orwiden to indicate decreases or increases in the strengthof an individual topic or a group of topics in theassociated documents. The river is shown within thecontext of a timeline and a corresponding textualpresentation of external events.

Keywords : visualization metaphors, trend analysis,timeline

1. Introduction

In exploratory information visualization, one goal isto present information so that users can easily discernpatterns. Patterns reveal trends, relationships, anoma-lies, and structure in the data, and may help users

Figure 1: ThemeRiver uses a river metaphor to represent theme changes over time.

confirm knowledge or hypotheses. Perhaps more impor-tantly, they also raise unexpected questions leadingusers to new insights. The challenge is to create visuali-zations that enable users to find patterns quickly andeasily. ThemeRiver, shown in Figure 1, is a prototypesystem designed to reveal temporal patterns in textcollections.

Information visualization systems such as Envision[13], BEAD [1], LyberWorld [ 3, 4] and SPIRE [18]represent each document or group of documents with aglyph or icon, portraying various document attributes.Various methods have been explored for showingchange over time in document-centric visualizations.See Section 3 below.

However, a user may be less interested in documentsthemselves than in theme changes within the whole col-lection over time. For example, how did Shakespeare’sthemes change during various periods of his life or inrelation to contemporary events? Such information isdifficult, if not impossible, to glean from most visuali-zations. A visualization that focuses on themes, ratherthan documents, could be more useful for such explora-tion.

ThemeRiver provides users with a macro-view ofthematic changes in a corpus of documents over a serialdimension. It is designed to facilitate the identificationof trends, patterns, and unexpected occurrence or non-occurrence of themes or topics. In our prototype, we usetime as the serial dimension. We provide contextualinformation through a timeline and markers for co-occurring events of interest. Figure 1 shows a sampleThemeRiver visualization. This paper describes thedesign of ThemeRiver, walks through a sample informa-tion exploration session, and discusses results of forma-tive usability testing.

2. Design

Our major design goal was to provide a visualizationof theme change over time. Consider using a histogramto visualize these changes. In a histogram (such as theone shown in Figure 2), each bar represents a time slice,and color variations and size within the bar representthe relative strength of themes specific to that slice.However, understanding the histogram requires users towork at integrating the themes across time because thebars are anchored to a baseline and the position of aparticular theme within the bars may vary considerably.Like a histogram, ThemeRiver uses variations inwidth to represent variations in strength or degree of

representation. However, it connects the strength valuesin adjacent time slices with smooth and continuouscurves. The horizontal flow of the river represents theflow of time. Colored currents that run horizontallywithin the river represent themes. Each vertical sectionof the river corresponds to an ordered time slice.

The width of each current changes to reflect thethematic strength for each time slice. For example, inFigure 1 the theme “soviet” increases in relativestrength in June 1960 as indicated by the widening ofthe upper bright orange current. “Soviet” loses relativestrength in July and August; thus the same current nar-rows in the next two time slices. “Soviet” then increasessignificantly in relative strength in September; thecurrent widens proportionately.

Currents maintain their integrity as a single entityover time. If a theme ceases to occur in the documentsfor a period of time and then recurs, the current likewisedisappears and then reappears. Consistent color andrelative position to other themes make theme currentseasy to recognize. In Figure 1, the lower purple banddepicts the changes in relative strength of the theme“cane.” The “cane” current occurs grows and shrinksover time; “cane” occurs most strongly in March 1961.We believe that ThemeRiver’s continuous curveshave much to do with its usability. The Gestalt Schoolof Psychology [8], founded in 1919 in Germany,theorized that with perception, “the whole is greaterthan the sum of the parts.” Simply put, during theperception process humans do not organize individual,low-level, sensed elements, but sense more complete“packages” that represent objects or patterns. In hisrecent book [6], Hoffman presents a compelling discus-sion of how our perceptual processes identify curvesand silhouettes, recognize parts, and group them togeth-er into objects. Numerous aspects of the image influ-ence our ability to perceive these parts and objects,including similarity, continuity, symmetry, proximity,and closure. For example, it is easier to perceive objectsthat are bounded by continuous curves than those thatcontain abrupt changes [17].

The vertical proximity of the river currents makes iteasy for users to judge the relative width of currents andthus the relative strength of the themes. Similarly, sym-metry around the horizontal axis of the river, a current,or group of currents makes it easier for users to perceiveflow patterns and changes. Widths of currents combineto show cumulative widening and narrowing, represent-ing changing strength for the selected set of themes as awhole.

Values for theme strength can be calculated variousways. For example, they might represent the number ofdocuments containing the word. Because the river losesits continuity and structure if there are too few or toomany themes, we created several theme subsets forexploration.

We have implemented a proof-of-principle prototypeand used it to explore data from multiple sources.Figure 1 portrays data from a collection of speeches,interviews, articles, and other text associated with FidelCastro. The visualization includes the river, a timelinebelow the river, and markers for related historicalevents along the top. With ThemeRiver, users may• display topic and event labels• display time and event grid lines• display the raw data points

• choose among drawing algorithms for the

currents and river.

Users may also display the associated time or themename by simply moving the mouse across the image. Inaddition, users may pan and zoom to see other timeperiods or parts of the river and to see more detail orbroader context. In this sample data set, we foundseveral interesting correspondences between themes andevents, such as the expansion of the “oil” theme justbefore Castro confiscated American oil refineries (seeFigure 1).

the ability to show parent-child relationships with linesbetween related time bars [10]. The DIVA system [12]uses animation to show how particular measured valueschange in relation to the temporal flow of a video. Tohelp groups collaborating to create a document or otherartifact, the Timewarp system developed at XeroxPARC [2] lets users view and edit multiple timelines ofthe changing state of that artifact. The metaphor used issimilar to a state diagram, with lines connecting statenodes and branches. Additional work on timelinesincludes Karam’s [7] and Kullberg’s [9].

We know of no other systems that use the river meta-phor to depict the passage of time. However, Tufte [16]presents a similar idea in an artist’s illustration showingtrends in music. In that illustration, width representssales and proximity indicates influence of precedingstyles. Our work differs in several aspects, such as theuse of color, the inclusion of contextual events, and theability to generate the visualization automatically from apotentially very large collection of documents.

4. Usability Evaluation

Early in ThemeRiver’s development, we carried outa simple formative usability evaluation with two users.Questions we wanted to answer with this evaluationincluded

• Do users understand the metaphor?

• Can they identify themes that are more often

discussed?

• Does the visualization help them raise new

questions about the data?

• Do they interpret details of the visualization in

ways we had not expected?

• How does their interpretation of the

visualization differ from that of a histogramshowing the same data?

The data were the Castro collection described above,focusing on the years 1960-1963. We represented thesame data both in ThemeRiver and in a histogram thatwe created using a spreadsheet. (See Figure 2.) Wemade the content of the histogram as similar as possibleto ThemeRiver’s. For example, the histogram depictedthematic content by months, using the same values thatdrive ThemeRiver. The month timeline was shownalong the bottom and we added an event line to thehistogram like the one in ThemeRiver.

Usability evaluation began with a brief explanationof the purpose of the session, followed by an introduc-tion to the data. Both participants viewed the data inboth visualizations; one participant started first with the

3. Related Work

Many systems include features for viewing time. Onecommon method is to show discrete time slices. For ex-ample, in the Spatial Paradigm for Information Retriev-al and Exploration (SPIRE) Galaxy visualization [18],users may choose to progressively step through time,showing only the icons for documents originating withineach specified time period. Another common approachis to show time as an attribute of documents, as done inthe Virginia Tech’s Envision system, which lets usersmap various metadata values, including date, to x-axis,y-axis, or color, shape, or size graphical encodings [13].More similar to ThemeRiver in intent are systemsthat focus directly on time. The LifeLines system,developed jointly by the University of Maryland andIBM, has been used to visualize medical records andjuvenile criminal records [14, 15]. The visualizationdisplays time along the x-axis and uses the y-axis tocategorize events. Bars depict duration for a givenevent, and graphical attributes such as color show eventattributes. TmViewer uses a similar approach, adding

Figure 2: Like ThemeRiverTM in Figure 1, this histogram uses the Castro collection data anddepicts changes in thematic content over time.

histogram and one with ThemeRiver. We asked eachparticipant questions about what they observed in eachdisplay.

Examples of specific questions include

• In July ’62, what are the three most discussedthemes?

• Where is a new theme introduced?Examples of more general questions include

• What looks interesting here – what do youwant to explore?

• How would you like to change or manipulatethe view?

We captured verbal protocol during this discussion.At the end, we asked participants to complete a shortquestionnaire, with feedback about the visualization andpossible enhancements.

From the verbal protocol and from user behavior, weobserved that the users had no difficulty in understand-ing the metaphor. They were able to identify themesthat were strongly represented and able to understandthe relationship between the width of the current andtheme strength. The visualization also triggered ques-tions about the reasons behind certain theme strengths

and patterns. For exploratory visualizations, this is agood result; we believe that a visualization should helpthe user identify questions of interest to explore.

Questionnaire responses showed that users foundThemeRiver easy to understand. They also foundThemeRiver useful, particularly for identifying macrotrends. They told us that it was less useful for identi-fying minor trends because the curves tend to de-emphasize very small values. We asked about the valueof the river metaphor, and users rated it highly as well.They observed that the connectedness of the riverhelped them follow a trend more easily over time thanin the histogram; this result is compatible with the per-ception principles described by Ware [17].

Users liked some features of the histogram and rec-ommended adding them to ThemeRiver. One such fea-ture is the ability to see numeric values that drive thehistogram and river currents. One user expressed moretrust in the histogram, because she “knew” that the barswere exactly the data values, whereas she was not sureexactly what the data values were in ThemeRiver. Herpoint is a valid one, especially because the curved lines

of ThemeRiver do require that we interpolate betweendata points to produce the curves. We have added thecapability for users to see the exact data points ondemand.

Although users liked the abstraction to the wholecollection and thus away from individual documents,both users suggested adding features to access docu-ments if desired. They wanted the ability to see the totalnumber of documents during any time period and to getthe text of each document on demand. They wanted toselect a current and see the documents that contributedto it.

Users also wanted the ability to reorder the themecurrents. Options they discussed included user-definedordering and ordering by correlation, so that themesappearing together in the documents would be nearby inthe river.

5. Interactions and Sample Usage

Based on usability evaluation results, we added anumber of features to combine the best of both the rivermetaphor and histogram capabilities. This section pre-

sents a sample usage scenario, illustrating thecapabilities of the current version.

We used ThemeRiver to explore the 1990 Associ-ated Press (AP) newswire data from the TREC5 distri-bution disks, a set of over 100,000 documents (seeFigure 3). To explore the selected themes in this collec-tion, a user might begin with a high-level survey of thevisualization by panning along the course of the river.The user might look for wider currents that signal heavyuse of a topic, such as the one for “baghdad” in Figure3. Changes in the color distribution of the river signalchanges in themes. We see such a change in August1990, when the “kuwait” current, which had vanished inlate July, suddenly appears and rapidly widens. Theuser could also look for narrow currents in the river thatsignal relatively light use of particular themes.

In an earlier paper, Hetzler et al. [5] explored the APdata set with a variety of our visual analysis tools, fo-cusing on large theme changes surrounding the Iraqiinvasion of Kuwait on August 2. ThemeRiver also re-flects these large theme changes. Near the right side of

Figure 3, we see several currents that expand dramatic-

Figure 3: AP data from July - August 1990. A wide current in the river indicates heavy use of a topic,while changes in color distribution correlate to changes in themes.

ally at the time of the invasion, which is shown on theevent line above the river. Labels have been turned onfor currents representing the themes “kuwait,” “iraq,”“saddam,” and “baghdad.” ThemeRiver reveals someadditional detail not noted in the earlier study. Thetheme “oil,” which is persistent across the image, alsoexpands noticeably at this time. The themes of “ku-wait,” “iraq,” and “saddam” show up in small burstsbefore the invasion but are not persistent. News storiescorresponding with these bursts covered the verbal con-flicts leading up to the invasion. This distinctionbetween persistent and bursty themes is one advantagethat ThemeRiver provides over document-centric visu-alizations.

During late June and throughout July 1990, thethemes appear relatively consistent. A user interested inthe more prominent themes might turn on theme labels

as shown in Figure 3 to discover that the main themesrepresent “bush” (President Bush), “germany” (the re-unification discussions), and “communist.” Somesmaller variations in theme are also apparent, such asthe widening of the “nato” band, related to the NATOdecision to redefine their military strategy.

Figure 4 shows the ThemeRiver from earlier in thesummer of 1990. In late May, a large change in themestrength is shown, this time not matching any previouslyidentified events. Some of the larger currents here are“gorbachev,” “bush,” and “summit.” This might suggestthat Bush and Gorbachev both attended a summit.Viewing the pertinent news documents from that time,we found that a four-day summit meeting took place inWashington among several world leaders, includingBush and Gorbachev.

Figure 4: ThemeRiver

of AP data from June - July 1990 identifies very different events from thoserevealed immediately afterwards (Figure 3).

Some more subtle changes can also be seen in Figure4. For example, a small current near the middle of theriver expands slightly near the beginning of June andagain near the end of the month. This is the current for“earthquake.” The wider areas correspond with thequakes in Peru and Iran respectively.

In each of the figures shown so far, there are portionsof the river that are extremely narrow overall. In fact,for the AP rivers (Figures 3 and 4), the river seems tonarrow quite frequently. On closer inspection, we seethat the narrow spots correspond with Sundays. Becausethe river contains only a subset of the themes in thecollection, we do not know at this point whether thenews is generally lighter on Sunday or whether othertopics dominate on that day. This uncertainty is one ofthe points that came up early in user testing. Inresponse, we have added a feature allowing the user toshow a histogram representing the total number ofdocuments in a given time slot, along with the portionrepresented by the themes in the river (see Figure 5).With this histogram, it is apparent that in general fewernews stories are released on Sunday than on other daysof the week.

based on metadata and compare the themes in the two

partitions as separate rivers. Figure 6 shows two parallelrivers: the lower river shows AP news stories fromWashington, D. C. and the upper river shows the newsstories from New York. Some differences in majorthemes are immediately apparent. The Washingtonthemes emphasize Bush, the Senate, and the SupremeCourt. The New York stories show a major growth inthe themes “apartheid” and “mandela”; this correspondswith the visit of Nelson Mandela to the US. He arrivedfirst in New York, where he spent several days beforeproceeding to Washington.

Figure 6: Parallel rivers let users compare AP datafrom Washington, D.C. and New York from thesame time period.

6. Discussion and Design Challenges

Ideally, a visual metaphor facilitates discovery bypresenting data in an intuitive, easy way that is consis-tent with the user’s perceptual and cognitive abilities.Lakoff and Johnson [11] argue that metaphors are wiredinto our understanding of particular concepts, usingevidence from common linguistic expressions. One ex-ample they cite is the many English expressions thatimply that Anglo-Americans understand time in terms

Figure 5: The addition of a histogram to Theme-River reveals that news is light on Sundays,not that themes shift.

Sometimes users may want to compare themechanges in one set of documents to those in another set;alternatively, they may wish to partition a collection

of motion relative to ourselves. Some expressions char-acterize time as moving (e.g., “the time will come,”“don’t let the opportunity pass”), while others implythat people are the ones moving through time (e.g., “aswe go through the years”). From formative usabilityevaluation and anecdotal feedback, we have observedthat the river metaphor is intuitive and easy to under-stand. We believe the river metaphor of theme currentschanging over time gets part of its strength from thiscultural understanding.

Focusing on themes rather than documents changesissues of scalability. ThemeRiver visualizations havelittle dependence on the number of documents repre-sented. For example, if theme strength is determined bythe number of documents containing each theme word,a single pass through the collection is needed tocalculate the values, which may be displayed similarlyregardless of collection size. On the other hand, thenumber of currents that can be reasonably included in asingle river is limited. Options for addressing this issueinclude grouping through color families, as suggested inFigure 7, or using each current to represent a set ofthemes rather than a single theme.

themes into related groups and displaying each groupwith a color family. Figure 7 shows a portion of ourcolor legend with such an ordering, which emphasizeschanges in related themes and may make it easier tounderstand relationships among them.

A key cognitive advantage of the river metaphorover a simple histogram lies in the curving continuouslines that define the boundaries between topic currents.But it is also important that the visualization notmislead users. Because dates are not continuous data,we must approximate the true boundaries by interpolat-ing between discrete data points. As long as the reso-lution of the data is sufficient, ThemeRiver provides anoverview that meets our criteria for intuitiveness, easeof use, and integrity. If the user zooms in farther thanthe data resolution supports, the “truthfulness”approximated by the interpolated lines is questionable.While the resolution of data forces a lower limit onthe level of zoom, we can deal with the problem of “toomuch” resolution by combining time slices. That is, asthe user zooms out, we can increase the amount of timeper time slice and combine theme weights. In this way,we can maintain a suitable level of truthfulness withoutslowing the rendering speed to a crawl by trying to drawmore detail than necessary.

With interactive visualizations, calculation and draw-ing speeds are important. For the current features ofThemeRiver, it is sufficient to calculate the drawingpoints on startup and then recalculate only after a con-figuration change. Nevertheless, a fast, efficient algo-rithm is needed. We are investigating curved-line algo-rithms and ways to speed up both the calculations andthe rendering.

7. Conclusions

Figure 7: Tracking related themes is simplified byassigning them to the same color family. Thisensures related themes appear together and areidentifiable as a group.

Color choices pose an interesting design challenge.Color perception depends on local contrast. However,because themes come and go, it is impossible to predictwhich colors will be adjacent at any given time. More-over, we want to show a relatively large number ofthemes in the river and still achieve acceptable dis-criminability. Currently we are exploring a solution sug-gested during formative usability evaluation: sorting

ThemeRiver is a demonstration prototype, developedto test the value of the metaphor. We are continuing toadd interaction capabilities to it. We also need todevelop ways to build the event timeline automaticallyand to provide more flexibility in selecting and orderingthe theme currents. From formative usability evaluation,we learned that users want to know more about thecontext of the river and want to access the documentsthat contribute to it at a particular point in time.

We conclude that ThemeRiver is potentially valuablefor information analysts and plan to develop it into a

full production system.

8. Acknowledgments

We gratefully acknowledge the contributions of ourcolleagues at Battelle to the development and testing ofthe ThemeRiver visualization. Special thanks forcontributions to this paper go to Grant Nakamura, AlanWillse, Sharon Eaton, Wanda Mar, and Dan Donohoo.Battelle Memorial Institute’s Information SynthesisPlatform funded this research.

9. References

1. D. Brodbeck, M. Chalmers, A. Lunzer, and P. Cotture,

“Domesticating Bead: Adaptiing an InformationVisualization System to a Financial Institution,”Proceedings of InfoViz ’97. IEEE Computer Society, LosAlamitos, CA, 1997, pp. 73-80.

2. K.W. Edwards and E.D. Mynatt, “Timewarp: Techniques

for Autonomous Collaboration,” Proceedings of CHI’97, Association for Computing Machinery, Inc., 1997, pp.218-225.

3. M. Hemmje, “LyberWorld: a 3D Graphical User Inter-face for Fulltext Retrieval,” Conference Companion onHuman Factors in Computing Systems, 1995, pp. 417 -418.

4. M. Hemmje, C. Kunkel, and A. Willett. “LyberWorld - a

Visualization User Interface Supporting Fulltext Re-trieval,” Proceedings of the 17th Annual InternationalACM-SIGR Conference on Research and Developmentin Information Retrieval, 1994, pp. 249 -259.

5. B. Hetzler, P. Whitney, L. Martucci, L., and J.

Thomas, “Multi-faceted Insight ThroughInteroperable Visual Information AnalysisParadigms,” Proceedings of IEEE Symposium onInformation Visualization, InfoVis '98, 1998,pp.137-144.

6. D.D. Hoffman, Visual Intelligence: How We Create

What We See, W.W. Norton & Company, Inc.,New York, 1998.

7. G.M. Karam, “Visualization Using Timelines,”

Proceedings of the 1994 International Symposiumon Software Testing and Analysis, 1994, pp. 125-137.

8. K. Koffka, (1935), Principles of Gestalt

Psychology , Harcourt-Brace, New York, 1935.9. R.L. Kullberg, “Dynamic Timelines: Visualizing

the History of Photography,” Proceedings of CHI’96, 1996, pp. 386-397.

10. V. Kumar and R. Furuta, “Visualization of

Relationships,” Proceedings of Hypertext 99, ACM Press, Darmstadt, Germany, 1999.

11. G. Lakoff and M. Johnson, Metaphors We Live By.

University of Chicago Press, Chicago, 1983.

12. W. Mackay and M. Beaudouin-Lafon, “Diva:

Exploratory Data Analysis with MultimediaStreams,” Proceedings of CHI’98, 1998, pp. 416-423.

13. L.T. Nowell, R.K. France, D. Hix, L.S. Heath, and

E.A. Fox, “Visualizing Search Results: SomeAlternatives to Query-Document Similarity,”Proceedings of SIGIR ’96, ACM Press, Zurich,1996, pp. 67-75.

14. C. Plaisant, D. Heller, J. Li, B. Shneiderman, R.J.

Mushinlin, and J. Karat, Visualizing MedicalRecords with LifeLines. CHI ’98 Summary, 1998,28-29.

15. C. Plaisant, B. Milash, A. Rose, S. Widoff, and B.

Shneiderman, “Lifelines: Visualizing Personal His-tories,” Proceedings of CHI ’96, Association forComputing Machinery, Inc, 1996, pp. 221-227.16. E.R. Tufte, Visual Explanations: Images and

Quantities, Evidence and Narrative, GraphicsPress, Cheshire, CT, 1997, 90-91.

17. C. Ware, Information Visualization: Perception for

Design, Academic Press, San Diego, 2000.

18. J.A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M.

Pottier, A. Schur,, and V. Crow, “Visualizing theNon-Visual: Spatial Analysis and Interaction withInformation from Text Documents,” S.K Card, J.D.Mackinlay, and B. Shneiderman, (editors.), Read-ings in Information Visualization: Using Vision toThink , Morgan Kaufmann, San Francisco, 1999,pp. 442-45

ThemeRiver: Visualizing Theme Changes over Time

Susan Havre, Beth Hetzler, and Lucy Nowell

Battelle Pacific Northwest DivisionRichland, Washington 99352 USA

1+509+375-6948

{susan.havre | beth.hetzler | lucy.nowell}@pnl.govAbstract

ThemeRiver is a prototype system that visualizesthematic variations over time within a large collectionof documents. The “river” flows from left to rightthrough time, changing width to depict changes inthematic strength of temporally associated documents.Colored “currents” flowing within the river narrow orwiden to indicate decreases or increases in the strengthof an individual topic or a group of topics in theassociated documents. The river is shown within thecontext of a timeline and a corresponding textualpresentation of external events.

Keywords : visualization metaphors, trend analysis,timeline

1. Introduction

In exploratory information visualization, one goal isto present information so that users can easily discernpatterns. Patterns reveal trends, relationships, anoma-lies, and structure in the data, and may help users

Figure 1: ThemeRiver uses a river metaphor to represent theme changes over time.

confirm knowledge or hypotheses. Perhaps more impor-tantly, they also raise unexpected questions leadingusers to new insights. The challenge is to create visuali-zations that enable users to find patterns quickly andeasily. ThemeRiver, shown in Figure 1, is a prototypesystem designed to reveal temporal patterns in textcollections.

Information visualization systems such as Envision[13], BEAD [1], LyberWorld [ 3, 4] and SPIRE [18]represent each document or group of documents with aglyph or icon, portraying various document attributes.Various methods have been explored for showingchange over time in document-centric visualizations.See Section 3 below.

However, a user may be less interested in documentsthemselves than in theme changes within the whole col-lection over time. For example, how did Shakespeare’sthemes change during various periods of his life or inrelation to contemporary events? Such information isdifficult, if not impossible, to glean from most visuali-zations. A visualization that focuses on themes, ratherthan documents, could be more useful for such explora-tion.

ThemeRiver provides users with a macro-view ofthematic changes in a corpus of documents over a serialdimension. It is designed to facilitate the identificationof trends, patterns, and unexpected occurrence or non-occurrence of themes or topics. In our prototype, we usetime as the serial dimension. We provide contextualinformation through a timeline and markers for co-occurring events of interest. Figure 1 shows a sampleThemeRiver visualization. This paper describes thedesign of ThemeRiver, walks through a sample informa-tion exploration session, and discusses results of forma-tive usability testing.

2. Design

Our major design goal was to provide a visualizationof theme change over time. Consider using a histogramto visualize these changes. In a histogram (such as theone shown in Figure 2), each bar represents a time slice,and color variations and size within the bar representthe relative strength of themes specific to that slice.However, understanding the histogram requires users towork at integrating the themes across time because thebars are anchored to a baseline and the position of aparticular theme within the bars may vary considerably.Like a histogram, ThemeRiver uses variations inwidth to represent variations in strength or degree of

representation. However, it connects the strength valuesin adjacent time slices with smooth and continuouscurves. The horizontal flow of the river represents theflow of time. Colored currents that run horizontallywithin the river represent themes. Each vertical sectionof the river corresponds to an ordered time slice.

The width of each current changes to reflect thethematic strength for each time slice. For example, inFigure 1 the theme “soviet” increases in relativestrength in June 1960 as indicated by the widening ofthe upper bright orange current. “Soviet” loses relativestrength in July and August; thus the same current nar-rows in the next two time slices. “Soviet” then increasessignificantly in relative strength in September; thecurrent widens proportionately.

Currents maintain their integrity as a single entityover time. If a theme ceases to occur in the documentsfor a period of time and then recurs, the current likewisedisappears and then reappears. Consistent color andrelative position to other themes make theme currentseasy to recognize. In Figure 1, the lower purple banddepicts the changes in relative strength of the theme“cane.” The “cane” current occurs grows and shrinksover time; “cane” occurs most strongly in March 1961.We believe that ThemeRiver’s continuous curveshave much to do with its usability. The Gestalt Schoolof Psychology [8], founded in 1919 in Germany,theorized that with perception, “the whole is greaterthan the sum of the parts.” Simply put, during theperception process humans do not organize individual,low-level, sensed elements, but sense more complete“packages” that represent objects or patterns. In hisrecent book [6], Hoffman presents a compelling discus-sion of how our perceptual processes identify curvesand silhouettes, recognize parts, and group them togeth-er into objects. Numerous aspects of the image influ-ence our ability to perceive these parts and objects,including similarity, continuity, symmetry, proximity,and closure. For example, it is easier to perceive objectsthat are bounded by continuous curves than those thatcontain abrupt changes [17].

The vertical proximity of the river currents makes iteasy for users to judge the relative width of currents andthus the relative strength of the themes. Similarly, sym-metry around the horizontal axis of the river, a current,or group of currents makes it easier for users to perceiveflow patterns and changes. Widths of currents combineto show cumulative widening and narrowing, represent-ing changing strength for the selected set of themes as awhole.

Values for theme strength can be calculated variousways. For example, they might represent the number ofdocuments containing the word. Because the river losesits continuity and structure if there are too few or toomany themes, we created several theme subsets forexploration.

We have implemented a proof-of-principle prototypeand used it to explore data from multiple sources.Figure 1 portrays data from a collection of speeches,interviews, articles, and other text associated with FidelCastro. The visualization includes the river, a timelinebelow the river, and markers for related historicalevents along the top. With ThemeRiver, users may• display topic and event labels• display time and event grid lines• display the raw data points

• choose among drawing algorithms for the

currents and river.

Users may also display the associated time or themename by simply moving the mouse across the image. Inaddition, users may pan and zoom to see other timeperiods or parts of the river and to see more detail orbroader context. In this sample data set, we foundseveral interesting correspondences between themes andevents, such as the expansion of the “oil” theme justbefore Castro confiscated American oil refineries (seeFigure 1).

the ability to show parent-child relationships with linesbetween related time bars [10]. The DIVA system [12]uses animation to show how particular measured valueschange in relation to the temporal flow of a video. Tohelp groups collaborating to create a document or otherartifact, the Timewarp system developed at XeroxPARC [2] lets users view and edit multiple timelines ofthe changing state of that artifact. The metaphor used issimilar to a state diagram, with lines connecting statenodes and branches. Additional work on timelinesincludes Karam’s [7] and Kullberg’s [9].

We know of no other systems that use the river meta-phor to depict the passage of time. However, Tufte [16]presents a similar idea in an artist’s illustration showingtrends in music. In that illustration, width representssales and proximity indicates influence of precedingstyles. Our work differs in several aspects, such as theuse of color, the inclusion of contextual events, and theability to generate the visualization automatically from apotentially very large collection of documents.

4. Usability Evaluation

Early in ThemeRiver’s development, we carried outa simple formative usability evaluation with two users.Questions we wanted to answer with this evaluationincluded

• Do users understand the metaphor?

• Can they identify themes that are more often

discussed?

• Does the visualization help them raise new

questions about the data?

• Do they interpret details of the visualization in

ways we had not expected?

• How does their interpretation of the

visualization differ from that of a histogramshowing the same data?

The data were the Castro collection described above,focusing on the years 1960-1963. We represented thesame data both in ThemeRiver and in a histogram thatwe created using a spreadsheet. (See Figure 2.) Wemade the content of the histogram as similar as possibleto ThemeRiver’s. For example, the histogram depictedthematic content by months, using the same values thatdrive ThemeRiver. The month timeline was shownalong the bottom and we added an event line to thehistogram like the one in ThemeRiver.

Usability evaluation began with a brief explanationof the purpose of the session, followed by an introduc-tion to the data. Both participants viewed the data inboth visualizations; one participant started first with the

3. Related Work

Many systems include features for viewing time. Onecommon method is to show discrete time slices. For ex-ample, in the Spatial Paradigm for Information Retriev-al and Exploration (SPIRE) Galaxy visualization [18],users may choose to progressively step through time,showing only the icons for documents originating withineach specified time period. Another common approachis to show time as an attribute of documents, as done inthe Virginia Tech’s Envision system, which lets usersmap various metadata values, including date, to x-axis,y-axis, or color, shape, or size graphical encodings [13].More similar to ThemeRiver in intent are systemsthat focus directly on time. The LifeLines system,developed jointly by the University of Maryland andIBM, has been used to visualize medical records andjuvenile criminal records [14, 15]. The visualizationdisplays time along the x-axis and uses the y-axis tocategorize events. Bars depict duration for a givenevent, and graphical attributes such as color show eventattributes. TmViewer uses a similar approach, adding

Figure 2: Like ThemeRiverTM in Figure 1, this histogram uses the Castro collection data anddepicts changes in thematic content over time.

histogram and one with ThemeRiver. We asked eachparticipant questions about what they observed in eachdisplay.

Examples of specific questions include

• In July ’62, what are the three most discussedthemes?

• Where is a new theme introduced?Examples of more general questions include

• What looks interesting here – what do youwant to explore?

• How would you like to change or manipulatethe view?

We captured verbal protocol during this discussion.At the end, we asked participants to complete a shortquestionnaire, with feedback about the visualization andpossible enhancements.

From the verbal protocol and from user behavior, weobserved that the users had no difficulty in understand-ing the metaphor. They were able to identify themesthat were strongly represented and able to understandthe relationship between the width of the current andtheme strength. The visualization also triggered ques-tions about the reasons behind certain theme strengths

and patterns. For exploratory visualizations, this is agood result; we believe that a visualization should helpthe user identify questions of interest to explore.

Questionnaire responses showed that users foundThemeRiver easy to understand. They also foundThemeRiver useful, particularly for identifying macrotrends. They told us that it was less useful for identi-fying minor trends because the curves tend to de-emphasize very small values. We asked about the valueof the river metaphor, and users rated it highly as well.They observed that the connectedness of the riverhelped them follow a trend more easily over time thanin the histogram; this result is compatible with the per-ception principles described by Ware [17].

Users liked some features of the histogram and rec-ommended adding them to ThemeRiver. One such fea-ture is the ability to see numeric values that drive thehistogram and river currents. One user expressed moretrust in the histogram, because she “knew” that the barswere exactly the data values, whereas she was not sureexactly what the data values were in ThemeRiver. Herpoint is a valid one, especially because the curved lines

of ThemeRiver do require that we interpolate betweendata points to produce the curves. We have added thecapability for users to see the exact data points ondemand.

Although users liked the abstraction to the wholecollection and thus away from individual documents,both users suggested adding features to access docu-ments if desired. They wanted the ability to see the totalnumber of documents during any time period and to getthe text of each document on demand. They wanted toselect a current and see the documents that contributedto it.

Users also wanted the ability to reorder the themecurrents. Options they discussed included user-definedordering and ordering by correlation, so that themesappearing together in the documents would be nearby inthe river.

5. Interactions and Sample Usage

Based on usability evaluation results, we added anumber of features to combine the best of both the rivermetaphor and histogram capabilities. This section pre-

sents a sample usage scenario, illustrating thecapabilities of the current version.

We used ThemeRiver to explore the 1990 Associ-ated Press (AP) newswire data from the TREC5 distri-bution disks, a set of over 100,000 documents (seeFigure 3). To explore the selected themes in this collec-tion, a user might begin with a high-level survey of thevisualization by panning along the course of the river.The user might look for wider currents that signal heavyuse of a topic, such as the one for “baghdad” in Figure3. Changes in the color distribution of the river signalchanges in themes. We see such a change in August1990, when the “kuwait” current, which had vanished inlate July, suddenly appears and rapidly widens. Theuser could also look for narrow currents in the river thatsignal relatively light use of particular themes.

In an earlier paper, Hetzler et al. [5] explored the APdata set with a variety of our visual analysis tools, fo-cusing on large theme changes surrounding the Iraqiinvasion of Kuwait on August 2. ThemeRiver also re-flects these large theme changes. Near the right side of

Figure 3, we see several currents that expand dramatic-

Figure 3: AP data from July - August 1990. A wide current in the river indicates heavy use of a topic,while changes in color distribution correlate to changes in themes.

ally at the time of the invasion, which is shown on theevent line above the river. Labels have been turned onfor currents representing the themes “kuwait,” “iraq,”“saddam,” and “baghdad.” ThemeRiver reveals someadditional detail not noted in the earlier study. Thetheme “oil,” which is persistent across the image, alsoexpands noticeably at this time. The themes of “ku-wait,” “iraq,” and “saddam” show up in small burstsbefore the invasion but are not persistent. News storiescorresponding with these bursts covered the verbal con-flicts leading up to the invasion. This distinctionbetween persistent and bursty themes is one advantagethat ThemeRiver provides over document-centric visu-alizations.

During late June and throughout July 1990, thethemes appear relatively consistent. A user interested inthe more prominent themes might turn on theme labels

as shown in Figure 3 to discover that the main themesrepresent “bush” (President Bush), “germany” (the re-unification discussions), and “communist.” Somesmaller variations in theme are also apparent, such asthe widening of the “nato” band, related to the NATOdecision to redefine their military strategy.

Figure 4 shows the ThemeRiver from earlier in thesummer of 1990. In late May, a large change in themestrength is shown, this time not matching any previouslyidentified events. Some of the larger currents here are“gorbachev,” “bush,” and “summit.” This might suggestthat Bush and Gorbachev both attended a summit.Viewing the pertinent news documents from that time,we found that a four-day summit meeting took place inWashington among several world leaders, includingBush and Gorbachev.

Figure 4: ThemeRiver

of AP data from June - July 1990 identifies very different events from thoserevealed immediately afterwards (Figure 3).

Some more subtle changes can also be seen in Figure4. For example, a small current near the middle of theriver expands slightly near the beginning of June andagain near the end of the month. This is the current for“earthquake.” The wider areas correspond with thequakes in Peru and Iran respectively.

In each of the figures shown so far, there are portionsof the river that are extremely narrow overall. In fact,for the AP rivers (Figures 3 and 4), the river seems tonarrow quite frequently. On closer inspection, we seethat the narrow spots correspond with Sundays. Becausethe river contains only a subset of the themes in thecollection, we do not know at this point whether thenews is generally lighter on Sunday or whether othertopics dominate on that day. This uncertainty is one ofthe points that came up early in user testing. Inresponse, we have added a feature allowing the user toshow a histogram representing the total number ofdocuments in a given time slot, along with the portionrepresented by the themes in the river (see Figure 5).With this histogram, it is apparent that in general fewernews stories are released on Sunday than on other daysof the week.

based on metadata and compare the themes in the two

partitions as separate rivers. Figure 6 shows two parallelrivers: the lower river shows AP news stories fromWashington, D. C. and the upper river shows the newsstories from New York. Some differences in majorthemes are immediately apparent. The Washingtonthemes emphasize Bush, the Senate, and the SupremeCourt. The New York stories show a major growth inthe themes “apartheid” and “mandela”; this correspondswith the visit of Nelson Mandela to the US. He arrivedfirst in New York, where he spent several days beforeproceeding to Washington.

Figure 6: Parallel rivers let users compare AP datafrom Washington, D.C. and New York from thesame time period.

6. Discussion and Design Challenges

Ideally, a visual metaphor facilitates discovery bypresenting data in an intuitive, easy way that is consis-tent with the user’s perceptual and cognitive abilities.Lakoff and Johnson [11] argue that metaphors are wiredinto our understanding of particular concepts, usingevidence from common linguistic expressions. One ex-ample they cite is the many English expressions thatimply that Anglo-Americans understand time in terms

Figure 5: The addition of a histogram to Theme-River reveals that news is light on Sundays,not that themes shift.

Sometimes users may want to compare themechanges in one set of documents to those in another set;alternatively, they may wish to partition a collection

of motion relative to ourselves. Some expressions char-acterize time as moving (e.g., “the time will come,”“don’t let the opportunity pass”), while others implythat people are the ones moving through time (e.g., “aswe go through the years”). From formative usabilityevaluation and anecdotal feedback, we have observedthat the river metaphor is intuitive and easy to under-stand. We believe the river metaphor of theme currentschanging over time gets part of its strength from thiscultural understanding.

Focusing on themes rather than documents changesissues of scalability. ThemeRiver visualizations havelittle dependence on the number of documents repre-sented. For example, if theme strength is determined bythe number of documents containing each theme word,a single pass through the collection is needed tocalculate the values, which may be displayed similarlyregardless of collection size. On the other hand, thenumber of currents that can be reasonably included in asingle river is limited. Options for addressing this issueinclude grouping through color families, as suggested inFigure 7, or using each current to represent a set ofthemes rather than a single theme.

themes into related groups and displaying each groupwith a color family. Figure 7 shows a portion of ourcolor legend with such an ordering, which emphasizeschanges in related themes and may make it easier tounderstand relationships among them.

A key cognitive advantage of the river metaphorover a simple histogram lies in the curving continuouslines that define the boundaries between topic currents.But it is also important that the visualization notmislead users. Because dates are not continuous data,we must approximate the true boundaries by interpolat-ing between discrete data points. As long as the reso-lution of the data is sufficient, ThemeRiver provides anoverview that meets our criteria for intuitiveness, easeof use, and integrity. If the user zooms in farther thanthe data resolution supports, the “truthfulness”approximated by the interpolated lines is questionable.While the resolution of data forces a lower limit onthe level of zoom, we can deal with the problem of “toomuch” resolution by combining time slices. That is, asthe user zooms out, we can increase the amount of timeper time slice and combine theme weights. In this way,we can maintain a suitable level of truthfulness withoutslowing the rendering speed to a crawl by trying to drawmore detail than necessary.

With interactive visualizations, calculation and draw-ing speeds are important. For the current features ofThemeRiver, it is sufficient to calculate the drawingpoints on startup and then recalculate only after a con-figuration change. Nevertheless, a fast, efficient algo-rithm is needed. We are investigating curved-line algo-rithms and ways to speed up both the calculations andthe rendering.

7. Conclusions

Figure 7: Tracking related themes is simplified byassigning them to the same color family. Thisensures related themes appear together and areidentifiable as a group.

Color choices pose an interesting design challenge.Color perception depends on local contrast. However,because themes come and go, it is impossible to predictwhich colors will be adjacent at any given time. More-over, we want to show a relatively large number ofthemes in the river and still achieve acceptable dis-criminability. Currently we are exploring a solution sug-gested during formative usability evaluation: sorting

ThemeRiver is a demonstration prototype, developedto test the value of the metaphor. We are continuing toadd interaction capabilities to it. We also need todevelop ways to build the event timeline automaticallyand to provide more flexibility in selecting and orderingthe theme currents. From formative usability evaluation,we learned that users want to know more about thecontext of the river and want to access the documentsthat contribute to it at a particular point in time.

We conclude that ThemeRiver is potentially valuablefor information analysts and plan to develop it into a

full production system.

8. Acknowledgments

We gratefully acknowledge the contributions of ourcolleagues at Battelle to the development and testing ofthe ThemeRiver visualization. Special thanks forcontributions to this paper go to Grant Nakamura, AlanWillse, Sharon Eaton, Wanda Mar, and Dan Donohoo.Battelle Memorial Institute’s Information SynthesisPlatform funded this research.

9. References

1. D. Brodbeck, M. Chalmers, A. Lunzer, and P. Cotture,

“Domesticating Bead: Adaptiing an InformationVisualization System to a Financial Institution,”Proceedings of InfoViz ’97. IEEE Computer Society, LosAlamitos, CA, 1997, pp. 73-80.

2. K.W. Edwards and E.D. Mynatt, “Timewarp: Techniques

for Autonomous Collaboration,” Proceedings of CHI’97, Association for Computing Machinery, Inc., 1997, pp.218-225.

3. M. Hemmje, “LyberWorld: a 3D Graphical User Inter-face for Fulltext Retrieval,” Conference Companion onHuman Factors in Computing Systems, 1995, pp. 417 -418.

4. M. Hemmje, C. Kunkel, and A. Willett. “LyberWorld - a

Visualization User Interface Supporting Fulltext Re-trieval,” Proceedings of the 17th Annual InternationalACM-SIGR Conference on Research and Developmentin Information Retrieval, 1994, pp. 249 -259.

5. B. Hetzler, P. Whitney, L. Martucci, L., and J.

Thomas, “Multi-faceted Insight ThroughInteroperable Visual Information AnalysisParadigms,” Proceedings of IEEE Symposium onInformation Visualization, InfoVis '98, 1998,pp.137-144.

6. D.D. Hoffman, Visual Intelligence: How We Create

What We See, W.W. Norton & Company, Inc.,New York, 1998.

7. G.M. Karam, “Visualization Using Timelines,”

Proceedings of the 1994 International Symposiumon Software Testing and Analysis, 1994, pp. 125-137.

8. K. Koffka, (1935), Principles of Gestalt

Psychology , Harcourt-Brace, New York, 1935.9. R.L. Kullberg, “Dynamic Timelines: Visualizing

the History of Photography,” Proceedings of CHI’96, 1996, pp. 386-397.

10. V. Kumar and R. Furuta, “Visualization of

Relationships,” Proceedings of Hypertext 99, ACM Press, Darmstadt, Germany, 1999.

11. G. Lakoff and M. Johnson, Metaphors We Live By.

University of Chicago Press, Chicago, 1983.

12. W. Mackay and M. Beaudouin-Lafon, “Diva:

Exploratory Data Analysis with MultimediaStreams,” Proceedings of CHI’98, 1998, pp. 416-423.

13. L.T. Nowell, R.K. France, D. Hix, L.S. Heath, and

E.A. Fox, “Visualizing Search Results: SomeAlternatives to Query-Document Similarity,”Proceedings of SIGIR ’96, ACM Press, Zurich,1996, pp. 67-75.

14. C. Plaisant, D. Heller, J. Li, B. Shneiderman, R.J.

Mushinlin, and J. Karat, Visualizing MedicalRecords with LifeLines. CHI ’98 Summary, 1998,28-29.

15. C. Plaisant, B. Milash, A. Rose, S. Widoff, and B.

Shneiderman, “Lifelines: Visualizing Personal His-tories,” Proceedings of CHI ’96, Association forComputing Machinery, Inc, 1996, pp. 221-227.16. E.R. Tufte, Visual Explanations: Images and

Quantities, Evidence and Narrative, GraphicsPress, Cheshire, CT, 1997, 90-91.

17. C. Ware, Information Visualization: Perception for

Design, Academic Press, San Diego, 2000.

18. J.A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M.

Pottier, A. Schur,, and V. Crow, “Visualizing theNon-Visual: Spatial Analysis and Interaction withInformation from Text Documents,” S.K Card, J.D.Mackinlay, and B. Shneiderman, (editors.), Read-ings in Information Visualization: Using Vision toThink , Morgan Kaufmann, San Francisco, 1999,pp. 442-45


相关文章

  • 文献检索实验报告
  • 文献检索课程的意义和认识 当今时代是一个知识爆炸的时代,知识与情报是巨大的社会财富,可以说及时有效地获取并利用必要的知识与情报是当今社会取得竞争的制胜关键.这是高等学校学生必须练好的本领,也是我国教育面向现代化.面向世界.面向未来的需要.知 ...查看


  • 本科计算机论文题目
  • 基于asp 语言的测试项目 学生信息管理系统的设计与实现 基于ASP.NET 的社区人口管理系统 基于ASP.NET 的课程教学网站设计 公司会议网站 C#高校工资管理系统 C#在线点歌系统 <数据库原理>精品课程网站设计 教师 ...查看


  • 学校格式文件
  • 附件2 华中科技大学武昌分校本科毕业设计/论文工作流程 附件3 华中科技大学武昌分校 20 级本科毕业设计/论文工作安排表 院系负责人(签字): 院系(盖章) 年 月 日 附件4 华中科技大学武昌分校学生创新成果类毕业设计/论文申请表 附件 ...查看


  • 图书馆管理系统毕业设计开题报告
  • 毕业设计(论文) 开题报告 题 目 图书管理系统的设计与开发 专 业 计算机科学与技术 班 级 10 计科 0 1班 学 生 指导教师 职 称 助 教 高科学院 年 一.毕业设计(论文) 课题来源.类型 二.选题的目的及意义 目的:通过本次 ...查看


  • 慕课在中国研究进展情况的文献分析
  • [摘要]文章以中国知网数据库近五年(2009―2014年)关于慕课(MOOC)的1137篇文献(966篇期刊论文.24篇优秀硕博论文.26篇会议论文和121篇报纸文献)为研究对象,从文献视角对数据进行分析和处理,梳理了慕课文献的类型.时间分 ...查看


  • 数据挖掘技术在电子期刊及数字图书馆中的应用
  • 摘要: 本文介绍了数据挖掘的概念及主要技术,数字图书馆个性化服务的含义,分析了数据挖掘技术在电子期刊及数字图书馆中的应用. Abstract: This article introduced data mining's concept an ...查看


  • 论文内容要求
  • 论文内容包括: 1.毕业论文(设计)的文档包括: ①任务书: ②开题报告: ③文献综述:论文题目中所引用的文献,进行综述. ④翻译文章(包括外文原文): ⑤毕业论文(设计)正文:包括封面.论文目录.标题.学院与专业.作者,指导教师.内容摘要 ...查看


  • 学士学位论文撰写模板
  • 学士学位论文撰写规范 学士学位论文(设计说明书)是学生在教师的指导下经过调查研究.科学实验或工程设计,对所取得成果的科学表述,是学生毕业及学位资格认定的重要依据.其撰写在参照国家.各专业部门制定的有关标准及语法规范的同时,应遵照如下规范: ...查看


  • 企业成品库存管理系统开题报告
  • 毕业设计(论文)材料之二(2) 本科毕业设计(论文)开题报告 题目:企业成品库存管理系统 课题类 学生姓名: 学号:雷锋00000000 00000 鸡院 法海 2013年3月4日专业班级:学院:指导教师:开题时间: 2013年3月4日 毕 ...查看


  • 苏州大学论文格式
  • 苏州大学本科生毕业设计(论文)工作条例 (苏大教[2012]10号) 毕业设计(论文)是本科专业人才培养方案的重要组成部分,是培养学生综合运用所学基础理论.基本知识和基本技能,进行科学研究初步训练,提高分析.解决实际问题能力的重要教学环节. ...查看


热门内容