Chris
Harrison



This work was part of my Masters thesis at New York University. The sections below are excepts. If you are interested in reading the full document, please email me.



Introduction

Recent advances in storage technology allow users to amass large quantities of documents. Other trends in consumer electronics and the Internet have lead to a proliferation of digital photographs, movies and music. Furthermore, users must also manage documents they do not create, such as those received by email or downloaded from the Internet. These factors lead to an organizational overload that is a tremendous burden on computer users, and in particular, presents a significant obstacle for continued and effective use by novice users [4,7,9].

Hierarchical filing systems, on which the digital equivalents are based, are widely used. However, it is not clear that this method of organization is optimal. Computer systems allow us to break away from limitations in the real world and provide new, physically impossible and powerful ways to access media. To better target our efforts in developing a new file navigation and management paradigm, it was important to first consider inherent drawbacks of hierarchical file systems. We identify three general categories:

Organization and Overload

In order for hierarchical systems to be effective management schemes, users must be diligent and spend time to organize documents appropriately. If documents are not organized, or worse, incorrectly arranged, the system can become more unwieldy than a flat file system. Also, as directories become saturated with files, users create sub-folders to partition documents into smaller and more manageable sets. As users create and acquire additional files, maintaining and navigating these increasingly deep organizational structures becomes complicated and time-consuming.

Naming Ambiguity

Effective naming is vital for maintaining easy-to-navigate hierarchies. The user, rather than remembering the entire hierarchical structure, can scan over a list of directory names and choose the one that is most applicable to the target document (e.g. resumes). However, this system becomes clumsy when directories are poorly or ambiguously named.

Users also rely on informative names to differentiate between files. Inconsistent and vague naming leads to confusion, potentially requiring several files to be opened before the correct one is located, even within a single directory. Many desktop environments remedy this problem for image files by generating thumbnails. Image names, which are typically cryptic (e.g., IMG_1092.jpg), are no longer essential because the thumbnails provide sufficient distinction. However, current systems do not provide an effective way to see the contents of other media, especially text, without opening them.

Versioning

Users create versions of documents for two reasons: they provide a safety net for accidental modifications and offer a history of documents for reference. Users sometimes rely on simple naming conventions, such as dating, lettering or numbering (e.g., resume1.doc, resume2.doc). However, this system requires users to be consistent and accurate in the creation and naming of versions to be reliable.



Time-Centric File Organization

One way to alleviate the difficulties associated with hierarchical filing systems is to avoid hierarchies altogether. As noted by Rikimoto, personal activities are tightly coupled with the flow of time, providing an obvious mechanism for automatic organization [10]. Coincidentally, considerable temporal information is produced as a byproduct of regular computer use - file creation and modification times can be readily captured. Although these factors create an immediately favorable platform on which to organize files, there are also numerous universal cognitive abilities that can be leveraged to great advantage.

Foremost, users have excellent memory for when, roughly, documents were created or edited [2,5]. This makes a timeline, where users can rapidly move back and forward through time and set the temporal extent (i.e., the length of the time period), an obvious navigational mechanism. Additionally, humans are particularly adept at remembering the chronology of items [7]. Thus, during navigation, other documents can serve as temporal signposts.

Moreover, a time-based visualization has the natural ability to accentuate temporal relationships between files, especially clusters of file edits or creations. For example, a series of HTML documents and images created in close temporal proximity might comprise work relating to a single website. Furthermore, photographs (or movie clips) taken at roughly the same time are likely to have been captured at the same location or event. A timeline is also useful for projects with finite time spans, as users can simply set the timeline’s extent and view all material created during that period.

These clustering and ordering clues serve as a temporal context, which has been shown to substantially improve recall and file recognition accuracy [11]. Files that surround a target document often reveal what the file is about, and how and why it was created or modified. It should be noted that computer-aided work is highly fragmented [8], which will break up continuous, project-level file creations and modifications. However, interleaved documents from other jobs often enhance the temporal context because humans have excellent recollection for parallel tasks [2,5].

Previous research has noted that users do not place an emphasis on old documents, although archiving can be useful [1,3]. A timeline intrinsically supports this behavior by automatically diminishing the presence of old files simply by rendering them in the past. Recently accessed files, and the ones most likely to be relevant, are located near the present.





An overview of the Kronosphere interface: A) content-driven search menu, B) buttons to quickly navigate to common temporal extents (e.g., current day, month). C) timeline visualization, D) scrollbar for seeking the timeline, and E) file information pane.

Kronosphere

Our investigation of hierarchical file systems and time-centric document organization and navigation revealed several areas that have been under-explored: 1) visualizing and highlighting temporal context, 2) file versioning, 3) keyword tagging, and 4) tightly integrating time and content into a unified search mechanism. Although Kronosphere boasts a comprehensive array of features, the system description will primarily concentrate on elements that address these particular issues.

Visualization

Kronosphere uses a timeline-based visualization. Each time a file is saved to the system, either through creation or modification, a new entry is created and displayed on a timeline. Temporal distances between files are preserved visually. This method emphasizes important temporal relationships between files. For example, a cluster of quick edits or a period of downtime between two activities would be readily identifiable. Furthermore, the timeline provides an intuitive and unobtrusive versioning mechanism. Each time a file is modified, a new instance of the document is attached to the timeline (including content). This allows one to navigate the timeline and see each modification, the earliest instance being the file’s creation.

Another feature unique to Kronosphere is the ability to view the timeline in a linear or exponential mode. The linear view represents all time linearly, such that a unit of time is represented by a fixed amount of space. Alternatively, the exponential view scales time in a decaying manner, such that a unit of time becomes smaller the further in the past it is located. This simple feature has a nice effect: older files bunch up, while newer documents are more spread out, allowing them to be more readily recognized. The latter mode was developed under the assumption that newer files and versions of files are often more relevant than older ones, a view supported by previous research.

Keyword Tagging

Kronosphere allows users to attach keywords to their documents. This enables users to create their own folksonomies, tagging documents in a similar way to successful systems like del.icio.us and Flickr . This feature, coupled with a search engine, provides a powerful and flexible organization system. Additionally, the keywords alleviate naming ambiguity, one of the key deficiencies seen in hierarchical file systems. Even documents with similar content, such as versions of the same file, can be tagged in a way to successfully differentiate them.

However, expecting users to tag all of their files is unrealistic. Thus, Kronosphere automatically generates keywords for documents as they are added to the timeline. For text documents, a text analytics package developed by Jeff Borden of New York University is used to extract significant words, phrases and entities. This is achieved through a combination of TF-IDF statistical analysis (Term Frequency – Inverse Document Frequency) and linguistic analysis (named entity extraction and part of speech tagging). Other file types are supported as well, for example, mp3 files are tagged their ID3 tags (e.g., song, genre, artist, year) and images are tagged with their primary colors.

Search

Desktop search interfaces are a popular and successful extension to the desktop metaphor. Notable systems include Google’s Desktop Search, released in 2005, and Apple’s Spotlight feature, which débuted in MacOSX 10.4. Similarly, Kronosphere offers a rich content-search interface, including the ability to execute full-text searches. Kronosphere’s inclusion of an additional dimension, time, only helps to refine the search and produce more accurate results. When the result list is returned, our system’s natural temporal clustering and ordering clues allows users to quickly hone in on the desired file or particular version of a file. Additionally, keywords from documents in the result set can be quickly added to refine the search and further reduce the number of hits.

Content and metadata search features are primarily offered through a menu located above the timeline (see Figure, Label A). In addition to full text search, users can also search by keyword, file name, and file type. Kronosphere also offers methods for accessing document versions, including the ability to see the entire version history of a file or to jump to the most recent version. Users can also search for related content given a target file. This is achieved by locating other documents that have similar tags. Additionally, a prototype content-based image retrieval system was used to generate image metadata that allowed the visual content to be searchable. Specifically, given a target image, users could find images similar in visual composition.

Interaction

In Kronosphere, users can move backwards and forwards through time using a horizontal scroll bar located at the bottom of the timeline (see Figure 6, Label D). Users can also click on a document or in the whitespace between documents to center the timeline and focus on the corresponding date.

The ability to change the temporal extent is also critical to effective navigation of time-space. Kronosphere provides several mechanisms: First, three buttons provide quick access to commonly used periods – the current day, week, and month (see Figure 6, Label B). Second, users can right-click a document and select the extent of the surrounding time period. This ranges from a minute to a month in duration, and allows users to quickly focus on a particular period and access other files created and modified around the same time. Third, the mouse can be used to control the temporal extent. Double clicking not only centers the timeline on the corresponding time, but also reduces the temporal extent (analogous to zooming in). Lastly, the scroll wheel can be used to zoom in and out of time as well.

Kronosphere limits the number of documents that can be seen at any given time (typically set between 10 and 50). The reason for this restriction is two fold. First, an abundance of files will cause the timeline to become too cluttered to be useful. Second, and most importantly, users have difficulty visually scanning and mentally processing large quantities of files. Instead, the interface encourages users to refine their search, either by using temporal context clues to narrow the temporal extent (i.e., zooming in) or using content clues, such as keywords, to add relevant terms to the search query. This multi-dimensional and iterative search approach rapidly reduces the number of possible matches in addition to providing a wealth of information about potentially related items.

Architecture

In order to minimize impact, the current version of the application is designed to run alongside the user’s existing operating system, hierarchical file structure and applications. The current Java-based version runs on Windows, Linux and MacOS.

Kronosphere is composed of three major components. The client, which is the primary focus of this paper, provides a thin, but rich interface in which users can search a central database. This database can be local or remote; the latter affording users the possibility to share files (and versions) collaboratively. The final component is a daemon that monitors a user’s hierarchical file system for changes. When a new file or modification to an existing file is detected, the file is processed and a new record is created in the database. Keywords are extracted during this process.



References

  1. Barreau, D. and Nardi, B. A. Finding and reminding: file organization from the desktop. SIGCHI Bulletin 27, 3, 39-43, July 1995.
  2. Blanc-Brude, T. and Scapin, D. L. What do people recall about their documents?: Implications for desktop search tools. In Proceedings of the 12th international Conference on Intelligent User Interfaces, pages 102-111. ACM Press, New York, NY, 2007.
  3. Fertig, S., Freeman, E., and Gelernter, D. “Finding and reminding” reconsidered. SIGCHI Bulletin 28, 1, 66-69, January 1996.
  4. Freeman, E. and Gelernter, D. Lifestreams: a storage model for personal data. SIGMOD Rec. 25, 1, 80-86, March 1996.
  5. Gonçalves, D. and Jorge, J. A. Describing documents: what can users tell us? In Proceedings of the 9th international Conference on Intelligent User Interfaces, pages 247-249. ACM Press, New York, NY, 2004.
  6. Krishnan, A. and Jones, S. TimeSpace: activity-based temporal visualisation of personal information spaces. Personal Ubiquitous Computing. 9, 1, 46-65, January 2005.
  7. Lansdale. M. The psychology of personal information management. Applied Ergonomics, 19, 1, 55-66, 1988.
  8. Mark, G., Gonzalez, V. M., and Harris, J. No task left behind?: Examining the nature of fragmented work. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 321-330. ACM Press, New York, NY, 2005.
  9. Marsden, Gary and Cairns, D. Improving the Usability of the Hierarchical File System. South African Computer Journal, 32, 1, 69-78, 2004.
  10. Rekimoto, J. Time-machine computing: a time-centric approach for the information environment. In Proceedings of the 12th Annual ACM Symposium on User interface Software and Technology, pages 45-54. ACM Press, New York, NY, 1999.
  11. Soules, C. A. and Ganger, G. R. Connections: using context to enhance file search. In Proc. of the Twentieth ACM Symposium on Operating Systems Principles, pages 119-132. ACM Press, New York, NY, 2005.
© Chris Harrison