Intellectual Access to Images :
An Overview

Megan Winget: October 24, 2002

Fifteen years ago, art historian Richard Brilliant wrote a prescient appeal for a more thorough intermingling of text-based and image-based indexing systems for art historians (Brilliant, 1988) . He argued that although art historians have a highly developed visual memory of the objects they study; these powers are insufficient to place an object historically and interpret it properly if the retrieval system only provides textual descriptions and indexing terms. While his basic appeal that art historians need more images at their disposal has been answered in the intervening two decades by the development and deployment of millions upon millions of images available to scholars online; his underlying premise that indexing systems and images must be seamlessly integrated is still a challenge for information technology professionals.

Text-Based, or Manual Approaches

In fact, librarians have been working on image classification and retrieval for nearly twenty-five years. LIS researchers have developed and augmented existing structures and vocabularies for visual information; the Art and Architecture Thesaurus (AAT) (Peterson, 1990) and the Library of Congress Thesaurus of Graphic Materials (LCTGM) (Library of Congress Prints and Photographs Division, 2000) being two of the most well known and widely used indexing systems. However, using traditional "manual" indexing techniques like the AAT and the LCTGM for image retrieval is often problematic. On a practical level, these systems are generally developed for the specific needs of a particular audience or collection, and are not good for broad collections (Jörgensen, 1996 , Jörgensen, 1999 , Greenberg, 1993) . Robust, (Layne, 1994) and consistent (Markey, 1984) indexing requires subject specialists trained in the use of the system, and is therefore expensive. The very hierarchy of the systems, which provides much needed structure and standardization, also has a tendency to either limit (Hourihane, 1999) or awkwardly scatter the data (Jörgensen, 2000) . Finally, because of the richness and opacity of the data sources, effective retrieval often requires the services of an information expert, hence is not so effective for naïve users in online environments (Gordon, 2001) . On a more abstract level, there are theoretical problems with using text to describe and classify objects that are not textual in nature (Jörgensen, 2001) . 

Other text-based indexing initiatives deal with developing metadata schemas and structures to classify image information. Two sets of schemas that have been widely distributed and used are the Dublin Core (Dublin Core, 1999) which is used primarily for retrieving resources on the web, and the VRA Core (VRA, 2002) , which has elements to describe both an original work of art and its surrogate. While both of these metadata standards have strengths, each is also limited by the original intent of the developers. For example, the Dublin Core is a very basic building block for providing access to a lot of different types of information. However, because its original intent was so broad, simply to “provide descriptions for networked resources,” (Hillman, 2001) the metadata elements are also extraordinarily general – almost to the point of futility. The VRA Core is more specific – it was developed explicitly to deal with art and architecture – but because many of the elements focus on providing information about surrogates it seems to work best for use in administrative, institutional settings (Baca, 1998) . There are yet more metadata schemas that provide richer, more detailed information on the original work – like the Categories of Description of Works of Art (the CDWA), from the Getty Institute (Baca & Harpring, 2000) . But again, all of these text-based retrieval methods have serious limitations related to the purpose for which they were originally conceived.

Machine-based, or Automatic Approaches

A completely different approach to image retrieval has grown out of pattern recognition research and has its roots in fields like computer science, medical informatics and electrical engineering (Chen, 2001) . The general term for this is "content-based" retrieval. The term "content-based" comes from the Electrical Engineering / Computer Science community and refers to the retrieval of that visual information depicted in the image - the "content" of the image. The major research in this field has centered on automated indexing of attributes at the pixel level (Chang et al., 1997 ; Rui, Huang, & Chang, 1999) ; focusing on indexing color (Swain & Ballard, 1991 ; Smith & Chang, 1995) , texture (Tamura, Mori, & Yamawaki, 1978 ;  Smith & Chang, 1994) , and shapes (Li & Ma, 1994) ; and the introduction of new search functions (Faloutsos et al., 1994 ; Gupta, Santini, & Jain, 1997 ; Ma & Manjanath, 1998) . Using these systems, users can forego the use of text and, in effect, find a relevant proto-image and tell the system to "find more like this." Some of the most influential content-based retrieval systems are the IBM Query by Image Content (QBIC) project (Flickner et al., 1995) , and VisualSEEk (Smith & Chang, 1996) .  

The chief question for content-based retrieval systems is not whether they are too specific and expensive like the traditional text-based retrieval systems - computers are doing the cataloging and retrieval, so human-based knowledge and expenses do not play a part - but whether these types of systems are retrieving information that any users find valuable. Chen (2001) says "it is not clear how the retrieval functionality of these systems correlates with image information needs of real users" and Jörgensen, Jaimes, Benitez, & Chang (2001) say that these systems "currently address only a small portion of the complete range of image attributes of potential interest to users of digital image collections."

Further Research: The User

Meeting users’ needs has, in fact, long been recognized as one of the major challenges looming over both the manual and automatic indexing communities (Keister, 1994 ; Smith, 2001) . User-centered research in image retrieval has focused on identifying users' visual information needs (Enser & McGregor, 1993 ; Goodrum & Spink, 1999 ; Conniss, Ashford, & Graham, 2000) , defining the complexity of their image queries (Hastings, 1995b ; Hastings, 1995a) , and reviewing vocabularies to make image databases more relevant to specialists and non-specialists alike (Jörgensen, 1995 ; Jörgensen, 1998) . There is, of course, quite a bit of literature related to users’ general information seeking behavior, particularly in the field of digital libraries, where many image databases live (Yang, 1997 ; Marchionini, 1995) , but literature dealing specifically with the question of image retrieval is somewhat sparse – a disturbing realization for this researcher. We know that users, naïve and specialized, are not using and/or interacting with the systems as much as we’d want or expect (Lynch, 2002) ; we also know that we’ve spent a lot of time, effort and money building the systems. Exploring user needs and behaviors is a basic and important phase of system development; these systems should not fail for lack of relevant user studies.

Further Research: System Interoperability

In the preceding paragraph I mentioned the user, the idea of the digital library, the image retrieval system, and the lack of relevant studies that tie these three entities together. One of the great strides in IS literature in recent years has been the realization that progress depends on the consideration of image indexing within this larger framework that includes the image, the delivery system, and the user (Jörgensen, 2001) .  There are two basic approaches in response to the challenge of integration.

The first approach, typified by “Blobworld,” a system developed at the University of California, Berkeley (Carson et al., 1999) , explores the intersection between traditional textual indexing and content-based approaches. Blobworld is primarily an automated system that employs traditional indexing techniques whenever possible. By extracting the structure and defining regions of an image, Berkeley researchers are hoping to make generalizations about the subject of an image. For example, in an image database of specifically flowers, those images of an object with a small, dark, circular central region, surrounded by a specific number of lighter “petal-like” structures is, more often than not, a daisy. For some applications this assumption could then be used to propose that specific pairs of images are in fact the same thing, and organize them accordingly (Carson et al., 2000) . Further, Berkeley researchers have observed that when simple indexing terms already exist for given images, the presence of content-based indexing improves the retrieval rate. For example, searching on a word like “rose” yields a number of pictures that include roses; but also include images for which roses are not the subject, or the returned “rose” might be a “rose window.” When the user has the ability to call for a “rose,” and define a rose-shaped blob, then the retrieval rate is significantly improved (Wilensky, 2000) . More recently, scholars have advanced research in this area by trying to specifically determine the degree to which the similarity of image content, as defined by automatic indexing techniques, can realistically determine the inheritability of terms (Goodrum et al., 2001) .

The second approach is not to develop new systems, but to explore new indexing schemas that bring together aspects of both manual and automatic indexing techniques. The development of the MPEG-7 standard (MPEG-7 Requirements Group, 1999) , which is a set of descriptors that can be used to describe various types of multimedia information, as well as predefined structures of descriptors and their relationships, prompted many proposals to describe and structure multimedia information. Jaimes & Chang (2000) have mapped the Visual Description Scheme to a “pyramid” indexing structure, which differentiates between the perceptual or syntactical information, developed automatically by content-based retrieval methods; and conceptual or semantic information which is inherently more interpretive and dependent on contextual or subjective information. Further research on this proposal (Jörgensen et al., 2001) has demonstrated that the Pyramid schema is a “robust system:” it can accommodate a full range of image attributes, both manually and automatically acquired, it can effectively organize visual content for retrieval, and it has a relatively intuitive structure, hence it is a useful and consistent guide to the indexing process.

A different approach, although still related to exploring new indexing schemas is the development of metadata “crosswalks.” Greenberg (Greenberg, 2001) has done work on defining the strengths and weaknesses among different schemas, and the Getty Institute is exploring ways in which we could build “crosswalks” between different schemas (Baca et al., 2000) . Finally, some very exciting research is being done in “Application Profiles” (Heery & Patel, 2000) , which allows users to mix and match pieces of various metadata schemas to meet their needs.

These cooperative systems are making headway towards providing users with the significance and speed that they need to make sense of the vast numbers of images available online. However, it is also important to discuss, very briefly, a recently introduced distinction between “digital libraries” and “digital collections,” and how this distinction is building upon work by earlier scholars. Speaking at Web-Wise 2002, Clifford Lynch (Lynch, 2002) noted the difference between “digital collections” – basically raw material, indexed certainly, but with minimal interpretative information; and “digital libraries,” the systems “that make digital collections come alive.” One month later, Howard Besser, (2002) built upon this idea to include a definition of those services that traditional libraries provide and how nascent digital libraries might better meet the needs of the online community through replication of those services. This is an interesting distinction, one that brings the user to the fore. Indeed, Roberts (Roberts, 2001) discusses the scarcity of meaningful intellectual access to humanities materials and calls for a more thorough and thought-provoking indexing schema – she posits that pictures are worth a thousand words, or at least a thousand data access points. It would not be monetarily feasible or even functionally desirable to have librarians or other professional cataloguers give this much attention to single works of art. This is the stuff of humanities scholarship not indexing techniques.

These distinctions and desires for richer indexing techniques are, perhaps unknowingly, building on work done by scholars like Helen Tibbo (1994) , who discusses the complexities and difficulties inherent in indexing materials for the humanities scholar; and Joseph Busch (Busch, 1994) , who developed a robust data model for a historical retrieval system. A very brief and selective list of image-based digital libraries that would meet Lynch and Besser’s requirements, as well as (perhaps) meeting Tibbo and Busch’s expectations, includes: the William Blake Archive (William Blake Archive editors and staff, 2000) , the Rossetti Archive (McGann, 1993) , the Dickinson Electronic Archives (Smith, 1995) , Documenting the American South (University of North Carolina at Chapel Hill, 2002) and the Electronic Beowulf Project (Kiernan, 1991) . It should be noted that these systems [excluding DocSouth and Beowulf] were devised and developed initially by people who knew very little about image indexing trends or document retrieval theories. Specifically, the users themselves developed these systems. A profitable area of future study, therefore, might be to look at the usage styles and retrieval rates for specialized and naïve users of these types of systems, and how developers of automatic and manual indexing schemes can use these user-centered systems as a template for future progress.

Our difficulties in this field should not cause dismay. The tension between images and written descriptions of those images has been a field of study since antiquity. In fact, ekphrasis, a Latin term defined as "a vivid description intended to bring the subject before the mind's eye of the listener," (Hornblower & A. Spawforth, 1996) was one of the most advanced rhetorical skills in the classical period. Giorgio Vasari, commonly recognized as the world's first art historian, might also be defined as the world's first image cataloger. Vasari used the ekphratic tradition not only as a springboard for philosophical discourse, but also as a means of describing the elements and themes of a painting, making artist attributions, and defining styles and movements within his field of expertise, the Italian Renaissance.  Perhaps what we need is a more literary tradition of image retrieval – an ekphratic tradition of image cataloging. Cataloging keywords and phrases, blob relationships within an image, and pixel level indexing, although obviously difficult problems, are straightforward compared to “vividly bringing the image to the mind’s eye of the user.” If the Information Science community wants to meet the needs of humanities scholars and even the general public – if we want users to access the resources we’ve spent many years and dollars building and amassing, we will need to both change our idea of what an operable digital library is, and how users will want to interact with it.

Reference List

Baca, M. & Harpring, P. (2000). Categories for the description of works of art. J. Paul Getty Trust.

Baca, M. (1998). Introduction to metadata: pathways to digital information. Getty Information Institute.

Baca, M., Gilliland-Swetland, A., Harpring, P. &., & Woodley, M. (2000). Metadata Standards Crosswalk. Getty Information Institute.

Besser, H. (2002). The next stage: Moving from digital collections to interoperable digital libraries. First Monday, 7(6), 1-20.

Brilliant, R. (1988). How an art historian connects art objects and information. Library Trends, 37(2), 120-129.

Busch, J. A. (1994). Thinking ambiguously: Organizing source materials for historical research. in Challenges in indexing electronic text and images, (pp. 23-55). Medford, NJ: Learned Information, Inc.

Carson, C., Belongie, S., Greenspan, H. &., & Malik, J. (1999). Blobworld: Image segmentation using Expectation-Maximization and its application to image querying. Department of Electrical Engineering and Computer Science. Available: http://elib.cs.berkeley.edu/carson/papers/pami.html

Carson, C., Thomas, M., Belongie, S., Hellserstein, J. M. &., & Malik, J. (2000). Bobworld: A system for region-based image indexing and retreival.  Third International Conference on Visual Information Systems Amsterdam: University of Amsterdam.

Chang, S., Smith, J. R., Beigi, M. &., & Benitez, A. (1997). Visual information retrieval from large distributed online repositories . Communications of the AC M, 40(12), 63-71.

Chen, H. (2001). An analysis of image queries in the field of art history. Journal of the American Society for Information Science and Technology, 52(3), 260-273.

Conniss, L. R., Ashford, A. J. &., & Graham, M. E. (2000). Information seeking behavior in image retrieval: VISOR I Final report. Newcastle: Institute for Image Data Research.

Dublin Core. (1999). Dublin Core metadata element set, version 1.1. Available: http://dublincore.org/documents/dces/

Enser, P. G. B. & McGregor, C. G. (1993). Analysis of visual information retrieval queries. London: British Library.

Faloutsos, C., Barber, R., Flickner, M., Hafner, J., Niblack, W., Petkovic, D. &., & Equitz, M. (1994). Efficient and effective querying by image content. Journal of Intelligent Information Systems, 3, 231-262.

Flickner, M., Sawhney, H., Niblack, W., Ashley, J., Huang, Q., Dom, B., Gorkani, M., Hafner, J., Lee, D., Petkovic, D., Steele, D. &., & Yanker, P. (1995). Query by image and video content: The QBIC system. IEEE Computer, 28, 23-32.

Goodrum, A. &., & Spink, A. (1999). Visual Information seeking: A study of image queries on the World Wide Web. Proceedings the 1999 Annual Meeting of the American Society for Information Science.

Goodrum, A., Rorvig, M. E., Jeong, K.-T. &., & Suresh, C. (2001). An open source agenda for research linking text and image content features. Journal of the American Society for Information Science and Technology, 52(11), 948-953.

Gordon, A. S. (2001). Browsing image collections with representations of common-sense activities. Journal of the American Society for Information Science and Technology, 52(11), 925-929.

Greenberg, J. (1993). Intellectual control of visual archives: A comparison between the art & architecture thesaurus and the Library of Congress thesaurus for graphic materials. Cataloging and Classification Quarterly, 16(1), 85-117.

Greenberg, J. (2001). A quantitative categorical analysis of metadata elements in image-applicable metadata schemas. Journal of the American Society for Information Science and Technology, 52(11), 917-924.

Gupta, A., Santini, S. &., & Jain, R. (1997). In search of information in visual media. Communications of the ACM, 40(12), 35-42.

Hastings, S. K. (1995a). An exploratory study of intellectual access to digitized art images. 16th National Online Meeting proceedings, pp. 177-185

Hastings, S. K. (1995b). Query categories in a study of intellectual access to digitized art images. Proceedings of the 58th Annual Meeting of the American Society for Information Science (pp.3-8). Medford, NJ: ASIS.

Heery, R., & Patel, M. (2000). Application profiles: mixing and matching metadata schemas. Ariadne, September(25).

Hillman, D. (2001). Using Dublin Core. Dublin Core Metadata Initiative.

Hornblower, S. &., & A. Spawforth, e. (1996). The Oxford classical dictionary. Oxford, New York: Oxford University Press.

Hourihane, C. (1999). Subject classification for visual collections (Visual Resources Association Special Bulletin No. 12). Columbus, OH: Visual Resources Association.

Jaimes, A. &., & Chang, S.-F. (2000). A conceptual framework for indexing visual information at multiple levels. IS&T/SPIE Internet Imaging, 3964.

Jörgensen, C. (1996). Testing an image description template. Proceedings of the 59th Annual meeting of the American Society for Information Science (ASIS '96) (pp. 209-213). Medford, NJ: Information Today.

Jörgensen, C. (1999). Retrieving the Unretrievable: Art, aesthetics, and emotion in image retrieval systems. Proceedings SPIE (vol.3644 Human vision and electronic imaging IV) (pp. 348-355). San Jose, CA: International Society for Optical Engineering.

Jörgensen, C. (1995). Classifying Images: Criteria for grouping as revealed in a sorting task. Advances in classification 6 (proceedings of the 6th ASIS SIG/CR Classification Research Workshop) Medford, NJ: Information Today.

Jörgensen, C. (1998). Attributes of images in describing tasks. Information Processing & Management, 34(2/3), 161-174.

Jörgensen, C. (2000). Image indexing: an analysis of selected classification systems in relation to image attributes named by naive users. Final Report to the Office of Sponsored Research, Online Computer Library Center.

Jörgensen, C., Jaimes, A., Benitez, A. B. &., & Chang, S. Fu. (2001). A conceptual framework and empirical research for classifying visual descriptors. Journal of the American Society for Information Science and Technology, 52(11), 938-947.

Jörgensen, C. (2001). Introduction and Overview. Journal of the American Society for Information Science and Technology, 52(11), 906-910.

Keister, L. H. (1994). User types and queries: impact on image access systems. in Challenges in indexing electronic text and images . Medford, NJ: Learned Information, Inc.

Kiernan, K. S. (1991). Digital image processing and the Beowulf manuscript. Literary and Linguistic Computing, 6(Special issue on Computers and Medieval Studies (Edited by Marilyn Deegan with Andrew Armour and Mark Infusino)), 20-27.

Layne, S. S. (1994). Some Issues in the Indexing of Images. Journal of the American Society of Information Science, 45, 583-588.

Li, B. &., & Ma, S. D. On the relation between region and contour representation. Proceedings of the IEEE International Conference on Pattern Recognition .

Library of Congress Prints and Photographs Division. (2000). Thesaurus for Graphic Materials I: Subject Terms.

Lynch, C. (2002). Digital collections, digital libraries and the digitization of cultural heritage information. First Monday, 7(5), 1-17.

Ma, W. &., & Manjanath, B. (1998). Netra: A toolbox for navigating large image databases. (pp. 568-571).

Marchionini, G. (1995). Information seeking in electronic environments. Cambridge: University Press of Cambridge.

Markey, K. (1984). Interindexer consistency tests : A literature review report of a test of consistency in indexing visual materials. Library & Information Science Research, 6(2), 155-177.

McGann, J. J. (1993). The Complete Writings and Pictures of Dante Gabriel Rossetti: A Hypermedia Research Archive.

MPEG-7 Requirements Group. (1999). Context, objectives and technical roadmap. (Report No. ISO/IEC JTC1/SC29/WG11 MPEG99/N2861). Vancouver, Canada.:

Peterson, T. (1990). Developing a new thesaurus for art and architecture. Library Trends, 38(4), 644-658.

Roberts, H. E. (2001). A picture is worth a thousand words: art indexing in electronic databases. Journal of the American Society for Information Science and Technology, 52(11), 911-916.

Rui, Y., Huang, T. S. &., & Chang, S.-F. (1999). Image retrieval: Current techniques, promising directions and open issues. Journal of Visual Communication and Image Representation, 10, 1-23.

Smith, J. R. &., & Chang, S.-F. (1996). VisualSEEk: A fully automated content-based image query system. Proceedings of the ACM International Conference on Multimedia (ACMMM) .

Smith, J. R. (2001). Quantitative assessment of image retrieval effectiveness. Journal of the American Society for Information Science and Technology, 52(11), 969-979.

Smith, J. R. &., & Chang, S.-F.(1994) Transofrm features for texture classification and discrimination in large image databases. Proceedings IEEE International conference on Image Processing .

Smith, J. R. &., & Chang, S.-F. Tools and techniques for color image retrieval . Storage & Retrieval for Image and Video Databases: Vol. IV. IS&T / SPIE Proceedings 1995.

Smith, M. N. (1995). The importance of a hypermedia archive of Dickinson's creative work. The Emily Dickinson Journal, IV(1).

Swain, M. &., & Ballard, D. (1991). Color indexing. International Journal of Computer Vision, 7(1), 11-32.

Tamura, H., Mori, S. &., & Yamawaki, T. (1978). Texture features corresponding to visual perception. IEEE Transactions on Systems, Management, & Cyb. SMC, 8(6), 460-473.

Tibbo, H. (1994). Indexing for the humanities. Journal of the American Society for Information Science, 45(8), 607-619.

University of North Carolina at Chapel Hill. (2002). Documenting the American South. Natasha Smith.

VRA Data Standards Committee. (2002).

Wilensky, R. (2000).  Research Accomplishments : Content Analysis for Access. Berkeley, CA: University of California, Berkeley.

William Blake Archive editors and staff. (2000). The persistence of vision: Images and imaging at the William Blake archive. RLG DigiNews, 4(1).

Yang, S. C. (1997). Qualitative exploration of learners' information-seeking processes using Perseus hypermedia system. Journal of the American Society for Information Science, 48(7), 667-669.