Intellectual Access to Images :
An Overview
Megan Winget: October 24, 2002
Fifteen years ago, art historian Richard Brilliant wrote a prescient
appeal for a more thorough intermingling of text-based and image-based
indexing systems for art historians (Brilliant,
1988) . He argued that although art historians have a highly developed
visual memory of the objects they study; these powers are insufficient
to place an object historically and interpret it properly if the retrieval
system only provides textual descriptions and indexing terms. While his
basic appeal that art historians need more images at their disposal has
been answered in the intervening two decades by the development and deployment
of millions upon millions of images available to scholars online; his
underlying premise that indexing systems and images must be seamlessly
integrated is still a challenge for information technology professionals.
Text-Based, or Manual Approaches
In fact, librarians have been working on image classification and retrieval
for nearly twenty-five years. LIS researchers have developed and augmented
existing structures and vocabularies for visual information; the Art and
Architecture Thesaurus (AAT) (Peterson, 1990)
and the Library of Congress Thesaurus of Graphic Materials (LCTGM) (Library
of Congress Prints and Photographs Division, 2000) being two of the
most well known and widely used indexing systems. However, using traditional
"manual" indexing techniques like the AAT and the LCTGM for
image retrieval is often problematic. On a practical level, these systems
are generally developed for the specific needs of a particular audience
or collection, and are not good for broad collections (Jörgensen,
1996 , Jörgensen, 1999 , Greenberg,
1993) . Robust, (Layne, 1994) and consistent
(Markey, 1984) indexing requires subject specialists
trained in the use of the system, and is therefore expensive. The very
hierarchy of the systems, which provides much needed structure and standardization,
also has a tendency to either limit (Hourihane,
1999) or awkwardly scatter the data (Jörgensen,
2000) . Finally, because of the richness and opacity of the data sources,
effective retrieval often requires the services of an information expert,
hence is not so effective for naïve users in online environments (Gordon,
2001) . On a more abstract level, there are theoretical problems with
using text to describe and classify objects that are not textual in nature
(Jörgensen, 2001) .
Other text-based indexing initiatives deal with developing metadata schemas
and structures to classify image information. Two sets of schemas that
have been widely distributed and used are the Dublin Core (Dublin
Core, 1999) which is used primarily for retrieving resources on the
web, and the VRA Core (VRA, 2002) , which has elements
to describe both an original work of art and its surrogate. While both
of these metadata standards have strengths, each is also limited by the
original intent of the developers. For example, the Dublin Core is a very
basic building block for providing access to a lot of different types
of information. However, because its original intent was so broad, simply
to “provide descriptions for networked resources,” (Hillman,
2001) the metadata elements are also extraordinarily general – almost
to the point of futility. The VRA Core is more specific – it was developed
explicitly to deal with art and architecture – but because many of the
elements focus on providing information about surrogates it seems to work
best for use in administrative, institutional settings (Baca,
1998) . There are yet more metadata schemas that provide richer, more
detailed information on the original work – like the Categories of Description
of Works of Art (the CDWA), from the Getty Institute (Baca
& Harpring, 2000) . But again, all of these text-based retrieval
methods have serious limitations related to the purpose for which they
were originally conceived.
Machine-based, or Automatic Approaches
A completely different approach to image retrieval has grown out of pattern
recognition research and has its roots in fields like computer science,
medical informatics and electrical engineering (Chen,
2001) . The general term for this is "content-based" retrieval.
The term "content-based" comes from the Electrical Engineering
/ Computer Science community and refers to the retrieval of that visual
information depicted in the image - the "content" of the image.
The major research in this field has centered on automated indexing of
attributes at the pixel level (Chang et al., 1997
; Rui, Huang, & Chang, 1999) ; focusing
on indexing color (Swain & Ballard, 1991
; Smith & Chang, 1995) , texture (Tamura,
Mori, & Yamawaki, 1978 ; Smith &
Chang, 1994) , and shapes (Li & Ma, 1994)
; and the introduction of new search functions (Faloutsos
et al., 1994 ; Gupta, Santini, &
Jain, 1997 ; Ma & Manjanath, 1998)
. Using these systems, users can forego the use of text and, in effect,
find a relevant proto-image and tell the system to "find more like
this." Some of the most influential content-based retrieval systems
are the IBM Query by Image Content (QBIC) project (Flickner
et al., 1995) , and VisualSEEk (Smith &
Chang, 1996) .
The chief question for content-based retrieval systems is not whether
they are too specific and expensive like the traditional text-based retrieval
systems - computers are doing the cataloging and retrieval, so human-based
knowledge and expenses do not play a part - but whether these types of
systems are retrieving information that any users find valuable. Chen
(2001) says "it is not clear how the retrieval
functionality of these systems correlates with image information needs
of real users" and Jörgensen, Jaimes, Benitez, & Chang (2001)
say that these systems "currently address only a small portion of
the complete range of image attributes of potential interest to users
of digital image collections."
Further Research: The User
Meeting users’ needs has, in fact, long been recognized as one of the
major challenges looming over both the manual and automatic indexing communities
(Keister, 1994 ; Smith, 2001)
. User-centered research in image retrieval has focused on identifying
users' visual information needs (Enser &
McGregor, 1993 ; Goodrum & Spink, 1999
; Conniss, Ashford, & Graham, 2000)
, defining the complexity of their image queries (Hastings,
1995b ; Hastings, 1995a) , and reviewing
vocabularies to make image databases more relevant to specialists and
non-specialists alike (Jörgensen, 1995 ; Jörgensen,
1998) . There is, of course, quite a bit of literature related to
users’ general information seeking behavior, particularly in the field
of digital libraries, where many image databases live (Yang,
1997 ; Marchionini, 1995) , but literature
dealing specifically with the question of image retrieval is somewhat
sparse – a disturbing realization for this researcher. We know that users,
naïve and specialized, are not using and/or interacting with the systems
as much as we’d want or expect (Lynch, 2002) ;
we also know that we’ve spent a lot of time, effort and money building
the systems. Exploring user needs and behaviors is a basic and important
phase of system development; these systems should not fail for lack of
relevant user studies.
Further Research: System Interoperability
In the preceding paragraph I mentioned the user, the idea of the digital
library, the image retrieval system, and the lack of relevant studies
that tie these three entities together. One of the great strides in IS
literature in recent years has been the realization that progress depends
on the consideration of image indexing within this larger framework that
includes the image, the delivery system, and the user (Jörgensen,
2001) . There are two basic approaches in response to the challenge
of integration.
The first approach, typified by “Blobworld,” a system developed at the
University of California, Berkeley (Carson et
al., 1999) , explores the intersection between traditional textual
indexing and content-based approaches. Blobworld is primarily an automated
system that employs traditional indexing techniques whenever possible.
By extracting the structure and defining regions of an image, Berkeley
researchers are hoping to make generalizations about the subject of an
image. For example, in an image database of specifically flowers, those
images of an object with a small, dark, circular central region, surrounded
by a specific number of lighter “petal-like” structures is, more often
than not, a daisy. For some applications this assumption could then be
used to propose that specific pairs of images are in fact the same thing,
and organize them accordingly (Carson et al.,
2000) . Further, Berkeley researchers have observed that when simple
indexing terms already exist for given images, the presence of content-based
indexing improves the retrieval rate. For example, searching on a word
like “rose” yields a number of pictures that include roses; but also include
images for which roses are not the subject, or the returned “rose” might
be a “rose window.” When the user has the ability to call for a “rose,”
and define a rose-shaped blob, then the retrieval rate is significantly
improved (Wilensky, 2000) . More recently, scholars
have advanced research in this area by trying to specifically determine
the degree to which the similarity of image content, as defined by automatic
indexing techniques, can realistically determine the inheritability of
terms (Goodrum et al., 2001) .
The second approach is not to develop new systems, but to explore new
indexing schemas that bring together aspects of both manual and automatic
indexing techniques. The development of the MPEG-7 standard (MPEG-7
Requirements Group, 1999) , which is a set of descriptors that can
be used to describe various types of multimedia information, as well as
predefined structures of descriptors and their relationships, prompted
many proposals to describe and structure multimedia information. Jaimes
& Chang (2000) have mapped the Visual
Description Scheme to a “pyramid” indexing structure, which differentiates
between the perceptual or syntactical information, developed automatically
by content-based retrieval methods; and conceptual or semantic information
which is inherently more interpretive and dependent on contextual or subjective
information. Further research on this proposal (Jörgensen
et al., 2001) has demonstrated that the Pyramid schema is a “robust
system:” it can accommodate a full range of image attributes, both manually
and automatically acquired, it can effectively organize visual content
for retrieval, and it has a relatively intuitive structure, hence it is
a useful and consistent guide to the indexing process.
A different approach, although still related to exploring new indexing
schemas is the development of metadata “crosswalks.” Greenberg (Greenberg,
2001) has done work on defining the strengths and weaknesses among
different schemas, and the Getty Institute is exploring ways in which
we could build “crosswalks” between different schemas (Baca
et al., 2000) . Finally, some very exciting research is being done
in “Application Profiles” (Heery & Patel,
2000) , which allows users to mix and match pieces of various metadata
schemas to meet their needs.
These cooperative systems are making headway towards providing users
with the significance and speed that they need to make sense of the vast
numbers of images available online. However, it is also important to discuss,
very briefly, a recently introduced distinction between “digital libraries”
and “digital collections,” and how this distinction is building upon work
by earlier scholars. Speaking at Web-Wise 2002, Clifford Lynch (Lynch,
2002) noted the difference between “digital collections” – basically
raw material, indexed certainly, but with minimal interpretative information;
and “digital libraries,” the systems “that make digital collections come
alive.” One month later, Howard Besser, (2002)
built upon this idea to include a definition of those services that traditional
libraries provide and how nascent digital libraries might better meet
the needs of the online community through replication of those services.
This is an interesting distinction, one that brings the user to the fore.
Indeed, Roberts (Roberts, 2001) discusses the
scarcity of meaningful intellectual access to humanities materials and
calls for a more thorough and thought-provoking indexing schema – she
posits that pictures are worth a thousand words, or at least a
thousand data access points. It would not be monetarily feasible or even
functionally desirable to have librarians or other professional cataloguers
give this much attention to single works of art. This is the stuff of
humanities scholarship not indexing techniques.
These distinctions and desires for richer indexing techniques are, perhaps
unknowingly, building on work done by scholars like Helen Tibbo (1994)
, who discusses the complexities and difficulties inherent in indexing
materials for the humanities scholar; and Joseph Busch (Busch,
1994) , who developed a robust data model for a historical retrieval
system. A very brief and selective list of image-based digital libraries
that would meet Lynch and Besser’s requirements, as well as (perhaps)
meeting Tibbo and Busch’s expectations, includes: the William
Blake Archive (William Blake Archive
editors and staff, 2000) , the Rossetti
Archive (McGann, 1993) , the
Dickinson Electronic Archives (Smith, 1995)
, Documenting the American South
(University of North Carolina at Chapel Hill, 2002)
and the Electronic
Beowulf Project (Kiernan, 1991) . It
should be noted that these systems [excluding DocSouth and Beowulf]
were devised and developed initially by people who knew very little about
image indexing trends or document retrieval theories. Specifically, the
users themselves developed these systems. A profitable area of future
study, therefore, might be to look at the usage styles and retrieval rates
for specialized and naïve users of these types of systems, and how developers
of automatic and manual indexing schemes can use these user-centered systems
as a template for future progress.
Our difficulties in this field should not cause dismay. The tension between
images and written descriptions of those images has been a field of study
since antiquity. In fact, ekphrasis, a Latin term defined as "a
vivid description intended to bring the subject before the mind's eye
of the listener," (Hornblower &
A. Spawforth, 1996) was one of the most advanced rhetorical skills
in the classical period. Giorgio Vasari, commonly recognized as the world's
first art historian, might also be defined as the world's first image
cataloger. Vasari used the ekphratic tradition not only as a springboard
for philosophical discourse, but also as a means of describing the elements
and themes of a painting, making artist attributions, and defining styles
and movements within his field of expertise, the Italian Renaissance.
Perhaps what we need is a more literary tradition of image retrieval –
an ekphratic tradition of image cataloging. Cataloging keywords and phrases,
blob relationships within an image, and pixel level indexing, although
obviously difficult problems, are straightforward compared to “vividly
bringing the image to the mind’s eye of the user.” If the Information
Science community wants to meet the needs of humanities scholars and even
the general public – if we want users to access the resources we’ve spent
many years and dollars building and amassing, we will need to both change
our idea of what an operable digital library is, and how users will want
to interact with it.
Baca, M. & Harpring, P. (2000). Categories
for the description of works of art. J. Paul Getty Trust.
Baca, M. (1998). Introduction to metadata: pathways
to digital information. Getty Information Institute.
Baca, M., Gilliland-Swetland,
A., Harpring, P. &., & Woodley, M. (2000). Metadata Standards
Crosswalk. Getty Information Institute.
Besser, H. (2002). The next stage: Moving from
digital collections to interoperable digital libraries. First Monday,
7(6), 1-20.
Brilliant, R. (1988).
How an art historian connects art objects and information. Library Trends,
37(2), 120-129.
Busch, J. A. (1994). Thinking ambiguously: Organizing
source materials for historical research. in Challenges in indexing electronic
text and images, (pp. 23-55). Medford, NJ: Learned Information, Inc.
Carson, C., Belongie, S., Greenspan, H. &.,
& Malik, J. (1999). Blobworld: Image segmentation using Expectation-Maximization
and its application to image querying. Department of Electrical Engineering
and Computer Science. Available: http://elib.cs.berkeley.edu/carson/papers/pami.html
Carson, C., Thomas, M., Belongie, S., Hellserstein,
J. M. &., & Malik, J. (2000). Bobworld: A system for region-based
image indexing and retreival. Third International Conference on Visual
Information Systems Amsterdam: University of Amsterdam.
Chang, S., Smith, J. R., Beigi, M. &.,
& Benitez, A. (1997). Visual information retrieval from large distributed
online repositories . Communications of the AC M, 40(12), 63-71.
Chen, H. (2001). An analysis of image queries in
the field of art history. Journal of the American Society for Information
Science and Technology, 52(3), 260-273.
Conniss, L. R., Ashford, A. J.
&., & Graham, M. E. (2000). Information seeking behavior in image
retrieval: VISOR I Final report. Newcastle: Institute for Image Data Research.
Dublin Core. (1999). Dublin Core metadata element
set, version 1.1. Available: http://dublincore.org/documents/dces/
Enser, P. G. B. & McGregor, C. G.
(1993). Analysis of visual information retrieval queries. London: British
Library.
Faloutsos, C., Barber, R., Flickner, M.,
Hafner, J., Niblack, W., Petkovic, D. &., & Equitz, M. (1994).
Efficient and effective querying by image content. Journal of Intelligent
Information Systems, 3, 231-262.
Flickner, M., Sawhney, H., Niblack, W.,
Ashley, J., Huang, Q., Dom, B., Gorkani, M., Hafner, J., Lee, D., Petkovic,
D., Steele, D. &., & Yanker, P. (1995). Query by image and video
content: The QBIC system. IEEE Computer, 28, 23-32.
Goodrum, A. &., & Spink, A. (1999).
Visual Information seeking: A study of image queries on the World Wide
Web. Proceedings the 1999 Annual Meeting of the American Society for Information
Science.
Goodrum, A., Rorvig, M. E., Jeong, K.-T.
&., & Suresh, C. (2001). An open source agenda for research linking
text and image content features. Journal of the American Society for Information
Science and Technology, 52(11), 948-953.
Gordon, A. S. (2001). Browsing image collections
with representations of common-sense activities. Journal of the American
Society for Information Science and Technology, 52(11), 925-929.
Greenberg, J. (1993). Intellectual control
of visual archives: A comparison between the art & architecture thesaurus
and the Library of Congress thesaurus for graphic materials. Cataloging
and Classification Quarterly, 16(1), 85-117.
Greenberg, J. (2001). A quantitative categorical
analysis of metadata elements in image-applicable metadata schemas. Journal
of the American Society for Information Science and Technology, 52(11),
917-924.
Gupta, A., Santini, S. &., &
Jain, R. (1997). In search of information in visual media. Communications
of the ACM, 40(12), 35-42.
Hastings, S. K. (1995a). An exploratory study
of intellectual access to digitized art images. 16th National Online Meeting
proceedings, pp. 177-185
Hastings, S. K. (1995b). Query categories in
a study of intellectual access to digitized art images. Proceedings of
the 58th Annual Meeting of the American Society for Information Science
(pp.3-8). Medford, NJ: ASIS.
Heery, R., & Patel, M. (2000). Application
profiles: mixing and matching metadata schemas. Ariadne, September(25).
Hillman, D. (2001). Using Dublin Core. Dublin
Core Metadata Initiative.
Hornblower, S. &., & A.
Spawforth, e. (1996). The Oxford classical dictionary. Oxford, New York:
Oxford University Press.
Hourihane, C. (1999). Subject classification
for visual collections (Visual Resources Association Special Bulletin
No. 12). Columbus, OH: Visual Resources Association.
Jaimes, A. &., & Chang, S.-F. (2000).
A conceptual framework for indexing visual information at multiple levels.
IS&T/SPIE Internet Imaging, 3964.
Jörgensen, C. (1996). Testing an image description
template. Proceedings of the 59th Annual meeting of the American Society
for Information Science (ASIS '96) (pp. 209-213). Medford, NJ: Information
Today.
Jörgensen, C. (1999). Retrieving the Unretrievable:
Art, aesthetics, and emotion in image retrieval systems. Proceedings SPIE
(vol.3644 Human vision and electronic imaging IV) (pp. 348-355). San Jose,
CA: International Society for Optical Engineering.
Jörgensen, C. (1995). Classifying Images: Criteria
for grouping as revealed in a sorting task. Advances in classification
6 (proceedings of the 6th ASIS SIG/CR Classification Research Workshop)
Medford, NJ: Information Today.
Jörgensen, C. (1998). Attributes of images
in describing tasks. Information Processing & Management, 34(2/3),
161-174.
Jörgensen, C. (2000). Image indexing: an analysis
of selected classification systems in relation to image attributes named
by naive users. Final Report to the Office of Sponsored Research, Online
Computer Library Center.
Jörgensen, C., Jaimes, A., Benitez, A.
B. &., & Chang, S. Fu. (2001). A conceptual framework and empirical
research for classifying visual descriptors. Journal of the American Society
for Information Science and Technology, 52(11), 938-947.
Jörgensen, C. (2001). Introduction and Overview.
Journal of the American Society for Information Science and Technology,
52(11), 906-910.
Keister, L. H. (1994). User types and queries:
impact on image access systems. in Challenges in indexing electronic text
and images . Medford, NJ: Learned Information, Inc.
Kiernan, K. S. (1991). Digital image processing
and the Beowulf manuscript. Literary and Linguistic Computing, 6(Special
issue on Computers and Medieval Studies (Edited by Marilyn Deegan with
Andrew Armour and Mark Infusino)), 20-27.
Layne, S. S. (1994). Some Issues in the Indexing
of Images. Journal of the American Society of Information Science, 45,
583-588.
Li, B. &., & Ma, S. D. On the relation
between region and contour representation. Proceedings of the IEEE International
Conference on Pattern Recognition .
Library of Congress Prints and Photographs Division.
(2000). Thesaurus for Graphic Materials I: Subject Terms.
Lynch, C. (2002). Digital collections, digital
libraries and the digitization of cultural heritage information. First
Monday, 7(5), 1-17.
Ma, W. &., & Manjanath, B. (1998).
Netra: A toolbox for navigating large image databases. (pp. 568-571).
Marchionini, G. (1995). Information seeking
in electronic environments. Cambridge: University Press of Cambridge.
Markey, K. (1984). Interindexer consistency tests
: A literature review report of a test of consistency in indexing visual
materials. Library & Information Science Research, 6(2), 155-177.
McGann, J. J. (1993). The Complete Writings and
Pictures of Dante Gabriel Rossetti: A Hypermedia Research Archive.
MPEG-7 Requirements Group. (1999). Context, objectives
and technical roadmap. (Report No. ISO/IEC JTC1/SC29/WG11 MPEG99/N2861).
Vancouver, Canada.:
Peterson, T. (1990). Developing a new thesaurus
for art and architecture. Library Trends, 38(4), 644-658.
Roberts, H. E. (2001). A picture is worth a thousand
words: art indexing in electronic databases. Journal of the American Society
for Information Science and Technology, 52(11), 911-916.
Rui, Y., Huang, T. S. &., & Chang,
S.-F. (1999). Image retrieval: Current techniques, promising directions
and open issues. Journal of Visual Communication and Image Representation,
10, 1-23.
Smith, J. R. &., & Chang, S.-F. (1996).
VisualSEEk: A fully automated content-based image query system. Proceedings
of the ACM International Conference on Multimedia (ACMMM) .
Smith, J. R. (2001). Quantitative assessment of
image retrieval effectiveness. Journal of the American Society for Information
Science and Technology, 52(11), 969-979.
Smith, J. R. &., & Chang, S.-F.(1994)
Transofrm features for texture classification and discrimination in large
image databases. Proceedings IEEE International conference on Image Processing
.
Smith, J. R. &., & Chang, S.-F. Tools
and techniques for color image retrieval . Storage & Retrieval for
Image and Video Databases: Vol. IV. IS&T / SPIE Proceedings 1995.
Smith, M. N. (1995). The importance of a hypermedia
archive of Dickinson's creative work. The Emily Dickinson Journal, IV(1).
Swain, M. &., & Ballard, D. (1991).
Color indexing. International Journal of Computer Vision, 7(1), 11-32.
Tamura, H., Mori, S. &., &
Yamawaki, T. (1978). Texture features corresponding to visual perception.
IEEE Transactions on Systems, Management, & Cyb. SMC, 8(6), 460-473.
Tibbo, H. (1994). Indexing for the humanities.
Journal of the American Society for Information Science, 45(8), 607-619.
University of North Carolina at Chapel Hill. (2002).
Documenting the American South. Natasha Smith.
VRA Data Standards Committee. (2002).
Wilensky, R. (2000).
Research Accomplishments : Content Analysis for Access. Berkeley, CA:
University of California, Berkeley.
William Blake Archive editors and staff.
(2000). The persistence of vision: Images and imaging at the William Blake
archive. RLG DigiNews, 4(1).
Yang, S. C. (1997). Qualitative exploration of learners'
information-seeking processes using Perseus hypermedia system. Journal
of the American Society for Information Science, 48(7), 667-669.
|