Subject Tagging: Recommendations for Dryad Curators and Scientists

by Priscilla Jane Smith

1. Problem Specification

2. The value of quality descriptive metadata 3. Controlled terms 4. Uncontrolled terms 5. Recommendations 6. Further resources for authors

7. Bibliography

PROBLEM SPECIFICATION

The Dryad project is an online repository for the data which underlie publications in the biosciences. Submissions to Dryad consist of the data that is used to create scientific publications, for example phylogenetic trees, tables, spreadsheets, images, maps, gene alignments, matrices, and the like. When authors submit their data to Dryad, they have the opportunity to enter topical headings/subject headings that will be used to categorize and retrieve their data in the future.

There are currently four types of topical metadata that Dryad accepts: subject, temporal, spatial, and taxonomic. Because these headings are not controlled in any way, scientists often do not submit any headings (leave the fields blank) or submit them in ways that might not benefit future users of Dryad (poor formatting, abbreviations, etc.). The data is much more accessible to users of Dryad - other scientists, the public, Dryad curators - if the data is described in a meaningful and thorough way.

Studying this topic has allowed me to gain a great understanding of how scientists tag their data. Do they come up with their own topical terms, do they consult controlled vocabularies, or do they use some other source for these terms? The goal of this project will be to write a memo or guide which will help Dryad librarians guide scientists to describe their data in the best way possible by submitting meaningful subject headings/topical headings with their data. A summary of the benefits of scientific data archival is best summed up in Whitlock’s 2009 article Data archiving in ecology and evolution: best practices:

“Data archives serve science in a variety of ways. Publicly archived data enable more transparent science, with better error checking and verification of results. Archiving also enables data to be re-used for broader meta-analyses and to address new questions. Available data can serve a powerful educational role, both in teaching the statistical and technical aspects of research and to engage students in the process of science. Public data archiving is also a powerful mechanism for data security, providing a mechanism by which data can be saved and re-accessed by the original authors and others even after hard disk failure or other catastrophes.”

THE VALUE OF QUALITY DESCRIPTIVE METADATA

METRICS FOR METADATA QUALITY

Metadata are defined as the data providing information about aspects of the data. This may include the means of the creation of data, the purpose of the data, the time and date of the creation, the name of the creator or author of the data, the location where the data was created, or descriptions of the data’s about-ness.

There has been much research on criteria which can and should be used to measure the quality of this metadata. Rotherberg (1996) identifies correctness and appropriateness as two main criteria for data evaluation. In their 1997 article The Role of Content Analysis in Evaluating Metadata for the US Government Information Locator Service (GILS): Results from an exploratory study, Moen et al. describe 23 evaluation criteria with which to analyze metadata quality. From these 23 criteria, the researchers distilled four main criteria: accuracy, consistency, completeness, and currency. These four criteria partially overlap with Tozer’s (1999) data quality measures of accuracy, consistency, completeness, timeliness, and intelligibility. Bruce and Hillman (2004) refine the previously mentioned criteria and modify them for the library community, suggesting completeness, accuracy, provenance, conformance to expectation, logical consistency, coherence, timeliness, and accessibility. In her 2009 article Metadata Quality in Digital Repositories: A Survey of the Current State of the Art, Park evaluates Moen et al. (1997) and Bruce and Hillman’s (2004) criteria in addition to the criteria determined by seven other research teams, and determines that accuracy, completeness, and consistency are the most commonly used criteria in measuring metadata quality.

Accuracy (or correctness) of metadata indicates the accurate description and input of data. According to Park (2009), this is made up of three elements: the accurateness of the content of the data element, the correctness of the intellectual property, and the correctness of the instantiation (or particular instance). Park includes errors in spelling, date format, capitalization, punctuation as well as non-authoritative forms of terms, typographical errors and incorrect data values as potential problems with the accuracy of metadata.

According to Park, completeness can be measured by full access capacity to individual resources and connection to the collections in which they are housed. Completeness points to resource discovery as the functional purpose of metadata, but does not necessarily indicate that all elements in a particular metadata scheme must be used. Because completeness is directly affected by policies, best practices, and application profiles for specific domains, the completeness of a metadata records may vary depending on the environment in which they are housed. Completeness can be achieved in a metadata record if the given resource type, its relation to the local collection and the local metadata guidelines are met satisfactorily.

Consistency (or comparability) can be measured by examining the values of the metadata and the format of the metadata. Park states that metadata values must be examined on the conceptual level by measuring the degree to which the same data values are used for delivering similar concepts in the description of a resource, and the data format must be examined on a structural level by measuring the extent to which the same structure or format is used for presenting similar attributes of a resource. For example, differences between the encoding of a date element (e.g., MM-DD-YYYY versus DD-MM-YY) are structurally inconsistent, and may cause problems for future users of the data.

HUMAN METADATA GENERATION

According to Greenberg and Robertson (2002), human metadata generation takes place when an individual is responsible for the identification and assignment or recording of resource materials. This type of metadata generation may take place in a number of ways by different types of individuals. The three types of individuals that have been identified in the literature are professional metadata creators, resource authors, and social taggers. Professional metadata creators may be catalogers, indexers or curators who have formal training and are proficient in the use of descriptive standards (Greenberg et al. 2002). According to Lu et al. (2002), social taggers apply their own descriptors to sources that interest them. Resource authors, like the submitters of data to Dryad, are the individuals responsible for the creation of the intellectual content of a work. Most importantly, they are “intimate with their creations and have knowledge of unrecorded information for producing descriptive metadata,” allowing them to have a unique ability to describe their data with the highest accuracy (Greenberg and Robertson 2002).

The fact that resource authors may lack the knowledge of indexing and cataloging principles that professional metadata creators possess is well documented (Greenberg et al. 2003). However, attitudes toward resource authors as metadata creators are somewhat split in the literature. Wilson, in her 2007 article Toward Releasing the Metadata Bottleneck states that resource authors “seldom provide sufficient metadata for their digital resources” and Greenberg et al. (2003) state that authors have “reported confusion or uncertainty regarding specific fields and have requested greater assistance in determining appropriate inputs, especially for subject fields.” However, Greenberg et al. also report that resource authors state a desire for better understanding of the metadata record and its purpose. Currier et al. (2003) discuss the debate between allowing metadata professionals or resource authors to create metadata for resources:

“How may this difficult and complex task best be carried out for maximum resource discoverability by a heterogeneous population of searchers? Should the resource author, who may know their subject area and its terminology well, create the subject metadata? Or should it be a metadata specialist, who may know the specific area less well, but may be better placed to step back and think about all the potential users of a resource, and about consistency of key words and classifications across a repository or network?”

MODELS FOR CREATING METADATA

Currier et al. (2003) recommend three models for metadata creation: creation by a resource author only, creation by a metadata specialist only, or creation by collaboration between a resource author and a metadata specialist. A data collection center (like Dryad) would be well advised to ensure that their system supports user support and training if they are to rely on metadata creation by resource authors alone. Conversely, metadata specialists, who already possess the skills needed to create a quality metadata record may lack the knowledge about the context, history os subject area of the resource in order to best record its metadata. Currently, Dryad uses a semi-collaborative approach to metadata creation. Although resource authors do not consult with the curators, or metadata specialists while they are submitting data, the curators spend time checking the author-created metadata for quality issues. Greenberg and Robertson (2002) recommend this model, stating that “…the integration of expert and author generated descriptive metadata can advance and improve the quality of metadata for web content, which in turn could provide useful data for intelligent web agents, ultimately supporting the development of the Semantic Web. […] If such partnerships are well planned and evaluated, they could make a significant contribution to achieving the Semantic Web.” Along with the three models for metadata creation that Currier et al. (2003) present, social tagging and the rise of folksonomies should be mentioned as a fourth model. Folksonomies will be discussed in a later section of this paper.

CONTROLLED TERMS

WHAT IS A CONTROLLED VOCABULARY?

A controlled vocabulary allows for organization of some content, or knowledge, in a way in which it can be easily retrieved at a later time. Vocabularies are 'controlled' in that they make use of authorized descriptions of the content they contain. These groupings of concepts are carefully selected and described so that the information they contain can be retrieved in the most efficient ways possible.

THE VALUE OF CONTROLLED TERMS

Natural language, or the way that humans speak in everyday life, is messy. We use multiple terms and phrases to describe the same things, and there are fine (or grey) lines between one meaning and another. The organization, categorization, and labeling of our knowledge can be achieved by way of controlled vocabularies. A controlled vocabulary allows for all concepts to be consistently labeled using language that is unambiguous and is familiar to its users. More importantly, controlled vocabularies allow us to search for concepts and achieve successful, quality results.

CONTROLLED VOCABULARIES USEFUL IN DESCRIBING DRYAD DATA

Medical Subject Headings (MeSH) is the controlled vocabulary of the United States National Library of Medicine. Currently consisting of more than 177,000 terms situated in a twelve-level hierarchy, MeSH allows for the indexing of articles from biomedical journals for the MEDLINE/PubMED database. MeSH is comprised of three main types of terms: descriptors (main headings), qualifiers (subheadings), and supplementary concept records (SCRs). Descriptors indicate the subject of citations indexed in MEDLINE/PubMED, and the 83 existing topical qualifiers allow for the grouping together of citations concerned with a particular aspect of a subject. SCRs index chemicals and drugs and are searchable by substance name in PubMED.

The Getty Thesaurus of Geographic Names (TGN) is a controlled vocabulary provided by the Getty Vocabulary Program of the J. Paul Getty Trust. TGN currently includes approximately 1,106,000 hierarchically arranged terms that describe names and associated information about places, including current and historical physical features and political entities. Each term entry includes a unique identification number, known as a subject ID, text description about the place, geographical coordinates, associated place-names, dates referring to the usage of those names, position of the entry in the TGN hierarchy, information about related places, information about the type of place described in the entry, and information about the data source.

Developed by the White House Subcommittee on Biodiversity and Ecosystem Dynamics, the Integrated Taxonomic Information System (ITIS) is a controlled vocabulary for describing ecosystem management and biodiversity conservation. Information about each species includes an authoritative scientific title, a taxonomic rank and serial number, associated synonyms and vernacular names, information about the data source, and data quality indicators.

The BIOSIS Controlled Vocabulary contains multiple lists of terms used in the BIOSIS Previews and Biological Abstracts databases. The vocabulary is organized into several categories including concepts, organism classifiers, and geopolitical locations, among others. The vocabulary includes 168 “major concepts” and 562 “concept codes” used for subject or topical indexing; 77 “organism classifiers” and 957 “super taxa” used for taxonomic data; and 316 geopolitical locations.

The National Biological Information Infrastructure (NBII) was a program coordinated by the United States Geological Survey's Biological Informatics Program Office . Its purpose was to facilitate access to data and information on the biological resources of the United States, utilizing government agencies, academic institutions, non-government organizations, and private industry. The NBII Biocomplexity Thesaurus, and online thesaurus of scientifically reviewed biological terms, was initially created through a merger of several individual thesauri, including the CSA Aquatic Sciences and Fisheries Thesaurus, the Cambridge Scientific Abstracts (CSA) Life Sciences Thesaurus, the CSA Pollution Thesaurus, the CSA Sociological Thesaurus, the CERES/NBII Thesaurus, and the CSA Ecotourism Thesaurus. The thesaurus includes over 15,000 terms on subjects such as aquatic sciences, life sciences, social sciences, ecotourism, and pollution. Development and web hosting of the NBII was terminated 15 January 2012.

The Library of Congress Subject Headings (LCSH) is a controlled vocabulary for use in subject cataloging and indexing. First published in 1898, LCSH was designed for and is maintained by the Library of Congress, but the system has been adopted by many other libraries. LCSH covers all subjects generally. Subject headings can consist of single words or phrases and are divided into two types: main headings and subheadings. LCSH uses four categories of subdivisions to further distinguish main heading topics: form subdivisions, geographical subdivisions, chronological subdivisions, and topical subdivisions.

AGROVOC is a multilingual controlled vocabulary covering all areas of interest to the Food and Agricultural Organization of the United Nations (FAO), including food, nutrition, agriculture, fisheries, forestry, and the environment. AGROVOC contains over 30,000 concepts organized in a hierarchy, and concepts may have labels in up to 22 languages.

Wilson and Reeder’s Mammal Species of the World is an online database of mammalian taxonomy. Use of the Mammal Species of the World, through search or taxonomic browsing, allows users to verify recognized scientific names and conduct taxonomic research.

CAB Thesaurus is the research tool for users of the CAB ABSTRACTS™ and Global Health databases. The thesaurus includes over 200,000 terms broad and covers topics in the applied life sciences, technology and social sciences.

The GeoRef Thesaurus contains 23,065 valid and 7,740 invalid terms, of which about 1780 are newly added. The Thesaurus is a guide to the index terms used in GeoRef, a database consisting of bibliographic citations and abstracts covering the field of geology and its allied environmental sciences. For each term, the Thesaurus includes hierarchical and other relationships, usage notes, dates of addition, indexing rules, geographic coordinates, and guidelines for searching. Cross-references from invalid to valid terms are included.

The National Agricultural Library’s NAL Agricultural Thesaurus includes terminology which supports biological, physical and social sciences. Biological nomenclature comprises a majority of the terms in the thesaurus and is located in the “Taxonomic Classification of Organisms” Subject Category. Political geography is also included, and is mainly described at the country level.

uBIO is an initiative within the science library community to join international efforts to create and utilize a comprehensive and collaborative catalog of known names of all living (and once-living) organisms. uBio’s Taxonomic Name Server (TNS) catalogs names and classifications to enable tools that can help users find information on living things using any of the names that may be related to an organism.

UNCONTROLLED TERMS

Uncontrolled terms, or tags, are taken directly from natural language. Because they do not possess the same characteristics of controlled vocabulary terms, they pose many advantages – and disadvantages – over the use of controlled vocabulary terms in the submission of data to Dryad.

THE VALUE OF UNCONTROLLED TERMS

According to Noruzi (2006), uncontrolled terms, or tags, are words or phrases users attach to resources that may help in later retrieval of that resource. These tags have no fixed categories, syntaxes, or standards. However, the fact that no time was taken to develop standards or categorizations for these tags means that there is little overhead in the efficiency of their creation. They are created precisely at the point of submission, and potentially cost little time and effort on the part of the user, be it a resource author or a social tagger. Lu et al. (2010) present several compelling advantages to the use of tags. First, they state that tags may help to bridge the gap between professional and public discourse by providing a source of terms not included in controlled vocabularies. Second, they mention that tags not only allow users to search resources in their own language, but also provide a window for the libraries to understand and learn more about user information needs and interests. Third, their findings show that social taggers may help enhance subject access to collections by describing resources with terms different from those used by experts (this final sentiment is echoed in Rolla 2009).

Along with these many advantages, uncontrolled terms also pose several important and challenging disadvantages. Because tags are not controlled in any way, a certain individuals’ tags may conflict with another individuals’ tags. These conflicts may manifest themselves as polysemy (words that have several meanings), synonymy (different words with similar or identical meanings), plurality (inconsistencies in the use of plurals), or granularity (inconsistencies in the depth or specificity of tags). Any of these problems may lead to low precision in searching.

FOLKSONOMIES

A portmanteau of the words folks and taxonomy, folksonomy is an internet-based information retrieval methodology consisting of collaboratively generated, open-ended labels that categorize content such as web resource, online photographs, and web links (Noruzi 2009). Folksonomies are created by social taggers, not information professionals, and these taggers assign one or more tags to each resource for their own individual use which is then shared through a community.

Much research has been done to study the use of user tags versus the use of controlled vocabulary terms. Lu et al. (2010) found that only a fraction of tag vocabulary terms overlap with LCSH terms, and even those overlapping terms might be used by social taggers and information professionals in different ways. Rolla (2009) reports that users of the LibraryThing social cataloging web application assign tags that range in depth from general to specific, whereas LCSH terms assigned to corresponding bibliographic records are more general in nature. In addition, cataloger-assigned LCSH terms in approximately 55% of bibliographic records brought out topics or concepts that LibraryThing tags did not, and approximately 75% of the time catalogers and taggers agreed on at least a portion of what a book is ‘about’. In conclusion, the Library of Congress Working Group on the Future of Bibliographic Control reports in 2008 that “allowing user-supplied data in online catalogs will make the catalogs more relevant to users accustomed to the internet and also will improve access to the materials in the library collection.”

RECOMMENDATIONS

HOW TO USE THE DESCRIPTIVE METADATA FIELDS

In recommending best practices for providing terms for the descriptive metadata fields in Dryad, the author would urge authors to consider the accuracy, consistency, and completeness of the chosen terms first before submission. In addition, the use controlled vocabulary terms are suggested, but not required. The author is recommended to weigh the benefits and disadvantages of submitting their own tags versus authorized controlled vocabulary terms. The following sections highlight the four individual keyword fields, and give specific instructions on how to best provide terms.

SUBJECT KEYWORDS

Screenshot of Dryad subject keyword submission field

Submitters of data to Dryad are required to include at least one subject keyword with their data submission. Within the Dryad submission system this field is repeatable, meaning that a submitter may include as many subject keywords as they choose. The submission of multiple subject keywords may be achieved by separating individual keywords by a comma. Using semicolons, dashes, periods, or any other types of punctuation will result in a list of keywords concatenated into single keyword. For example, if the two subject keywords Facial structure and Testosterone are entered into the subject field as [Facial Structure; Testosterone] the two will be recognized in Dryad’s system as a single entity and will be shown as one subject keyword “Facial structure; Testosterone”, not as two separate subject keywords “Facial structure” and “Testosterone.”

According to Dryad’s metadata schema, the subject keyword field is associated with the Dublin Core metadata term Subject. According to the Dublin Core Metadata Initiative (DCMI), a subject “will be represented using keywords, key phrases, or classification codes.” DCMI also recommends the use of controlled vocabulary terms for use in the Subject field. Because Dryad does not support the use of integrated controlled vocabularies, data submitters may either create their own subject keywords or they may draw from any controlled vocabulary they choose.

RECOMMENDED CONTROLLED VOCABULARIES

The following controlled vocabularies are recommended for reference in adding subject keywords to Dryad data submissions:

TEMPORAL KEYWORDS

Screenshot of Dryad temporal keyword submission field

Submitters of data to Dryad are not required to include temporal keywords with their data submission, but are urged to do so if the field is applicable to the nature of the data. This field in the submission system is also repeatable, meaning that a submitter may include as many temporal keywords as they choose. Again, the submission of multiple temporal keywords may be achieved by separating individual keywords by a comma, and using semicolons, dashes, periods, or any other types of punctuation will result in a list of keywords concatenated into single keyword.

According to Dryad’s metadata schema, the temporal keyword field is associated with the Dublin Core metadata term Temporal. According to the DCMI, a temporal keyword should be used to describe “temporal characteristics of the resource.”

RECOMMENDED CONTROLLED VOCABULARIES

The following controlled vocabularies are recommended for reference in adding temporal keywords to Dryad data submissions:

SPATIAL KEYWORDS

Screenshot of Dryad spatial keyword submission field

Submitters of data to Dryad are not required to include spatial keywords with their data submission, but are urged to do so if the field is applicable to the nature of the data. This field in the submission system is also repeatable, meaning that a submitter may include as many spatial keywords as they choose. Again, the submission of multiple spatial keywords may be achieved by separating individual keywords by a comma, and using semicolons, dashes, periods, or any other types of punctuation will result in a list of keywords concatenated into single keyword. Submitters of data to Dryad should note that locations with multi-part names, such as Los Angeles, California, will be automatically split into two terms.

According to Dryad’s metadata schema, the spatial keyword field is associated with the Dublin Core metadata term Spatial. According to the DCMI, a spatial keyword should be used to describe “spatial description of the dataset specified by a geographic description and geographic coordinates.” The instructions given for entering data into this field in the Dryad submission system indicate that “locations may include names of cities, regions, or coordinates.” Like the DCMI, the Dryad curation team recommends using terms from standard taxonomies of controlled vocabularies for use in the spatial keyword field.

RECOMMENDED CONTROLLED VOCABULARIES

The following controlled vocabularies are recommended for reference in adding spatial keywords to Dryad data submissions:

TAXONOMIC KEYWORDS

Screenshot of Dryad taxonomic keyword submission field

Submitters of data to Dryad are not required to include taxonomic keywords with their data submission, but are urged to do so if the field is applicable to the nature of the data. This field in the submission system is also repeatable, meaning that a submitter may include as many taxonomic keywords as they choose. Again, the submission of multiple taxonomic keywords may be achieved by separating individual keywords by a comma, and using semicolons, dashes, periods, or any other types of punctuation will result in a list of keywords concatenated into single keyword.

According to Dryad’s metadata schema, the taxonomic keyword field is associated with the Darwin Core metadata term Specific Epithet. According to the Biodiversity Information Standards (TDWG), a taxonomic keyword should be used to describe “The specific epithet of the scientific name applied to the organism.” The instructions given for entering data into this field in the Dryad submission system indicate that taxonomic keywords should be used to describe “the full name of the lowest level taxon to which the organism has been identified in the most recent accepted determination, specified as precisely as possible.” Like the DCMI, the Dryad curation team recommends using terms from standard taxonomies of controlled vocabularies for use in the spatial keyword field. The following quotation, taken from the Borer et al. (2009) article Some Simple Guidelines for Effective Data Management, discusses the problems associated with taxonomic keywords and suggested solutions.

“Over time, the names of taxa often are changed as their evolutionary relationships are clarified. The same taxonomic name can actually refer to two or more different concepts of a species. However, scientific names in ecological data are fixed as originally recorded, and so it is critical for long-term preservation to document which taxonomic descriptions were intended by each taxon name used in a data set. This becomes particularly important when comparing species information from data collected at different times, as the names used in the data sets can be ambiguous, which affects calculations of diversity and richness, among other issues. The best way to clarify a taxonomic name is to document the taxonomic authority you are using for the name. For example, Homo sapiens Linn. clarifies that the authority for this binomial is Linnaeus. Unfortunately, there are several formats to choose from for specifying taxonomic authority information, but any reference information is better than none.”

RECOMMENDED CONTROLLED VOCABULARIES

The following controlled vocabularies are recommended for reference in adding taxonomic keywords to Dryad data submissions:

RESOURCES FOR SCIENTISTS

The following resources are intended to assist authors in further assistance with the use of controlled and uncontrolled terms in Dryad.

BIBLIOGRAPHY

American Geosciences Institute. (2013). GeoRef Thesaurus Lists.

Babinec, M. and Mercer, H. (2009) Introduction: Metadata and digital repositories. Cataloging & Classification Quarterly. 47: 209-212.

Bates, Marcia J. Encyclopedia of Library and Information Sciences. 3rd ed, eds M.J. Bates and M.N. Maack. Boca Raton, FL: CRC Press, 2010. Print.

Borer, E.T., Seabloom, E.W., Jones, M.B., Schildhauer M. (2009) Some Simple Guidelines for Effective Data Management. Bulletin of the Ecological Society of America 90(2): 205-214.

Bruce, T.R. and Hillman, D. (2004). The Continuum of Metadata Quality: Defining, Expressing, Exploiting. In Metadata in Practice, eds. D. Hillman and E.L. Westbrooks (Chicago: American Library Association).

CAB International. (2013). CAB Thesaurus.

Carrier, Sarah W. (2008) The Dryad Repository Application Profile: Process, Development, and Refinement. A Master’s paper for the M.S. in I.S. degree.

Currier, Sarah and Barton, Jane. (2003) Quality Assurance for Digital Learning Object Repositories: How Should Metadata Be Created? Communities of Practice. ALT-C 2003 Research Proceedings.

Food and Agriculture Organization of the United Nations. (2013). AGROVOC.

Greenberg, Jane and Robertson, W. (2002) Davenport. Semantic Web construction: An inquiry of author’s views on collaborative metadata generation. Proc. Int. Conf. on Dublin Core and Metadata for e-Communities. 45-52.

Greenberg, Jane et al. (2002) Author-generated Dublin Core Metadata for Web Resources: A Baseline Study in an Organization. Journal of Digital Information 2(2).

Greenberg, Jane et al. (2003) Iterative Design of Metadata Creation Tools for Resource Authors. DC-2003--Seattle Proceedings.

ITIS Integrated Taxonomic Information System. (2013).

J. Paul Getty Trust . (2013). Getty Thesaurus of Geographic Names® Online. Retrieved from http://www.getty.edu/vow/TGNSearchPage.jsp

Library of Congress. (2013). Library of Congress Subject Headings.

Library of Congress Working Group on the Future of Bibliographic Control (2008) On the Record: Report of the Library of Congress Working Group on the Future of Bibliographic Control.

Lu, C., Park, J. and Hu, X. (2010) User tags versus expert-assigned subject terms: A comparison of LibraryThing tags and Library of Congress Subject Headings. Journal of Information Science 36(6): 763-779.

Marine Biological Laboratory. (2013). uBio Universal Biological Indexer and Organizer.

Moen, W.E., Stewart, E.L., and McClure, C.R. (1997) The Role of Content Analysis in Evaluating Metadata for the US Government Information Locator Service (GILS): results from an exploratory study.

National Center for Biotechnology Information, U.S. National Library of Medicine. (2013). MeSH.

Noruzi, Alireza. (2006) Folksonomies: (Un)Controlled Vocabulary? Knowledge Organization 33(4): 199-203.

Page, Roderic. (2006) Taxonomic names, metadata, and the Semantic Web. Biodiversity Informatics 3.

Park, Jung-Ran. (2009) Metadata quality in digital repositories: A survey of the current state of the art. Cataloging & Classification Quarterly 47: 213-228.

Rolla, Peter J.(2008) User tags versus subject headings: Can user-supplied data improve subject access to library collections? Library Resources & Technical Services 53(3): 174-184.

Rothenberg, J. (1996) Metadata to Support Data Quality and Longevity. 1st IEEE Metadata Conference, Silver Spring, Maryland.

Smithsonian Institution. (2013). Wilson & Reeder’s Mammal Species of the World.

Strader, C. Rockelle. (2009) Author-Assigned Keywords versus Library of Congress Subject Headings. Library Resources & Technical Services 53(4): 243-250.

Thompson Reuters. (2013). BIOSIS.

Tozer, G. (1999) Metadata Management for Information Control and Business Success. Boston: Artech House.

Trant, Jennifer. (2009) Studying Social Tagging and Folksonomy: A Review and Framework. Journal of Digital Information 10(1).

U.S. Department of Agriculture National Agricultural Library. (2013) Thesaurus: Browse by Subject Category.

Whitlock, M.C. (2011) Data archiving in ecology and evolution: best practices. Trends in Ecology & Evolution 26(2): 61–65.

Wilson, A.J. (2007) Toward Releasing the Metadata Bottleneck: A Baseline Evaluation of Contributor-supplied Metadata. Library Resources & Technical Services 51(1): 16-27.