SKOS: a beginner's guide

A guide to representing structured controlled vocabularies in the Simple Knowledge Organization System

by Priscilla Jane Smith


The purpose of this guide is to allow catalogers, librarians, and other information professionals to understand and use the Simple Knowledge Organization System, a W3C standard designed for the representation of controlled vocabularies on the Web.

  1. Introduction
  2. Foundational Standards and Context
  3. A Very Brief History of SKOS
  4. What is SKOS?
  5. The Elements of SKOS
  6. SKOS Integrity Conditions
  7. References

Introduction

Throughout history, classification systems have been widely used in the library community. Knowledge organization systems, or controlled structured vocabularies, are a growing area within the field of classification systems. Within a Web context, several formats have been proposed for representing controlled vocabularies, in a structured way, using the Web standards XML and RDF.

There is an important distinction to be made between controlled vocabularies which have been published to the Web and those which have been published in a structured way specifically for the Web. Natural-language vocabularies, such as simple subject heading lists, thesauri and back-of-the-book indexes are extremely useful for humans, but the amount of meaning that machines can derive from them is very limited. Linked Open Data and Linked Open Vocabularies are both Semantic Web technologies which further the work of publishing controlled vocabularies for the Web in a way that both humans and machines can recognize meaning. The creation of Linked Open Vocabularies allows for increased and more meaningful points of access and increased effectiveness in information retrieval.

So, what exactly is a controlled vocabulary? Simply put, a controlled vocabulary allows for organization of some content, or knowledge, in a way in which it can be easily retrieved at a later time. These vocabularies are 'controlled' in that they make use of authorized descriptions of the content they contain. These groupings of concepts are carefully selected and described so that the information they contain can be retrieved in the most efficient ways possible. Take, for example, this very small collection of terms called the veggie vocab:

veggie_vocab

The above group of terms about vegetables is listed in alphabetical order. At a glance, a human can recognize that there are foods and Latin names for species in the list. However, the same information can be reflected in a different way that makes it much easier for humans to understand the relationships between the different terms:

veggie_vocab

It is easy to see from this visualization that the vocabulary is about types of vegetables, Vegetables being the “top” term in the vocabulary. The terms Bean, Root, and Gourd are the three terms that are directly below Vegetables. These terms describe varieties of vegetables. Below this second level of terms is a group of terms of actual vegetables, like Potato and Pumpkin. So far, the relationships between terms have been hierarchical, or are broader or narrower in scope with relation to one another. The two types of hierarchical relationships in controlled vocabularies are broader term (BT) and narrower term (NT). For example, Root is a narrower term in relation to Vegetables, but a broader term in relation to Parsnip.

Another way to visualize the vocabulary is as a hierarchical report:

veggie_vocab

Here, we can see the levels of the terms and terms to which they are related. This figure also includes the term Veggie with the label UF. An equivalency relationship, as in use for (UF) or use (USE) allows the vocabulary to make connections between synonyms and near-synonyms. In this particular vocabulary, Vegetable is an authorized or preferred term, and Veggie is an unauthorized or non-preferred term for the same concept.

The final type of relationship that can be reflected in a vocabulary is an associative relationship. This type of relationship allows the vocabulary to make connections between related terms (RT), or terms that have neither hierarchical nor equivalency relationships. For example, Fruit might be a related term to Vegetable.

Foundational Standards and Context

SKOS, a data-sharing standard, was built upon several pre-existing Semantic Web standards for formal logic and structure. These technologies, like SKOS, provide ways of expressing meaning that are amenable to computation and ways of expressing meaning that complements and gives structure to information already existing on the Web. The terms and definitions in this section aim to provide context for how SKOS fits in to the wider Semantic Web vision.

XML:

The eXtensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human- and machine-readable. XML documents use markup tags to describe elements of a given type of content. The language is “extensible” because markup tags are not predefined and must be invented by some human author. The markup describing an element’s content may include attributes which aid in description of that content. For example:

				<?xml version="1.0" encoding="utf-8" ?> 
					<vegetable>
						<name_of_vegetable lang=“eng”>Carrot</name_of_vegetable>
						<name_of_vegetable lang=“lat”>Daucus carota</name_of_vegetable>
					</vegetable>
			

RDF:

The Resource Description Framework (RDF) is a metadata data model and method for conceptual description of information that provides a common syntax for the Web. Using various languages such as RDFS or OWL (see below), RDF can be implemented in Web resources by way of triples. A triple is a subject-predicate-object statement about a Web resources which helps to describe that resource. For example, “a carrot is a vegetable” is an RDF triple: a subject (“carrot”), a predicate (“is a”), and object (“vegetable”). RDF triples are unique because each component is associated with a unique URI. A collection of these RDF statements can form a powerful data model which can be used in many types of information organization and management. Take, for example, the triple

<http://www.veggievocab.com/vegetable/root/carrot> <http://purl.org/dc/elements/1.1/title> "Carrot" .

This triple statement means that the resource "http://www.veggievocab.com/vegetable/root/carrot" has a title of “Carrot.” The subject of the triple is a URI for a term in a vocabulary (the Veggie Vocab). The predicate of this triple is a URI for the Dublin Core metadata element “title.” The object of this triple is the string “Carrot.” This same triple might be expressed in RDF as:

				<rdf:RDF 
					xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
					xmlns:dc="http://purl.org/dc/elements/1.1/">
						<rdf:Description rdf:about="http://www.veggievocab.com/vegetable/root/carrot">
							<dc:title>Carrot</dc:title>
						</rdf:Description>
				</rdf:RDF>
			

RDFS:

The Resource Description Framework Schema (RDFS) is a formally defined knowledge representation language that provides a common data modeling language for data on the Web. It can also be thought of as a “semantic extension” to RDF. Information represented in RDFS is described through classes and properties of those classes.

OWL:

Like RDFS, the Web Ontology Language (OWL and OWL 2) is another, more expressive knowledge representation language that also provides a common data modeling language for data on the Web. While being fully compatible with RDFS, OWL also is able to augment the meaning of existing RDFS vocabularies. In addition, OWL includes several variants, or sublanguages, including OWL Lite, OWL DL, and OWL Full. OWL Full was designed to be compatible with RDFS, and it allows an ontology to extend the meaning of a given vocabulary.

Turtle:

The Terse RDF Triple Language (Turtle) is a textual syntax for RDF that allows RDF graphs to be written in compact and natural text format, with abbreviations for common usage patterns and datatypes.

SPARQL:

The SPARQL Protocol and RDF Query Language (SPARQL) is an RDF query language that provides a standard means for interacting with data on the Web. This language allows for the retrieval and manipulation of data stored in RDF format.

The application of any of these technologies over large bodies of information requires the construction of detailed maps of particular domains of knowledge and the accurate description of information resources on a large scale. Most of this work cannot be done automatically, and this is where SKOS comes in to play. Simply put, the SKOS data model is an OWL Full ontology and its data are expressed as RDF triples.

A Very Brief History of SKOS

The SKOS-Core 1.0 Guide was first introduced in 2001 by the World Wide Web Consortium (W3C) Semantic Web Deployment Working Group (SWDWG) in order to develop SKOS as a W3C standardized classification system. The W3C SWDWG currently maintains several pieces of documentation on SKOS which are freely available on the Web. First, the SKOS Reference document, which is currently at the final W3C recommendation or standard stage, defines SKOS. Second, the SKOS Primer document provides a guide for users of the system. Third, the SKOS Use Cases and Requirements document presents a list of representative use cases and a set of requirements derived from these use cases. The SWDWG also maintains an open mailing list and a wiki through which the public may contribute to the development of SKOS.

What is SKOS?

The Simple Knowledge Organization System is a common data model for knowledge organization systems such as thesauri, classification schemes, subject heading systems and taxonomies. Using SKOS, a knowledge organization system, or controlled vocabulary, can be expressed as machine-readable data and can then be exchanged between computer application software and published in a machine-readable format on the Web. According to the SKOS Reference, the aims of the system are:

SKOS was built upon RDF, and thus SKOS data are represented as RDF triples.

The Elements of SKOS

The SKOS Vocabulary:

Concepts:

Labels:

Relationships:

Semantic Relationships:

Mapping Properties:

Collections of Concepts:

Notes:

Concepts

skos:Concept (instance of owl:Class)

A SKOS concept is any ‘unit of thought’: an idea, an object, an event. These concepts are the building blocks of many knowledge organization systems. Because concepts are abstract ideas that exist in the mind, they are independent of the terms used to describe them. Let’s take carrot for example. The English word “carrot” which we use to describe the orange vegetable that rabbits like to eat is actually independent of the concept of a carrot. The idea of a concept and its descriptor (or label) being two separate entities is vital to the SKOS model. The SKOS ‘concept’ element allows vocabulary builders to describe and distinguish concepts and their descriptors (labels). A SKOS concept can be created in two steps:

  1. Create or reuse a URI to uniquely identify a concept
  2. Assert (make some statement) in RDF, using the property rdf:type, that the resource identified by this URI is of type skos:Concept.

Example:

<http://www.veggievocab.com/vegetable/root/carrot> rdf:type skos:Concept

Labels

In SKOS, a label is the element which is the descriptor of a concept. The three SKOS label elements are sub-properties of the RDFS element rdfs:label. The purpose of these three elements is to link a skos:Concept to an RDF plain literal, or character string.

skos:prefLabel (instance of owl:AnnotationProperty and sub-property of rdfs:label)

Preferred Label is a SKOS element that makes it possible to assign a preferred label to a concept. The two following examples show that the preferred label for the concept vegetable is the word “vegetable” in English and “légume” in French.

Example:

ex:vegetable rdf:type skos:Concept;
skos:prefLabel "Vegetable".


Example:

ex:vegetable rdf:type skos:Concept;
skos:prefLabel "Vegetable"@en;
skos:prefLabel "Légume"@fr.


For information retrieval and information organization purposes, no two concepts in the same KOS should be given the same preferred label for any given language tag.

skos:altLabel (instance of owl:AnnotationProperty and sub-property of rdfs:label)

Alternate Label makes it possible to assign an alternative label to a concept. This label allows multiple same-language descriptors for a concept to be stored. The following example shows that the preferred label for the concept fava_bean is the word “fava bean” and an alternate label is the word “broad bean.”

Example:

ex:fava_bean rdf:type skos:Concept;
skos:prefLabel "Fava bean"@en;
skos:altLabel "Broad bean"@en.


skos:hiddenLabel (instance of owl:AnnotationProperty and sub-property of rdfs:label)

Hidden label is a label for a resource that a KOS designer would like to be accessible to applications performing text-based indexing and search operations, but would not like to be visible otherwise. The following example shows that the preferred label for the concept “potato” is the word “potato,” and two hidden labels for the concept are “tater” and “spud.”

Example:

ex:potato rdf:type skos:Concept;
skos:prefLabel "Potato"@en;
skos:hiddenLabel "Tater"@en.
skos:hiddenLabel "Spud"@en.


Relationships

skos:broader and skos:narrower (instances of owl:ObjectProperty)

These two SKOS labels assert hierarchical relationships between concepts, that one concept is broader or narrower in meaning than another.

Example:

ex:root rdf:type skos:Concept;
skos:prefLabel "Root Vegetable"@en;
skos:narrower ex:sweet_potato.


Example:

ex:sweet_potato rdf:type skos:Concept;
skos:prefLabel "Sweet Potato"@en;
skos:broader ex:root.


It is important to note that SKOS anticipates hierarchical problems by not defining skos:broader and skos:narrower as generally transitive properties. In other words, the semantics of the system DO NOT support transitive inferences of this type: "if vegetables is broader than gourd and gourd is broader than pumpkin, then vegetables is assumed to be broader than pumpkin." It may seem logical and convenient for these types of properties to be automatically assigned to SKOS concepts. However, these types of properties may cause unexpected problems in the architecture of non-traditional or poorly designed vocabularies that are to be represented using SKOS, so the designers of the system made a conscious decision against the assumption of hierarchical relations between concepts being implied by the statement of other stated hierarchical relations. It is, however, possible to make this type of relationship by using skos:broaderTransitive and skos:narrowerTransitive, as discussed in the next section.

skos:related (instance of owl:ObjectProperty)

This SKOS label allows a designer to assert an associative relationship between two concepts.

Example:

ex:vegetable rdf:type skos:Concept;
skos:prefLabel "Vegetable"@en;
skos:related ex:fruit.

ex:fruit rdf:type skos:Concept;
skos:prefLabel "Fruit"@en.


In the SKOS data model, skos:related is not defined as a transitive property, and the transitive closure of skos:broader must be disjoint from skos:related. If the concepts Vegetable and Fruit are related via skos:related, there must not be a chain of skos:broader relationships from Vegetable to Fruit. In other words, concepts must not be related in both a hierarchical and an associative manner at the same time.

Semantic Relationships

skos:semanticRelation (instance of owl:ObjectProperty)

SKOS semantic relations are connections between SKOS concepts. This type of relation occurs when a link between two concepts is inherent in the meaning of the linked concepts. Each of the SKOS labels skos:broader, skos:narrower, skos:broaderTransitive, skos:narrowerTransitive and skos:related are sub-properties of skos:semanticRelation.

skos:broaderTransitive and skos:narrowerTransitive (instances of owl:TransitiveProperty)

Like skos:broader and skos:narrower, these two SKOS labels assert hierarchical relationships between concepts, that one concept is broader or narrower in meaning than another. The transitive nature of these two labels means that statements like: "if vegetables is broader than gourd and gourd is broader than pumpkin, then vegetables is assumed to be broader than pumpkin" are possible in the SKOS data model.

Mapping Properties

skos:mappingRelation (instance of owl:ObjectProperty)

The SKOS mapping labels are used to state mapping (or alignment) connections between SKOS concepts that exist in different concept schemes.

skos:closeMatch (instance of owl:ObjectProperty and owl:SymmetricProperty)

skos:exactMatch (instance of owl:ObjectProperty and owl:SymmetricProperty and owl:TransitiveProperty)

skos:broadMatch (instance of owl:ObjectProperty)

skos:narrowMatch (instance of owl:ObjectProperty)

skos:relatedMatch (instance of owl:ObjectProperty and owl:SymmetricProperty)

Collections of Concepts

skos:Collection (instance of owl:Class)

The SKOS concept collections labels are used to describe labeled or ordered groups of SKOS concepts. For example, the veggie vocab can be condsidered a collection of SKOS concepts because it is a group of concepts that have something in common.

Example:

<Veggie_Vocab> rdf:type skos:Collection;

skos:OrderedCollection (instance of owl:Class)

Example:

<Veggie_Vocab> rdf:type skos:OrderedCollection;

skos:member (instance of owl:ObjectProperty)

Example:

<Veggie_Vocab> rdf:type skos:Collection;
skos:member <bean> , <gourd> , <root> .

skos:memberList (instance of owl:ObjectProperty and owl:FunctionalProperty)

Example:

<Veggie_Vocab> rdf:type skos:OrderedCollection;
skos:memberList ( <bean , gourd , root> ) .

Notes

skos:note

This SKOS label was created for general documentation purposes. There is a hierarchical link between skos:note and its different specializations, and this allows all the documentation associated with a concept to be retrieved in a straightforward way.

skos:scopeNote (instance of owl:AnnotationProperty)

This label supplies some (possibly partial) information about the intended meaning of a concept. It is usually used as an indication of how the use of a concept is limited in indexing practice:

Example:

ex:root skos:scopeNote
"Used for plant roots used as vegetables"@en.


skos:definition (instance of owl:AnnotationProperty)

This label supplies a complete explanation of the intended meaning of a concept:

Example:

ex:parsnip skos:scopeNote
"The parsnip (Pastinaca sativa) is a root vegetable related to the carrot. Parsnips resemble carrots, but are paler in colour than most carrots, and have a sweeter taste, especially when cooked."@en.


skos:example (instance of owl:AnnotationProperty)

This label supplies an example of the use of a concept:

Example:

ex:pumpkin skos:example
"baking, cooking, pumpkin seeds, pumpkin seed oil, pumpkin carving, jack o’lanterns, etc."@en.


skos:historyNote (instance of owl:AnnotationProperty)

This label describes significant changes to the meaning or form of a concept:

Example:

ex:green_bean skos:scopeNote
"The first "stringless" bean was bred in 1894 by Calvin Keeney while working in Le Roy, New York."@en.


skos:editorialNote (instance of owl:AnnotationProperty)

This label supplies information that is an administrative aid (reminders of editorial work still to be done, notifications that future editorial changes might be made):

Example:

ex:lima_bean skos:editorialNote "Check for alternate terms"@en.


skos:changeNote (instance of owl:AnnotationProperty)

This label documents fine-grained changes to a concept for the purposes of administration and maintenance:

Example:

ex:Parsnip skos:changeNote
"Moved from under 'Gourd' to under 'Root' by Priscilla Jane Smith"@en.


It should also be mentioned that is possible to use non-SKOS properties to document concepts (like Dublin Core’s dct:creator):

Example:

ex:spaghetti_squash dct:creator [ foaf:name "Priscilla Jane Smith" ].



SKOS Integrity Conditions

The SKOS Reference document includes several integrity conditions. These integrity conditions are statements that help to determine whether or not given data (for example, a vocabulary) are consistent with respect to the SKOS data model. The purpose of the SKOS integrity conditions are to encourage the construction of well-formed and consistent data and to promote interoperability between data represented in SKOS.

skos:ConceptScheme is disjoint with skos:Concept

This condition states that SKOS concept schemes, or groups of SKOS concepts, must not be on the same hierarchical level as SKOS concepts and vice versa. For example, in the veggie vocab, Vegetables is a skos:ConceptScheme. This means that Vegetables must not also be a skos:Concept and one of the concepts, like Lima Bean, may not be a skos:ConceptScheme.

skos:prefLabel, skos:altLabel and skos:hiddenLabel are pairwise disjoint properties

This condition states that no SKOS concept may be a member of more than one of preferred label, alternate label and hidden label.

A resource has no more than one value of skos:prefLabel per language tag

This condition states that no SKOS concept may have more than one preferred label for each language tag. For example, the concept summer_squash has the preferred label of “Summer Squash” in English and of “Cucurbita pepo” in Latin, and there may not be any other preferred labels in English or Latin.

skos:related is disjoint with the property skos:broaderTransitive

This condition states that no two SKOS concepts may be connected by both related and broader transitive relationships.

skos:Collection is disjoint with each of skos:Concept and skos:ConceptScheme

This condition states that SKOS collections, or labeled or ordered groups of SKOS concepts, must not be on the same hierarchical level as SKOS concepts and vice versa. For example, in the veggie vocab, Vegetables is a skos:Collection. This means that Vegetables must not also be a skos:Concept and one of the concepts, like Lima Bean, may not be a skos:Collection.

skos:exactMatch is disjoint with each of the properties skos:broadMatch and skos:relatedMatch

This condition states that no two SKOS concepts may related by more than one of exact match, broader match, and related match.

References

Isaac, A., Phipps, J., & Rubin, D. (2009). SKOS use cases and requirements.

Isaac, A. & Summers, E. (2009). SKOS simple knowledge organization system primer.

Miles, A., Bechhofer, S. (2009). SKOS simple knowledge organization system reference.

Applications used to create visualizations:

MultiTes thesaurus workstation software

FreeMind mind-mapping software

RDF example: The Veggie Vocab





    
        The Veggie Vocab
        
    


		
				
				
				
				Bean
				
				
				
				
		

		
				
				
				
				Butternut Squash
				
		

		
				
				
				
				Carrot
				
		

		
				
				
				
				Fava Bean
				
		

		
				
				
				
				Gourd
				
				
				
				
				
		

		
				
				
				
				Green Bean
				
		

		
				
				
				
				Lima Bean
				
		

		
				
				
				
				Parsnip
				
		

		
				
				
				
				Potato
				
		

		
				
				
				
				Pumpkin
				
		

		
				
				
				
				Root
				
				
				
				
				
		

		
				
				
				
				Spaghetti Squash
				
		

		
				
				
				
				Summer Squash
				
		

		
				
				
				
				Sweet Potato
				
		

		
				
				
				
				
				Vegetable
				
				
				
				"An edible plant or part of a plant which may or may not propagate into offspring"
				Vegetal
				Légume
		


    
        Published