Topical Classification of Content: Making Sense of Folksonomies, Taxonomies, Ontologies, and More

Guy Van Peel
Written by Guy Van Peel
on October 22, 2010

In a meeting I attended last week it struck me how informally we often speak about certain concepts that are key in defining state-of-the-art knowledge products. In this case the discussion was about the importance of a comprehensive approach to classifying our tax, legal and regulatory content, and about what kind of classification system to use: taxonomy, thesaurus or other.

Specifically in the globalized context of our product portfolio development, where people with very different geographical and professional backgrounds come together to improve product features, I realized again how important it is to agree on definitions in order to be effective.

Although many people use the terms interchangeably, a taxonomy is not the same as a thesaurus, and both are different from an ontology. All are valuable models used in information architecture and knowledge organization, but they differ in sophistication, expressiveness and purpose.

As we evolve from “publishers” into “knowledge organizers” (see my previous post), I thought it useful to briefly position some of these concepts, and how they fit into the overall scheme of knowledge organization approaches. This is definitely not a new theme. Smarter minds have written extensively on this… So I’m not going to reproduce for instance the still excellent article from 2004 by Lars Marius Garshol (published in the Journal of Information Science but also available for free), but will limit myself to a brief summary indicating distinguishing traits of each of these concepts.

Folksonomy: a Web 2.0 concept, wasn’t in the 2004 Garshol article although it originated around that time (read more in this balanced article on folksonomies). A folksonomy is just a list of unmanaged keywords (“tags”) that are the result of the tagging activity of website visitors. Advocates of the folksonomy are to be found among those that believe in the “wisdom of crowds.” Personally, I don’t really believe that folksonomies or unmanaged keywords are a sufficient knowledge organization framework.

Controlled vocabulary: some use this term as an overarching concept, to include also the more sophisticated knowledge representation approaches mentioned further down in this post. I’d use it to indicate the simple, flat but managed keyword list. “Managed” here means that some expert or team of experts makes conscious decisions on what should be included in a list of “terms of art”.

Taxonomy: originated as a way to classify species in the science of biology, now more broadly used to add the notion of hierarchy (broader and narrower) to a controlled vocabulary. Often finds its expression in what we as publishers call “table of contents” that outlines a specific and self contained subject or knowledge domain.

Thesaurus: offers the capability to express a number of predefined relationships between terms, beyond just the hierarchical relationship that can be included in a taxonomy. I always advocate referring to this concept according to the definition and description provided by the available ISO industry standards (ISO 2788 and 5964).

Ontology: unlike taxonomies or thesauri, an ontology allows expression of any relationship between terms, so it’s not limited to the 12 or 13 relationships identified in the ISO thesaurus standards. Ontologies thus far are the most sophisticated knowledge organization framework I’ve encountered because they allow expression of the meaning of the relationships between concepts. They are considered foundational for the Semantic Web, as imagined by Tim Berners-Lee (remember he was also the one that was at the origin of Web 1.0!).

Why does it matter how we call the knowledge organization frameworks that we use? Do we actually need them? Do our customers care – what’s in it for them? Questions I’ll be exploring in later posts, but meanwhile of course I’d welcome your thoughts on the above classification of frameworks for knowledge organization!



Comments

There have been made 2 comments on this article

  1. Rosalie Donlon on October 26th 2010 at 05:24 am

    Thank you, Guy, for explaining these various terms. I find them confusing and I suspect that many outside the knowledge or content management arena may also find them confusing. I am currently attending the Association of Corporate Counsel Annual Meeting and I have not had anyone come to our booth to inquire about the taxonomy behind products on IntelliConnect. They only ask “how can I find what I’m looking for?” It seems to me that using the various technical terms outlined in Guy’s post is important for us as publishers to create a logical structure, but they aren’t terms that resonate with our customers. We’ll need to do more research to confirm that our customers are comfortable with anything more technical than “contents” or “table of contents.”

  2. Unlamnanahymh1a on February 28th 2012 at 03:16 am

    I recently stumbled upon your current blog post along with swiftly scanned along. I’ve seen a few strange feedback, however in most cases I’ve got to agree with what are the other commenters say. Experiencing countless nicegreat critiques on this blog, I was thinking that I might also join in and tell you that I really liked reading this publish. Therefore i think this may be my own initial opinion: “I believe you might have produced a number of genuinely unusual items. Very few people would really think about this how we just do. I’m truly satisfied that there’s a great deal concerning this subject that were uncovered so you made it happen therefore effectively, with much training!inch

Leave a comment

Exploring content, technology, & new ideas in the global information industry. New posts every Monday, Wednesday, Friday, & sometimes more. Visit us also at www.wolterskluwer.com
Recent comments
dropdown