Taxonomy and Ontology

Introduction

In researching this paper, the author found that, although there were a large number of specific taxonomies published and proposed, there was very little, if anything to be found discussing the general idea of "taxonomy". Even less material seemed to be available on the subject of ontology. The purpose of this paper is to present an introduction to the concepts of taxonomy and ontology and to discuss how they are being used in the development of computer systems with the goal of making the subject more accessible to newcomers.

Taxonomies

What is a Taxonomy?

Simply put, a taxonomy is a system of classification. Rudolf Scheltema writes the definition "taxonomy—1. the science of classification; laws and principles covering the classifying of objects 2. Biol. a system of arranging animals and plants into natural, related groups based on some factor common to each…."¹ According to the rules set up by the creators or maintainers of the taxonomy, the items being classified are grouped into taxa (singular "taxon"). The members of each taxon share one or more characteristics which mark them as a member of that class. The taxa are arranged in one or more related layers based on the level of detail. An idea of how this works may be gained from looking at a handful of example taxonomies.

Examples

Biological Taxonomy

Almost without a doubt, the most widely known taxonomy is that used by biologists. The objects to be classified, living organisms, are divided into taxa based on their physical characteristics. The taxa are arranged in nested levels with the specificity of the taxa increasing as the level advances from the highest level (super-kingdoms) to the lowest level (species). For example, the broadest category, super-kingdom, is divided based on the presence of a true nucleus in the organism's cells (eucharyonta) or its absence (procharyonta). The taxonomic description for modern humans is eucharyonta (super-kingdom) > animalia (kingdom) > chordata (phylum) > mammalia (class) > primate (order) > pongidae (family) > hominidae (subfamily) > homo (genus) > sapiens (species).² The biological taxonomy may be used to classify and describe any form of life on the planet, albeit with "division" substituted for "phylum" when describing plants. Interestingly, according to Scheltema, the biological task of classifying life on earth has run into a stumbling block similar to one faced by the information technology field in recent years—there are not enough trained taxonomists to deal with the number of new species being uncovered. Unlike the situation in the IT industry, there are no high salaries tempting young students to devote themselves to taxonomy and fill the void.

The Marney-Smith Taxonomy

Following the example set by biological taxonomists, Milton Marney and Nicholas Smith developed a taxonomy of systems far more inclusive than the biological taxonomy. The Marney-Smith taxonomy includes not only biological organisms but naturally occurring inanimate substances and social constructions. Classification begins with a taxonomic domain consisting of ylem—radiant energy. As nuclear binding acts on the ylem to form new systems, a new taxonomic domain, that of subatomic particles is formed. The elements/systems in each taxonomic domain unite into more complex systems which are members of a new, more complex taxonomic domain. Marney and Smith end their taxonomy in the noosphere with non-physical systems such as civilizations.³ Theoretically, anything in existence, whether natural or artificial, would be included in their taxonomy.

A Two-dimensional Taxonomy

Although the previously listed examples of taxonomies form huge, complex nets of with branching, tree-like connections, not all taxonomies do. Barton Sano and Alvin Despain provide a taxonomy for describing computer processor microarchitectures which, because of the small size of the relevant domain, is much smaller than Marney and Smith's taxonomy. The taxonomy authors use "fetch" and "decode" along one axis and "execute" and "retire" along the other to represent operational stages at the microarchitecture level and "static" (S) and "dynamic" (D) to represent the processor's behavior at that stage. The result, shown below, is their "Sixteen-Fold Way Taxonomy" of possible microarchitectures.⁴

Phases	Execute, Retire
Fetch, Decode	Methods	SS	SD	DD	DS
	SS	SSSS	SSSD	SSDD	SSDS
	SD	SDSS	SDSD	SDDD	SDDS
	DD	DDSS	DDSD	DDDD	DDDS
	DS	DSSS	DSSD	DSDD	DSDS

Geek Code

Included as an example of the human desire to taxonomize is the Geek Code. The Geek Code was developed on the Internet, presumably as a lark, by Robert Hayden with the intention of allowing geeks quickly to identify their characteristics to others. Geeks are classified in a number of areas including area of geekiness (similar to a college major, the category includes computers, engineering, jurisprudence, and "undecided" among others), familiarity with various flavors of UNIX, and the importance of Star Trek in their lives. The code in question includes letters identifying geeky characteristics and trailing punctuation to describe the strength of the characteristics. As an example, the code for the creator of the Geek Code is GED/J d-- s:++>: a-- C++(++++) ULU++ P+ L++ E---- W+(-) N+++ o+ K+++ w--- O- M+ V-- PS++>$ PE++>$Y++ PGP++ t- 5+++ X++ R+++>$ tv+ b+ DI+++ D+++ G++++ e++ h r-- y++** . Explanation of the code may be found in a twenty-seven-page document located at http://www.geekcode.com Similar taxonomies have been developed for other groups, including fans of television shows such as Buffy The Vampire Slayer as well as assorted clubs and political, philosophical, and sexual affiliations. In many cases, these codes are displayed on an individual's website or in his or her e-mail signature file.

What is a Taxonomy Good For?

Taxonomies aid humans in making sense of their environment. The jumble of random data which is the world can be classified and organized into a useful and meaningful system. Instead of walking across a field covered in strange plants, one can walk across a field covered in lupinus texensis, toxicodendron radicans and capsella bursapastoris, ringed by populus deltoides and carrya illinoiensis.⁵ Understanding the botanical taxonomic system would allow one to use the physical characteristics of the plants to find their botanical family and then narrow the search down by genus and species. Having found their place in the taxonomy, one would be able to find other information on the plants. For example, although l. texensis is a member of the legume family (leguminae/fabaceae), bluebonnets are not one of the edible members of that family.

While plants have evolved slowly over countless generations and random mutations, computers are evolving quickly and with directed purpose. New types if hardware and software as well as new models of old types are constantly being introduced. With the constant changes and developments compounded by lack of coordination in naming and ranking on the part of manufacturers, selecting components and systems can be difficult.

Jerry Banks has suggested the use of an industry-wide taxonomy to classify software and make it easier to distinguish products. As an example, Banks uses the interface of a piece of software. "The options could be 2-D, 2 1/2-D, and 3-D with values 1, 2, and 3. If the software is 3-D (3), then the next set of eight values are 0 or 1 corresponding to translation, rotation, scaling, light-sourcing, wire-frame, solid, continuous-motion, and true distancing. So this portion of the definition could be 311101111, meaning that the software is 3-D and possesses all of the attributes except light-sourcing."⁶ Just as the Geek Code can give a quick overview of a stranger's personality, such a system would allow a quick check for features available in a given product and greatly aid in researching purchases. Banks's proposal has foundered on the lack of an organization willing to take on the responsibility of creating and maintaining such a taxonomy.

If taxonomies are important to organizing human thought, they are even more important to organizing computer "thought". In order to "think", computers either calculate an answer mathematically or search a list of available options for a solution which fits the current situation. By classifying solutions into taxa, the computer can eliminate all members of a taxon without having to consider each one individually. An example of the use of taxonomy to improve computer processing may be found in Hans Berliner's research.

Working with a grant from the Defense Advanced Research Projects Agency, Berliner and others have been attempting to develop a taxonomy of concepts for chess programs.Berliner and company used earlier studies of the thought processes of chess players which indicated that "it is the 'chunking' of familiar chess specific patterns (groups of pieces) which deems Grandmasters superior to novices in their ability to recall a position from short term memory." Berliner quotes studies estimating that between 50,000 and 100,000 chess-specific concepts are stored in a Grandmaster's head.⁷

According to the article, a skilled player must be able not only to follow simple rules of thumb such as "passed pawns must be pushed," but also know when to ignore such advice. A good player must know when a pushed pawn becomes weak. Traditional chess programs possess tactical knowledge allowing the best programs to beat all but the best players. The problem is that the programs' knowledge does not represent understanding—they have little or no strategic ability. Strategy requires more situational knowledge and greater ability not only to read from a list of possible responses to an opponent's move but also to produce unexpected "creative" moves and to guess an opponent's strategy. ⁸

In an attempt to improve the design and playing ability of chess programs, Berliner has begun a taxonomy with the ultimate goal of identifying "as many relevant classes as realistically feasible with examples across all phases of play." The taxonomists intend to classify positions across the Opening, Middlegame, and Ending phases of the game. The taxonomy, in machine-readable form, is to be used to test chess programs and discover which chess concepts continue to elude their designers.⁹

Perhaps of more interest to readers than testing chess programs, the research of A. W. Roesler and S. G. McLellan provides an example of the use of taxonomies to aid in designing on-line help systems. It goes almost without saying that most computer users have used the help systems included with software at some time and wondered at the irony of the system being called "help". Roesler and McLellan quote a work by T. Duffy et al. that identifies the problem in designing on-line help systems as "match[ing] the information provided to users with the different kinds of knowledge that they require."¹⁰ To solve that problem, the researchers set about creating a taxonomy of user questions general enough to be usable in a variety of applications.

Roesler and McLellan had subjects use software while sitting next to a "wizard", an expert on the software being used, whom they could ask for help. All interaction between the users and wizards was videotaped and their questions recorded and studied. The questions were classified into a taxonomy of help content—starting from a base of pre-existing, published taxonomies—which was used to construct a prototype help system. After testing their taxonomy and prototype with users, Roesler and McLellan concluded that "a general taxonomy of information needs and the taxonomy of access methods to particular information types make it easy both for help providers to understand what information they need to supply and for help users to find the help they need quickly." ¹¹

Why are Taxonomies Important?

The importance of taxonomies stems from their usefulness in organizing information for use. Taxonomies turn ineffective, almost useless chaos into ordered, quickly referenced information. The advent of precisely ordered universal systems of knowledge organization has greatly aided the advance of science over the last several centuries.

In the development of computer information systems, the use of taxonomies is even more important. Where the human mind is creative, adaptive, and capable of dealing with chaos to some extent, the electronic minds of computers are un-creative, adaptive only so far as they have been programmed to be so and similarly incapable of dealing with chaos. Taxonomies can be used to order information for computers and make them more efficient. Further, taxonomies can order situations for computers, giving them a tool for understanding input. Closely related to taxonomies in this function are ontologies.

Ontologies

What is an Ontology?

Ontologies are systems of terms and definitions for use in interpreting a given domain. The more general the terms, the greater the number of domains to which the ontology may be applied. Peter Weinstein describes ontologies as "highly expressive representations of objects, classes of objects, and relations in some domain."¹² According to Asunción Gómez-Pérez, ontologies provide a common vocabulary for an area and "define…the meaning of the terms and relations between them."¹³ The sets of terms that make up ontologies provide a common language through which computer systems may communicate.

An Example

The University of Michigan Digital Library

An example of the use of ontologies in computer system design may be found in Weinstein's work with the University of Michigan Digital Library (UMDL). Weinstein and Gene Alloway, a UM librarian, were faced with the daunting task of redesigning the information systems of the UMDL. With the growth of the Internet and other electronic means of communication, the quantity of available information available for classification has with staggering speed. Recognizing that containing the flood of information within the traditional library framework—especially with the constantly changing needs of the users—would be impossible, Weinstein and Alloway sought to develop an automated information system which would be capable of handling the task.

Weinstein proposed to use electronic agents to handle the classification of new data. The agents were to be guided by ontologies carefully designed both to guide actions in the domain and to be extensible as a means of dealing with the UMDL's changing needs. Because the ontologies used are small yet capable of growing over time, Weinstein refers to them as "seed ontologies".¹⁴

As shown below, Weinstein's ontology makes use of a hierarchy of descriptors to classify a work. Each concept in the hierarchy represents a stage in the creation of the work from CONCEPTION to INSTANCE and beyond with each concept inheriting from its predecessor. Each concept also has a number of characteristics, such as PUBLISHING-FORMAT and GENRE that further describe the work and have their own hierarchies of subordinate concepts.

Concept	Definition	Example
CONCEPTION	A concept, plan, or design for work.	An idea for a story.
EXPRESSION	Work with specified content.	A manuscript for a novel.
MANIFESTATION	An expression packaged in a publishing format.	A published edition of the novel.
DIGITIZATION	A manifestation encoded in a digital format.	The novel in SGML format.
INSTANCE	A particular copy of a digitization.	A copy of the SGML file.
A hierarchy for the realization of work in the UMDL ontology.

The ontological structure describing the realization of work in the UMDL ontology.¹⁵

The structure of GENRE in the UMDL ontology.¹⁶

What are Ontologies Good For?

So, now that Weinstein and Alloway have invested great amounts of time and energy in the creation of their ontology, what is the thing good for? Ontologies provide a framework for interpreting an informational domain. They allow information systems more easily and efficiently to digest the data in a domain of interest. The use of ontologies makes it possible for intelligent agents to understand a domain well enough to automate tedious or time-consuming functions which would otherwise require a great deal of human labor.

In an article for Intelligence, Tim Menzies lists benefits in the following areas when using ontologies:

Interoperability
Browsing and searching
Reuse
Structuring

¹⁷

According to Menzies, the advantages in the area of interoperability come from the ability of different components using the same ontology to "design a mapping between different concepts in different components." In searching, an ontology's metaknowledge can assist an agent in finding data to fulfill a query, returning not just the results of a literal search but also searching for related concepts. The use of pre-existing ontologies can save resources in developing applications and databases. Finally, Menzies writes that "using the conceptualizations in ontologies to assist you in structuring the knowledge in a new domain" may speed the development of new systems.¹⁸

Why are Ontologies Important?

The use of intelligent systems such as computer agents is becoming more common. As mentioned above in the UMDL example, the amount of data becoming available as a result of the revolution in data technology and information systems is enough to completely overwhelm traditional methods of data ordering resulting in a need for agents. Increasing automation in areas such as manufacturing and shipping and the sending of probes into extreme environments such as space or the deep sea are likewise encouraging the use of agents. Agents must have some ground for interpreting their environment and ontologies provide that. Without ontologies, the development of intelligent systems would be slowed at best and almost impossible at worst.

The Relationship Between Taxonomies and Ontologies

Taxonomies and ontologies are closely linked. Gomez-Perez writes that ontologies are usually organized in taxonomies.¹⁹ Certainly the ontology developed by Weinstein takes the form of a hierarchical, multi-leveled taxonomy similar to the biological and Marney-Smith taxonomies listed above.

A taxonomy subdivides a domain into classes such as "hominidae" or "fetch-static, decode-static, execute-static, retire-dynamic", often with inheritance relationships between some of the classes. Ontologies similarly subdivide the domain of interest, using classes and relationships but also allow for other types of relationships, functions, and axioms. Where taxonomies divide the domain as a means of making sense out of it, ontologies not only subdivide the domain but also provide a means and method for interacting with and shaping the domain. The relationship between the two seems to be of two varieties. From the view of both constructions subdividing the domain and attempting to make sense of it, an ontology could be seen as a specialized and unusually active taxonomy. From the view of ontologies using subdivisions of the domain in their analysis of and interaction with the domain, taxonomies may be seen as an element within ontologies and a tool without which the ontologies could not function.

Conclusion

The purpose of this paper was to present an introduction to the concepts of taxonomy and ontology for use by persons new to the subject. In pursuing this goal, we have seen that taxonomies are an effective means of subdividing a domain of interest to make it more easily comprehensible. The importance of taxonomies to human and machine thought has also been discussed. Similarly, it has been shown that ontologies are a means of generically describing a domain of data using a common vocabulary accessible to different agents. The increasing importance of ontologies with the growth in the use of intelligent agents was also discussed. Conceptual links between taxonomies and ontologies were shown to exist. Afterwards, it was shown to be possible to view ontologies as unusually elaborate taxonomies possessed of certain extra functions and to view taxonomies as vital elements within and tools of ontologies.

Notes
¹Scheltema, p. 3 of printout. back
²Franklin, p. 6 or 10 on printout. back
³Marney, pp. 182-185. See also attachement 1. back
⁴Sano, pp. 60-61. back
⁵That is, bluebonnets, poison ivy, shepherd's purse, cottonwood trees, and pecan trees. back
⁶Banks, p. 1 of printout. back
⁷Berliner, pp. 336-338. back
⁸Berliner, pp. 336-338. back
⁹Berliner, p. 338. back
¹⁰Roesler, p. 2 of printout. back
¹¹Roesler, p. 6 of printout. back
¹² Weinstein, p. 83. back
¹³ Gómez-Pérez, p. 1 of printout. back
¹⁴ Weinstein, pp. 82-83. back
¹⁵ Weinstein, p. 85 and http://www.umich.edu/~peterw/Ontology/ontology.html back
¹⁶Weinstein, p. 85 and http://www.umich.edu/~peterw/Ontology/ontology.html back
¹⁷ Menzies, p. 27. back
¹⁸ Menzies, p. 27. back
¹⁹ Gómez-Pérez, p. 1 of printout. back
Bibliography

All articles listed as coming from the ACM are downloaded from the ACM's archives available through the UTA library's website. In many cases, no other information as to the provenience and publication history of the article came with it. In a few cases, no information on the date of copyright or publisher is available.

Banks, Jerry. "Let's Talk Taxonomy" IIE Solutions, V. 31 No. 6, p. 17, June 1999.

Bergamaschi, Sonia, and Sartori, Claudio. "On Taxonomic Reasoning in conceptual Design". ACM 1992.

Berliner, Hans; Kopec, Danny; and Northam, Ed. "A Taxonomy of Concepts for Evaluating Chess Strength" Copyright IEEE, date unknown. The article was downloaded from the IEEE's archives.

Bieszczad, Andrzej; Biswas, Pratik K.; Buga, Walter; Malek, Manu; and Tan, Hai. "Management of Heterogeneous Networks with Intelligent Agents". Bell Labs Technical Journal Vol. 4 No. 4 pp. 109-35. October-December 1999.

Chemij, Wasel. "Vas's Parallel Computer Taxonomy Page" Chapter 7 of the author's M. Phil. Thesis. 1997. http://www.gigaflop.demon.co.uk/comp/chapt7.htm

Faro, Alberto, and Giordano, Daniela. "Ontology, aestetics and creativity at the crossroad in Information System design". ACM 1999.

Franklin, Stan, and Graesser, Art. "Is it an Agent, or Just a Program?: A Taxonomy for Autonomous Agents". Proceedings of the Third International Workshop on Agent Theories, Architectures, and Languages, 1996.

Frantz, Frederick K. "A Taxonomy of Model Abstraction Techniques" Proceedings of the 1995 Winter Simulation Conference. 1995.

Gunter, Bert. "Tree-Based Classification and Regression Part 3: Tree-Based Procedures"

Gómez-Pérez, Asuncion, and Benjamins, V. Richard. "Applications of Ontologies and Problem-Solving Methods". AI Magazine V. 20 No. 1, pp. 119-22. Spring 1999.

Keshav, R, and Gamble, R. "Towards a Taxonomy of Architecture Integration Strategies" ACM 1998.

Marney, Milton, and Smith, Nicholas. "The Domain of Adaptive Systems: A Rudimentary Taxonomy". The General Theory of Systems Applied to Management and Organization.

Menzies, Tim. "Cost Benefits of Ontologies". Intelligence. Fall 1999.

Oman, Paul W. and Cook, Curtis R. "A Taxonomy of Programming Style". ACM 1990.

Rademacher, Robert. A. "Applying Bloom's Taxonomy of Cognition To Knowledge Management Systems". ACM 1999.

Roesler, A. W., and McLellan, G. G., "What Help Do Users Need?: Taxonomies for On-line Information Needs & Access Methods." Date unlisted. Available from http://www.acm.org

Sano, Barton, and Despain, Alvin. "The 16-Fold Way: A Microparallel Taxonomy" Advanced Computer Architecture Laboratory, University of Southern California Los Angeles. 1993.

Scheltema, Rudolf S. "Describing Diversity: Too Many New Species, Too Few Taxonomists" Oceanus V. 39, p. 16-18, Spring/Summer 1996.

Weinstein, Peter, and Alloway, Gene. "Seed Ontologies: growing digital libraries as distributed, intelligent systems" ACM 1997.