Introduction to Bio-Ontologies

by Peter Robinson and Sebastian Bauer

Abstract

In recent years the biological sciences have generated very large, complex data sets whose management, analysis and sharing have created unprecedented challenges. The development of ontologies, originally driven by the invention of the semantic web, has been critical in handling this data and permitting interoperability between databases and between applications. The book emphasizes computational and algorithmic issues surrounding bio-ontologies, and additionally offers a number of exercises and tutorials designed to help readers use software such as Protege and applications designed especially for OBO ontologies, including the Ontologizer, which was developed by the authors. The book provides readers with the foundation to use ontologies as a starting point for new bioinformatics research projects or to support current molecular genetics research projects. By supplying a self-contained introduction to OBO ontologies and the Semantic Web, it bridges the gap between both fields and helps readers see what each can contribute to the analysis and understanding of biomedical data.

The first part of the book defines ontology and bio-ontologies. It also explains the importance of mathematical logic for understanding concepts of inference in bio-ontologies, discusses the probability and statistics topics necessary for understanding ontology algorithms, and describes ontology languages, including OBO (the preeminent language for bio-ontologies), RDF, RDFS, and OWL (the languages of the Semantic Web).

The second part covers significant bio-ontologies and their applications. The book presents the Gene Ontology; upper-level ontologies, such as the Basic Formal Ontology and the Relation Ontology; and current bio-ontologies, including several anatomy ontologies, Chemical Entities of Biological Interest, Sequence Ontology, Mammalian Phenotype Ontology, and Human Phenotype Ontology.

The third part of the text introduces the major graph-based algorithms for bio-ontologies. The authors discuss how these algorithms are used in overrepresentation analysis, model-based procedures, semantic similarity analysis, and Bayesian networks for molecular biology and biomedical applications.

The fourth and final part of the book describes the ontology languages of the Semantic Web and their applications for inference. It covers the formal semantics of RDF and RDFS, OWL inference rules, a key inference algorithm, the SPARQL query language, and the state of the art for querying OWL ontologies.

Supplementary Materials

This section will be extended in the future.
Robo package
robo-0.1.tar.bz2 (2011/07/04)
This is the robo package, which can be used to read OBO and GAF files for analyzing gene sets. After the archive has been downloaded, install it by entering R CMD INSTALL robo-0.1.tar.bz2 at the command line prompt.

Contact

Feedback is highly appreciated. Please send it to Peter.Robinson@charite.de and Sebastian.Bauer@charite.de.