Open Conference Systems, 2010 Informatics for Phylogenetics, Evolution, and Biodiversity (iEvoBio 2010)

Font Size: 
CDAO-Store: A New Vision for Data Integration
Brandon Chisham, Trung Le, Enrico Pontelli, Tran Son, Ben Wright

Last modified: 2010-06-26

Abstract


The Comparative Data Analysis Ontology (CDAO)1 is an ontology developed, as part of the EvoInfo2 and EvoIO3
groups supported by NESCent4, to provide semantics to the descriptions of data and transformations commonly found
in the domain of phylogenetic inference. The core concepts of the ontology enables the description of phylogenetic
trees and associated character data matrices.
CDAO-store is a repository providing a rich set of API's for querying and visualizing phyloinformatics data. The
store is a triple-store which encodes data as RDF triples constructed according to the CDAO concept vocabulary.
CDAO-store provides three classes of services: a service for importing data in CDAO format, a PhyloWS interface
supporting an advanced set of queries for other external applications, and a web-interface for interacting with data
in the store.
The import feature is quite flexible and allows importing not only raw data, but also arbitrary annotations.
Currently, we have imported all of the trees in the TreeBASE dump dated January 2009. The corresponding
annotations relating these trees to studies and their authors have also been imported. We also have a translation
service to convert Phylip, NEXUS, MEGA, and NeXML format data into CDAO format for the import service.
The store provides a PhyloWS interface for programmatic access to data. The interface supports retrieving trees
by id, finding the nearest common ancestor of a set of taxa in a tree, or finding the minimum spanning clade for
a set of taxa. Results are returned as RDF/XML format CDAO documents. Additionally, the interface supports
retrieving matrices. The interface also supports extracting lists of trees that match certain structural criteria such
as the number of nodes, leaves, internal nodes, the radius, or diameter of the tree.
We have also implemented visualization tools for both trees and matrices. These tools allow users to view the
structure of the data, and provide some interaction with the data. For example, users can select particular columns
and rows from a matrix to view more closely only part of it. Users can also view a matrix as a color-coded gure.
With trees, users can select particular nodes in the tree and view related nodes. This last feature is currently under
development and is based o of the prefuse framework found at http://prefuse.org/download/.
The web-interface supports searching for trees by TreeBASE accession number, phylogenetic method, construction
algorithm, study, author, or taxonomic identier. In addition, it supports display of basic query results on the web
such as finding the nearest common ancestor or the minimum spanning clade of a set of nodes in a tree.
The CDAO-store is available at http://www.cs.nmsu.edu/~cdaostore/, the tool set including the translator is
licensed under the GPL, and is available on sourceforge http://cdaotools.sourceforge.net/.

1www.evolutionaryontology.org
2https://www.nescent.org/wg_evoinfo/Main_Page
3http://evoio.org/wiki/Main_Page
4http://www.nescent.org/index.php


Full Text: PDF