BioPAX and SBML

From HMS Genetics Department Wiki

Using BioPAX to annotate and extend SBML

While the main function of BioPAX is intended to be the exchange of data between biological pathway databases, a wide array of additional uses are possible. What follows is an example of how BioPAX could be used to provide additional annotation on a set of Systems Biology Markup Language (SBML) data. SBML is an XML-based data format designed to exchange pathway models between software simulation packages.

Suppose you are given an SBML model with human readable annotations, as shown below:

<sbml>
   <model>
       <listOfSpecies>
           <species id='pyruvate'>
                <notes> I grabbed compound pyruvic acid from KEGG. </notes>
           </species>
           <species id='cyclin'>
                <notes> I grabbed the protein cyclin A1 from EcoCyc. </notes>
           </species>
       </listOfSpecies>
   </model>
</sbml>

To a human familiar with the biochemistry, it is clear that 'pyruvate' is a small molecule and 'cyclin' is a protein. A human could also infer that another name for pyruvate is 'pyruvic acid' and another name for 'cyclin' is 'cyclin A1'. Without too much trouble, a human could look up additional details on these bioentities, or species as they are called in SBML, from their respective databases. But what happens when you have hundreds or thousands of species upon which to perform these operations? This is when the data integration problem becomes apparent. BioPAX can help take human-readable notes and produce machine-understandable annotations that can be used for query and inference.


Extending SBML with BioPAX metadata


In the SBML model shown above, we see a human-readable note explaining the type of species, a synonym for the species and the database [Au: OK?] from which the species came. To make this information machine understandable, the SBML metaid [Au: OK?] attribute and the RDF identification [Au: OK?] tag must be identical, so that they can be linked. It is shown below how to annotate the data type of each species with the BioPAX class:

 <model xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
        xmlns:biopax='http://www.biopax.org/release/biopax-level1.owl#'>
    <listOfSpecies>
        <species id='pyruvate' metaid='pyruvate'>
            <notes> 
                I grabbed compound pyruvic acid from KEGG.
            </notes>
            <annotation>
                <biopax:smallMolecule rdf:ID='#pyruvate'/>
            </annotation>
        </species>
        <species id='cyclin' metaid='cyclin'>
            <notes>
                I grabbed the protein cyclin A1 from EcoCyc.
            </notes>
            <annotation>
                <biopax:protein rdf:ID='#cyclin'/>
            </annotation>
        </species>
   </listOfSpecies>

It is also possible to represent the synonyms for each species with BioPAX properties:

 <model xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
        xmlns:biopax='http://www.biopax.org/release/biopax-level1.owl#'>
   <listOfSpecies>
        <species id='pyruvate' metaid='pyruvate'>
            <notes>
                I grabbed compound pyruvic acid from KEGG.
            </notes>
            <annotation>
                <biopax:smallMolecule rdf:ID='#pyruvate'>
                    <biopax:SYNONYMS>pyruvic acid</biopax:SYNONYMS>
                    <biopax:SYNONYMS>pyroracemic acid</biopax:SYNONYMS>
                    <biopax:SYNONYMS>2-oxopropanoic acid</biopax:SYNONYMS>
                    <biopax:SYNONYMS>BTS</biopax:SYNONYMS>
                     <biopax:SYNONYMS>pyruvic acid</biopax:SYNONYMS>
                </biopax:smallMolecule>
            </annotation>
        </species>
        <species id='cyclin' metaid='cyclin'>
            <notes>
                I grabbed the protein cyclin A1 from EcoCyc.
            </notes>
            <annotation>
                <biopax:protein rdf:ID='#cyclin'/>
                <biopax:SYNONYMS>Cyclin A1</biopax:SYNONYMS>
            </annotation>
     </listOfSpecies>

Finally, to reference the databases from which each species came using BioPAX Xrefs:

 <model xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
        xmlns:biopax='http://www.biopax.org/release/biopax-level1.owl#'>
     <listOfSpecies>
        <species id='pyruvate' metaid='pyruvate'>
            <notes>
                I grabbed compound pyruvic acid from LIGAND.
            </notes>
            <annotation>
               <biopax:smallMolecule rdf:ID='#pyruvate'>
                   <biopax:SYNONYMS>pyruvic acid</biopax:SYNONYMS>
                   <biopax:SYNONYMS>pyroracemic acid</biopax:SYNONYMS>
                   <biopax:SYNONYMS>2-oxopropanoic acid</biopax:SYNONYMS>
                   <biopax:SYNONYMS>BTS</biopax:SYNONYMS>
                   <biopax:SYNONYMS>pyruvic acid</biopax:SYNONYMS>
                   <biopax:XREF rdf:resource='#unificationXref119'/>
               </biopax:smallMolecule>
                    <biopax:unificationXref rdf:ID='#unificationXref119'>
                         <biopax:DB>LIGAND</biopax:DB>
                         <biopax:ID>c00022</biopax:ID>
                    </biopax:unificationXref>
            </annotation>
        </species>
        <species id='cyclin' metaid='cyclin'>
            <notes>
                I grabbed the protein cyclin A1 from EcoCyc.
            </notes>
            <annotation>
                <biopax:protein rdf:ID='#cyclin'/>
                    <biopax:SYNONYMS>Cyclin A1</biopax:SYNONYMS>
                    <biopax:XREF rdf:resource='#unificationXref12'/>
                </biopax:protein>
                <biopax:relationshipXref rdf:ID='unificationXref12'>
                    <biopax:DB>EcoO157Cyc</biopax:DB>
                    <biopax:DB-VERSION>46</biopax:DB-VERSION>
                    <biopax:RELATIONSHIP-TYPE>Similar protein
                </biopax:RELATIONSHIP-TYPE>
                    <biopax:ID>P18606</biopax:ID>
                </biopax:relationshipXref>
            </annotation>
    </listOfSpecies>

Through this mechanism, data types, synonyms and external references can be added to SBML; thus, one standard can be used to extend and enhance another.


Back to Joanne Luciano home page