= Methodology = The overall goal is to associate one or more EcoCyc reactions with each iAF1261 reaction using shared genes. Our approach was to partition unmapped reactions into increasingly finer disjoint sets. There were 4 categories of unmapped reactions: == The code: == In python, generate the Ecocyc-UCSD compound hashtable and the UCSD reaction-gene association list in lisp.
>>> rm = ReactionMatcher()
>>> iafdir = '/home/zucker/src/lsw/trunk/bug/iAF1261/'
>>> iAF1261_rxns_file = 'iAF1261-ecocyc-rxn-mappings.txt'
>>> iAF1261_cpds_file = 'iAF1261-ecocyc-cpd-mappings.txt'
>>> ecocyc_ucsd_cpd_map = rm.make_ecocyc_ucsd_cpd_map( iafdir + iAF1261_cpds_file  )
>>> rm.print_ecocyc_ucsd_map_to_lisp( ecocyc_ucsd_cpd_map, iafdir, 'ecocyc-ucsd-map' )
>>> lispfile = rm.print_genes_of_ucsd_rxn_to_lisp( iafdir + iAF1261_rxns_file, iafdir, 'iAF1261-genes-of-ucsd-rxn' )
In lisp, generate the list of EcoCyc reactions that match the gene.
EC> (setq lispdir "/home/zucker/src/lsw/trunk/bug/iAF1261" )
EC> (load (format nil "~A/~A" lispdir "ecocyc-ucsd-cpd-map.lisp"))
EC> (load (format nil "~A/~A" lispdir "iAF1261-genes-of-ucsd-rxn.lisp"))
EC> (load (format nil "~A/~A" lispdir "make-ecocyc-ucsd-map.lisp"))
EC> (match-rxns-by-gene lispdir "iAF1261" *iAF1261-genes-of-ucsd-rxn* *ecocyc-ucsd-cpds*)
In python, analyze the results and run this for each category
>>> comments_iAF1261 = rm.print_comment_and_rxns_of_gene( iAF1261_ecocyc_comments_of_gene, iafdir + 'iAF1261-unmapped-rxns-w-gene-comments.txt' )
>>> all_iAF1261 = rm.match_rxns_by_gene( iafdir + 'iAF1261-ecocyc-rxns-of-gene.txt', iafdir + iAF1261_rxns_file, ecocyc_ucsd_map, iafdir + 'iAF1261-matched-rxns-by-gene-all.txt', True )
>>> best_iAF1261 = rm.match_rxns_by_gene( iafdir + 'iAF1261-ecocyc-rxns-of-gene.txt', iafdir + iAF1261_rxns_file, ecocyc_ucsd_map, iafdir + 'iAF1261-matched-rxns-by-gene-best.txt', False )
== File Formats == === Stage 1 === In the first stage, two lisp files are generated: * [[Media:ecocyc-ucsd-map.lisp|All the compound mappings between iAF1261 and EcoCyc]] * [[Media:iAF1261-genes-of-ucsd-rxn.lisp|All the genes in the file with maps to the ucsd reactions that are catalyzed by their products]] === Stage 2 === In the second stage, two files are generated: * [[Media:iAF1261-ecocyc-rxns-of-gene.txt|A flatfile containing all EcoCyc reactions that are catalyzed by the gene]] * [[Media:iAF1261-ecocyc-comments-of-gene.py|A python dictionary where the key is (gene, ucsd_rxn) and the value is the EcoCyc protein comment.]] === Stage 3 === Whenever an iAF1261 reaction could be matched to an EcoCyc reaction, a list of EcoCyc substrates that had no map to a substrate in the iAF1261 reaction, and a list of iAF1261 substrates that had no map to a substrate in the EcoCyc reaction were included. Whenever possible, the iAF1261 compound abbreviation was used. If an EcoCyc substrate is listed with the EcoCyc identifier, it means that compound is not mapped to any compound in iAF1261. *[[Media:iAF1261-unmapped-rxns-w-gene-comments.txt|A list of genes that have UCSD reactions associated with it, but no EcoCyc reaction.]] Shows all genes that have been assigned to at least one reaction in iAF1261, but have not been assigned to any reaction in EcoCyc. Candidate EcoCyc reaction assignments can also be given. *[[iAF1261-matched-rxns-by-gene-best.txt|A list of UCSD reactions with the EcoCyc reaction that shared the most substrates in common with it.]] *[[Media:iAF1261-matched-rxns-by-gene-all.txt|A list of UCSD reactions with all EcoCyc reactions that share a gene in common with it.]] the following files map at least one EcoCyc reaction to an iAF1261 reaction by gene name. This may also help to map some of the remaining unmapped compounds. = Results = == Full reaction set == * iAF1261-ecocyc-rxn-mapping.txt: 2381 ** category0-rxns-w-genes.txt: 1919 *** category0-matched-rxns-by-gene-best.txt: 1814 *** category0-unmatched-rxns-w-gene.txt: 105 ** category0-rxns-wo-genes.txt: 463 *** category0-rxns-wo-genes-transport.txt: 66 *** category0-rxns-wo-genes-extracellular.txt: 305 *** category0-rxns-wo-genes-periplasmic.txt: 19 *** category0-rxns-wo-genes-cytoplasmic.txt: 73 == Unmapped Exchange reactions across the system boundary == * iAF1261-ecocyc-rxns-unmapped-exchange.txt: 304 ** category1-rxns-w-genes.txt: 0 *** category1-matched-rxns-by-gene-best.txt: 0 *** category1-unmatched-rxns-w-gene.txt: 0 ** category1-rxns-wo-genes.txt: 304 *** category1-rxns-wo-genes-transport.txt: 0 *** category1-rxns-wo-genes-extracellular.txt: 299 *** category1-rxns-wo-genes-periplasmic.txt: 0 *** category1-rxns-wo-genes-cytoplasmic.txt: 5 Category 1 reactions are outside the scope of EcoCyc, but are necessary for flux balance analysis to work. In SBML terms, they would be considered species with boundaryCondition=True In Elementary modes terms they, are the external metabolites. == Unmapped Diffusion reactions (facilitated by membrane transport proteins) == * iAF1261-ecocyc-rxns-unmapped-diffusion.txt: 229 ** category2-rxns-w-genes.txt:219 *** category2-matched-rxns-by-gene-best.txt: 219 *** category2-unmatched-rxns-w-gene.txt: 0 ** category2-rxns-wo-genes.txt: 9 *** category2-rxns-wo-genes-transport.txt: 9 *** category2-rxns-wo-genes-extracellular.txt: 0 *** category2-rxns-wo-genes-periplasmic.txt: 0 *** category2-rxns-wo-genes-cytoplasmic.txt: 0 Category 2 reactions are almost all catalyzed by the outer membrane proteins OmpC, OmpN, OmpF, and OmpE. In all 219 of those reactions were mapped to an instance of class: RXN0-2481 == Unmapped reactions due to one or more unmapped compounds == * iAF1261-ecocyc-rxns-with-unmapped-cpds.txt: 233 ** category3-rxns-w-genes.txt:184 *** category3-matched-rxns-by-gene-best.txt: 140 *** category3-unmatched-rxns-w-gene.txt: 44 ** category3-rxns-wo-genes.txt: 49 *** category3-rxns-wo-genes-transport.txt: 14 *** category3-rxns-wo-genes-extracellular.txt: 5 *** category3-rxns-wo-genes-periplasmic.txt: 9 *** category3-rxns-wo-genes-cytoplasmic.txt: 20 Category 3 category3 reactions were unmapped due to one or more unmapped compounds. Of those, category3.genes= had a gene associated with it, and category3.nogene = (category3 -category3.g) had no gene associated with it. Of the reactions that were not enzyme-catalyzed, category3.nogene.transport were transport reactions, category3.nogene.secretion were secretion reactions, category3.nogene.cytoplasmic were cytoplasmic reactions, category3.nogene.periplasmic were reactions that occurred entirely in the periplasm, and category3.nogene.extracellular occurred in the extracellular matrix. Of the category3.gene reactions that were enzyme-catalyzed, category3.gene.rxns=140 were able to be mapped to at least 1 EcoCyc reaction catalyzed by one or more of the same genes. category3.gene.comments were unable to be mapped to a specific EcoCyc reaction, but were able to be mapped to an EcoCyc protein comment which can be used to aid in resolving the discrepancy. == Unmapped reactions for other reasons == * iAF1261-ecocyc-rxn-unmapped-otherwise.txt: 537 ** category4-rxns-w-genes.txt: 465 *** category4-matched-rxns-by-gene-best.txt: 415 *** category4-unmatched-rxns-w-gene.txt: 50 ** category4-rxns-wo-genes.txt: 72 *** category4-rxns-wo-genes-transport.txt: 39 *** category4-rxns-wo-genes-extracellular.txt: 1 *** category4-rxns-wo-genes-periplasmic.txt: 10 *** category4-rxns-wo-genes-cytoplasmic.txt: 22 Substrates of category 4 category4 reactions were mapped to EcoCyc compounds, but were unable to be mapped to an EcoCyc reaction. Of these reactions, category4.genes had at least 1 gene associated with it, and category4.nogenes had no gene associated with it. Of the reactions that were not enzyme-catalyzed, category4.nogene.transport were transport reactions, category4.nogene.secretion were secretion reactions, category4.nogene.cytoplasmic were cytoplasmic reactions, category4.nogene.periplasmic were reactions that occurred entirely in the periplasm, and category4.nogene.extracellular occurred in the extracellular matrix. Of the category4.gene reactions that were enzyme-catalyzed, category4.gene.rxns=416 were able to be mapped to at least 1 EcoCyc reaction catalyzed by one or more of the same genes. category4.gene.comments were unable to be mapped to a specific EcoCyc reaction, but were able to be mapped to an EcoCyc protein comment which can be used to aid in resolving the discrepancy.