(in-package :ecocyc) ;;; make sure to (load "hand-cpd-assignments.lisp") , so *one-time-cpd-assignments* is set. ;;;(sri:fload "/homedir/kr/ecocyc/metabology/palsson/model-comparison-palsson-ecocyc") ;;; goals: ;;; - differences in reversibility between P and E (for jennie) ;;; (P has assumed irreversibility unless contrary evidence was found, ;;; whereas E is reversible unless evidence for physiol. irrev. was found.) ;;;kr:Nov-11-2005 using openoffice-1.1, saved 2 sheets from FBA_model-11-9-05.xls ;;; tab-delimited, no quotes around the fields, and the first line names the columns : ;;; cpds in FBA_model-11-9-05-cpds.csv and ;;; rxn-gene assocs in FBA_model-11-9-05-rxn-gpr.csv (defvar *palsson-cpds-ht* (make-hash-table :test #'equalp) "hash-table, the keys being the abbreviation string naming the cpd from the palsson spreadsheet. the value is a palsson-cpd structure." ) (defstruct palsson-cpd abbreviation officialName formula charge casNumber ;;kr:Dec-14-2005 I have now seen 1 case where 2 numbers occur in this field, separated by ";" !!! KEGG-cmpd-ID notes ;; The following fields were not in the FBA_model-11-9-05.xls spreadsheet we originally obtained. ecocyc-id ;; a frame ID ;;kr:Dec-31-2005 The following probably ought to have been called "sri-analysis" instead. ;;kr:Aug-4-2006 Repurposed the analysis slot. For backwards compatibility, the old definition ;; of these strings will remain. But see the expansion below. ;; ;; Used for our assessment for why a cpd did not match with EcoCyc. It is a string. ;; It needs to start with a symbolic indicator of the type of matching failure, ;; and after whitespace, arbitrary other commentary can be added. ;; The symbolic type indicators are: ;; synonym ECOCYC-ID : the officialName will simply be added as another synonym ;; to the EcoCyc frame, specified by ECOCYC-ID . ;; hack ECOCYC-ID : the officialName will be used to map ;; to the EcoCyc frame, specified by ECOCYC-ID , but we will not store that in EcoCyc. ;; class ECOCYC-ID : we have this as a class, but not as an instance. ;; protein ECOCYC-ID : a protein or modified protein, specified by ECOCYC-ID . ;; metacyc METACYC-ID : the cpd does exist in MetaCyc, and could be imported into EcoCyc. ;; synmeta METACYC-ID : the cpd does exist in MetaCyc, and could be imported into EcoCyc. ;; However, first the MetaCyc cpd needs this synonym to be added. ;; unknown : we do not have it in EcoCyc or MetaCyc . May need to be curated from scratch. ;; fatty : like unknown, but it involves a fatty acid, possibly attached to CoA or ACP. ;; other : an explanatory comment better be included ! ;; ;;kr:Aug-4-2006 The new redefinition, for dumping out the final spreadsheet: ;; Instead of the string above, this can contain a list. The list can contain the string above, ;; but it will primarily contain symbols stating the verdict for matching this palsson-cpd ;; to EcoCyc or MetaCyc. The symbols should describe orthogonal properties that can be combined. ;; The symbols have the following meaning: ;; : A match to EcoCyc was found. (This does not have a tag of its own. It is assumed ;; to be the default, which can be overridden by the presence of :metacyc ) ;; :metacyc : A match to MetaCyc was found, but not to EcoCyc. Basically applies to instances only. ;; :i-o-c : The palsson-cpd seems to be an instance of the EcoCyc class frame. The class could also be ;; protein or tRNA class, not just a small molecule. So the mapping is a bit indirect. ;; :dispute : While a tentative assignment was made, there is still a disagreement between the Palsson ;; group and SRI regarding the exact assignment or structure of the cpd. ;; :protein-instance: Links to an ecocyc protein, which ought to be generalized to a class instead. ;; :polymer-section : A hypothetical, hopefully representative segment out of a much larger polymer. ;; :manual : The match was made by human inspection, not by automated software, which for the most part ;; is in (map-palsson-cpd-to-ecocyc ...) ;; analysis metacyc-id ;; a frame ID ) ;;;kr:Nov-14-2005 The topmost slot values are mostly lists, where all the left and right lists are in sync ;;; among each other, respectively. The exception is the irrev-p, a Boolean. ;;;kr:Nov-16-2005 Another exception now is abbreviation, which is a string, and Palsson's ID for the rxn. ;;; (defstruct palsson-rxn left-cpds right-cpds left-stoichs right-stoichs left-compartments right-compartments irrev-p abbreviation officialName ;; A list of the rxn IDs we were able to map to. ecocyc-rxn-ids ;;kr:Mar-17-2006 Added this. An (experimental) list containing the classification ;; into one of many bins. ;; The list will primarily contain symbols stating the verdict for matching this palsson-rxn ;; to EcoCyc or MetaCyc. The symbols should describe orthogonal properties that can be combined. ;; Some properties are decsribed in sublists. ;; The symbols have the following meaning: ;; : A match to EcoCyc was found. (This does not have a tag of its own. It is assumed ;; to be the default, which can be overridden by the presence of :metacyc ) ;; :metacyc : A match to MetaCyc was found, but not to EcoCyc. ;; :expanded-rxn-match : In EcoCyc/MetaCyc, compound classes can stand for several compound instances, ;; and many reactions are formulated in this manner. To find possible matches between ;; such broadly formulated, generic reactions and the specific instantiations in ;; the Palsson model, a larger range of reactions from EcoCyc is used for comparison, ;; namely including all of the reactions of all the compound classes that include ;; the substrates of the Palsson reaction. ;; :unmapped-cpds : One or more compounds of the Palsson reaction could not be mapped to ;; EcoCyc/MetaCyc, and thus the entire reaction can not be mapped. ;; :exchange : A simple exchange rxn (diffusion across a cell compartment boundary). ;; There is no real representation for this concept in EcoCyc, so none of these match. ;; Additionally, sublists describe more specific transformations that were applied to elicit a match. ;; The sublists take the form of (side operation argument-1 optional-argument-2) , with examples being: ;; (LEFT SUBSTITUTE AMMONIA AMMONIUM) ;; (RIGHT REMOVE PROTON) ;; One of the most common operations was to remove a proton from the Palsson reaction to elicit a match ;; to EcoCyc/MetaCyc. A less common operation was to substitute AMMONIUM in the Palsson reaction ;; with AMMONIA , as the EcoCyc/MetaCyc reaction used this compound, differing in protonation state. ;; analysis ;;kr:Mar-23-2006 a list of symbols, with b-numbers geneAssociation ;;kr:Mar-24-2006 This is the original string from the spreadsheet equation ) ;;;kr:Dec-16-2005 (defun remove-nil-crud (value) (unless (equalp value "NIL") value ) ) ;; ====================================================================== ;; kr:Nov-11-2005 Description : Populates an intermediary hash-table from the cpd data ;; contained in the FBA_model-11-9-05-rxn-gpr.csv spreadsheet ;; sheet. The first row names the columns. ;; Assumes that EcoCyc has been opened and is the current kb. ;; Arguments : palsson-cpd-csv-filename : a full filename with path ;; :only-read : a keyword argument that supresses modifying the hash-table if non-nil ;; :ht : a keyword argument indicating which hash-table to use ;; ;; Returns : list of palsson-cpd structures ;; Side Effects : modifies the *palsson-cpds-ht* hash-table, unless :only-read t ;; Update History : kr:Nov-21-2005 Reworked to allow reading without updating the hash-table, ;;; by adding keyword argument :only-read and returning the list of structures. ;;;kr:Jul-21-2006 Added :ht keyword argument. (defun populate-palsson-cpds-ht (palsson-cpd-csv-filename &key only-read (ht *palsson-cpds-ht*) ) (index-ekb) (let* ((raw-lists (variable-column-table-to-list :filename palsson-cpd-csv-filename :column-sep #\Tab ) ) ) (map 'list #'(lambda (row) (let* ((raw-casNumber (remove-nil-crud (fifth row))) (casNumber (unless (equalp "None" raw-casNumber) ;;kr:Jan-18-2006 There seem to be a ton of these... !! raw-casNumber)) (raw-analysis-field (ninth row)) ;;kr:Aug-24-2006 This will be first read as a string (analysis-field (when raw-analysis-field (if (eql #\( (elt raw-analysis-field 0)) (read-from-string raw-analysis-field) ;; convert to the list of symbol tags raw-analysis-field) ) ) (pals-struct (make-palsson-cpd :abbreviation (first row) :officialName (second row) :formula (third row) :charge (parse-integer (fourth row) :junk-allowed t) ;;kr:Jan-18-2006 Some more syntax-checking sure would be beneficial here at some point... :casNumber casNumber :KEGG-cmpd-ID (remove-nil-crud (sixth row)) :notes (seventh row) ;;my own additions: ;;kr:Mar-1-2006 After reading this from the file, this still will be a string instead of a symbol. :ecocyc-id (remove-nil-crud (eighth row)) :analysis analysis-field ;;kr:Mar-1-2006 After reading this from the file, this still will be a string instead of a symbol. :metacyc-id (remove-nil-crud (tenth row)) ) ) ) (unless only-read ;;kr:Mar-1-2006 The policy was changed to directly use the ecocyc-id and metacyc-id assignments, ;; if they already were made available in the spreadsheet. (cond ((palsson-cpd-ecocyc-id pals-struct) ;; Basically, just do a sanity check. (let* ((potential-frame-id (intern (string-trim '(#\|) (palsson-cpd-ecocyc-id pals-struct))))) (if (coercible-to-frame-p potential-frame-id :kb (kb-of-organism 'ECOLI)) (setf (palsson-cpd-ecocyc-id pals-struct) potential-frame-id) (progn (warn "For ~S, the ecocyc-id ~S was not coercible to a frame." (palsson-cpd-abbreviation pals-struct) potential-frame-id) (setf (palsson-cpd-ecocyc-id pals-struct) (map-palsson-cpd-to-ecocyc pals-struct)) ) ) ) ) ((palsson-cpd-metacyc-id pals-struct) ;; Basically, just do a sanity check. (let* ((potential-frame-id (intern (string-trim '(#\|) (palsson-cpd-metacyc-id pals-struct))))) (if (coercible-to-frame-p potential-frame-id :kb (kb-of-organism 'META)) (setf (palsson-cpd-metacyc-id pals-struct) potential-frame-id) (progn (warn "For ~S, the metacyc-id ~S was not coercible to a frame in MetaCyc." (palsson-cpd-abbreviation pals-struct) potential-frame-id) ;; Retry mapping in EcoCyc: (setf (palsson-cpd-ecocyc-id pals-struct) (map-palsson-cpd-to-ecocyc pals-struct)) ) ) ) ) (t ;;kr:Aug-23-2006 This is the real mapping, from scratch: ;;(setf (palsson-cpd-ecocyc-id pals-struct) (map-palsson-cpd-to-ecocyc pals-struct)) ;;kr:Aug-23-2006 Changed the behaviour of (map-palsson-cpd-to-ecocyc ...) to destructive, ;; so the hand-assignments can store the correct analysis tags. (map-palsson-cpd-to-ecocyc pals-struct) ) ) (setf (gethash (first row) ht) ;; abbreviation pals-struct) ) pals-struct ) ) (rest raw-lists)) ;; skip the first row, which names the columns ) ) ;; ====================================================================== ;; kr:Nov-17-2005 Description : ;; ;; ;; Arguments : palsson-cpd-list : a list of palsson-cpd structures ;; palsson-cpd-csv-filename : a full filename with path ;; ;; Returns : ;; Side Effects : ;; Update History : kr:Jan-15-2006 Suppressed unnecessary "NIL"s being written out. (defun dump-palsson-cpds-to-tab-delimited-file (palsson-cpd-list palsson-cpd-csv-filename) (with-open-file-noerror (file-stream palsson-cpd-csv-filename :direction :output :if-exists :supersede :if-does-not-exist :create) ;; Write the header line, containing the names of the columns: (write-tab-delimited-line file-stream (list "abbreviation" "officialName" "formula" "charge" "casNumber" "KEGG-cmpd-ID" "notes" ;; our extra columns "ecocyc-id" "analysis" "metacyc-id" )) (dolist (palsson-cpd palsson-cpd-list) (write-tab-delimited-line file-stream (list (palsson-cpd-abbreviation palsson-cpd) (palsson-cpd-officialName palsson-cpd) (palsson-cpd-formula palsson-cpd) (palsson-cpd-charge palsson-cpd) (if (palsson-cpd-casNumber palsson-cpd) (palsson-cpd-casNumber palsson-cpd) "") (if (palsson-cpd-KEGG-cmpd-ID palsson-cpd) (palsson-cpd-KEGG-cmpd-ID palsson-cpd) "") (if (palsson-cpd-notes palsson-cpd) (palsson-cpd-notes palsson-cpd) "") (if (palsson-cpd-ecocyc-id palsson-cpd) ;;kr:Jan-15-2006 Make sure to capture vertical bars of symbols. (format nil "~S" (palsson-cpd-ecocyc-id palsson-cpd)) "") (if (palsson-cpd-analysis palsson-cpd) ;;kr:Aug-23-2006 For new-style tags: (format nil "~S" (palsson-cpd-analysis palsson-cpd)) "") (if (palsson-cpd-metacyc-id palsson-cpd) (format nil "~S" (palsson-cpd-metacyc-id palsson-cpd)) "") ) ) ) ) ) ;;;kr:Nov-17-2005 (defun write-tab-delimited-line (stream list) (format stream "~A" (first list)) (loop for item in (rest list) do (format stream "~A~A" #\Tab item) ) (terpri stream) ) ;; ====================================================================== ;; kr:Dec-16-2005 Description : The analysis slot of a palsson-cpd is interpreted, ;; and appropriate ecocyc and or metacyc cpd ids are ;; filled in. ;; Arguments : analyzed-palsson-cpd structure ;; ;; Returns : ;; Side Effects : may modify the corresponding palsson-cpd in *palsson-cpds-ht* ;; Update History : (defun interpret-palsson-cpd-analysis (analyzed-palsson-cpd) (let* ((analysis (palsson-cpd-analysis analyzed-palsson-cpd)) (tokenized-analysis (tokenize-string analysis)) (abbreviation (palsson-cpd-abbreviation analyzed-palsson-cpd)) (palsson-cpd (gethash abbreviation *palsson-cpds-ht*)) (ecocyc-id (palsson-cpd-ecocyc-id analyzed-palsson-cpd)) (metacyc-id (palsson-cpd-metacyc-id analyzed-palsson-cpd)) (potential-frame-id (intern (string-trim '(#\|) (second tokenized-analysis)))) ) (when ecocyc-id (warn "~S already had an ecocyc-id of ~S assigned !" abbreviation ecocyc-id) ) (when metacyc-id (warn "~S already had an metacyc-id of ~S assigned !" abbreviation metacyc-id) ) (flet ((assign-metacyc-id () (assert (coercible-to-frame-p potential-frame-id :kb (kb-of-organism 'META)) nil "Frame ID ~S for ~S is not coercible in MetaCyc." potential-frame-id abbreviation) (setf (palsson-cpd-metacyc-id palsson-cpd) potential-frame-id) ) (assign-ecocyc-id () (assert (coercible-to-frame-p potential-frame-id :kb (kb-of-organism 'ECOLI)) nil "Frame ID ~S for ~S is not coercible in EcoCyc." potential-frame-id abbreviation) (setf (palsson-cpd-ecocyc-id palsson-cpd) potential-frame-id) ) ) (ecase (intern (string-upcase (first tokenized-analysis))) (synonym (assign-ecocyc-id)) (hack (assign-ecocyc-id)) (class (assign-ecocyc-id)) (protein (assign-ecocyc-id)) (metacyc (assign-metacyc-id)) (synmeta (assign-metacyc-id)) (unknown nil) (fatty nil) (other nil) ;;kr:Dec-30-2005 for ingrid's file: ((nil) (assert ecocyc-id) (let* ((potential-frame-id (intern (string-trim '(#\|) ecocyc-id)))) (assert (coercible-to-frame-p potential-frame-id :kb (kb-of-organism 'ECOLI)) nil "Frame ID ~S for ~S is not coercible in EcoCyc." potential-frame-id abbreviation) (setf (palsson-cpd-ecocyc-id palsson-cpd) potential-frame-id) ) ) ) ;;kr:Dec-31-2005 This needs to be transferred in any case: (setf (palsson-cpd-analysis palsson-cpd) analysis) ) ) ) ;; ====================================================================== ;; kr:Jan-18-2006 Description : Given a palsson-cpd structure, this will see whether the assigned frame ;; in EcoCyc or MetaCyc already has the KEGG and CAS IDs and the name string, ;; and add such items if not. ;; Arguments : analyzed-palsson-cpd structure ;; ;; Returns : ;; Side Effects : may update a frame in ecocyc or metacyc ;; Update History : (defun update-ecocyc-from-palsson-cpd (analyzed-palsson-cpd) (let* ((analysis (palsson-cpd-analysis analyzed-palsson-cpd)) (tokenized-analysis (tokenize-string analysis)) (abbreviation (palsson-cpd-abbreviation analyzed-palsson-cpd)) (officialName (palsson-cpd-officialName analyzed-palsson-cpd)) (KEGG-cmpd-ID (palsson-cpd-KEGG-cmpd-ID analyzed-palsson-cpd)) (casNumber (palsson-cpd-casNumber analyzed-palsson-cpd)) (raw-ecocyc-id (palsson-cpd-ecocyc-id analyzed-palsson-cpd)) (raw-metacyc-id (palsson-cpd-metacyc-id analyzed-palsson-cpd)) (ecocyc-id (with-organism (:org-id 'ECOLI) (convert-string-to-frame-if-possible raw-ecocyc-id))) (metacyc-id (with-organism (:org-id 'META) (convert-string-to-frame-if-possible raw-metacyc-id))) ) (when (find #\; casNumber) (warn "CAS nr. is weird: ~S" analyzed-palsson-cpd) ) (if (and ecocyc-id metacyc-id) (warn "~S is assigned to both an ecocyc ID ~S and a metacyc ID ~S ! nothing done." abbreviation ecocyc-id metacyc-id) (progn (when ecocyc-id (with-organism (:org-id 'ECOLI) (update-frame-synonym-and-dblinks ecocyc-id (unless (equal "hack" (first tokenized-analysis)) officialName) KEGG-cmpd-ID casNumber ) ) ;;kr:Jan-18-2006 also update metacyc, as propagation of dblinks was not yet really implemented... (let* ((metacyc-id (with-organism (:org-id 'META) (convert-string-to-frame-if-possible raw-ecocyc-id)))) (when metacyc-id (with-organism (:org-id 'META) (update-frame-synonym-and-dblinks metacyc-id (unless (equal "hack" (first tokenized-analysis)) officialName) KEGG-cmpd-ID casNumber ) ) ) ) ) (when metacyc-id (with-organism (:org-id 'META) (update-frame-synonym-and-dblinks metacyc-id (unless (equal "hack" (first tokenized-analysis)) officialName) KEGG-cmpd-ID casNumber ) ) ) ) ) ) ) ;;;kr:Jan-18-2006 Used immediately above. ;;; (defun update-frame-synonym-and-dblinks (frame synonym KEGG-id-str CAS-id-str) (let* ((existing-kegg-links (get-links frame :db 'LIGAND-CPD)) (existing-cas-links (get-links frame :db 'CAS)) ) (when KEGG-id-str (unless (find KEGG-id-str existing-kegg-links :key #'link-oid :test #'equal) (let* ((dblink (make-link :db 'LIGAND-CPD :oid KEGG-id-str))) (add-link frame dblink) (format t ";;; Added dblink ~S to ~S~%" dblink (get-frame-name frame)) ) ) ) (when CAS-id-str (unless (find CAS-id-str existing-cas-links :key #'link-oid :test #'equal) (let* ((dblink (make-link :db 'CAS :oid CAS-id-str))) (add-link frame dblink) (format t ";;; Added dblink ~S to ~S~%" dblink (get-frame-name frame)) ) ) ) (when (and synonym ;;kr:Jan-18-2006 This may perform the addition, and if so, it will return T. (add-frame-synonym frame synonym) ) (format t ";;; Added synonym ~S to ~S~%" synonym (get-frame-name frame)) ) ) ) ;;;kr:Thu Nov 17 2005 : paley says that changes in cpd synonyms propagate between ecocyc and metacyc, ;;; but not db-links. ;; ====================================================================== ;; kr:Nov-11-2005 Description : This is the main function that tries to map from the information ;; in a palsson-cpd structure to an EcoCyc frame ID, taking into account ;; CAS and KEGG IDs and the cpd name, and noting conflicts. ;; This assumes that *one-time-cpd-assignments* was set, by loading ;; file hand-cpd-assignments.lisp ;; ;; TODO: kr:Aug-23-2006 This function does not currently try on its own to search ;; in MetaCyc, but it probably should, storing an approriate analysis code... ;; Arguments : palsson-cpd structure ;; ;; Returns : cpd-frame ID, or NIL ;; Side Effects : can now modify some slots in palsson-cpd ;; Update History : kr:Jan-6-2006 Commented out the out-dated synonym lookup, but added lookup of ;;; the special mappings in *one-time-cpd-assignments* ;;;kr:Jan-14-2006 Key bug fix in the return processing, so hand-encoded mappings will actually be used. ;;;kr:Aug-23-2006 Changed this function to a destructive one, to allow storing analysis info coming from ;;; the hand-encoded mappings. (defun map-palsson-cpd-to-ecocyc (palsson-cpd) (let* ((kegg-id (palsson-cpd-KEGG-cmpd-ID palsson-cpd)) (cpd-by-kegg (when kegg-id (find-kegg-cpd kegg-id))) (cas-id (palsson-cpd-casNumber palsson-cpd)) (cpd-by-cas (when cas-id (find-cas-cpd cas-id))) (cpd-name (palsson-cpd-officialName palsson-cpd)) (cpd-by-name (get-cpd-by-name cpd-name)) (abbreviation (palsson-cpd-abbreviation palsson-cpd)) ) ;;kr:Mar-23-2006 Added checking for :multiple-hits (if (or (eql cpd-by-kegg :multiple-hits) (eql cpd-by-cas :multiple-hits) (and cpd-by-kegg cpd-by-cas (not (fequal cpd-by-kegg cpd-by-cas)) ) ) (warn "KEGG ~S pulled out ~A , whereas CAS ~S pulled out ~A" kegg-id (if (coercible-to-frame-p cpd-by-kegg) (get-frame-name cpd-by-kegg) cpd-by-kegg) cas-id (if (coercible-to-frame-p cpd-by-cas) (get-frame-name cpd-by-cas) cpd-by-cas) ) (when cpd-by-name ;; no conflicts ? (if (or (not (when cpd-by-kegg (not (fequal cpd-by-kegg cpd-by-name)))) (not (when cpd-by-cas (not (fequal cpd-by-cas cpd-by-name)))) ) ;; return the name, as it appears to be fine and non-conflicting. ;;kr:Jan-14-2006 This is now no longer the main return. See at the bottom. (get-frame-name cpd-by-name) (warn "KEGG ~S pulled out ~A , CAS ~S pulled out ~A , whereas name ~S pulled out ~A" kegg-id (when cpd-by-kegg (get-frame-name cpd-by-kegg)) cas-id (when cpd-by-cas (get-frame-name cpd-by-cas)) cpd-name (get-frame-name cpd-by-name)) ) ) ) ;;kr:Jan-14-2006 Return the result: (if cpd-by-name (setf (palsson-cpd-ecocyc-id palsson-cpd) (get-frame-name cpd-by-name)) ;;kr:Jan-6-2006 If there is a conflict, instead of not making a mapping, ;; try one last lookup in a list of hand-assigned hacks. (let* ((hand-assigned (assoc (palsson-cpd-abbreviation palsson-cpd) *one-time-cpd-assignments* :test #'equal)) (frame-id (second hand-assigned)) (analysis-list (subseq hand-assigned 2)) ) (when hand-assigned (warn "abbreviation ~S was assigned by hand-encoded mapping." abbreviation) ;;kr:Aug-23-2006 Assign the match to the correct slot: (if (find :metacyc analysis-list) (setf (palsson-cpd-metacyc-id palsson-cpd) frame-id) (setf (palsson-cpd-ecocyc-id palsson-cpd) frame-id) ) ;;kr:Aug-23-2006 For the hand-assigned, add the :manual analysis tag (setf (palsson-cpd-analysis palsson-cpd) (cons :manual analysis-list)) frame-id ) ) ) ) ) (defun cpds-to-search () ;; would we need to search proteins as well ??? (get-class-all-sub-frames '|Compounds|) ) ;;;kr:Mar-1-2006 Commentary: This tries to return the EcoCyc cpd if one is found, ;;; or nil if none, or :multiple-hits if several are found. This fn is used at the core of mapping both KEGG and CAS ids. ;;; (defun find-cpd-by-dblink (oid-str db-id) (let* ((cpds (loop for c in (cpds-to-search) when (get-links c :db db-id :oid oid-str) collect c) ) ) (if (> (length cpds) 1) (progn (warn " ~A ID ~S pulled out more than one cpd, namely: ~S" db-id oid-str (map 'list #'get-frame-name cpds)) :multiple-hits) (first cpds) ) ) ) ;;;kr:Nov-23-2005 enhanced, to search both LIGAND-CPD and LIGAND , ;;; because both apparently were used in the past. ;;;kr:Mar-1-2006 The use of LIGAND for cpds has been declared obsolete. ;;; If :multiple-hits is returned by any of the sub-calls to (find-cpd-by-dblink ...), ;;; this result should also be returned as the final result. ;;; (defun find-kegg-cpd (kegg-ligand-id-str) (let* ((ligand-cpd (find-cpd-by-dblink kegg-ligand-id-str 'LIGAND-CPD)) (ligand (find-cpd-by-dblink kegg-ligand-id-str 'LIGAND)) ) (if (and ligand-cpd ligand) (progn (warn " ~S pulled out more than one cpd, namely: ~S from LIGAND-CPD and ~S from LIGAND" kegg-ligand-id-str ligand-cpd ligand) :multiple-hits) ;; return one or the other (or ligand-cpd ligand) ) ) ) #||example: (time (find-kegg-cpd "C00078")) ==> TRP (time (find-kegg-cpd "C06251")) ==> KDO2-LAUROYL-LIPID-IVA ||# (defun find-cas-cpd (cas-id-str) (find-cpd-by-dblink cas-id-str 'CAS) ) #||example: (time (find-cas-cpd "73-22-3")) ==> TRP ||# ;;;kr:Aug-23-2006 Used for the old-style analysis tags (or rather: strings). ;;; (defun full-tagword-match (tag string) (when string ;;kr:Jan-15-2006 the :end2 forces the match to be at the beginning. (search tag string :end2 (length tag)) ) ) #||examples: EC(34): (full-tagword-match "metacyc" "metacyc") 0 EC(35): (full-tagword-match "metacyc" "whatever metacyc") NIL EC(36): ||# ;;;kr:Aug-23-2006 This just maps the old-style analysis tags to new ones, if applicable. ;;; (defun map-old-style-cpd-analysis-to-new (old-analysis-string) (let* ((old-tag (find-if #'(lambda (tag) (full-tagword-match tag old-analysis-string) ) #("synonym" "hack" "class" "protein" "metacyc" "synmeta" "unknown" "fatty" "other") )) ) (when old-tag (ecase (intern old-tag) ((|synonym| |hack|) '(:manual)) ((|metacyc| |synmeta|) '(:manual :metacyc)) (|class| '(:manual :i-o-c)) (|protein| '(:manual :protein-instance)) ((|unknown| |fatty| |other|) nil) ) ) ) ) #||examples: EC(38): (map-old-style-cpd-analysis-to-new "synonym ECOCYC-ID") (:MANUAL) EC(39): (map-old-style-cpd-analysis-to-new "synmeta METACYC-ID") (:MANUAL :METACYC) EC(40): (map-old-style-cpd-analysis-to-new "unknown") NIL EC(41): ||# ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; The Reactions (defvar *p-rxns-all* nil) (defvar *p-rxns-mapped-to-ecocyc* nil) (defvar *p-rxns-mapped-to-metacyc* nil) (defvar *p-rxns-with-unmapped-cpds* nil) (defvar *p-rnxs-otherwise-unmapped* nil) (defvar *p-rnxs-unmapped-exchange* nil) (defvar *p-rnxs-unmapped-diffusion* nil) ;; ====================================================================== ;; kr:Nov-16-2005 Description : ;; ;; It is assumed that before running this, the hash-table ;; *palsson-cpds-ht* was built by running (populate-palsson-cpds-ht ...) . ;; Arguments : ;; ;; ;; Returns : ;; Side Effects : ;; Update History : (defun map-palsson-gpr-file (palsson-gpr-csv-filename) (let* ((raw-lists (variable-column-table-to-list :filename palsson-gpr-csv-filename :column-sep #\Tab ) ) ) ;; reset: (setf *p-rxns-all* nil) (setf *p-rxns-mapped-to-ecocyc* nil) (setf *p-rxns-mapped-to-metacyc* nil) (setf *p-rxns-with-unmapped-cpds* nil) (setf *p-rnxs-otherwise-unmapped* nil) (setf *p-rnxs-unmapped-exchange* nil) (setf *p-rnxs-unmapped-diffusion* nil) (dolist (row (rest raw-lists)) ;; skip the first row, which names the columns (let* ((abbreviation (first row)) (officialName (second row)) (equation (third row)) ;;(subSystem (fourth row)) ;;(proteinClass (fifth row)) (geneAssociation (sixth row)) ;;(reaction-notes (seventh row)) ;;(model-notes (eighth row)) (symbolic-gene-assoc (when geneAssociation (read-from-string geneAssociation))) ) (multiple-value-bind (palsson-rxn metacyc-required-p) (parse-palsson-rxn-eqn equation abbreviation) ;;kr:Mar-23-2006 Simply record this, for now: (setf (palsson-rxn-officialName palsson-rxn) officialName) (setf (palsson-rxn-geneAssociation palsson-rxn) symbolic-gene-assoc) (setf (palsson-rxn-equation palsson-rxn) equation) ;;kr:Jun-16-2006 Always record this here: (push palsson-rxn *p-rxns-all*) ;;kr:Aug-29-2006 It seems like we have to preempt these special, external exchange rxns, ;; because they in fact may match to ecocyc in some cases, because ;; (find-rxn-by-substrates ...) only needs one side to become active !! (if (palsson-rxn-is-exchange-p palsson-rxn) (progn (push :exchange (palsson-rxn-analysis palsson-rxn)) (push palsson-rxn *p-rnxs-unmapped-exchange*) ) ;;kr:Mar-17-2006 This call now also stores its results directly in the palsson-rxn (let* ((ecocyc-rxns (unless metacyc-required-p (palsson-rxn->ecocyc palsson-rxn))) ) (if ecocyc-rxns (push palsson-rxn *p-rxns-mapped-to-ecocyc*) ;; A match to EcoCyc did not work out. (progn ;; kr:Mar-21-2006 Also try MetaCyc. (with-organism (:org-id 'META) (setf ecocyc-rxns (palsson-rxn->ecocyc palsson-rxn)) ) (if ecocyc-rxns (progn (push :metacyc (palsson-rxn-analysis palsson-rxn)) (push palsson-rxn *p-rxns-mapped-to-metacyc*) ) ;; No match in either EcoCyc or MetaCyc. (if (or (some #'stringp (palsson-rxn-left-cpds palsson-rxn)) (some #'stringp (palsson-rxn-right-cpds palsson-rxn)) ) (progn (push :unmapped-cpds (palsson-rxn-analysis palsson-rxn)) (push palsson-rxn *p-rxns-with-unmapped-cpds*) ) (if (palsson-rxn-is-diffusion-p palsson-rxn) (progn (push :diffusion (palsson-rxn-analysis palsson-rxn)) (push palsson-rxn *p-rnxs-unmapped-diffusion*) ) (push palsson-rxn *p-rnxs-otherwise-unmapped*) ) ) ) ) ) (format t "~S~%" ecocyc-rxns) ;;(format t "~S~%" symbolic-gene-assoc) ;;(REACTIONS-OF-GENE (gene-object bnumber)) ) ) ) ) ) ;;raw-lists (format t "*p-rxns-all* : ~A~%" (length *p-rxns-all*)) (format t "*p-rxns-mapped-to-ecocyc* : ~A~%" (length *p-rxns-mapped-to-ecocyc*)) (format t "*p-rxns-mapped-to-metacyc* : ~A~%" (length *p-rxns-mapped-to-metacyc*)) (format t "*p-rxns-with-unmapped-cpds* : ~A~%" (length *p-rxns-with-unmapped-cpds*)) (format t "*p-rnxs-otherwise-unmapped* : ~A~%" (length *p-rnxs-otherwise-unmapped*)) (format t "*p-rnxs-unmapped-exchange* : ~A~%" (length *p-rnxs-unmapped-exchange*)) (format t "*p-rnxs-unmapped-diffusion* : ~A~%" (length *p-rnxs-unmapped-diffusion*)) ) ) ;; ====================================================================== ;; kr:Nov-14-2005 Description : Takes the raw reaction equation from the Palsson spreadsheet and ;; converts it to a palsson-rxn structure. This involves looking up ;; the Palsson abbreviations for cpds and finding the corresponding ;; EcoCyc or MetaCyc equivalents, using the previously assembled mappings ;; stored in *palsson-cpds-ht* . It also involves parsing stoichiometry ;; coefficients and compartment information. Compartments can have been ;; assigned on a per-compound basis, or as applying to the entire reaction equation. ;; Arguments : rxn-eqn-str : a string ;; abbreviation : a string ;; Returns : 2 values: a palsson-rxn structure, ;; and a boolean flag indicating whether any cpd ;; had to be located in MetaCyc (because it was not present in EcoCyc), ;; which would imply that any rxn would have to be searched in MetaCyc as well. ;; Side Effects : ;; Update History : kr:Mar-22-2006 Added a second returned value, indicating whether the rxn is likely to only ;;; occur in MetaCyc, because at least one of the cpds was only found in MetaCyc. (defun parse-palsson-rxn-eqn (rxn-eqn-str abbreviation) (let* ((parse-index 0) ;; This is a compartment string at the beginning of a rxn, that applies to the entire rxn: (rxn-compartment-str (when (eql #\[ (elt rxn-eqn-str 0)) (setf parse-index 5);;kr:Mar-20-2006 May assume too much about the format... (subseq rxn-eqn-str 1 (position #\] rxn-eqn-str)))) ;;kr:Mar-20-2006 Added this mapping to our CCO, so (find-rxn-by-substrates ...) ;; can make use of compartments. (rxn-cco-compartment (when rxn-compartment-str (palsson-compartment->ecocyc-cco rxn-compartment-str) )) (arrow-end-index (1+ (position #\> rxn-eqn-str))) (irreversible-p (if (equal "-->" (subseq rxn-eqn-str (- arrow-end-index 3) arrow-end-index)) t ;; Otherwise, must be reversible. (unless (equal "<==>" (subseq rxn-eqn-str (- arrow-end-index 4) arrow-end-index)) (warn "arrow of eqn ~S is not recognized." rxn-eqn-str) ) ) ) (left-side-str (subseq rxn-eqn-str parse-index (- arrow-end-index 4))) (right-side-str (subseq rxn-eqn-str arrow-end-index)) (left-side-raw (split-into-tokens "+" left-side-str)) (right-side-raw (split-into-tokens "+" right-side-str)) (palsson-rxn (make-palsson-rxn :irrev-p irreversible-p :abbreviation abbreviation)) metacyc-required-p ) ;;kr:Jul-21-2006 Added the testing for (""), which in some sense is an artifact or bug of (split-into-tokens ...) . ;; The newer iAF1237 model has weird new "sink" rxns that have no right side cpds !!! (when (equal '("") left-side-raw) (setf left-side-raw nil) ) (when (equal '("") right-side-raw) (setf right-side-raw nil) ) (flet ((determine-compartment (cpd-compartment) (if cpd-compartment (progn ;; The overall rxn-compartment better be nil, or we have a conflict. (when rxn-cco-compartment (warn "conflict of rxn-compartment ~S with cpd-compartment ~S , in ~S , ~S" rxn-cco-compartment cpd-compartment rxn-eqn-str abbreviation) ) ;; The cpd-compartment should take priority in any case, presumably: cpd-compartment ) (if rxn-cco-compartment rxn-cco-compartment (warn "(some) compartment info missing altogether in ~S , ~S" rxn-eqn-str abbreviation) ) ) ) ) (dolist (raw-cpd-str left-side-raw) (multiple-value-bind (ecocyc-cpd coeff compartment found-in-metacyc-p) (palsson-cpd->ecocyc raw-cpd-str) (when found-in-metacyc-p (setf metacyc-required-p t)) (push ecocyc-cpd (palsson-rxn-left-cpds palsson-rxn)) ;;kr:Mar-22-2006 Could now be MetaCyc too. (push coeff (palsson-rxn-left-stoichs palsson-rxn)) (push (determine-compartment compartment) (palsson-rxn-left-compartments palsson-rxn)) ) ) (dolist (raw-cpd-str right-side-raw) (multiple-value-bind (ecocyc-cpd coeff compartment found-in-metacyc-p) (palsson-cpd->ecocyc raw-cpd-str) (when found-in-metacyc-p (setf metacyc-required-p t)) (push ecocyc-cpd (palsson-rxn-right-cpds palsson-rxn)) ;;kr:Mar-22-2006 Could now be MetaCyc too. (push coeff (palsson-rxn-right-stoichs palsson-rxn)) (push (determine-compartment compartment) (palsson-rxn-right-compartments palsson-rxn)) ) ) ;; Reverse the orders before returning, so it is easily comparable to the original equation: (setf (palsson-rxn-left-cpds palsson-rxn) (nreverse (palsson-rxn-left-cpds palsson-rxn)) (palsson-rxn-right-cpds palsson-rxn) (nreverse (palsson-rxn-right-cpds palsson-rxn)) (palsson-rxn-left-stoichs palsson-rxn) (nreverse (palsson-rxn-left-stoichs palsson-rxn)) (palsson-rxn-right-stoichs palsson-rxn) (nreverse (palsson-rxn-right-stoichs palsson-rxn)) (palsson-rxn-left-compartments palsson-rxn) (nreverse (palsson-rxn-left-compartments palsson-rxn)) (palsson-rxn-right-compartments palsson-rxn) (nreverse (palsson-rxn-right-compartments palsson-rxn)) ) (values palsson-rxn metacyc-required-p) ) ) ) ;; ====================================================================== ;; kr:Nov-16-2005 Description : This is nowadays just some glue code for calling ;; the fuzzy reaction finder, taking arguments from ;; and storing results back into a palsson-rxn structure. ;; ;; Arguments : palsson-rxn : a palsson-rxn structure ;; ;; Returns : list of ecocyc rxn frames, or nil ;; Side Effects : Stores the results into 2 slots of the palsson-rxn structure. ;; Update History : kr:Mar-17-2006 Replaced the core of this fn with ;;; (fuzzy-find-rxn-by-substrates ...) , which was completely ;;; restructured and generalized. ;;; kr:Mar-20-2006 Added compartment processing, so transport reactions can be found ok. (defun palsson-rxn->ecocyc (palsson-rxn) (let* ((left-cpds (palsson-rxn-left-cpds palsson-rxn)) (right-cpds (palsson-rxn-right-cpds palsson-rxn)) (left-cmptmts (palsson-rxn-left-compartments palsson-rxn)) (right-cmptmts (palsson-rxn-right-compartments palsson-rxn)) ) ;;kr:Aug-27-2006 Added this test, to reduce expensive searching when not all ;; substrates were coerced to frames in the first place. (unless (or (find-if #'stringp left-cpds) (find-if #'stringp right-cpds) ) (multiple-value-bind (ecocyc-rxns successful-strategy) (fuzzy-find-rxn-by-substrates (map 'list #'synthesize-frbs-cpd-spec left-cpds left-cmptmts) (map 'list #'synthesize-frbs-cpd-spec right-cpds right-cmptmts) ) (setf ecocyc-rxns (map 'list #'get-frame-name ecocyc-rxns)) ;; Store the results, so we do not lose them. (setf (palsson-rxn-ecocyc-rxn-ids palsson-rxn) ecocyc-rxns (palsson-rxn-analysis palsson-rxn) successful-strategy) ecocyc-rxns ) ) ) ) ;; ====================================================================== ;; kr:Mar-20-2006 Description : Currently, for (find-rxn-by-substrates ...) to find ;; any transport reactions, correct compartment information ;; has to be supplied, which currently implies that ;; these list-based cpd-specs needs to be constructed. ;; Arguments : cpd : cpd frame ;; cco-compartment : frame from CCO schema ;; ;; Returns : list of 3 items. For a description, see (find-rxn-by-substrates ...) ;; Side Effects : ;; Update History : (defun synthesize-frbs-cpd-spec (cpd cco-compartment) (let* ((frame? (coercible-to-frame-p cpd)) (type (if (and frame? (class-p cpd)) :class :instance)) ) (list type cpd cco-compartment) ) ) ;; ====================================================================== ;; kr:Mar-17-2006 Description : This is a generalization of (find-rxn-by-substrates ...), ;; which tries harder if the initial exact match fails. ;; It tries to modify the left and right cpd lists by ;; applying a set of rules. After each modification, ;; matching is tried again, and once it succeeds, we return. ;; ;; Arguments : left-cpds : list of cpd frames, or cpd-specs (see (find-rxn-by-substrates ...)) ;; right-cpds : list of cpd frames, or cpd-specs ;; ;; Returns : 2 values: a list of rxns, or nil if none found. ;; The second value is the strategy list, prepended with the side ;; (left or right) that ended up succeeding in finding a rxn. ;; Side Effects : ;; Update History : kr:Mar-20-2006 As we now generate cpd-specs to handle the compartments needed for ;;; transport reactions, I have to filter them out again, to retain simple rules. ;;;kr:May-12-2006 Changed to pass through cpd-spec to the lower level without stripping them off. ;;;kr:Aug-25-2006 Split out (fuzzy-f-r-b-s-enumerate-strategies ...) , and added another stage ;;; if the first such call fails. On the second attempt, the explicit list of all reactions is ;;; passed, which is much larger than the set that (find-rxn-by-substrates ...) generates on its ;;; own, by only using what can be reached from the substrate instance frames. This by definition ;;; excludes rxns that may have class substrates that would have subsumed the substrates originally ;;; passed in to this search. (defun fuzzy-find-rxn-by-substrates (left-cpds right-cpds) (let* ((ecocyc-rxns (find-rxn-by-substrates left-cpds right-cpds :exact-substrates? t ) ) ) (if ecocyc-rxns ;; Return the value: ecocyc-rxns ;; If nothing found yet, try some strategies for modifying the cpd lists: (multiple-value-bind (ecocyc-rxns successful-strategies) (fuzzy-f-r-b-s-enumerate-strategies left-cpds right-cpds nil) (if ecocyc-rxns (return-from fuzzy-find-rxn-by-substrates (values ecocyc-rxns successful-strategies)) ;; If still was nothing found, we should become even more aggressive, ;; and try whether rxns with classes might include the set of cpds. (let* ((rxn-set (reduce #'append ;;kr:Aug-27-2006 (nconc ...) caused a lot of trouble !! ;;kr:Aug-27-2006 This should be vastly faster than always trying all rxns, ;; but it still is probably somewhat inefficient, because several ;; rxns may be tried redundantly, as we are just appending this together. (map 'list #'(lambda (cpd-spec) (reactions-of-compound (second cpd-spec) :non-specific-too? t) ) (append left-cpds right-cpds) ) ) )) (multiple-value-bind (ecocyc-rxns successful-strategies) (fuzzy-f-r-b-s-enumerate-strategies left-cpds right-cpds ;;kr:Aug-25-2006 This does seem to catch higher-level ;; rxns with class substrate frames, ;; but it obviously is very expensive. ;;(get-class-all-instances '|Reactions|) ;;kr:Aug-27-2006 More economical: rxn-set ) (when ecocyc-rxns (return-from fuzzy-find-rxn-by-substrates (values (remove-duplicates ecocyc-rxns :test #'fequal) (push :expanded-rxn-match successful-strategies))) ) ) ) ) ) ) ) ) #||;;;kr:Mar-17-2006 Example invocation: EC(32): (fuzzy-find-rxn-by-substrates '(ATP D-XYLULOSE) '(ADP PROTON XYLULOSE-5-PHOSPHATE)) 0[1]: (FIND-RXN-BY-SUBSTRATES (ATP D-XYLULOSE) (ADP PROTON XYLULOSE-5-PHOSPHATE) :EXACT-SUBSTRATES? T) 0[1]: returned NIL 0[1]: (FIND-RXN-BY-SUBSTRATES (ATP D-XYLULOSE) (ADP XYLULOSE-5-PHOSPHATE) :EXACT-SUBSTRATES? T) 0[1]: returned (XYLULOKIN-RXN) (XYLULOKIN-RXN) (RIGHT REMOVE PROTON) EC(33): EC(33): (fuzzy-find-rxn-by-substrates '(ADP PROTON XYLULOSE-5-PHOSPHATE) '(ATP D-XYLULOSE)) (XYLULOKIN-RXN) (LEFT REMOVE PROTON) EC(34): EC(37): (fuzzy-find-rxn-by-substrates '(D-SERINE) '(PYRUVATE AMMONIUM)) (DSERDEAM-RXN) (RIGHT SUBSTITUTE AMMONIA AMMONIUM) EC(38): ;;;kr:Mar-21-2006 After reworking to apply strategies to both sides simultaneously: EC(67): (fuzzy-find-rxn-by-substrates '(PROTON WATER CPD-421) '(CARBON-DIOXIDE AMMONIUM N2-SUCCINYLORNITHINE)) (SUCCARGDIHYDRO-RXN) ((LEFT REMOVE PROTON) (RIGHT SUBSTITUTE AMMONIA AMMONIUM)) EC(68): ||# ;; ====================================================================== ;; kr:Aug-25-2006 Description : Try out combinations of modification rules, ;; to see whether we can pick a rxn. ;; ;; Arguments : left-cpds : list of cpd frames, or cpd-specs (see (find-rxn-by-substrates ...)) ;; right-cpds : list of cpd frames, or cpd-specs ;; rxn-list : NIL , or the list of rxns to try. ;; ;; Returns : 2 values: a list of rxns, or nil if none found. ;; The second value is the strategy list, prepended with the side ;; (left or right) that ended up succeeding in finding a rxn. ;; Side Effects : ;; Update History : Spun out from (fuzzy-find-rxn-by-substrates ...). ;;;kr:Mar-21-2006 Reworked to apply strategies to both sides simultaneously. ;;;kr:Aug-25-2006 Added the rxn-list argument. (defun fuzzy-f-r-b-s-enumerate-strategies (left-cpds right-cpds rxn-list) ;; For a description of the strategy list format, please see the comment ;; of (find-rxn-by-substrates-w-more-fuzziness ...). (let* ((strategy-list '( (noop);;kr:Mar-21-2006 Needed for full combinatorial exploration. (remove PROTON) (substitute AMMONIA AMMONIUM) ) ) ) ;; Enumerate every combination of rules, because there seem to be rxns where a proton needs ;; to be removed from one side, and the ammonium substitution needs to happen on the other side, ;; simultaneously. ;;kr:Mar-21-2006 Currently, rules can only be simultaneously applied to the opposite sides, ;; not to the same side in succession. Let's hope this will not become necessary too... (loop for left-strategy in strategy-list do (loop for right-strategy in strategy-list do (multiple-value-bind (ecocyc-rxns successful-strategies) (find-rxn-by-substrates-w-fuzziness-strategies left-strategy right-strategy left-cpds right-cpds rxn-list) (when ecocyc-rxns (return-from fuzzy-f-r-b-s-enumerate-strategies (values ecocyc-rxns successful-strategies)) ) ) ) ) ) ) ;; ====================================================================== ;; kr:Mar-17-2006 Description : Given simple strategy descriptions for the left and right sides, ;; this function applies the modifications described by ;; the strategies to the left and right cpd lists, ;; to see whether that will pull out rxn hits after all. ;; If the modification strategies were successful, they will be ;; returned as the second value (augmented with the side ;; that led to succeess), so we can log this fact. ;; ;; Currently, this probably conses a lot... ;; ;; Arguments : left-strategy : a list describing what should be tried. ;; The first item needs to be a valid Common LISP function, ;; and the second and optional third list items should be ;; the first arguments to this function. ;; To speed things up, the strategy will only be run if the ;; last list item can be found in one of the cpd lists. ;; The functions should be non-destructive, because ;; otherwise, they will likely mess up some data structures. ;; Examples of valid strategy arguments are: ;; (remove PROTON) ;; (substitute AMMONIUM AMMONIA) ;; right-strategy : same description as left-strategy ;; left-cpds : list of cpd frames, or cpd-specs (see (find-rxn-by-substrates ...)) ;; right-cpds : list of cpd frames, or cpd-specs ;; rxn-list : NIL , or the list of rxns to try. ;; Returns : 2 values: a list of rxns, or nil if none found. ;; The second value is a list of strategies that led to success in finding a rxn. ;; Each strategy is prepended with the side (left or right) that it was applied to. ;; Side Effects : ;; Update History : kr:Mar-21-2006 Renamed this, and totally reworked, adding the other strategy argument as well. ;;; So now, a left-strategy is applied to the left side, and a right-strategy to the right side. ;;;kr:Aug-25-2006 Added the rxn-list argument. (defun find-rxn-by-substrates-w-fuzziness-strategies (left-strategy right-strategy left-cpds right-cpds rxn-list) (let* ((modified-left-cpds (modify-cpd-list-by-strategy left-cpds left-strategy)) (modified-right-cpds (modify-cpd-list-by-strategy right-cpds right-strategy)) ;; After strategies have been appled to both sides, search for rxns again: (rxns (find-rxn-by-substrates modified-left-cpds modified-right-cpds :exact-substrates? t ;;kr:Aug-25-2006 Now passing this argument :rxn-list rxn-list)) ) (when rxns (values rxns (append (unless (eql left-cpds modified-left-cpds) (list (cons 'LEFT left-strategy)) ) (unless (eql right-cpds modified-right-cpds) (list (cons 'RIGHT right-strategy)) ) ) ) ) ) ) #||;;;kr:Mar-21-2006 Example invocation: EC(63): (find-rxn-by-substrates-w-fuzziness-strategies '(noop) '(remove PROTON) '(ATP D-XYLULOSE) '(ADP PROTON XYLULOSE-5-PHOSPHATE)) (XYLULOKIN-RXN) ((RIGHT REMOVE PROTON)) EC(64): ||# ;; ====================================================================== ;; kr:May-12-2006 Description : Split this out from (find-rxn-by-substrates-w-fuzziness-strategies ...) ;; to avoid code duplication. ;; This allowed me to add much more complex processing, to take into account ;; cpd-specs, and apply strategies to the specs as well. ;; A key complication is that the (SUBSTITUTE ...) strategy will have ;; to substitute a frame inside a cpd-spec. ;; ;; Arguments : cpd-list : list of cpd frames, or cpd-specs (see (find-rxn-by-substrates ...)) ;; strategy : list, representing the strategy to be applied. For more about them, ;; see (find-rxn-by-substrates-w-fuzziness-strategies ...) ;; ;; Returns : list of cpd frames, or cpd-specs ;; Side Effects : ;; Update History : (defun modify-cpd-list-by-strategy (cpd-list strategy) (let* ((strategy-fn (first strategy)) (strategy-args (rest strategy)) ;; This assumes that the cpd to be affected will be the last item of the strategy. (strategy-target-cpd (first (last strategy-args))) (found-target-cpd (find strategy-target-cpd cpd-list :test #'(lambda (strategy-target cpd-list-item) (let* ((cpd-of-spec ;;kr:May-12-2006 If this is a cpd-spec, check the frame inside. (if (consp cpd-list-item) (second cpd-list-item) ;; the cpd frame id cpd-list-item))) (fequal strategy-target cpd-of-spec) )))) (modified-cpds (if (and (not (eql strategy-fn 'noop)) found-target-cpd) (apply strategy-fn (append (if (consp found-target-cpd) ;;kr:May-12-2006 If we are dealing with a cpd-spec, ;; we need to change the strategy-args to versions with ;; cpd-specs too. These are faked to look like the spec ;; in found-target-cpd, just so that the matching done by ;; the strategy itself will work correctly. (map 'list #'(lambda (strategy-arg) ;;kr:May-12-2006 This effectively fakes a replacement ;; spec , using the strategy-arg (substitute strategy-arg (second found-target-cpd) ;; Use the framework of the spec found-target-cpd :test #'fequal) ) strategy-args) ;; If the target was not a spec, we don't need this ;; fancy pre-processing. strategy-args) (list cpd-list :test #'fequal))) cpd-list)) ) modified-cpds ) ) #||;;;kr:May-12-2006 Example invocations: EC(42): (modify-cpd-list-by-strategy '((:INSTANCE ADP CCO-CYTOSOL) (:INSTANCE PROTON CCO-CYTOSOL) (:INSTANCE ILE CCO-CYTOSOL) (:INSTANCE |Pi| CCO-CYTOSOL)) '(REMOVE PROTON)) 0[1]: (SUBSTITUTE PROTON PROTON (:INSTANCE PROTON CCO-CYTOSOL) :TEST #) 0[1]: returned (:INSTANCE PROTON CCO-CYTOSOL) ((:INSTANCE ADP CCO-CYTOSOL) (:INSTANCE ILE CCO-CYTOSOL) (:INSTANCE |Pi| CCO-CYTOSOL)) EC(43): EC(44): (modify-cpd-list-by-strategy '((:INSTANCE ADP CCO-CYTOSOL) (:INSTANCE PROTON CCO-CYTOSOL) (:INSTANCE ILE CCO-CYTOSOL) (:INSTANCE |Pi| CCO-CYTOSOL)) '(SUBSTITUTE AMMONIUM PROTON)) 0[1]: (SUBSTITUTE AMMONIUM PROTON (:INSTANCE PROTON CCO-CYTOSOL) :TEST #) 0[1]: returned (:INSTANCE AMMONIUM CCO-CYTOSOL) 0[1]: (SUBSTITUTE PROTON PROTON (:INSTANCE PROTON CCO-CYTOSOL) :TEST #) 0[1]: returned (:INSTANCE PROTON CCO-CYTOSOL) 0[1]: (SUBSTITUTE (:INSTANCE AMMONIUM CCO-CYTOSOL) (:INSTANCE PROTON CCO-CYTOSOL) ((:INSTANCE ADP CCO-CYTOSOL) (:INSTANCE PROTON CCO-CYTOSOL) (:INSTANCE ILE CCO-CYTOSOL) (:INSTANCE |Pi| CCO-CYTOSOL)) :TEST #) 0[1]: returned ((:INSTANCE ADP CCO-CYTOSOL) (:INSTANCE AMMONIUM CCO-CYTOSOL) (:INSTANCE ILE CCO-CYTOSOL) (:INSTANCE |Pi| CCO-CYTOSOL)) ((:INSTANCE ADP CCO-CYTOSOL) (:INSTANCE AMMONIUM CCO-CYTOSOL) (:INSTANCE ILE CCO-CYTOSOL) (:INSTANCE |Pi| CCO-CYTOSOL)) EC(45): ||# ;; ====================================================================== ;; kr:Nov-15-2005 Description : Performs the parsing of a compound entry from the ;; reaction equation from Palsson's spreadsheet. ;; Tries to extract the stoichiometry coefficient, ;; the actual compound (returned as an ecocyc frame), ;; and the compartment. ;; The example argument "(2) h2o[p]" would mean 2 water molecules ;; in the periplasmic space. ;; Arguments : raw-cpd-str : a string describing a compound in Palsson's spreadsheet ;; ;; Returns : 4 values: ecocyc-cpd : the cpd frame, or the palsson cpd string if the lookup failed ;; coeff : an integer, default 1 ;; compartment: a CCO frame ID ;; found-in-metacyc-p : a Boolean, which is t if ecocyc-cpd is really a frame ID in metacyc. ;; Side Effects : ;; Update History : kr:Mar-22-2006 Added a 4th returned value, identifying those cpds that could be mapped to MetaCyc, ;;; after EcoCyc failed. (defun palsson-cpd->ecocyc (raw-cpd-str) (let* ((trimmed-str (trim-spaces raw-cpd-str)) (trimmed-str-len (length trimmed-str)) (index-after-name trimmed-str-len) (compartment (when (eql #\] (elt trimmed-str (1- trimmed-str-len))) (setf index-after-name (position #\[ trimmed-str :from-end t)) (subseq trimmed-str (1+ index-after-name) (1- trimmed-str-len)) ) ) ;;kr:Mar-20-2006 Added this mapping to our CCO, so (find-rxn-by-substrates ...) ;; can make use of compartments. (cco-compartment (when compartment (palsson-compartment->ecocyc-cco compartment) )) found-in-metacyc-p ) ;; A cpd string with coefficient looks like this: "(2) h2o". ;; This tries to read in that coefficient list, if there is one. (multiple-value-bind (coeff-list index-after-coeff) (if (eql #\( (elt trimmed-str 0)) (read-from-string trimmed-str) (values nil 0) ) (let* ((coeff (if coeff-list (if (and (= 1 (length coeff-list)) (integerp (first coeff-list))) (first coeff-list) (warn "In cpd ~S , this did not appear to be a real coefficient: ~S" raw-cpd-str coeff-list) ) ;; the default coefficient 1) ) (looked-up-palsson-cpd (gethash (subseq trimmed-str index-after-coeff index-after-name) *palsson-cpds-ht*)) (ecocyc-cpd (if (and looked-up-palsson-cpd (palsson-cpd-ecocyc-id looked-up-palsson-cpd)) (palsson-cpd-ecocyc-id looked-up-palsson-cpd) ;;kr:Mar-22-2006 Try to see whether that cpd would be in MetaCyc. (if (and looked-up-palsson-cpd (palsson-cpd-metacyc-id looked-up-palsson-cpd)) (progn (setf found-in-metacyc-p t) (palsson-cpd-metacyc-id looked-up-palsson-cpd) ) (progn (warn "palsson cpd ~S could not be mapped to either ecocyc or metacyc." trimmed-str) trimmed-str) ))) ) (values ecocyc-cpd coeff cco-compartment found-in-metacyc-p) ) ) ) ) #||;;;kr:May-12-2006 Example invocation: EC(18): (palsson-cpd->ecocyc "(2) h2o[p]") WATER 2 CCO-PERI-BAC NIL EC(19): ||# ;; ====================================================================== ;; kr:Mar-20-2006 Description : Maps the 1-letter compartment in the palsson model ;; to a CCO frame. ;; ;; Arguments : palsson-compartment-str : a 1-letter string: "c" "p" or "e" ;; ;; Returns : a CCO frame ID ;; Side Effects : ;; Update History : (defun palsson-compartment->ecocyc-cco (palsson-compartment-str) (ecase (intern palsson-compartment-str (find-package :ecocyc)) (|c| (get-frame-name (default-compartment))) (|p| 'CCO-PERI-BAC) (|e| 'CCO-EXTRACELLULAR) ) ) ;; ====================================================================== ;; kr:Aug-29-2006 Description : Returns non-nil if the palsson-rxn is a simple exchange reaction, ;; between the [e] compartment and the world outside the model. ;; Their abbreviation usually should start with "EX_". ;; ;; Arguments : palsson-rxn : a palsson-rxn structure ;; ;; Returns : Boolean ;; Side Effects : ;; Update History : kr:Aug-24-2006 Tightened the definition, so that it has to affect only 1 cpd. (defun palsson-rxn-is-exchange-p (palsson-rxn) ;;kr:Aug-24-2006 Probably, the simple exchange rxns only transfer 1 cpd: (and (= 1 (length (palsson-rxn-left-cpds palsson-rxn))) ;;kr:Aug-29-2006 They don't seem to have anything on the right side ! (= 0 (length (palsson-rxn-right-cpds palsson-rxn))) ) ) ;; ====================================================================== ;; kr:Mar-14-2006 Description : Returns non-nil if the palsson-rxn is a simple diffusion reaction, ;; between 2 compartments. Their abbreviation usually should end in "ex". ;; ;; Arguments : palsson-rxn : a palsson-rxn structure ;; ;; Returns : Boolean ;; Side Effects : ;; Update History : kr:Aug-24-2006 Tightened the definition, so that it has to affect only 1 cpd. ;;; This change recovered 64 transporters from a pool of 347 presumed simple diffusion rxns !! ;;;kr:Aug-29-2006 Changed name to "diffusion", according to the correction by Adam feist. ;;; (defun palsson-rxn-is-diffusion-p (palsson-rxn) ;;kr:Aug-24-2006 Probably, the simple diffusion rxns only transfer 1 cpd: (and (= 1 (length (palsson-rxn-left-cpds palsson-rxn))) (set-fequal (palsson-rxn-left-cpds palsson-rxn) (palsson-rxn-right-cpds palsson-rxn)) ) ) ;; ====================================================================== ;; kr:Aug-24-2006 Description : Just for filtering out a palsson-rxn that has any ;; non-default compartments, which will likely mean ;; that it is a transport reaction. ;; Arguments : palsson-rxn : a palsson-rxn structure ;; ;; Returns : Boolean ;; Side Effects : ;; Update History : (defun palsson-rxn-has-non-default-compartments-p (palsson-rxn) (let* ((left-cpmts (palsson-rxn-left-compartments palsson-rxn)) (right-cpmts (palsson-rxn-right-compartments palsson-rxn)) (default-cpmt (get-frame-name (default-compartment))) ) (or (notevery #'(lambda (cpmt) (eql cpmt default-cpmt) ) left-cpmts) (notevery #'(lambda (cpmt) (eql cpmt default-cpmt) ) right-cpmts) ) ) ) ;; ====================================================================== ;; kr:Aug-27-2006 Description : ;; ;; ;; Arguments : palsson-rxn-list : a list of palsson-rxn structures ;; palsson-rxn-csv-filename : a full filename with path ;; ;; Returns : ;; Side Effects : ;; Update History : kr:Jan-15-2006 Suppressed unnecessary "NIL"s being written out. (defun dump-palsson-rxns-to-tab-delimited-file (palsson-rxn-list palsson-rxn-csv-filename) ;;kr:Aug-27-2006 if this idiotic variable is not set to nil, the output will ;; contain additional, spurious linefeeds !!! (let* ((*PRINT-PRETTY* nil)) (with-open-file-noerror (file-stream palsson-rxn-csv-filename :direction :output :if-exists :supersede :if-does-not-exist :create) ;; Write the header line, containing the names of the columns: (write-tab-delimited-line file-stream (list "abbreviation" "officialName" "equation" ;; our extra columns "ecocyc-rxn-ids" "analysis")) (dolist (palsson-rxn palsson-rxn-list) (write-tab-delimited-line file-stream (list (palsson-rxn-abbreviation palsson-rxn) (palsson-rxn-officialName palsson-rxn) (palsson-rxn-equation palsson-rxn) (if (palsson-rxn-ecocyc-rxn-ids palsson-rxn) ;;kr:Jan-15-2006 Make sure to capture vertical bars of symbols. (format nil "~S" (palsson-rxn-ecocyc-rxn-ids palsson-rxn)) "") (if (palsson-rxn-analysis palsson-rxn) ;;kr:Aug-23-2006 For new-style tags: (format nil "~S" (palsson-rxn-analysis palsson-rxn)) "") ) ) ) ) ) ) ;; ====================================================================== ;; kr:Nov-18-2005 Description : Simplistic helper fn, useful for interactively trying ;; to locate frames for cpd substrings. ;; ;; Arguments : ;; ;; ;; Returns : ;; Side Effects : ;; Update History : (defun pfc (search-substring) (map 'list #'(lambda (orgid) (format *terminal-io* "~A ~A : ~A~%" search-substring orgid (multiple-value-bind (main others) (find-indexed-frame search-substring '|Compounds| :kb (kb-of-organism orgid)) (declare (ignore others)) (when main (get-frame-name main :kb (kb-of-organism orgid)) ) ) ) ) '(ECOLI META) ) (values) ) #||;;; example invocation: EC(12): (pfc "pimeloyl-CoA") pimeloyl-CoA ECOLI : 6-CARBOXYHEXANOYL-COA pimeloyl-CoA META : 6-CARBOXYHEXANOYL-COA EC(13): ||# ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; Below are some utilities for dumping out a report of (unmatched) rxns, ;;; that have corresponding genes paired up in clusters, to simplify visual examination. ;;;kr:Mar-28-2006 (defun dump-gene-rxn-report (palsson-rxn-list filename) (with-open-file (out-stream filename :direction :output :if-exists :supersede) (print-gene-rxn-report palsson-rxn-list out-stream) ) ) ;; ====================================================================== ;; kr:Mar-24-2006 Description : ;; ;; ;; Arguments : ;; ;; ;; Returns : ;; Side Effects : ;; Update History : (defun print-gene-rxn-report (palsson-rxn-list stream) (loop for palsson-rxn in palsson-rxn-list for pal-rxn-num from 0 do (terpri stream) (let* ((b-genes (get-all-b-genes-in-geneAssociation (palsson-rxn-geneAssociation palsson-rxn))) (equation (palsson-rxn-equation palsson-rxn)) (abbreviation (palsson-rxn-abbreviation palsson-rxn)) ) (unless b-genes (setf b-genes (list "no-b-gene"))) (loop for b-gene in b-genes for ecocyc-gene-frame = (gene-object b-gene) do ;; palsson info first: (format stream "~A~A~A~A~A~A~A~A~A~%" pal-rxn-num #\Tab b-gene #\Tab "" ;; we don't have a pretty name for these genes #\Tab equation #\Tab abbreviation ) ;; followed by ecocyc info (when (coercible-to-frame-p ecocyc-gene-frame) (let* ((ecocyc-gene-frame-id (get-frame-name ecocyc-gene-frame)) (ecocyc-rxns (reactions-of-gene ecocyc-gene-frame-id)) ) (loop for ecocyc-rxn in ecocyc-rxns do (format stream "~A~A~A~A~A~A~A~A~A~%" pal-rxn-num #\Tab ecocyc-gene-frame-id #\Tab (get-name-string ecocyc-gene-frame-id) #\Tab (strip-html-tags (get-name-string ecocyc-rxn)) #\Tab (get-frame-name ecocyc-rxn) ) ) ) ) ) ) ) ) ;; ====================================================================== ;; kr:Mar-24-2006 Description : ;; ;; ;; Arguments : geneAssociation : list of symbols, using b-numbers for the genes ;; ;; ;; Returns : list of symbols (just the b-numbers) ;; Side Effects : ;; Update History : (defun get-all-b-genes-in-geneAssociation (geneAssociation) (delete 'AND (delete 'OR (flatten (listify geneAssociation)))) ) #||;;;kr:Mar-24-2006 Example invocation: EC(19): (get-all-b-genes-in-geneAssociation '(B4238 OR (B3924 AND B4238 AND B4237))) (B4238 B3924 B4238 B4237) EC(20): ||# ;; ====================================================================== ;; kr:Mar-24-2006 Description : ;; ;; ;; Arguments : geneAssociation : list of symbols, using b-numbers for the genes ;; ;; ;; Returns : list of ecocyc rxn frame IDs ;; Side Effects : ;; Update History : (defun get-all-ecocyc-genes-in-geneAssociation (geneAssociation) (map 'list #'get-frame-name (map 'list #'gene-object (delete 'AND (delete 'OR (flatten geneAssociation))))) ) #||;;;kr:Mar-24-2006 Example invocation: EC(17): (get-all-ecocyc-genes-in-geneAssociation '(B4238 OR (B3924 AND B4238 AND B4237))) (EG11417 EG10628 EG11417 G812) EC(18): ||#