How to

How to#

In this notebook, we will keep track of handy implementations for metabolic modeling related tasks.

Linear Programming software#

Get CPLEX#

When a software says it has CPLEX as a prerequisite, they refer to the ILOG CPLEX Optimization Studio

Analytical decision support toolkit for rapid development and deployment of optimization models using mathematical and constraint programming. It combines an integrated development environment with the powerful Optimization Programming Language and high-performance CPLEX and CP Optimizer solvers.

https://hpc-community.unige.ch/t/guide-for-installing-ibm-cplex/2104

CPLEX has an academic license and once you get this, you can then set Python and other interfaces to use it. However, this process can be rather challenging and more often than not it’s far from straightforward.

In this link I found some good enough notes from 2022.

https://academic.ibm.com/a2mt/email-auth#/

Media and environment setup#

Extracellular reactions#

COMETS includes the capability to simulate reactions happening in the extracellular environment, without association to a specific organism. Users can implement either elementary reactions of arbitrary order based on mass-action kinetics, or enzyme-catalyzed reactions obeying Michaelis–Menten kinetics, e.g., for the simulation of extracellular enzymes.

Get complete medium for ModelSEED#

With the term complete medium* we describe an in silico object where any compound that could be used as a nutrient, it is available for the model.

To build this object for the case of ModelSEED, we need to first get all the possible compounds. And we can do this by first, getting locally the ModelSEEDDatabase repo.

Then we can explore the Biochemistry folder of that to retrieve all possible nutrients that could be imported in our model.

From the Biochemistry folder of the dev branch of the ModelSEEDDatabase repository, run:

!awk -F"\t" '$6 != 1 && $18==0  {print $5}' reaction_*.tsv   > TRANSPORT_REACTIONS.tsv

awk: fatal: cannot open file `reaction_*.tsv' for reading: No such file or directory

Now, with something like the following Python chunk, you can build the complete medium and export in a .csv file that with the applied format, could be used for gapfilling with the fill command of the gapseq tool.

def write_to_gapseq_format(all_compounds, cpd2name, output_file):
    """
    Write a 3-col csv file with the compound id, its name and a boundary flux of 1000
    """
    with open(output_file, "w") as f:
        counter = 0
        for compound in all_compounds:
            if compound in cpd2name:
                counter += 1
                f.write(f"{compound}\t{cpd2name[compound]}\t1000\n")
            else:
                print(f"Compound {compound} not found in cpd2name dictionary")

    print(f"Total compounds written: {counter}")


def process_transport_reactions(input_file, output_file=None):
    """
    Parse the TRANSPORT_REACTIONS.tsv file to export compounds that should be part of the complete medium.
    """
    with open(input_file) as f:
        lines = f.readlines()

    ex = [line.strip() for line in lines if len(line.split(";")) == 2]

    cpd2name = {}
    all_compounds = set()

    for reaction in ex:
        compounds = reaction.split(";")
        c1 = compounds[0].split(":")[1]
        c2 = compounds[1].split(":")[1]
        
        if c1 == c2:
            name = compounds[0].split(":")[-1]
            all_compounds.add(c2)
            if c2 not in cpd2name:
                cpd2name[c2] = name

    if output_file is not None:
        write_to_gapseq_format(all_compounds, cpd2name, output_file)

# Main execution
if __name__ == "__main__":
    process_transport_reactions("TRANSPORT_REACTIONS.tsv", "complete_modelseed_medium.csv")

Total compounds written: 0