Skip to content

Gap-filling

MATLAB functions in RAVEN/gapfilling of the RAVEN toolbox. Help text is collected from the source of the tracked branch.

Functions

Function Summary
canConsume Check which metabolites can be consumed by a model.
canProduce Check which metabolites can be produced from a model.
checkProduction Check which metabolites can be produced from a model.
checkRxn Check which reactants can be synthesized and products consumed.
consumeSomething Consume any metabolite using as few reactions as possible.
fillGaps Use template model(s) to fill gaps in a model.
findLeakMetabolite Find a metabolite that can be freely made or consumed.
fitTasks Fill gaps in a model so it can perform a list of tasks.
gapFillFastCore L1-norm LP subroutine for fastGapFill.
gapFillFastLP LP-based gap-filling (fastGapFill / swiftGapFill).
gapFillMILP Objective-based gap-filling via a global growth-floor MILP.
gapFillSwiftCore SWIFTCORE LP subroutine for swiftGapFill.
gapFillTopological BFS metabolite-producibility pre-screening.
gapReport
makeSomething Excrete any metabolite using as few reactions as possible.

Reference

canConsume

Check which metabolites can be consumed by a model.

Checks which metabolites can be consumed by a model using the specified constraints.

Input arguments:

Name Type Description Default
model struct

a model structure.

required

Name-value arguments:

Name Type Description Default
mets cell or logical or double

either a cell array of metabolite IDs, a logical vector with the same number of elements as metabolites in the model, or a vector of indexes to check for (default model.mets).

Output arguments:

Name Type Description
consumed logical

vector with true if the corresponding metabolite could be produced.

Examples:

consumed = canConsume(model, mets);

canProduce

Check which metabolites can be produced from a model.

Checks which metabolites can be produced from a model using the specified constraints. This is a less advanced but faster version of checkProduction.

Input arguments:

Name Type Description Default
model struct

a model structure.

required

Name-value arguments:

Name Type Description Default
mets cell or logical or double

either a cell array of metabolite IDs, a logical vector with the same number of elements as metabolites in the model, or a vector of indexes to check for (default model.mets).

Output arguments:

Name Type Description
produced logical

vector with true if the corresponding metabolite could be produced.

Examples:

produced = canProduce(model, mets);
See also

checkProduction

checkProduction

Check which metabolites can be produced from a model.

Checks which metabolites that can be produced from a model using the specified constraints.

The function is intended to be used to identify which metabolites must be connected in order to have a fully connected network. It does so by first identifying which metabolites could have a net production in the network. Then it calculates which other metabolites must be able to have net production in order to have production of all metabolites in the network. So, if a network contains the equations A[external]->B, C->D, and D->E it will identify that production of C will connect the metabolites D and E.

Input arguments:

Name Type Description Default
model struct

a model structure.

required

Name-value arguments:

Name Type Description Default
checkNeededForProduction logical

for each of the metabolites that could not be produced, include an artificial production reaction and calculate which new metabolites that could be produced as an effect of this (default false).

excretionFromCompartments cell

cell array with compartment ids from which metabolites can be excreted (default model.comps).

printDetails logical

print details to the screen (default true).

Output arguments:

Name Type Description
notProduced double

indices of metabolites that could not be produced.

notProducedNames cell

cell array with names and compartments for metabolites that could not be produced.

neededForProductionMat logical

matrix where n x m is true if metabolite n allows for production of metabolite m.

minToConnect cell

the minimal number of metabolites that need to be connected in order to be able to produce all other metabolites, and which metabolites each of them connects.

model struct

updated model structure with excretion reactions added.

Examples:

[notProduced, notProducedNames, neededForProductionMat, minToConnect, model] = ...
    checkProduction(model, checkNeededForProduction, excretionFromCompartments, printDetails);

checkRxn

Check which reactants can be synthesized and products consumed.

Checks which reactants in a reaction that can be synthesized and which products that can be consumed. This is primarily for debugging reactions which cannot have flux.

Input arguments:

Name Type Description Default
model struct

a model structure.

required
rxn char

the id of one reaction to check.

required

Name-value arguments:

Name Type Description Default
cutoff double

minimal flux for successful production/consumption (default 10^-7).

revDir logical

true if the reaction should be reversed (default false).

printReport logical

print a report (default true).

Output arguments:

Name Type Description
report struct

report with fields:

  • reactants : array with reactant indexes
  • canMake : boolean array, true if the corresponding reactant can be synthesized by the rest of the metabolic network
  • products : array with product indexes
  • canConsume : boolean array, true if the corresponding product can be consumed by the rest of the metabolic network

Examples:

report = checkRxn(model, rxn, cutoff, revDir, printReport);

consumeSomething

Consume any metabolite using as few reactions as possible.

Wrapper for findLeakMetabolite(model,'consume',...), retained for backward compatibility. See findLeakMetabolite for full documentation.

See Also

findLeakMetabolite

fillGaps

Use template model(s) to fill gaps in a model.

This method works by merging the model with the reference model(s) and checking which reactions can carry flux. All reactions that cannot carry flux are removed (cannotConnect). If useModelConstraints is false it then solves the MILP problem of minimizing the number of active reactions from the reference models that are required to have flux in all the reactions in model. This requires that the input model has exchange reactions present for the nutrients that are needed for its metabolism. If useModelConstraints is true then the problem is to include as few reactions as possible from the reference models in order to satisfy the model constraints.

The intended use is that the user can attempt a general gap-filling using useModelConstraints=false, or a more targeted gap-filling by setting constraints in the model structure and then using useModelConstraints=true. For example, to include reactions so that all biomass components can be synthesized, the user could define a biomass equation and set its lower bound to >0. Running this function with useModelConstraints=true would then give the smallest set of reactions that have to be included in order for the model to produce biomass.

Input arguments:

Name Type Description Default
model struct

a model structure that may contain gaps to be filled.

required
models cell or struct

a cell array of reference models or a model structure. The gaps will be filled using reactions from these models.

required

Name-value arguments:

Name Type Description Default
allowNetProduction logical

true if net production of all metabolites is allowed. A reaction can be unable to carry flux because one of the reactants is unavailable or because one of the products cannot be further processed. If true, only the first type of unconnectivity is considered (default false).

useModelConstraints logical

true if the constraints specified in the model structure should be used. If false then reactions are included from the template model(s) so that as many reactions as possible in model can carry flux (default false).

supressWarnings logical

false if warnings should be displayed (default false).

rxnScores double or cell

array with scores for each of the reactions in the reference model(s). If more than one model is supplied, then rxnScores should be a cell array of vectors. The solver will try to maximize the sum of the scores for the included reactions (default is -1 for all reactions).

algorithm char

gap-filling algorithm to use (default 'reference'):

  • 'reference' : existing MILP-based connectivity maximisation
  • 'fastLP' : L1-norm LP relaxation via FASTCORE (gapFillFastLP)
  • 'swiftLP' : SWIFTCORE single-LP variant (gapFillFastLP)
  • 'gapfillMILP' : growth-floor MILP with directionality repair (gapFillMILP). Uses only models{1} as the database.
  • 'topological' : BFS metabolite producibility pre-screening (gapFillTopological); no model modification.
epsilon double

minimum flux threshold used by fastLP and swiftLP (default 1e-4).

minGrowth double

minimum growth rate for gapfillMILP (default [] = 10% of max FBA).

verbose logical

print progress messages for new algorithms (default true).

Output arguments:

Name Type Description
newConnected cell

cell array with the reactions that could be connected. This is not calculated if useModelConstraints is true.

cannotConnect cell

cell array with reactions that could not be connected. This is not calculated if useModelConstraints is true.

addedRxns cell

cell array with the reactions that were added from "models".

newModel struct

the model with reactions added to fill gaps.

exitFlag double

exit status:

  • 1 : optimal solution found
  • -1 : no feasible solution found
  • -2 : optimization time out

Examples:

[newConnected, cannotConnect, addedRxns, newModel, exitFlag]=...
    fillGaps(model,models,allowNetProduction,useModelConstraints,...
    supressWarnings,rxnScores,params);

findLeakMetabolite

Find a metabolite that can be freely made or consumed.

Tests whether the model can produce (excrete) or consume (take up) any metabolite without balanced stoichiometric constraints — a sign that the model contains a stoichiometric leak, futile cycle, or unconstrained exchange. This is the unified replacement for makeSomething ('produce') and consumeSomething ('consume').

Input arguments:

Name Type Description Default
model struct

a model structure.

required
direction char

'produce' to look for freely excreted metabolites, or 'consume' to look for freely consumed metabolites.

required

Name-value arguments:

Name Type Description Default
ignoreMets cell or logical or double

either a cell array of metabolite IDs, a logical vector with the same number of elements as metabolites in the model, or a vector of indexes for metabolites to exclude from this analysis (default []).

isNames logical

true if the supplied mets represent metabolite names (as opposed to IDs). This is a way to delete metabolites in several compartments at once without knowing the exact IDs. This only works if ignoreMets is a cell array (default false).

minNrFluxes logical

solves the MILP problem of minimizing the number of fluxes instead of the sum. Slower, but can be used if the sum gives too many fluxes (default false).

allowExcretion logical

allow for excretion of all metabolites. Only used when direction is 'produce' (default true).

params struct

obsolete option.

ignoreIntBounds logical

true if internal bounds (including reversibility) should be ignored. Exchange reactions are not affected. This can be used to find unbalanced solutions which are not possible using the default constraints (default false).

Output arguments:

Name Type Description
solution double

flux vector for the solution.

metabolite double

the index of the metabolite(s) which was produced or consumed. If possible only one metabolite is reported, but there are situations where metabolites can only be exchanged in pairs (or more).

Examples:

[solution, metabolite] = findLeakMetabolite(model, 'produce');
[solution, metabolite] = findLeakMetabolite(model, 'consume', ignoreMets);
Notes

Works by forcing at least 1 unit of "any metabolite" to be produced or consumed, then minimising for the sum of fluxes. If more than one metabolite is produced/consumed, it tries to find a single metabolite that satisfies the constraint on its own.

See Also

makeSomething, consumeSomething

fitTasks

Fill gaps in a model so it can perform a list of tasks.

Fills gaps in a model by including reactions from a reference model, so that the resulting model can perform all the tasks in a task list. The gap-filling is done in a task-by-task manner, rather than solving for all tasks at once. This means that the order of the tasks could influence the result.

Input arguments:

Name Type Description Default
model struct

a model structure.

required
refModel struct

reference model from which to include reactions.

required
inputFile char

a task list in Excel format. See the function parseTaskList for details (optional if taskStructure is supplied).

required

Name-value arguments:

Name Type Description Default
printOutput logical

true if the results of the test should be displayed (default true).

rxnScores double

scores for each of the reactions in the reference model. Only negative scores are allowed. The solver will try to maximize the sum of the scores for the included reactions (default is -1 for all reactions).

taskStructure struct

structure with the tasks, as from parseTaskList. If supplied then inputFile is ignored.

Output arguments:

Name Type Description
outModel struct

model structure with reactions added to perform the tasks.

addedRxns logical

MxN matrix with the added reactions (M) from refModel for each task (N). An element is true if the corresponding reaction is added in the corresponding task. Failed tasks and SHOULD FAIL tasks are ignored.

Examples:

[outModel, addedRxns]=fitTasks(model,refModel,inputFile,printOutput,...
    rxnScores,taskStructure);

gapFillFastCore

L1-norm LP subroutine for fastGapFill.

Finds the minimum-L1-flux extension of the model that activates all reactions in the core set. Any reaction with non-zero flux in the solution is "consistent" with the core. Used internally by gapFillFastLP.

Based on FASTCORE (Vlassis et al. 2014, PLoS Comput Biol 10:e1003424).

Input arguments:

Name Type Description Default
model struct

a RAVEN model structure.

required
coreRxns cell or logical or double

core reaction set: cell array of reaction IDs, a logical vector, or a vector of reaction indices into model.rxns. All core reactions are forced to carry flux >= epsilon in the LP solution.

required
epsilon double

minimum flux threshold for a reaction to be considered active.

required

Output arguments:

Name Type Description
activeRxns logical

logical vector (length numel(model.rxns)) where true indicates the reaction carries flux in the L1-minimal solution. Empty if the LP is infeasible (core reactions cannot all carry flux simultaneously).

See also

gapFillSwiftCore, gapFillFastLP

gapFillFastLP

LP-based gap-filling (fastGapFill / swiftGapFill).

Merges the draft model with a universal reaction database, then for each blocked draft reaction calls an LP subroutine (FASTCORE or SWIFTCORE) to find the minimum set of database reactions that makes the blocked reaction carry flux. The union of all activated database reactions across all blocked reactions is returned as addedRxns.

Based on fastGapFill (Thiele et al. 2014, PLoS Comput Biol 10:e1003515) and swiftGapFill (Tefagh & Boyd 2020, BMC Bioinformatics 21:23).

Input arguments:

Name Type Description Default
model struct

draft RAVEN model to be gap-filled.

required
universalModel struct

universal reaction database (RAVEN model). The caller is responsible for building this, e.g. via getKEGGModelForOrganism with broad scope.

required
Optional name-value parameters

'epsilon' (default 1e-4) minimum flux threshold for a reaction to be considered active. 'variant' (default 'fast') 'fast' — use FASTCORE L1-norm LP (gapFillFastCore) 'swift' — use SWIFTCORE single-LP (gapFillSwiftCore); faster but stochastic (results may vary between runs/solvers). 'verbose' (default true) print progress messages.

Output arguments:

Name Type Description
addedRxns cell

reaction IDs from universalModel that should be added to rescue blocked reactions.

newModel struct

copy of model with addedRxns incorporated.

cannotConnect cell

blocked draft reaction IDs that remain blocked even after augmenting with the full universalModel (cannot be rescued by the database).

See also

gapFillFastCore, gapFillSwiftCore, gapFillMILP, fillGaps

gapFillMILP

Objective-based gap-filling via a global growth-floor MILP.

Finds the minimum-cost set of modifications to make the model's objective (biomass) >= minGrowth. Supports two repair mechanisms from Kumar et al. 2007 (Bioinformatics 23:1626-1635): 1. Directionality reversal of existing draft reactions (binary y_rev) 2. Addition of reactions from a universal database (binary y_db)

The universal model should contain all candidate reactions, including exchange and transport reactions if those repair classes are desired.

This is a SINGLE-level MILP (not bilevel). It is computationally more expensive than gapFillFastLP but supports the directionality-reversal repair mechanism and directly optimises for objective feasibility.

Input arguments:

Name Type Description Default
model struct

draft RAVEN model to be gap-filled.

required
universalModel struct

universal reaction database (RAVEN model).

required
Optional name-value parameters

'minGrowth' (default []) minimum objective value to achieve. If empty, runs FBA on the merged model and uses 10% of the maximum as the floor. 'weights' (default [1 2 3 4]) cost weights [w_rev w_db w_exch w_trans]. w_rev is the cost per reversed draft reaction; w_db is the cost per added universal reaction. w_exch and w_trans are reserved for future use when exchange and transport reactions are tracked separately. 'bigM' (default 1000) Big-M constant for coupling constraints. Must be larger than any expected flux magnitude. 'params' (default []) solver parameter struct passed to optimizeProb. Fields such as params.TimeLimit and params.intTol can be used to tune the MILP. 'verbose' (default true) print progress messages.

Output arguments:

Name Type Description
addedRxns cell

reaction IDs from universalModel that are selected by the MILP.

reversedRxns cell

draft reaction IDs whose directionality is reversed by the MILP.

newModel struct

model with addedRxns incorporated and reversedRxns bounds updated (lb set to -bigM for reversed reactions).

exitFlag double

1 = optimal, -1 = infeasible, -2 = timeout or other failure.

See also

gapFillFastLP, gapFillTopological, fillGaps

gapFillSwiftCore

SWIFTCORE LP subroutine for swiftGapFill.

Single-LP alternative to gapFillFastCore. Maximises the sum of non-core fluxes while enforcing that every core reaction carries at least epsilon flux. Any reaction with non-zero flux in the solution is "consistent" with the core.

Because the maximisation objective makes the LP degenerate (many optima exist that differ only in which non-core reactions carry flux), the returned active set can vary between LP solvers and between runs with slightly different numerical conditions. This is the expected behaviour of SWIFTCORE (Tefagh & Boyd 2020).

Based on SWIFTCORE (Tefagh & Boyd 2020, BMC Bioinformatics 21:23).

Input arguments:

Name Type Description Default
model struct

a RAVEN model structure.

required
coreRxns cell or logical or double

core reaction set: cell array of reaction IDs, a logical vector, or a vector of reaction indices into model.rxns. All core reactions are forced to carry flux >= epsilon in the LP solution.

required
epsilon double

minimum flux threshold for a reaction to be considered active.

required

Output arguments:

Name Type Description
activeRxns logical

logical vector (length numel(model.rxns)) where true indicates the reaction carries flux in the solution. Empty on infeasibility.

See also

gapFillFastCore, gapFillFastLP

gapFillTopological

BFS metabolite-producibility pre-screening.

Identifies which metabolites in the draft model cannot be produced from a given set of seed metabolites (medium), using a graph-based fixed-point "scope" computation (no solver required). For each unreachable metabolite, finds candidate reactions in the universal database that produce it directly.

This is a computationally cheap pre-screening step that prunes the universal database before a more expensive gap-filling solve. It is inspired by the topological analysis used in Meneco (Prigent et al. 2017, PLoS Comput Biol).

Algorithm

Starting from seed metabolites (available in the medium), iteratively mark reactions as "fireable" when all their substrates are reachable, then mark their products as newly reachable. Continue until no new metabolite is reached (fixed point). Reversible reactions (lb < 0) are also considered in their reverse direction.

Input arguments:

Name Type Description Default
model struct

draft RAVEN model.

required
universalModel struct

universal reaction database (RAVEN model).

required
Optional name-value parameters

'seeds' (default []) Metabolite IDs (cell array) available from the medium. Default: metabolites involved in uptake exchange reactions (lb < 0). 'targets' (default []) Metabolite IDs (cell array) that should be produced. Default: substrates of the objective (biomass) reaction. 'verbose' (default true) print a summary of the results.

Output arguments:

Name Type Description
result struct with fields:

.reachableMets — logical vector (length nMets) marking producible mets .blockedMets — cell array of metabolite IDs that cannot be reached .candidateRxns — n-by-1 cell array of universal rxn ID lists, one cell per blocked metabolite (same order as blockedMets) .pruningFraction — fraction of universal reactions pruned (not needed for any blocked metabolite)

See also

gapFillFastLP, gapFillMILP, fillGaps

gapReport

Perform a gap analysis and summarize the results.

Input arguments:

Name Type Description Default
model struct

a model structure.

required

Name-value arguments:

Name Type Description Default
templateModels cell

a cell array of template models to use for gap filling.

Output arguments:

Name Type Description
noFluxRxns cell

reactions that cannot carry flux.

noFluxRxnsRelaxed cell

reactions that cannot carry flux even if the mass balance constraint is relaxed so that net production of all metabolites is allowed.

subGraphs struct

the metabolites in each of the isolated sub-networks.

notProducedMets cell

the metabolites that could not have net production.

minToConnect struct

the minimal number of metabolites that need to be connected in order to be able to produce all other metabolites, and which metabolites each of them connects.

neededForProductionMat double

matrix where n x m is true if metabolite n allows for production of metabolite m.

canProduceWithoutInput cell

metabolites that could be produced even when there is no input to the model.

canConsumeWithoutOutput cell

metabolites that could be consumed even when there is no output from the model.

connectedFromTemplates cell

the reactions that could be connected using the template models.

addedFromTemplates struct

the reactions that were added from the template models and which model they were added from.

Examples:

[noFluxRxns, noFluxRxnsRelaxed, subGraphs, notProducedMets, ...
    minToConnect, neededForProductionMat, connectedFromTemplates, ...
    addedFromTemplates] = gapReport(model, templateModels);

makeSomething

Excrete any metabolite using as few reactions as possible.

Wrapper for findLeakMetabolite(model,'produce',...), retained for backward compatibility. See findLeakMetabolite for full documentation.

See Also

findLeakMetabolite