Gap-filling¶

MATLAB functions in RAVEN/gapfilling of the RAVEN toolbox. Help text is collected from the source of the tracked branch.

Functions¶

Function	Summary
`canConsume`	Check which metabolites can be consumed by a model.
`canProduce`	Check which metabolites can be produced from a model.
`checkProduction`	Check which metabolites can be produced from a model.
`checkRxn`	Check which reactants can be synthesized and products consumed.
`consumeSomething`	Consume any metabolite using as few reactions as possible.
`fillGaps`	Use template model(s) to fill gaps in a model.
`findLeakMetabolite`	Find a metabolite that can be freely made or consumed.
`fitTasks`	Fill gaps in a model so it can perform a list of tasks.
`gapFillFastCore`	L1-norm LP subroutine for fastGapFill.
`gapFillFastLP`	LP-based gap-filling (fastGapFill / swiftGapFill).
`gapFillMILP`	Objective-based gap-filling via a global growth-floor MILP.
`gapFillSwiftCore`	SWIFTCORE LP subroutine for swiftGapFill.
`gapFillTopological`	BFS metabolite-producibility pre-screening.
`gapReport`
`makeSomething`	Excrete any metabolite using as few reactions as possible.

Reference¶

canConsume¶

Check which metabolites can be consumed by a model.

Checks which metabolites can be consumed by a model using the specified constraints.

Input arguments:

Name	Type	Description	Default
`model`	`struct`	a model structure.	required

Name-value arguments:

Name	Type	Description	Default
`mets`	`cell or logical or double`	either a cell array of metabolite IDs, a logical vector with the same number of elements as metabolites in the model, or a vector of indexes to check for (default model.mets).

Output arguments:

Name	Type	Description
`consumed`	`logical`	vector with true if the corresponding metabolite could be produced.

Examples:

consumed = canConsume(model, mets);

canProduce¶

Check which metabolites can be produced from a model.

Checks which metabolites can be produced from a model using the specified constraints. This is a less advanced but faster version of checkProduction.

Input arguments:

Name	Type	Description	Default
`model`	`struct`	a model structure.	required

Name-value arguments:

Name	Type	Description	Default
`mets`	`cell or logical or double`	either a cell array of metabolite IDs, a logical vector with the same number of elements as metabolites in the model, or a vector of indexes to check for (default model.mets).

Output arguments:

Name	Type	Description
`produced`	`logical`	vector with true if the corresponding metabolite could be produced.

Examples:

produced = canProduce(model, mets);

See also

checkProduction

checkProduction¶

Check which metabolites can be produced from a model.

Checks which metabolites that can be produced from a model using the specified constraints.

The function is intended to be used to identify which metabolites must be connected in order to have a fully connected network. It does so by first identifying which metabolites could have a net production in the network. Then it calculates which other metabolites must be able to have net production in order to have production of all metabolites in the network. So, if a network contains the equations A[external]->B, C->D, and D->E it will identify that production of C will connect the metabolites D and E.

Input arguments:

Name	Type	Description	Default
`model`	`struct`	a model structure.	required

Name-value arguments:

Name	Type	Description
`checkNeededForProduction`	`logical`	for each of the metabolites that could not be produced, include an artificial production reaction and calculate which new metabolites that could be produced as an effect of this (default false).
`excretionFromCompartments`	`cell`	cell array with compartment ids from which metabolites can be excreted (default model.comps).
`printDetails`	`logical`	print details to the screen (default true).

Output arguments:

Name	Type	Description
`notProduced`	`double`	indices of metabolites that could not be produced.
`notProducedNames`	`cell`	cell array with names and compartments for metabolites that could not be produced.
`neededForProductionMat`	`logical`	matrix where n x m is true if metabolite n allows for production of metabolite m.
`minToConnect`	`cell`	the minimal number of metabolites that need to be connected in order to be able to produce all other metabolites, and which metabolites each of them connects.
`model`	`struct`	updated model structure with excretion reactions added.

Examples:

[notProduced, notProducedNames, neededForProductionMat, minToConnect, model] = ...
    checkProduction(model, checkNeededForProduction, excretionFromCompartments, printDetails);

checkRxn¶

Check which reactants can be synthesized and products consumed.

Checks which reactants in a reaction that can be synthesized and which products that can be consumed. This is primarily for debugging reactions which cannot have flux.

Input arguments:

Name	Type	Description	Default
`model`	`struct`	a model structure.	required
`rxn`	`char`	the id of one reaction to check.	required

Name-value arguments:

Name	Type	Description
`cutoff`	`double`	minimal flux for successful production/consumption (default 10^-7).
`revDir`	`logical`	true if the reaction should be reversed (default false).
`printReport`	`logical`	print a report (default true).

Output arguments:

Name	Type	Description
`report`	`struct`	report with fields: reactants : array with reactant indexes canMake : boolean array, true if the corresponding reactant can be synthesized by the rest of the metabolic network products : array with product indexes canConsume : boolean array, true if the corresponding product can be consumed by the rest of the metabolic network

Examples:

report = checkRxn(model, rxn, cutoff, revDir, printReport);

consumeSomething¶

Consume any metabolite using as few reactions as possible.

Wrapper for findLeakMetabolite(model,'consume',...), retained for backward compatibility. See findLeakMetabolite for full documentation.

fillGaps¶

Use template model(s) to fill gaps in a model.

This method works by merging the model with the reference model(s) and checking which reactions can carry flux. All reactions that cannot carry flux are removed (cannotConnect). If useModelConstraints is false it then solves the MILP problem of minimizing the number of active reactions from the reference models that are required to have flux in all the reactions in model. This requires that the input model has exchange reactions present for the nutrients that are needed for its metabolism. If useModelConstraints is true then the problem is to include as few reactions as possible from the reference models in order to satisfy the model constraints.

The intended use is that the user can attempt a general gap-filling using useModelConstraints=false, or a more targeted gap-filling by setting constraints in the model structure and then using useModelConstraints=true. For example, to include reactions so that all biomass components can be synthesized, the user could define a biomass equation and set its lower bound to >0. Running this function with useModelConstraints=true would then give the smallest set of reactions that have to be included in order for the model to produce biomass.

Input arguments:

Name	Type	Description	Default
`model`	`struct`	a model structure that may contain gaps to be filled.	required
`models`	`cell or struct`	a cell array of reference models or a model structure. The gaps will be filled using reactions from these models.	required

Name-value arguments:

Name	Type	Description
`allowNetProduction`	`logical`	true if net production of all metabolites is allowed. A reaction can be unable to carry flux because one of the reactants is unavailable or because one of the products cannot be further processed. If true, only the first type of unconnectivity is considered (default false).
`useModelConstraints`	`logical`	true if the constraints specified in the model structure should be used. If false then reactions are included from the template model(s) so that as many reactions as possible in model can carry flux (default false).
`supressWarnings`	`logical`	false if warnings should be displayed (default false).
`rxnScores`	`double or cell`	array with scores for each of the reactions in the reference model(s). If more than one model is supplied, then rxnScores should be a cell array of vectors. The solver will try to maximize the sum of the scores for the included reactions (default is -1 for all reactions).
`algorithm`	`char`	gap-filling algorithm to use (default 'reference'): 'reference' : existing MILP-based connectivity maximisation 'fastLP' : L1-norm LP relaxation via FASTCORE (gapFillFastLP) 'swiftLP' : SWIFTCORE single-LP variant (gapFillFastLP) 'gapfillMILP' : growth-floor MILP with directionality repair (gapFillMILP). Uses only models{1} as the database. 'topological' : BFS metabolite producibility pre-screening (gapFillTopological); no model modification.
`epsilon`	`double`	minimum flux threshold used by fastLP and swiftLP (default 1e-4).
`minGrowth`	`double`	minimum growth rate for gapfillMILP (default [] = 10% of max FBA).
`verbose`	`logical`	print progress messages for new algorithms (default true).

Output arguments:

Name	Type	Description
`newConnected`	`cell`	cell array with the reactions that could be connected. This is not calculated if useModelConstraints is true.
`cannotConnect`	`cell`	cell array with reactions that could not be connected. This is not calculated if useModelConstraints is true.
`addedRxns`	`cell`	cell array with the reactions that were added from "models".
`newModel`	`struct`	the model with reactions added to fill gaps.
`exitFlag`	`double`	exit status: 1 : optimal solution found -1 : no feasible solution found -2 : optimization time out

Examples:

[newConnected, cannotConnect, addedRxns, newModel, exitFlag]=...
    fillGaps(model,models,allowNetProduction,useModelConstraints,...
    supressWarnings,rxnScores,params);

findLeakMetabolite¶

Find a metabolite that can be freely made or consumed.

Tests whether the model can produce (excrete) or consume (take up) any metabolite without balanced stoichiometric constraints — a sign that the model contains a stoichiometric leak, futile cycle, or unconstrained exchange. This is the unified replacement for makeSomething ('produce') and consumeSomething ('consume').

Input arguments:

Name	Type	Description	Default
`model`	`struct`	a model structure.	required
`direction`	`char`	'produce' to look for freely excreted metabolites, or 'consume' to look for freely consumed metabolites.	required

Name-value arguments:

Name	Type	Description
`ignoreMets`	`cell or logical or double`	either a cell array of metabolite IDs, a logical vector with the same number of elements as metabolites in the model, or a vector of indexes for metabolites to exclude from this analysis (default []).
`isNames`	`logical`	true if the supplied mets represent metabolite names (as opposed to IDs). This is a way to delete metabolites in several compartments at once without knowing the exact IDs. This only works if ignoreMets is a cell array (default false).
`minNrFluxes`	`logical`	solves the MILP problem of minimizing the number of fluxes instead of the sum. Slower, but can be used if the sum gives too many fluxes (default false).
`allowExcretion`	`logical`	allow for excretion of all metabolites. Only used when direction is 'produce' (default true).
`params`	`struct`	obsolete option.
`ignoreIntBounds`	`logical`	true if internal bounds (including reversibility) should be ignored. Exchange reactions are not affected. This can be used to find unbalanced solutions which are not possible using the default constraints (default false).

Output arguments:

Name	Type	Description
`solution`	`double`	flux vector for the solution.
`metabolite`	`double`	the index of the metabolite(s) which was produced or consumed. If possible only one metabolite is reported, but there are situations where metabolites can only be exchanged in pairs (or more).

Examples:

[solution, metabolite] = findLeakMetabolite(model, 'produce');
[solution, metabolite] = findLeakMetabolite(model, 'consume', ignoreMets);

Notes

Works by forcing at least 1 unit of "any metabolite" to be produced or consumed, then minimising for the sum of fluxes. If more than one metabolite is produced/consumed, it tries to find a single metabolite that satisfies the constraint on its own.

fitTasks¶

Fill gaps in a model so it can perform a list of tasks.

Fills gaps in a model by including reactions from a reference model, so that the resulting model can perform all the tasks in a task list. The gap-filling is done in a task-by-task manner, rather than solving for all tasks at once. This means that the order of the tasks could influence the result.

Input arguments:

Name	Type	Description	Default
`model`	`struct`	a model structure.	required
`refModel`	`struct`	reference model from which to include reactions.	required
`inputFile`	`char`	a task list in Excel format. See the function parseTaskList for details (optional if taskStructure is supplied).	required

Name-value arguments:

Name	Type	Description
`printOutput`	`logical`	true if the results of the test should be displayed (default true).
`rxnScores`	`double`	scores for each of the reactions in the reference model. Only negative scores are allowed. The solver will try to maximize the sum of the scores for the included reactions (default is -1 for all reactions).
`taskStructure`	`struct`	structure with the tasks, as from parseTaskList. If supplied then inputFile is ignored.

Output arguments:

Name	Type	Description
`outModel`	`struct`	model structure with reactions added to perform the tasks.
`addedRxns`	`logical`	MxN matrix with the added reactions (M) from refModel for each task (N). An element is true if the corresponding reaction is added in the corresponding task. Failed tasks and SHOULD FAIL tasks are ignored.

Examples:

[outModel, addedRxns]=fitTasks(model,refModel,inputFile,printOutput,...
    rxnScores,taskStructure);

gapFillFastCore¶

L1-norm LP subroutine for fastGapFill.

Finds the minimum-L1-flux extension of the model that activates all reactions in the core set. Any reaction with non-zero flux in the solution is "consistent" with the core. Used internally by gapFillFastLP.

Based on FASTCORE (Vlassis et al. 2014, PLoS Comput Biol 10:e1003424).

Input arguments:

Name	Type	Description	Default
`model`	`struct`	a RAVEN model structure.	required
`coreRxns`	`cell or logical or double`	core reaction set: cell array of reaction IDs, a logical vector, or a vector of reaction indices into model.rxns. All core reactions are forced to carry flux >= epsilon in the LP solution.	required
`epsilon`	`double`	minimum flux threshold for a reaction to be considered active.	required

Output arguments:

Name	Type	Description
`activeRxns`	`logical`	logical vector (length numel(model.rxns)) where true indicates the reaction carries flux in the L1-minimal solution. Empty if the LP is infeasible (core reactions cannot all carry flux simultaneously).

See also

gapFillSwiftCore, gapFillFastLP

gapFillFastLP¶

LP-based gap-filling (fastGapFill / swiftGapFill).

Merges the draft model with a universal reaction database, then for each blocked draft reaction calls an LP subroutine (FASTCORE or SWIFTCORE) to find the minimum set of database reactions that makes the blocked reaction carry flux. The union of all activated database reactions across all blocked reactions is returned as addedRxns.

Based on fastGapFill (Thiele et al. 2014, PLoS Comput Biol 10:e1003515) and swiftGapFill (Tefagh & Boyd 2020, BMC Bioinformatics 21:23).

Input arguments:

Name	Type	Description	Default
`model`	`struct`	draft RAVEN model to be gap-filled.	required
`universalModel`	`struct`	universal reaction database (RAVEN model). The caller is responsible for building this, e.g. via getKEGGModelForOrganism with broad scope.	required

Optional name-value parameters

'epsilon' (default 1e-4) minimum flux threshold for a reaction to be considered active. 'variant' (default 'fast') 'fast' — use FASTCORE L1-norm LP (gapFillFastCore) 'swift' — use SWIFTCORE single-LP (gapFillSwiftCore); faster but stochastic (results may vary between runs/solvers). 'verbose' (default true) print progress messages.

Output arguments:

Name	Type	Description
`addedRxns`	`cell`	reaction IDs from universalModel that should be added to rescue blocked reactions.
`newModel`	`struct`	copy of model with addedRxns incorporated.
`cannotConnect`	`cell`	blocked draft reaction IDs that remain blocked even after augmenting with the full universalModel (cannot be rescued by the database).

See also

gapFillFastCore, gapFillSwiftCore, gapFillMILP, fillGaps

gapFillMILP¶

Objective-based gap-filling via a global growth-floor MILP.

Finds the minimum-cost set of modifications to make the model's objective (biomass) >= minGrowth. Supports two repair mechanisms from Kumar et al. 2007 (Bioinformatics 23:1626-1635): 1. Directionality reversal of existing draft reactions (binary y_rev) 2. Addition of reactions from a universal database (binary y_db)

The universal model should contain all candidate reactions, including exchange and transport reactions if those repair classes are desired.

This is a SINGLE-level MILP (not bilevel). It is computationally more expensive than gapFillFastLP but supports the directionality-reversal repair mechanism and directly optimises for objective feasibility.

Input arguments:

Name	Type	Description	Default
`model`	`struct`	draft RAVEN model to be gap-filled.	required
`universalModel`	`struct`	universal reaction database (RAVEN model).	required

Optional name-value parameters

'minGrowth' (default []) minimum objective value to achieve. If empty, runs FBA on the merged model and uses 10% of the maximum as the floor. 'weights' (default [1 2 3 4]) cost weights [w_rev w_db w_exch w_trans]. w_rev is the cost per reversed draft reaction; w_db is the cost per added universal reaction. w_exch and w_trans are reserved for future use when exchange and transport reactions are tracked separately. 'bigM' (default 1000) Big-M constant for coupling constraints. Must be larger than any expected flux magnitude. 'params' (default []) solver parameter struct passed to optimizeProb. Fields such as params.TimeLimit and params.intTol can be used to tune the MILP. 'verbose' (default true) print progress messages.

Output arguments:

Name	Type	Description
`addedRxns`	`cell`	reaction IDs from universalModel that are selected by the MILP.
`reversedRxns`	`cell`	draft reaction IDs whose directionality is reversed by the MILP.
`newModel`	`struct`	model with addedRxns incorporated and reversedRxns bounds updated (lb set to -bigM for reversed reactions).
`exitFlag`	`double`	1 = optimal, -1 = infeasible, -2 = timeout or other failure.

See also

gapFillFastLP, gapFillTopological, fillGaps

gapFillSwiftCore¶

SWIFTCORE LP subroutine for swiftGapFill.

Single-LP alternative to gapFillFastCore. Maximises the sum of non-core fluxes while enforcing that every core reaction carries at least epsilon flux. Any reaction with non-zero flux in the solution is "consistent" with the core.

Because the maximisation objective makes the LP degenerate (many optima exist that differ only in which non-core reactions carry flux), the returned active set can vary between LP solvers and between runs with slightly different numerical conditions. This is the expected behaviour of SWIFTCORE (Tefagh & Boyd 2020).

Based on SWIFTCORE (Tefagh & Boyd 2020, BMC Bioinformatics 21:23).

Input arguments:

Name	Type	Description	Default
`model`	`struct`	a RAVEN model structure.	required
`coreRxns`	`cell or logical or double`	core reaction set: cell array of reaction IDs, a logical vector, or a vector of reaction indices into model.rxns. All core reactions are forced to carry flux >= epsilon in the LP solution.	required
`epsilon`	`double`	minimum flux threshold for a reaction to be considered active.	required

Output arguments:

Name	Type	Description
`activeRxns`	`logical`	logical vector (length numel(model.rxns)) where true indicates the reaction carries flux in the solution. Empty on infeasibility.

See also

gapFillFastCore, gapFillFastLP

gapFillTopological¶

BFS metabolite-producibility pre-screening.

Identifies which metabolites in the draft model cannot be produced from a given set of seed metabolites (medium), using a graph-based fixed-point "scope" computation (no solver required). For each unreachable metabolite, finds candidate reactions in the universal database that produce it directly.

This is a computationally cheap pre-screening step that prunes the universal database before a more expensive gap-filling solve. It is inspired by the topological analysis used in Meneco (Prigent et al. 2017, PLoS Comput Biol).

Algorithm

Starting from seed metabolites (available in the medium), iteratively mark reactions as "fireable" when all their substrates are reachable, then mark their products as newly reachable. Continue until no new metabolite is reached (fixed point). Reversible reactions (lb < 0) are also considered in their reverse direction.

Input arguments:

Name	Type	Description	Default
`model`	`struct`	draft RAVEN model.	required
`universalModel`	`struct`	universal reaction database (RAVEN model).	required

Optional name-value parameters

'seeds' (default []) Metabolite IDs (cell array) available from the medium. Default: metabolites involved in uptake exchange reactions (lb < 0). 'targets' (default []) Metabolite IDs (cell array) that should be produced. Default: substrates of the objective (biomass) reaction. 'verbose' (default true) print a summary of the results.

Output arguments:

Name	Type	Description
`result`	`struct with fields:`	.reachableMets — logical vector (length nMets) marking producible mets .blockedMets — cell array of metabolite IDs that cannot be reached .candidateRxns — n-by-1 cell array of universal rxn ID lists, one cell per blocked metabolite (same order as blockedMets) .pruningFraction — fraction of universal reactions pruned (not needed for any blocked metabolite)

See also

gapFillFastLP, gapFillMILP, fillGaps

gapReport¶

Perform a gap analysis and summarize the results.

Input arguments:

Name	Type	Description	Default
`model`	`struct`	a model structure.	required

Name-value arguments:

Name	Type	Description	Default
`templateModels`	`cell`	a cell array of template models to use for gap filling.

Output arguments:

Name	Type	Description
`noFluxRxns`	`cell`	reactions that cannot carry flux.
`noFluxRxnsRelaxed`	`cell`	reactions that cannot carry flux even if the mass balance constraint is relaxed so that net production of all metabolites is allowed.
`subGraphs`	`struct`	the metabolites in each of the isolated sub-networks.
`notProducedMets`	`cell`	the metabolites that could not have net production.
`minToConnect`	`struct`	the minimal number of metabolites that need to be connected in order to be able to produce all other metabolites, and which metabolites each of them connects.
`neededForProductionMat`	`double`	matrix where n x m is true if metabolite n allows for production of metabolite m.
`canProduceWithoutInput`	`cell`	metabolites that could be produced even when there is no input to the model.
`canConsumeWithoutOutput`	`cell`	metabolites that could be consumed even when there is no output from the model.
`connectedFromTemplates`	`cell`	the reactions that could be connected using the template models.
`addedFromTemplates`	`struct`	the reactions that were added from the template models and which model they were added from.

Examples:

[noFluxRxns, noFluxRxnsRelaxed, subGraphs, notProducedMets, ...
    minToConnect, neededForProductionMat, connectedFromTemplates, ...
    addedFromTemplates] = gapReport(model, templateModels);

makeSomething¶

Excrete any metabolite using as few reactions as possible.

Wrapper for findLeakMetabolite(model,'produce',...), retained for backward compatibility. See findLeakMetabolite for full documentation.