Skip to content

Localization (Python)

raven-toolbox objects in raven_toolbox.localization, collected from the source of the tracked branch.

Functions

Function Summary
apply_localization Apply a :class:LocalizationProposal to model: move reactions, add the
load_deeploc Parse DeepLoc 2 CSV output into a normalised :class:LocalizationScores.
load_wolfpsort Parse WoLF PSORT summary output (runWolfPsortSummary) into a normalised
LocalizationProposal What :func:predict_localization proposes, before applying it.
LocalizationResult Outcome of :func:predict_localization (when apply=True).
LocalizationScores Per-gene compartment scores. df is indexed by gene_id with one column per
predict_localization Place a caller-specified set of reactions in compartments via MILP.

Reference

apply_localization

Apply a :class:LocalizationProposal to model: move reactions, add the inter-compartment transports the proposal listed, and return (model_copy, added).

The returned model is a deep copy of the input (original left untouched). Moved reactions get their metabolites' compartment suffix swapped (e.g. A_c → A_m); new compartment-specific metabolite copies are added on demand. Each added transport is a passive diffusion M[default] ⇌ M[c] (RAVEN convention), named tr_<met>_<c>.

load_deeploc

Parse DeepLoc 2 CSV output into a normalised :class:LocalizationScores.

DeepLoc 2's per-protein CSV has columns Protein_ID, Localizations, Signals, <Compartment1>, <Compartment2>, ... where columns 4+ are per-class probabilities. The first three metadata columns are dropped; the rest become compartment columns.

load_wolfpsort

Parse WoLF PSORT summary output (runWolfPsortSummary) into a normalised :class:LocalizationScores. Rows like PROT: treating N X's as ... are skipped.

LocalizationProposal

What :func:predict_localization proposes, before applying it.

All DataFrames have one row per item. Use this with apply=False to preview changes; pass it back to :func:apply_localization to commit, or diff against a curator's expectations.

LocalizationResult

Outcome of :func:predict_localization (when apply=True).

LocalizationScores

Per-gene compartment scores. df is indexed by gene_id with one column per compartment id; values are floats (higher = stronger evidence for that compartment).

Genes absent from df and NaN entries are treated as "no signal" by :func:raven_toolbox.localization.predict_localization (uniform prior contribution).

with_compartments

with_compartments(
    mapping: Mapping[str, str],
) -> LocalizationScores

Rename compartment columns via {old: new} (e.g. predictor labels → model compartments). Unmapped columns are kept; multiple sources can be merged with df.combine_first afterwards.

predict_localization

Place a caller-specified set of reactions in compartments via MILP.

Returns a :class:LocalizationProposal (when apply=False) or a :class:LocalizationResult (when apply=True).

reactions_to_relocate: the reaction ids to (re-)place. Everything else stays where it is. Boundary reactions and existing multi-compartment transports passed in this set are silently filtered out (always pinned). Pass an empty set or a list of zero non-boundary reactions to no-op.

transport_cost: either a scalar (same cost per added transport) or a mapping {metabolite_id_base: cost} (where the base id strips the compartment suffix, e.g. "glc__D" matches "glc__D_c"/"glc__D_e"). Negative costs favour adding the transport.

Multi-compartment gene scoring (default behaviour): a gene contributes its predictor score in each compartment it lands in; the highest-scoring compartment is "free", each additional compartment costs multi_compartment_penalty. A secondary compartment is only worth picking when its score (typically lower than the primary) still exceeds the penalty — no hard cutoff, just an explicit score-vs-penalty trade-off. Set multi_compartment_penalty very large for effectively mono-localised genes.