Localization (Python)¶

raven-toolbox objects in raven_toolbox.localization, collected from the source of the tracked branch.

Functions¶

Function	Summary
`apply_localization`	Apply a :class:`LocalizationProposal` to `model`: move reactions, add the
`load_deeploc`	Parse DeepLoc 2 CSV output into a normalised :class:`LocalizationScores`.
`load_wolfpsort`	Parse WoLF PSORT summary output (`runWolfPsortSummary`) into a normalised
`LocalizationProposal`	What :func:`predict_localization` proposes, before applying it.
`LocalizationResult`	Outcome of :func:`predict_localization` (when `apply=True`).
`LocalizationScores`	Per-gene compartment scores. `df` is indexed by `gene_id` with one column per
`predict_localization`	Place a caller-specified set of reactions in compartments via MILP.

Reference¶

apply_localization¶

Apply a :class:LocalizationProposal to model: move reactions, add the inter-compartment transports the proposal listed, and return (model_copy, added).

The returned model is a deep copy of the input (original left untouched). Moved reactions get their metabolites' compartment suffix swapped (e.g. A_c → A_m); new compartment-specific metabolite copies are added on demand. Each added transport is a passive diffusion M[default] ⇌ M[c] (RAVEN convention), named tr_<met>_<c>.

load_deeploc¶

Parse DeepLoc 2 CSV output into a normalised :class:LocalizationScores.

DeepLoc 2's per-protein CSV has columns Protein_ID, Localizations, Signals, <Compartment1>, <Compartment2>, ... where columns 4+ are per-class probabilities. The first three metadata columns are dropped; the rest become compartment columns.

load_wolfpsort¶

Parse WoLF PSORT summary output (runWolfPsortSummary) into a normalised :class:LocalizationScores. Rows like PROT: treating N X's as ... are skipped.

LocalizationProposal¶

What :func:predict_localization proposes, before applying it.

All DataFrames have one row per item. Use this with apply=False to preview changes; pass it back to :func:apply_localization to commit, or diff against a curator's expectations.

LocalizationResult¶

Outcome of :func:predict_localization (when apply=True).

LocalizationScores¶

Per-gene compartment scores. df is indexed by gene_id with one column per compartment id; values are floats (higher = stronger evidence for that compartment).

Genes absent from df and NaN entries are treated as "no signal" by :func:raven_toolbox.localization.predict_localization (uniform prior contribution).

with_compartments ¶

with_compartments(
    mapping: Mapping[str, str],
) -> LocalizationScores

Rename compartment columns via {old: new} (e.g. predictor labels → model compartments). Unmapped columns are kept; multiple sources can be merged with df.combine_first afterwards.

predict_localization¶

Place a caller-specified set of reactions in compartments via MILP.

Returns a :class:LocalizationProposal (when apply=False) or a :class:LocalizationResult (when apply=True).

reactions_to_relocate: the reaction ids to (re-)place. Everything else stays where it is. Boundary reactions and existing multi-compartment transports passed in this set are silently filtered out (always pinned). Pass an empty set or a list of zero non-boundary reactions to no-op.

transport_cost: either a scalar (same cost per added transport) or a mapping {metabolite_id_base: cost} (where the base id strips the compartment suffix, e.g. "glc__D" matches "glc__D_c"/"glc__D_e"). Negative costs favour adding the transport.

Multi-compartment gene scoring (default behaviour): a gene contributes its predictor score in each compartment it lands in; the highest-scoring compartment is "free", each additional compartment costs multi_compartment_penalty. A secondary compartment is only worth picking when its score (typically lower than the primary) still exceeds the penalty — no hard cutoff, just an explicit score-vs-penalty trade-off. Set multi_compartment_penalty very large for effectively mono-localised genes.