Skip to content

Top-level (Python)

raven-toolbox objects in raven_toolbox, collected from the source of the tracked branch.

Functions

Function Summary
autofetch_enabled Whether lazy first-use downloads are allowed.
BinaryStatus Outcome of provisioning one executable.
ensure_binary Download (if needed) and return the path to a bundled executable.
ensure_data_file Download (if needed) and return the cached path to one artefact file.
ensure_kegg_data Ensure the core KEGG artefacts are cached; return their directory.
ensure_kegg_hmm_library Ensure a domain HMM library is cached and decompressed; return the .hmm path.
ensure_kegg_taxonomy Ensure the KEGG taxonomy artefact is cached; return its (gzipped) path.
executables_for_set Return the executables in a named set ("all" = the union of every set).
load_into_registries Load a manifest and merge it into the live data/binary registries.
load_manifest Read and validate a manifest from source (path/URL) or $RAVEN_PYTHON_MANIFEST.
main
platform_key Return the <os>-<arch> key used in the registry (e.g. linux-x86_64).
provision_binaries Ensure each executable is available, reporting per-tool outcomes.
resolve_binary Resolve an executable to a path: arg → env var → PATH → bundled ZIP → error.
to_binary_registry Project manifest['binaries'] onto the raven_toolbox.binaries._REGISTRY shape.
to_data_registry Project manifest['data'] onto the raven_toolbox.data._DATA_REGISTRY shape.

Reference

autofetch_enabled

Whether lazy first-use downloads are allowed.

On by default (the zero-setup behaviour). Set RAVEN_PYTHON_AUTOFETCH to 0/false/no/off (any case) to disable, so :func:resolve_binary stops at PATH and never reaches the network — for air-gapped or strictly conda/system-managed setups.

BinaryStatus

Outcome of provisioning one executable.

status is one of "present" (already on PATH / via env var), "downloaded" (fetched from a bundle just now), "unavailable" (no bundle hosted for this OS/arch — install via conda/WSL2), or "error" (download or verification failed). detail is the path (present/downloaded) or message.

ensure_binary

Download (if needed) and return the path to a bundled executable.

Consults the registry for the current platform, downloads the pinned ZIP, verifies its SHA256, extracts it into the cache, and returns the executable path. Raises FileNotFoundError if no bundle for this platform is hosted.

ensure_data_file

Download (if needed) and return the cached path to one artefact file.

Looks the file up in the registry for dataset (at version or the registry's default), downloads it to the version-pinned cache directory, verifies its SHA256, and returns the path. Re-uses an already-cached copy.

A freshly downloaded file is always SHA256-checked. verify additionally re-checks an already-cached file's SHA256 (a mismatch — i.e. a corrupted cache — discards it and re-downloads); it is off by default so the common cache-hit path stays fast.

ensure_kegg_data

Ensure the core KEGG artefacts are cached; return their directory.

Fetches the single <version>_core.tar.gz bundle (the gene-free reference model + the KO/reaction/organism-gene tables of :data:CORE_KEGG_FILES), SHA256-verifies it, and extracts the version-prefixed members into the cache directory on first use — ready to pass as the artefact_dir of :func:get_kegg_model_for_organism_from_artefacts. The HMM libraries and the taxonomy file are separate artefacts (see :func:ensure_kegg_hmm_library, :func:ensure_kegg_taxonomy).

ensure_kegg_hmm_library

Ensure a domain HMM library is cached and decompressed; return the .hmm path.

domain is "prokaryotes" or "eukaryotes". Fetches the gzipped concatenated library <version>_<domain>.hmm.gz and decompresses it once (cached). Returns the path to the .hmm flatfile — the argument for :func:run_hmmsearch, which searches it directly (no hmmpress needed).

Shipping the gzip flatfile keeps the download ~10x smaller than a binary index, stays portable across HMMER versions/platforms, and lets the same artefact serve MATLAB RAVEN.

ensure_kegg_taxonomy

Ensure the KEGG taxonomy artefact is cached; return its (gzipped) path.

The gzipped KEGG taxonomy file is the source for domain classification and for regenerating the phylogenetic distance matrix — RAVEN's keggPhylDist, which GECKO uses to pick the closest organism for kcat assignment — via :func:raven_toolbox.reconstruction.kegg.phyl_dist (which reads .gz directly). So that capability needs only this published artefact, no MATLAB .mat file.

executables_for_set

Return the executables in a named set ("all" = the union of every set).

load_into_registries

Load a manifest and merge it into the live data/binary registries.

Parameters:

Name Type Description Default
source str | PathLike | None

Manifest path or URL; defaults to $RAVEN_PYTHON_MANIFEST.

None
replace bool

If True, clear the existing registries first; otherwise merge (manifest wins).

False

Returns:

Type Description
dict

The parsed manifest.

load_manifest

Read and validate a manifest from source (path/URL) or $RAVEN_PYTHON_MANIFEST.

main

platform_key

Return the <os>-<arch> key used in the registry (e.g. linux-x86_64).

provision_binaries

Ensure each executable is available, reporting per-tool outcomes.

With prefer_existing (default) a tool already on PATH or pointed at by its env var is left as-is ("present") and not downloaded. Otherwise the bundle is fetched via :func:ensure_binary. Never raises for an individual tool — a missing platform bundle becomes "unavailable" and a failed download "error", so a caller can report the whole set at once.

resolve_binary

Resolve an executable to a path: arg → env var → PATH → bundled ZIP → error.

The bundled-ZIP step is skipped when auto-fetch is disabled (:func:autofetch_enabled); resolution then stops at PATH with an actionable error instead of downloading.

to_binary_registry

Project manifest['binaries'] onto the raven_toolbox.binaries._REGISTRY shape.

to_data_registry

Project manifest['data'] onto the raven_toolbox.data._DATA_REGISTRY shape.