Top-level (Python)¶
raven-toolbox objects in raven_toolbox, collected from the source of the tracked branch.
Functions¶
| Function | Summary |
|---|---|
autofetch_enabled |
Whether lazy first-use downloads are allowed. |
BinaryStatus |
Outcome of provisioning one executable. |
ensure_binary |
Download (if needed) and return the path to a bundled executable. |
ensure_data_file |
Download (if needed) and return the cached path to one artefact file. |
ensure_kegg_data |
Ensure the core KEGG artefacts are cached; return their directory. |
ensure_kegg_hmm_library |
Ensure a domain HMM library is cached and decompressed; return the .hmm path. |
ensure_kegg_taxonomy |
Ensure the KEGG taxonomy artefact is cached; return its (gzipped) path. |
executables_for_set |
Return the executables in a named set ("all" = the union of every set). |
load_into_registries |
Load a manifest and merge it into the live data/binary registries. |
load_manifest |
Read and validate a manifest from source (path/URL) or $RAVEN_PYTHON_MANIFEST. |
main |
|
platform_key |
Return the <os>-<arch> key used in the registry (e.g. linux-x86_64). |
provision_binaries |
Ensure each executable is available, reporting per-tool outcomes. |
resolve_binary |
Resolve an executable to a path: arg → env var → PATH → bundled ZIP → error. |
to_binary_registry |
Project manifest['binaries'] onto the raven_toolbox.binaries._REGISTRY shape. |
to_data_registry |
Project manifest['data'] onto the raven_toolbox.data._DATA_REGISTRY shape. |
Reference¶
autofetch_enabled¶
Whether lazy first-use downloads are allowed.
On by default (the zero-setup behaviour). Set RAVEN_PYTHON_AUTOFETCH to
0/false/no/off (any case) to disable, so :func:resolve_binary
stops at PATH and never reaches the network — for air-gapped or
strictly conda/system-managed setups.
BinaryStatus¶
Outcome of provisioning one executable.
status is one of "present" (already on PATH / via env var),
"downloaded" (fetched from a bundle just now), "unavailable" (no bundle
hosted for this OS/arch — install via conda/WSL2), or "error" (download or
verification failed). detail is the path (present/downloaded) or message.
ensure_binary¶
Download (if needed) and return the path to a bundled executable.
Consults the registry for the current platform, downloads the pinned ZIP,
verifies its SHA256, extracts it into the cache, and returns the executable
path. Raises FileNotFoundError if no bundle for this platform is hosted.
ensure_data_file¶
Download (if needed) and return the cached path to one artefact file.
Looks the file up in the registry for dataset (at version or the
registry's default), downloads it to the version-pinned cache directory,
verifies its SHA256, and returns the path. Re-uses an already-cached copy.
A freshly downloaded file is always SHA256-checked. verify additionally
re-checks an already-cached file's SHA256 (a mismatch — i.e. a corrupted
cache — discards it and re-downloads); it is off by default so the common
cache-hit path stays fast.
ensure_kegg_data¶
Ensure the core KEGG artefacts are cached; return their directory.
Fetches the single <version>_core.tar.gz bundle (the gene-free reference
model + the KO/reaction/organism-gene tables of :data:CORE_KEGG_FILES),
SHA256-verifies it, and extracts the version-prefixed members into the cache
directory on first use — ready to pass as the artefact_dir of
:func:get_kegg_model_for_organism_from_artefacts. The HMM libraries and the
taxonomy file are separate artefacts (see :func:ensure_kegg_hmm_library,
:func:ensure_kegg_taxonomy).
ensure_kegg_hmm_library¶
Ensure a domain HMM library is cached and decompressed; return the .hmm path.
domain is "prokaryotes" or "eukaryotes". Fetches the gzipped
concatenated library <version>_<domain>.hmm.gz and decompresses it once
(cached). Returns the path to the .hmm flatfile — the argument for
:func:run_hmmsearch, which searches it directly (no hmmpress needed).
Shipping the gzip flatfile keeps the download ~10x smaller than a binary index, stays portable across HMMER versions/platforms, and lets the same artefact serve MATLAB RAVEN.
ensure_kegg_taxonomy¶
Ensure the KEGG taxonomy artefact is cached; return its (gzipped) path.
The gzipped KEGG taxonomy file is the source for domain classification and for
regenerating the phylogenetic distance matrix — RAVEN's keggPhylDist, which GECKO
uses to pick the closest organism for kcat assignment — via
:func:raven_toolbox.reconstruction.kegg.phyl_dist (which reads .gz directly). So
that capability needs only this published artefact, no MATLAB .mat file.
executables_for_set¶
Return the executables in a named set ("all" = the union of every set).
load_into_registries¶
Load a manifest and merge it into the live data/binary registries.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
str | PathLike | None
|
Manifest path or URL; defaults to |
None
|
replace
|
bool
|
If True, clear the existing registries first; otherwise merge (manifest wins). |
False
|
Returns:
| Type | Description |
|---|---|
dict
|
The parsed manifest. |
load_manifest¶
Read and validate a manifest from source (path/URL) or $RAVEN_PYTHON_MANIFEST.
main¶
platform_key¶
Return the <os>-<arch> key used in the registry (e.g. linux-x86_64).
provision_binaries¶
Ensure each executable is available, reporting per-tool outcomes.
With prefer_existing (default) a tool already on PATH or pointed at by its
env var is left as-is ("present") and not downloaded. Otherwise the bundle
is fetched via :func:ensure_binary. Never raises for an individual tool — a
missing platform bundle becomes "unavailable" and a failed download
"error", so a caller can report the whole set at once.
resolve_binary¶
Resolve an executable to a path: arg → env var → PATH → bundled ZIP → error.
The bundled-ZIP step is skipped when auto-fetch is disabled
(:func:autofetch_enabled); resolution then stops at PATH with an actionable
error instead of downloading.
to_binary_registry¶
Project manifest['binaries'] onto the raven_toolbox.binaries._REGISTRY shape.
to_data_registry¶
Project manifest['data'] onto the raven_toolbox.data._DATA_REGISTRY shape.