Skip to content

Omics integration (Python)

raven-toolbox objects in raven_toolbox.omics, collected from the source of the tracked branch.

Functions

Function Summary
hpa_gene_scores Numeric gene scores from HPA levels for one tissue (optionally one celltype).
HPAData Tidy HPA proteomics data: one row per (gene, tissue, cell type).
HPARnaData Tidy HPA RNA-seq data: one row per (gene, tissue) with TPM.
parse_hpa Parse an HPA proteomics dump (normal_tissue.tsv; version ≥17 format).
parse_hpa_rna Parse an HPA RNA-seq dump.
rna_gene_scores Numeric gene scores from HPA RNA-seq TPM for one tissue.

Reference

hpa_gene_scores

Numeric gene scores from HPA levels for one tissue (optionally one celltype).

Maps HPA's categorical levels to numbers via level_scores (default :data:HPA_LEVEL_SCORES). Genes absent from the tissue, or whose level is not in the score table, are omitted from the output (downstream :func:score_reactions_from_genes will then fall back to no_gene_score for any reaction whose genes are all absent).

When several cell types per tissue carry the gene, multiple_celltype chooses between "best" (max score, RAVEN default) and "average" (mean across cell types).

HPAData

Tidy HPA proteomics data: one row per (gene, tissue, cell type).

:attr:df columns: gene_id, gene_name, tissue, celltype, level, reliability. level is the categorical string from HPA; map it to numbers via :func:hpa_gene_scores (or pass a custom level_scores).

HPARnaData

Tidy HPA RNA-seq data: one row per (gene, tissue) with TPM.

:attr:df columns: gene_id, gene_name, tissue, tpm.

expression

expression(tissue: str) -> dict[str, float]

{gene_id: TPM} for tissue. Use this directly with :func:raven_toolbox.init.score.gene_scores_from_expression.

parse_hpa

Parse an HPA proteomics dump (normal_tissue.tsv; version ≥17 format).

Expected columns (any reasonable delimiter; HPA ships tab-separated): Gene Gene name Tissue Cell type Level Reliability. Returns an :class:HPAData with one row per (gene, tissue, cell type).

parse_hpa_rna

Parse an HPA RNA-seq dump.

Accepts the canonical ≥v17 tidy layout (Gene Gene name Tissue TPM, one row per gene × tissue) or the older wide layout with one TPM column per tissue (Gene Gene name TissueA TissueB ...) — the latter is melted into the same tidy shape.

rna_gene_scores

Numeric gene scores from HPA RNA-seq TPM for one tissue.

Thin wrapper over :func:raven_toolbox.init.score.gene_scores_from_expression (the same 5·ln(TPM/reference)-clamped scoring used elsewhere): selects the tissue, derives a reference if none is given (per-gene mean TPM across all tissues — RAVEN's default for arrayData.threshold), and returns {gene_id: score}.