Omics integration (Python)¶
raven-toolbox objects in raven_toolbox.omics, collected from the source of the tracked branch.
Functions¶
| Function | Summary |
|---|---|
hpa_gene_scores |
Numeric gene scores from HPA levels for one tissue (optionally one celltype). |
HPAData |
Tidy HPA proteomics data: one row per (gene, tissue, cell type). |
HPARnaData |
Tidy HPA RNA-seq data: one row per (gene, tissue) with TPM. |
parse_hpa |
Parse an HPA proteomics dump (normal_tissue.tsv; version ≥17 format). |
parse_hpa_rna |
Parse an HPA RNA-seq dump. |
rna_gene_scores |
Numeric gene scores from HPA RNA-seq TPM for one tissue. |
Reference¶
hpa_gene_scores¶
Numeric gene scores from HPA levels for one tissue (optionally one celltype).
Maps HPA's categorical levels to numbers via level_scores (default
:data:HPA_LEVEL_SCORES). Genes absent from the tissue, or whose level is not in the
score table, are omitted from the output (downstream
:func:score_reactions_from_genes will then fall back to no_gene_score for any
reaction whose genes are all absent).
When several cell types per tissue carry the gene, multiple_celltype chooses
between "best" (max score, RAVEN default) and "average" (mean across cell types).
HPAData¶
Tidy HPA proteomics data: one row per (gene, tissue, cell type).
:attr:df columns: gene_id, gene_name, tissue, celltype, level,
reliability. level is the categorical string from HPA; map it to numbers via
:func:hpa_gene_scores (or pass a custom level_scores).
HPARnaData¶
Tidy HPA RNA-seq data: one row per (gene, tissue) with TPM.
:attr:df columns: gene_id, gene_name, tissue, tpm.
expression ¶
{gene_id: TPM} for tissue. Use this directly with
:func:raven_toolbox.init.score.gene_scores_from_expression.
parse_hpa¶
Parse an HPA proteomics dump (normal_tissue.tsv; version ≥17 format).
Expected columns (any reasonable delimiter; HPA ships tab-separated):
Gene Gene name Tissue Cell type Level Reliability. Returns an
:class:HPAData with one row per (gene, tissue, cell type).
parse_hpa_rna¶
Parse an HPA RNA-seq dump.
Accepts the canonical ≥v17 tidy layout (Gene Gene name Tissue TPM, one row per
gene × tissue) or the older wide layout with one TPM column per tissue
(Gene Gene name TissueA TissueB ...) — the latter is melted into the same
tidy shape.
rna_gene_scores¶
Numeric gene scores from HPA RNA-seq TPM for one tissue.
Thin wrapper over :func:raven_toolbox.init.score.gene_scores_from_expression (the same
5·ln(TPM/reference)-clamped scoring used elsewhere): selects the tissue, derives
a reference if none is given (per-gene mean TPM across all tissues — RAVEN's default
for arrayData.threshold), and returns {gene_id: score}.