Skip to content

Omics integration

MATLAB functions in RAVEN/omics of the RAVEN toolbox. Help text is collected from the source of the tracked branch.

Functions

Function Summary
parseHPA Parse a database dump of the Human Protein Atlas (HPA).
parseHPArna Parse a dump of Human Protein Atlas (HPA) RNA-Seq data.
scoreModel Score model reactions and genes from HPA and/or array data.

Reference

parseHPA

Parse a database dump of the Human Protein Atlas (HPA).

Input arguments:

Name Type Description Default
fileName char

comma- or tab-separated database dump of HPA protein data. For details regarding the format, see http://www.proteinatlas.org/about/download.

required

Name-value arguments:

Name Type Description Default
version double

accepted for backward compatibility but ignored; the format is now inferred from the column headers (default 19).

Output arguments:

Name Type Description
hpaData struct

parsed HPA data with fields:

  • genes : cell array with the unique gene IDs (Ensembl in v>=18)
  • geneNames : cell array with the gene symbols (present in v>=18 only)
  • tissues : cell array with the tissue names (may not be unique; one entry per tissue-cell-type combination)
  • celltypes : cell array with the cell type names for each tissue
  • levels : cell array with the unique expression level labels
  • types : cell array with the unique evidence types (v<18 only)
  • reliabilities : cell array with the unique reliability levels
  • gene2Level : sparse gene × tissue-cell-type matrix; value i,j is the index in hpaData.levels for gene i in tissue-cell-type j
  • gene2Type : sparse gene × tissue-cell-type matrix (v<18 only)
  • gene2Reliability : sparse gene × tissue-cell-type matrix; value i,j is the index in hpaData.reliabilities for gene i in tissue-cell-type j

Examples:

hpaData = parseHPA(fileName);

parseHPArna

Parse a dump of Human Protein Atlas (HPA) RNA-Seq data.

Input arguments:

Name Type Description Default
fileName char

tab-separated database dump of HPA RNA data. For details regarding the format, see http://www.proteinatlas.org/about/download.

required

Name-value arguments:

Name Type Description Default
version double

accepted for backward compatibility but ignored; the format is now inferred from the column headers (default 19).

Output arguments:

Name Type Description
arrayData struct

parsed HPA RNA data with fields:

  • genes : cell array with the unique Ensembl gene IDs
  • geneNames : cell array with the gene symbols
  • tissues : cell array with the unique tissue (or sample) names
  • levels : matrix of expression values (rows = genes, cols = tissues)

Examples:

arrayData = parseHPArna(fileName);

scoreModel

Score model reactions and genes from HPA and/or array data.

Scores the reactions and genes in a model based on expression data from HPA and/or gene arrays.

Input arguments:

Name Type Description Default
model struct

a model structure.

required
hpaData struct

HPA data structure from parseHPA (optional if arrayData is supplied, default []).

required

Name-value arguments:

Name Type Description Default
arrayData struct

gene expression data structure (optional if hpaData is supplied, default []) with fields:

  • genes : cell array with the unique gene names
  • tissues : cell array with the tissue names. The list may not be unique, as there can be multiple cell types per tissue
  • celltypes : cell array with the cell type names for each tissue
  • levels : GENESxTISSUES array with the expression level for each gene in each tissue/celltype. NaN should be used when no measurement was performed
  • threshold : a single value or a vector of gene expression thresholds, above which genes are considered to be "expressed". (optional, by default, the mean expression levels of each gene across all tissues in arrayData will be used as the threshold values)
tissue char

tissue to score for. Should exist in either hpaData.tissues or arrayData.tissues.

celltype char

cell type to score for. Should exist in either hpaData.celltypes or arrayData.celltypes for this tissue (default is to use the best values among all the cell types for the tissue). Use [] if you want to supply more arguments.

noGeneScore double

score for reactions without genes (default -2).

multipleGeneScoring char

determines how scores are calculated for reactions with several genes, "best" or "average" (default "best").

multipleCellScoring char

determines how scores are calculated when several cell types are used, "best" or "average" (default "best").

hpaLevelScores struct

structure with numerical scores for the expression level categories from HPA. The structure should have a "names" and a "scores" field (default see code for default scores).

Output arguments:

Name Type Description
rxnScores double

scores for each of the reactions in model.

geneScores double

scores for each of the genes in model. Genes which are not in the dataset(s) have -Inf as scores.

hpaScores double

scores for each of the genes in model if only taking hpaData into account. Genes which are not in the dataset(s) have -Inf as scores.

arrayScores double

scores for each of the genes in model if only taking arrayData into account. Genes which are not in the dataset(s) have -Inf as scores.

Examples:

[rxnScores, geneScores, hpaScores, arrayScores] = scoreModel(model, ...
    hpaData, arrayData, tissue, celltype, noGeneScore, ...
    multipleGeneScoring, multipleCellScoring, hpaLevelScores);