3.4 Define the biomass composition¶
Metabolism must be able to synthesise the macromolecules that make up biomass. GEMs encode this in biomass pseudoreactions that specify which macromolecules (DNA, RNA, protein, lipid, carbohydrate) are required and in what amounts. Defining the biomass composition early ensures the model can biosynthesise everything it needs.
In the template models the biomass is split over several pseudoreactions, e.g. the DNA pseudoreaction:
which then combine into biomass:
Take the pseudoreactions from a template¶
H. polymorpha has poly-unsaturated fatty acids, similar to R. toruloides, so
the rhto-GEM biomass pseudoreactions are used as the starting point. They are
recognised by reaction names ending in pseudoreaction:
biomassRxns = modelRhto.rxns(endsWith(modelRhto.rxnNames, 'pseudoreaction'));
model = addRxnsGenesMets(model, modelRhto, biomassRxns);
Update the stoichiometric coefficients¶
A spreadsheet in the repository's data/biomass/ folder details the
calculations behind the coefficients; the script loads them and applies each
group with changeRxns. The DNA pseudoreaction (r_4050), for example:
DNA.mets = {'s_0584', 's_0589', 's_0615', 's_0649', 's_3720'};
DNA.stoichCoeffs = [-0.00189, -0.00174, -0.00174, -0.00189, 1];
model = changeRxns(model, 'r_4050', DNA, 1);
The same pattern updates RNA (r_4049), protein (r_4047), carbohydrate
(r_4048) and the lipid backbone/chain pseudoreactions (r_4063, r_4065).
Where the coefficients come from¶
Determining a coefficient combines a measured total fraction of a macromolecule with the ratio of its constituents:
| Component | Total fraction from | Constituent ratio from |
|---|---|---|
| DNA | scaled from yeast-GEM (0.24% for the smaller genome) | DNA FASTA nucleotide frequencies |
| RNA | 6% of biomass (as in yeast-GEM) | coding-sequence nucleotide frequencies (from the GenBank file) |
| Protein | measured total protein content | amino-acid frequencies in the protein FASTA |
| Carbohydrate | measured glucan / mannan / trehalose / glycogen | reported levels for H. polymorpha |
| Lipids | measured lipid classes and acyl chains | the SLIME approach (next section) |
For nucleotides, the DNA FASTA gives the frequency of each base — readily counted
with sed/grep:
sed 's/>.*$//' hpo.fna > hanpo_modified.fa
for i in {A,T,G,C}; do
echo -n "$i "; grep -oi $i hanpo_modified.fa | wc -l;
done
Together with the paired nature of A·T and C·G and the nucleotide molecular weights, this gives the amount of each deoxyribonucleotide needed per gram of biomass. Ribonucleotide and amino-acid ratios are derived analogously from the coding sequences and the protein FASTA.
Lipids use SLIME
Lipid representation depends on two measurement types — lipid classes (e.g. triacylglycerol, phosphatidylinositol) and acyl chains (e.g. 16:0, 18:1). This model uses the SLIME formalism (Split Lipids Into Measurable Entities), set up in the next section.
Next: Curation of lipid reactions.