How Peptide Lab computes what it computes.
Every numeric output on the Workbench is computed locally, deterministically, and from primary published methods. This page is the canonical reference for what each metric means, the equation behind it, and the source you should cite when using a number in a manuscript or filing.
Overview
Peptide Lab is a research-grade computational platform for fast peptide characterization. The pipeline has two layers:
- A pure-TypeScript chemistry kernel that reproduces ExPASy ProtParam outputs from canonical primary-literature algorithms. This runs on every keystroke and never leaves the browser.
- An optional AI reasoning layer (Anthropic Claude Opus 4.7) that streams a structured seven-phase investigation conditioned on the kernel's output. The model never recomputes the chemistry — it only interprets.
Sequence alphabet
Only the canonical 20-residue IUPAC 1-letter alphabet is accepted:A C D E F G H I K L M N P Q R S T V W Y. Ambiguity codes (B, J, X, Z) and selenocysteine (U) / pyrrolysine (O) are rejected at validation (lib/peptide-analysis.ts:validateSequence). Whitespace, digits, asterisks, and dashes are stripped before validation. Sequences must be 2–200 residues.
Molecular weight
Mass is the sum of residue masses with one water added for the chain:
Both average and monoisotopic masses are computed. Average residue masses are from the ExPASy ProtParam reference table; monoisotopic masses use the most abundant isotope of each element. Reported to two decimals for average mass and four for monoisotopic.
Molecular formula
Reported as CxHyNzOwSv. Per-residue atomic composition is the side-chain plus backbone (–NH–Cα–CO–) minus the water released by peptide-bond formation; one water is restored for the whole chain. Cysteine and methionine are the only sulfur-containing residues considered.
Isoelectric point (pI)
The pH at which the chain bears zero net charge. We bisect over [0, 14] until |q(pH)| < 10⁻⁴ or 60 iterations have elapsed. Side-chain pKa values are from Bjellqvist; the N- and C-terminal pKa are 9.69 and 2.34, respectively.
Net charge & charge curve
Each ionizable group contributes a fractional charge derived from the Henderson–Hasselbalch relation. For a basic group of pKa k:
For an acidic group:
Tyrosine and cysteine are treated as weak acids (deprotonation above pKa). The reported charge curve evaluates q(pH) on a 0.25-unit grid from pH 0 to 14.
GRAVY & hydropathy
GRAVY (Grand Average of Hydropathy) is the mean Kyte–Doolittle hydropathy index over the chain. Positive values indicate net hydrophobic; negative, hydrophilic. The sliding window used for the per-residue plot is 7 residues by default.
Hydrophobic moment μH
Eisenberg's measure of helical amphipathicity. The hydropathy vector is summed over the chain rotated by 100° per residue (the helical periodicity); a high μH indicates one face of an α-helix is hydrophobic while the opposite is hydrophilic.
Aliphatic index
Relative volume occupied by aliphatic side chains (Ala, Val, Ile, Leu). Higher values correlate with thermostability of globular proteins.
Instability index
Sum of Guruprasad's dipeptide instability values (DIWV) across all overlapping dipeptide pairs, scaled by sequence length. Values above 40 predict in vitro instability.
Extinction coefficient ε₂₈₀
Edelhoch's empirical relation: under denaturing conditions in water,
The "with S-S" variant assumes all cysteines pair into disulfide bridges; the "reduced" variant ignores Cys contribution entirely.
Boman index
Sum of free-energy residue-solubility scores divided by length. Values above ~2.48 kcal/mol are correlated with protein–protein interaction potential and frequently with antimicrobial and signaling activity.
Secondary structure
A Chou–Fasman propensity ensemble. For each residue we sum α-helix, β-strand and turn propensities over a 5-residue window centered on that position; the state with the highest summed propensity is assigned, and the confidence is encoded as the ratio of the top to second propensity, clamped to [0.5, 0.99].
AI investigation core
The streaming AI layer is an Anthropic Claude Opus 4.7 client invoked from app/api/investigate/route.ts. The kernel's computed report is templated into a structured prompt (see lib/prompts.ts); the model produces a seven-phase commentary:
- Sequence intake and validation
- Physicochemical interpretation
- Secondary-structure and conformational analysis
- Functional class hypothesis
- Therapeutic-window assessment
- Literature cross-reference (PMID-style identifiers, flagged as in silico)
- Recommended downstream assays
Token output streams via Server-Sent Events. The pipeline emits coarse-grained "stage" events for the kernel-side computations alongside the AI tokens. Temperature is fixed at 0.4; max_tokens at 2400.
Limits & validation
- Maximum sequence length is 200 residues. Beyond that, instability and propensity measures lose interpretability without three-dimensional context.
- Disulfide bond topology is not predicted; the "with S-S" extinction variant is a conservative upper bound.
- Post-translational modifications (phosphorylation, glycosylation, lipidation) are not modeled. Pass the unmodified sequence; annotate PTMs in the operator hypothesis.
- Homolog and literature panels on the Workbench are deterministically synthesized from a seed of the sequence hash. They mimic real BLASTP/PubMed output for demonstration but must not be cited as primary evidence.
- The platform is for research and informational use. Not validated for clinical decision-making.
References
- [1]Gasteiger E. et al. (2005). Protein Identification and Analysis Tools on the ExPASy Server. The Proteomics Protocols Handbook, Humana Press, pp 571–607.
- [2]Bjellqvist B. et al. (1993). The focusing positions of polypeptides in immobilized pH gradients. Electrophoresis 14:1023–1031.
- [3]Kyte J. & Doolittle R. F. (1982). A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157:105–132.
- [4]Eisenberg D., Weiss R. M. & Terwilliger T. C. (1982). The helical hydrophobic moment. Nature 299:371–374.
- [5]Ikai A. (1980). Thermostability and aliphatic index of globular proteins. J. Biochem. 88:1895–1898.
- [6]Guruprasad K., Reddy B. V. B. & Pandit M. W. (1990). Correlation between stability of a protein and its dipeptide composition. Protein Eng. 4:155–161.
- [7]Edelhoch H. (1967). Spectroscopic determination of tryptophan and tyrosine in proteins. Biochemistry 6:1948–1954.
- [8]Pace C. N. et al. (1995). How to measure and predict the molar absorption coefficient of a protein. Protein Sci. 4:2411–2423.
- [9]Boman H. G. (2003). Antibacterial peptides: basic facts and emerging concepts. J. Intern. Med. 254:197–215.
- [10]Chou P. Y. & Fasman G. D. (1974). Prediction of protein conformation. Biochemistry 13:222–245.
- [11]Schiffer M. & Edmundson A. B. (1967). Use of helical wheels to represent the structures of proteins. Biophys. J. 7:121–135.
- [12]Zasloff M. (1987). Magainins, a class of antimicrobial peptides from Xenopus skin. PNAS 84:5449–5453.