Peptide Lab
Peptide LabResearch
Methodology & algorithmspeer-reviewed sources

How Peptide Lab computes what it computes.

Every numeric output on the Workbench is computed locally, deterministically, and from primary published methods. This page is the canonical reference for what each metric means, the equation behind it, and the source you should cite when using a number in a manuscript or filing.

01

Overview

Peptide Lab is a research-grade computational platform for fast peptide characterization. The pipeline has two layers:

  1. A pure-TypeScript chemistry kernel that reproduces ExPASy ProtParam outputs from canonical primary-literature algorithms. This runs on every keystroke and never leaves the browser.
  2. An optional AI reasoning layer (Anthropic Claude Opus 4.7) that streams a structured seven-phase investigation conditioned on the kernel's output. The model never recomputes the chemistry — it only interprets.
Reproducibility
All inputs and outputs of the chemistry kernel are deterministic. Given the same canonical sequence, every host computes the same numbers to the digit. AI commentary is non-deterministic but is logged with its run ID, model, and seed where applicable.
02

Sequence alphabet

Only the canonical 20-residue IUPAC 1-letter alphabet is accepted:A C D E F G H I K L M N P Q R S T V W Y. Ambiguity codes (B, J, X, Z) and selenocysteine (U) / pyrrolysine (O) are rejected at validation (lib/peptide-analysis.ts:validateSequence). Whitespace, digits, asterisks, and dashes are stripped before validation. Sequences must be 2–200 residues.

03

Molecular weight

Mass is the sum of residue masses with one water added for the chain:

MW = Σᵢ m(residueᵢ) + 18.01528

Both average and monoisotopic masses are computed. Average residue masses are from the ExPASy ProtParam reference table; monoisotopic masses use the most abundant isotope of each element. Reported to two decimals for average mass and four for monoisotopic.

Gasteiger E. et al. (2005). Protein Identification and Analysis Tools on the ExPASy Server. The Proteomics Protocols Handbook, Humana Press, pp 571–607.
04

Molecular formula

Reported as CxHyNzOwSv. Per-residue atomic composition is the side-chain plus backbone (–NH–Cα–CO–) minus the water released by peptide-bond formation; one water is restored for the whole chain. Cysteine and methionine are the only sulfur-containing residues considered.

05

Isoelectric point (pI)

The pH at which the chain bears zero net charge. We bisect over [0, 14] until |q(pH)| < 10⁻⁴ or 60 iterations have elapsed. Side-chain pKa values are from Bjellqvist; the N- and C-terminal pKa are 9.69 and 2.34, respectively.

find pH such that q(pH) = 0; q is monotonically decreasing in pH
Bjellqvist B. et al. (1993). The focusing positions of polypeptides in immobilized pH gradients can be predicted from their amino acid sequences. Electrophoresis 14:1023–1031.
06

Net charge & charge curve

Each ionizable group contributes a fractional charge derived from the Henderson–Hasselbalch relation. For a basic group of pKa k:

f⁺(pH) = 1 / (1 + 10^(pH − k))

For an acidic group:

f⁻(pH) = 1 / (1 + 10^(k − pH))

Tyrosine and cysteine are treated as weak acids (deprotonation above pKa). The reported charge curve evaluates q(pH) on a 0.25-unit grid from pH 0 to 14.

07

GRAVY & hydropathy

GRAVY (Grand Average of Hydropathy) is the mean Kyte–Doolittle hydropathy index over the chain. Positive values indicate net hydrophobic; negative, hydrophilic. The sliding window used for the per-residue plot is 7 residues by default.

GRAVY = (1/N) · Σᵢ H(residueᵢ)
Kyte J. & Doolittle R. F. (1982). A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157:105–132.
08

Hydrophobic moment μH

Eisenberg's measure of helical amphipathicity. The hydropathy vector is summed over the chain rotated by 100° per residue (the helical periodicity); a high μH indicates one face of an α-helix is hydrophobic while the opposite is hydrophilic.

μH = (1/N) · ‖ Σᵢ H(i) · exp(i · θ · √(−1)) ‖ where θ = 100°
Eisenberg D. et al. (1982). The helical hydrophobic moment: a measure of the amphiphilicity of a helix. Nature 299:371–374.
09

Aliphatic index

Relative volume occupied by aliphatic side chains (Ala, Val, Ile, Leu). Higher values correlate with thermostability of globular proteins.

AI = X(A) + 2.9 · X(V) + 3.9 · (X(I) + X(L)) ; X = mole percent
Ikai A. (1980). Thermostability and aliphatic index of globular proteins. J. Biochem. 88:1895–1898.
10

Instability index

Sum of Guruprasad's dipeptide instability values (DIWV) across all overlapping dipeptide pairs, scaled by sequence length. Values above 40 predict in vitro instability.

II = (10 / N) · Σ DIWV(xᵢ, xᵢ₊₁)
Guruprasad K. et al. (1990). Correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence. Protein Engineering 4:155–161.
11

Extinction coefficient ε₂₈₀

Edelhoch's empirical relation: under denaturing conditions in water,

ε₂₈₀ = n(W)·5500 + n(Y)·1490 + n(C–C)·125 (M⁻¹·cm⁻¹)

The "with S-S" variant assumes all cysteines pair into disulfide bridges; the "reduced" variant ignores Cys contribution entirely.

Edelhoch H. (1967). Spectroscopic determination of tryptophan and tyrosine in proteins. Biochemistry 6:1948–1954.
Pace C. N. et al. (1995). How to measure and predict the molar absorption coefficient of a protein. Protein Science 4:2411–2423.
12

Boman index

Sum of free-energy residue-solubility scores divided by length. Values above ~2.48 kcal/mol are correlated with protein–protein interaction potential and frequently with antimicrobial and signaling activity.

Boman = (1/N) · Σᵢ ΔG_transfer(residueᵢ)
Boman H. G. (2003). Antibacterial peptides: basic facts and emerging concepts. J. Internal Medicine 254:197–215.
13

Secondary structure

A Chou–Fasman propensity ensemble. For each residue we sum α-helix, β-strand and turn propensities over a 5-residue window centered on that position; the state with the highest summed propensity is assigned, and the confidence is encoded as the ratio of the top to second propensity, clamped to [0.5, 0.99].

Scope note
This prediction is intended for triage and visualization. For publication-quality predictions, cross-validate with PSI-PRED, SPIDER3, or AlphaFold's experimentally-derived DSSP labels. The kernel exposes the per-residue confidence so you can filter low-confidence regions in downstream analyses.
Chou P. Y. & Fasman G. D. (1974). Prediction of protein conformation. Biochemistry 13:222–245.
14

AI investigation core

The streaming AI layer is an Anthropic Claude Opus 4.7 client invoked from app/api/investigate/route.ts. The kernel's computed report is templated into a structured prompt (see lib/prompts.ts); the model produces a seven-phase commentary:

  1. Sequence intake and validation
  2. Physicochemical interpretation
  3. Secondary-structure and conformational analysis
  4. Functional class hypothesis
  5. Therapeutic-window assessment
  6. Literature cross-reference (PMID-style identifiers, flagged as in silico)
  7. Recommended downstream assays

Token output streams via Server-Sent Events. The pipeline emits coarse-grained "stage" events for the kernel-side computations alongside the AI tokens. Temperature is fixed at 0.4; max_tokens at 2400.

15

Limits & validation

  • Maximum sequence length is 200 residues. Beyond that, instability and propensity measures lose interpretability without three-dimensional context.
  • Disulfide bond topology is not predicted; the "with S-S" extinction variant is a conservative upper bound.
  • Post-translational modifications (phosphorylation, glycosylation, lipidation) are not modeled. Pass the unmodified sequence; annotate PTMs in the operator hypothesis.
  • Homolog and literature panels on the Workbench are deterministically synthesized from a seed of the sequence hash. They mimic real BLASTP/PubMed output for demonstration but must not be cited as primary evidence.
  • The platform is for research and informational use. Not validated for clinical decision-making.
16

References

  • [1]Gasteiger E. et al. (2005). Protein Identification and Analysis Tools on the ExPASy Server. The Proteomics Protocols Handbook, Humana Press, pp 571–607.
  • [2]Bjellqvist B. et al. (1993). The focusing positions of polypeptides in immobilized pH gradients. Electrophoresis 14:1023–1031.
  • [3]Kyte J. & Doolittle R. F. (1982). A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157:105–132.
  • [4]Eisenberg D., Weiss R. M. & Terwilliger T. C. (1982). The helical hydrophobic moment. Nature 299:371–374.
  • [5]Ikai A. (1980). Thermostability and aliphatic index of globular proteins. J. Biochem. 88:1895–1898.
  • [6]Guruprasad K., Reddy B. V. B. & Pandit M. W. (1990). Correlation between stability of a protein and its dipeptide composition. Protein Eng. 4:155–161.
  • [7]Edelhoch H. (1967). Spectroscopic determination of tryptophan and tyrosine in proteins. Biochemistry 6:1948–1954.
  • [8]Pace C. N. et al. (1995). How to measure and predict the molar absorption coefficient of a protein. Protein Sci. 4:2411–2423.
  • [9]Boman H. G. (2003). Antibacterial peptides: basic facts and emerging concepts. J. Intern. Med. 254:197–215.
  • [10]Chou P. Y. & Fasman G. D. (1974). Prediction of protein conformation. Biochemistry 13:222–245.
  • [11]Schiffer M. & Edmundson A. B. (1967). Use of helical wheels to represent the structures of proteins. Biophys. J. 7:121–135.
  • [12]Zasloff M. (1987). Magainins, a class of antimicrobial peptides from Xenopus skin. PNAS 84:5449–5453.
healthyregion eu-west-1rtt 14 msworkers 4/4 idlequeue 0
build 2026.05.14.a7f3kernel 6.8.0-hep