The deepest result
DNA uses 4 bases. Language uses ~40 phonemes. Proteins use 20 amino acids. Yet in each domain, evolution converges on the same effective alphabet size — and with it, the same geometry.
Act I
Every information-generating system writes its messages in a finite alphabet. The raw symbol counts look nothing alike.
Act II
Not every symbol is equally reachable from every other. Mutations, sound changes, and substitutions all show strong neighbour bias. When you measure the number of likely transitions per symbol, the raw alphabets collapse.
Act III
The geometric state equation has zero free parameters. Feed it the measured entropy rate and the invariant dimension, and the curvature falls out.
The universal invariant is the dimension: n = 2 across every system tested. The curvature varies with scale — but the form of the law does not.
| Domain | Raw alphabet | Effective | h (bits) | κ | n | Source | |
|---|---|---|---|---|---|---|---|
| Genomic | 4 bases | ~3 | 1.58 | → | 1.20–1.34 | 2.00 | 45K genomes |
| Linguistic | ~40 phonemes | ~3 | 1.57–1.65 | → | 1.18–1.31 | — | 34 families, 106K pairs |
| Proteomic | 20 amino acids | ~7 | 2.81 | → | 3.80 | 2.03 | UniRef clusters |
The convergence
The raw alphabets are accidents of chemistry, articulation, and codon tables. The effective alphabets are set by physics — the number of energetically accessible neighbours at each site.
This convergence is not numerology. It is a measurable consequence of constrained channel capacity: every information hierarchy evolves toward an optimal encoding depth, and the geometry of that encoding is fixed by a single, zero-parameter equation.
hDNA ≈ hphoneme ≈ log2(3) ≈ 1.58 bits → κ ≈ 1.2