← Trilogy

The deepest result

Why every information hierarchy
looks the same

DNA uses 4 bases. Language uses ~40 phonemes. Proteins use 20 amino acids. Yet in each domain, evolution converges on the same effective alphabet size — and with it, the same geometry.

scroll

Act I

Three alphabets, wildly different

Every information-generating system writes its messages in a finite alphabet. The raw symbol counts look nothing alike.

Genomic
A
T
G
C
4
nucleotide bases
Linguistic
p
b
t
d
k
g
m
n
s
z
f
v
l
r
j
w
i
e
a
o
u
...
~40
phonemes (median inventory)
Proteomic
A
R
N
D
C
E
Q
G
H
I
L
K
M
F
P
S
T
W
Y
V
20
amino acids
but look at what actually changes

Act II

The effective alphabet

Not every symbol is equally reachable from every other. Mutations, sound changes, and substitutions all show strong neighbour bias. When you measure the number of likely transitions per symbol, the raw alphabets collapse.

Genomic
A
T
G
C
T
G
C
~3
substitutions per base
transition/transversion bias
h = 1.58 bits
Linguistic
p
b
t
d
k
g
m
n
...
t
d
s
~3
likely targets per phoneme
articulatory-feature channelling
h = 1.57–1.65 bits
Proteomic
A
R
N
D
C
E
Q
G
...
S
T
A
V
I
L
D
~7
likely replacements per residue
BLOSUM62 neighbour classes
h = 2.81 bits
same h → same equation → same geometry

Act III

One equation, one curvature band

The geometric state equation has zero free parameters. Feed it the measured entropy rate and the invariant dimension, and the curvature falls out.

κ = ( h ln 2 / ( n − 1))²

The universal invariant is the dimension: n = 2 across every system tested. The curvature varies with scale — but the form of the law does not.

Domain Raw alphabet Effective h (bits) κ n Source
Genomic 4 bases ~3 1.58 1.20–1.34 2.00 45K genomes
Linguistic ~40 phonemes ~3 1.57–1.65 1.18–1.31 34 families, 106K pairs
Proteomic 20 amino acids ~7 2.81 3.80 2.03 UniRef clusters

The convergence

The raw alphabets are accidents of chemistry, articulation, and codon tables. The effective alphabets are set by physics — the number of energetically accessible neighbours at each site.

This convergence is not numerology. It is a measurable consequence of constrained channel capacity: every information hierarchy evolves toward an optimal encoding depth, and the geometry of that encoding is fixed by a single, zero-parameter equation.

hDNA ≈ hphoneme ≈ log2(3) ≈ 1.58 bits  →  κ ≈ 1.2