The deepest result

Why every information hierarchy
looks the same

DNA uses 4 bases. Language uses ~40 phonemes. Proteins use 20 amino acids. Yet in each domain, evolution converges on the same effective alphabet size — and with it, the same geometry.

scroll

Act I

Three alphabets, wildly different

Every information-generating system writes its messages in a finite alphabet. The raw symbol counts look nothing alike.

Genomic

nucleotide bases

Linguistic

...

~40

phonemes (median inventory)

Proteomic

amino acids

Act II

The effective alphabet

Not every symbol is equally reachable from every other. Mutations, sound changes, and substitutions all show strong neighbour bias. When you measure the number of likely transitions per symbol, the raw alphabets collapse.

Genomic

↓

substitutions per base
transition/transversion bias

h = 1.58 bits

Linguistic

...

↓

likely targets per phoneme
articulatory-feature channelling

h = 1.57–1.65 bits

Proteomic

...

↓

likely replacements per residue
BLOSUM62 neighbour classes

h = 2.81 bits

Act III

One equation, one curvature band

The geometric state equation has zero free parameters. Feed it the measured entropy rate and the invariant dimension, and the curvature falls out.

κ = ( h ln 2 / ( n − 1))²

The universal invariant is the dimension: n = 2 across every system tested. The curvature varies with scale — but the form of the law does not.

Domain	Raw alphabet	Effective	h (bits)		κ	n	Source
Genomic	4 bases	~3	1.58	→	1.20–1.34	2.00	45K genomes
Linguistic	~40 phonemes	~3	1.57–1.65	→	1.18–1.31	—	34 families, 106K pairs
Proteomic	20 amino acids	~7	2.81	→	3.80	2.03	UniRef clusters

The raw alphabets are accidents of chemistry, articulation, and codon tables. The effective alphabets are set by physics — the number of energetically accessible neighbours at each site.

This convergence is not numerology. It is a measurable consequence of constrained channel capacity: every information hierarchy evolves toward an optimal encoding depth, and the geometry of that encoding is fixed by a single, zero-parameter equation.

h_DNA ≈ h_phoneme ≈ log₂(3) ≈ 1.58 bits → κ ≈ 1.2

Why every information hierarchylooks the same

Three alphabets, wildly different

The effective alphabet

One equation, one curvature band

Why every information hierarchy
looks the same