Derive κ on a Chalkboard

Four steps. One number. No computer.

If κ = (h ln 2 / (n−1))² is truly a universal law with zero free parameters, you shouldn't need a neural network to find it. You should be able to derive it with a pencil and the number 3.

Step 1

Strip the equation bare

For a branching tree, n = 2. The denominator (n−1) vanishes. The factor ln 2 just converts bits to nats. Call the result h_nats.

The equation underneath everything is:

κ = (h_nats)²

The curvature of any information-generating hierarchy is the square of its entropy rate. That's it.

Step 2

Count the mutation paths

DNA has 4 bases: A, C, G, T. When a base mutates, it changes to one of the other 3. An A can become C, G, or T — never A again. From any position in the genome, evolution has exactly 3 branching paths.

The entropy rate of a process with 3 equiprobable choices:

h_nats = ln 3 ≈ 1.0986
— or in bits: log₂ 3 ≈ 1.585

Plug it in:

κ = (ln 3)²
κ = (1.0986)²
κ ≈ 1.207

1.207

Theoretical floor — pure substitution on a 4-letter code

Step 3

Add the biology

Real DNA doesn't just substitute letters. Mutations also insert and delete bases — indels. This gives evolution slightly more informational freedom than pure substitution. The effective branching factor isn't exactly 3. It's closer to ~3.05.

κ = (ln 3.05)²
κ = (1.1154)²
κ ≈ 1.244

The neural encoder trained on 5,550 genomes — with no knowledge of this derivation — finds an optimal curvature range of κ ≈ 1.28–1.34. Direct tree embedding of 107,000 bacterial species gives higher κ at the intra-domain scale. Both sit above our theoretical floor of 1.207, exactly as the biology predicts.

The gap between 1.207 and the measured range is not error. It is the informational signature of indels, context-dependent mutation, and structural variation — real biology beyond pure substitution.

Step 4

Break it with proteins

If this works only for DNA, it's a coincidence. Proteins use a completely different alphabet: 20 amino acids. If the equation is universal, it must predict the protein curvature from the protein alphabet alone.

Naively, 20 amino acids means 19 possible substitutions. That would give κ = (ln 19)² ≈ 8.67. But the measured value is κ ≈ 3.80. Why?

Because biochemistry constrains which substitutions are viable. Mutating a hydrophobic core residue to a charged polar residue destroys the protein. Amino acids cluster into ~6–7 functional groups (the Dayhoff classes, quantified by the BLOSUM62 matrix). The effective substitution alphabet isn't 19. It's ~7.

κ = (ln 7)²
κ = (1.9459)²
κ ≈ 3.78
— measured: 3.80 ± 0.60

3.78 → 3.80

Predicted from alphabet size alone. Measured independently.

Two alphabets. Two effective sizes. Two curvatures. One equation. Zero parameters.

What this means

The geometry is downstream of the branching

The equation doesn't know what a nucleotide is. It doesn't know what an amino acid is. It doesn't know what a phoneme is.

It knows one thing: how many paths branch from each node.

That number — the effective alphabet — determines the entropy rate. The entropy rate determines the curvature. The curvature determines the geometry. Everything else is substrate.

κ = (ln K_eff)²

DNA: K_eff ≈ 3 → κ ≈ 1.2
Proteins: K_eff ≈ 7 → κ ≈ 3.8
Language: K_eff ≈ 3 → κ ≈ 1.2

Language has ~40 phonemes. DNA has 4 bases. Both funnel through ~3 effective transitions — one by transition/transversion bias, the other by articulatory channelling. Different physics. Same bottleneck. Same curvature.

The curvature of the tree of life
is the natural logarithm
of its effective mutation paths,
squared.

The equation doesn't distinguish substrate.
It distinguishes structure.

Tat tvam asi.