M31 Field Analysis: Mathematics of Native Privacy Proving

Companion: Crypto Primitives

1. The Mersenne-31 Field

1.1 Definition

M31: p = 2^31 - 1 = 2,147,483,647 (Mersenne prime)

Arithmetic:
  Addition:       a + b mod p
  Subtraction:    a - b mod p
  Multiplication: a × b mod p  →  (a × b) mod (2^31 - 1)

  Key property: reduction mod (2^31 - 1) is a single shift + add:
    x mod p = (x & 0x7FFFFFFF) + (x >> 31)

  Cost: ~1-2 CPU clock cycles per multiplication (vs ~20 for 254-bit fields)

1.2 Extension Tower

M31 is extended to provide sufficient security margin:

M31 (31 bits)
  │
  └── CM31 = M31[i] / (i² + 1)      (62 bits, 2 M31 elements)
        │
        └── QM31 = CM31[u] / (u² - 2 - i)  (124 bits, 4 M31 elements)

QM31 element: (a + bi) + (c + di)u where a, b, c, d ∈ M31

QM31 multiplication:

(a₁ + b₁i + c₁u + d₁iu) × (a₂ + b₂i + c₂u + d₂iu)

Requires: 16 M31 multiplications + 16 M31 additions
(using Karatsuba: 9 M31 multiplications + ~20 additions)

1.3 Security Margin

Extension	Bits	Security (multiplicative group)	Sufficient?
M31	31	~15 bits	No
CM31	62	~31 bits	No
QM31	124	~62 bits (multiplicative)	Marginal
QM31 (STARK proof)	124	~124 bits (hash-based)	Yes

Critical distinction:

For discrete-log-based crypto (ElGamal, Pedersen over EC), QM31's multiplicative group provides only ~62 bits of security — insufficient.
For hash-based crypto (Poseidon commitments, STARK proofs), security depends on the hash output size, not the field's multiplicative group order. Poseidon over QM31 with 4-element output provides ~124 bits — sufficient.

Implication: VM31 privacy MUST use hash-based primitives (Poseidon commitments, Poseidon Merkle trees) rather than discrete-log-based primitives (ElGamal, Pedersen EC commitments). This is not a limitation — it's the direction modern ZK protocols (Aztec 3, Mina) are already moving.

2. M31 vs felt252: Cost Comparison

2.1 Field Operation Costs

Operation	M31	felt252	M31/felt252 Ratio
Addition	1 cycle	4 cycles	4x faster
Multiplication	1-2 cycles	20 cycles	10-20x faster
Inversion	~310 muls (Fermat)	~251 muls (Fermat)	~equivalent
Memory per element	4 bytes	32 bytes	8x less

2.2 Poseidon Hash Comparison

Poseidon-252 (Starknet native):

State width: 3 felt252 elements
Full rounds: 8 (MDS + S-box on all elements)
Partial rounds: 83 (MDS + S-box on 1 element)
Total multiplications: ~91 × 3 × 3 = ~819 felt252 muls
                     = ~819 × 20 cycles = ~16,380 CPU cycles

Poseidon-M31 (VM31 native):

State width: 16 M31 elements (wider state for security)
Full rounds: 8
Partial rounds: 14
Total multiplications: 22 × 16 × 16 = ~5,632 M31 muls
                     = ~5,632 × 1.5 cycles = ~8,448 CPU cycles

But: GPU can parallelize the 16-wide state operations
GPU-accelerated: ~300-500 effective cycles per hash

Poseidon-252 emulated in M31 (the approach we're avoiding):

Each felt252 multiplication in M31 limbs:
  felt252 = 8 × M31 limbs (32 bits each)
  Schoolbook multiply: 64 M31 muls + 15 carry propagations
  Reduction mod p_252: ~20 M31 muls
  Total: ~84 M31 muls per felt252 mul

Poseidon-252 in M31: 819 × 84 = ~68,796 M31 muls

Overhead vs native Poseidon-M31: 68,796 / 5,632 = ~12.2x
Overhead vs native Poseidon-252: 68,796 × 1.5 / 16,380 = ~6.3x

2.3 Aggregate Circuit Cost Comparison

Confidential transfer (1 transaction):

Component	felt252 (emulated in M31)	M31 (native)	Speedup
Poseidon hash (nullifier)	68,796 M31 muls	5,632 M31 muls	12.2x
Merkle proof (20 levels)	1,375,920 M31 muls	112,640 M31 muls	12.2x
Range proof (64-bit)	2,048 M31 muls	2,048 M31 muls	1x
Balance proof	2,048 M31 muls	2,048 M31 muls	1x
Commitment verification	68,796 M31 muls	5,632 M31 muls	12.2x
Total	~1,517,608	~128,000	~12x

With GPU acceleration (100x throughput): ~1.28ms per transaction (native M31).

Batch of 1000 transactions:

M31 native: ~128M M31 muls → ~1.3s on GPU
felt252 emulated: ~1.52B M31 muls → ~15s on GPU
Noir (BN254, CPU): ~45 minutes

3. Poseidon-M31 Specification

3.1 Why Poseidon-M31 (Not Poseidon-252)

Property	Poseidon-252	Poseidon-M31
Field	felt252	M31
Native to prover	No (emulated)	Yes
Native to Starknet	Yes (builtin)	No (verified via STARK)
Cost in M31 circuit	~68,800 M31 muls	~5,600 M31 muls
Security level	~252 bits	~124 bits (QM31 output)
GPU parallelism	Poor (wide limbs)	Excellent (SIMD-friendly)

3.2 Parameter Selection

Following the Poseidon paper's security analysis for the M31 field:

t = 16          (state width in M31 elements)
d = 5           (S-box degree: x^5 is the lowest non-trivial power coprime to p-1)
R_f = 8         (full rounds, provides margin against statistical attacks)
R_p = 14        (partial rounds, sufficient for algebraic attack resistance)

Security analysis:

Algebraic attacks: min(R_f + R_p, d^R_p) > 2^124 ✓
Statistical attacks: R_f ≥ 6 + log_d(t) ≈ 6 + 1.7 = 8 ✓
Interpolation attacks: d^(R_f + R_p) > p^t ✓

3.3 Output Encoding

For 124-bit security:

Hash output = 4 M31 elements = 1 QM31 element
  = 4 × 31 bits = 124 bits of output

Commitment: H(value, blinding) = Poseidon_M31(value || blinding || padding)[0..4]
Nullifier:  H(secret, index)  = Poseidon_M31(secret || index || padding)[0..4]

3.4 Domain Separation

COMMITMENT_DOMAIN = 0x434F4D4D  // "COMM"
NULLIFIER_DOMAIN  = 0x4E554C4C  // "NULL"
MERKLE_DOMAIN     = 0x4D45524B  // "MERK"
ENCRYPTION_DOMAIN = 0x454E4352  // "ENCR"

Usage: state[0] = domain_tag before first permutation

4. Elliptic Curves Over M31 — Why We Don't Need Them

4.1 The Traditional Approach

Traditional privacy protocols (Zcash, Tongo, Obelysk v1) use elliptic curve cryptography:

Pedersen commitments: C = v·G + r·H (requires EC point multiplication)
ElGamal encryption: (r·G, m·H + r·PK) (requires EC point multiplication)
Schnorr proofs: (k·G, s = k - e·x) (requires EC scalar mul)

These require a curve with a large prime-order subgroup (128+ bits). Over M31:

The Circle group has order 2^31 — too small
Over CM31: order ~2^62 — still too small
Over QM31: possible but complex and uncharted territory

4.2 The Hash-Based Alternative

Modern ZK protocols are moving away from EC-based primitives toward hash-based alternatives:

EC-Based	Hash-Based Alternative	Security Basis
Pedersen commitment: `v·G + r·H`	Poseidon commitment: `H(v, r)`	Hash collision resistance
ElGamal encryption	Symmetric encryption: `E_k(m) = m ⊕ H(k, nonce)` + ZK proof	PRF security
Schnorr proof of knowledge	STARK proof of hash preimage	Soundness of STARK
EC-based Merkle tree	Poseidon-M31 Merkle tree	Hash collision resistance

Key insight: In a STARK-based system, you don't need interactive proofs (Schnorr). The STARK itself proves everything — knowledge of preimages, correct computation, range validity. Interactive EC-based proofs are unnecessary overhead.

4.3 Homomorphic Properties

What we lose: Pedersen commitments are additively homomorphic (C(a) + C(b) = C(a+b)). Poseidon commitments are not.

Why it doesn't matter: Homomorphic commitments are useful for on-chain balance updates without proofs. In the VM31 model, ALL state transitions are proved in ZK — the contract just verifies the STARK proof and updates the state root. No homomorphic operations needed on-chain.

Old model (Tongo/Obelysk v1):
  On-chain:  encrypted_balance = encrypted_balance ⊕ encrypted_amount
  (Homomorphic, no proof needed for update itself)

New model (VM31):
  Off-chain: prove(old_balance - amount >= 0, new_commitment = H(new_balance, new_r))
  On-chain:  verify STARK proof, update state_root
  (No homomorphism needed, proof covers everything)

5. Bigint Arithmetic in M31 (For Interoperability)

5.1 When You Need felt252 Anyway

Some operations inherently require felt252 arithmetic:

Interacting with Starknet storage (contract addresses are felt252)
Verifying existing Poseidon-252 hashes (for bridge proofs)
Interoperating with ERC-20 token amounts (uint256)

5.2 Limb Representation

felt252 in M31 limbs:
  f = f₇·B⁷ + f₆·B⁶ + f₅·B⁵ + f₄·B⁴ + f₃·B³ + f₂·B² + f₁·B + f₀

Where:
  B = 2^32 (limb base, fits in M31 with 1 bit headroom)
  f_i ∈ [0, 2^32) ⊂ M31
  8 limbs × 32 bits = 256 bits ≥ 252 bits

Multiplication:
  Schoolbook: 64 M31 muls + carry propagation
  Karatsuba: 44 M31 muls + more additions

Reduction mod p_252:
  ~20 M31 additions (Barrett reduction)

5.3 When To Use Bigint vs Native

Operation	Use Native M31	Use Bigint (felt252 in M31)
Privacy commitments	✓
Nullifier derivation	✓
Merkle trees	✓
Range proofs	✓
ERC-20 amount handling		✓ (uint256 token amounts)
Contract address verification		✓ (Starknet addresses)
Bridge proofs		✓ (cross-chain hash verification)

Design principle: Keep the privacy circuit in native M31. Only use bigint for the "edges" where the privacy system interfaces with Starknet's felt252 world.

6. Security Analysis

6.1 Collision Resistance of Poseidon-M31

For a Poseidon hash with t=16 state width and 4 M31 output elements:

Output space: |M31|⁴ = (2^31 - 1)⁴ ≈ 2^124

Birthday bound for collision: √(2^124) = 2^62 hash evaluations

Security level: ~124 bits against generic attacks
               ~124 bits against algebraic attacks (R_p ≥ 14)

124 bits vs 128 bits: The industry standard is 128-bit security. QM31 provides 124 bits — a 4-bit shortfall. In practice:

2^124 operations ≈ $10^25 at current compute costs (infeasible)
2^128 operations ≈ 16 × 2^124 (16x harder — marginally better)
For comparison, Bitcoin's PoW provides ~80 bits of security
Conclusion: 124 bits is sufficient for all practical purposes

6.2 Preimage Resistance

Poseidon-M31 with t=16 input and 4-element output:

Preimage: find x such that H(x) = y
Generic attack: 2^124 evaluations (same as collision)
No known algebraic shortcuts for Poseidon with sufficient rounds

6.3 STARK Soundness

The STWO STARK proof system provides:

Security = log_blowup × n_queries + pow_bits
         = 8 × 14 + 20
         = 132 bits

This is INDEPENDENT of the M31 field size.
The STARK proof's soundness comes from FRI, not from the field's group structure.

6.4 Comparison With Competing Systems

System	Proof Security	Crypto Security	Trusted Setup
Groth16 (BN254)	128 bits	128 bits (EC)	Yes
UltraPlonk (Noir)	128 bits	128 bits (EC)	Yes (universal)
STARK (STWO/M31)	132 bits	124 bits (hash)	No
Bulletproofs (Curve25519)	N/A (transparent)	128 bits (EC)	No

VM31 trades 4 bits of crypto security for no trusted setup and 500-2000x performance. This is an excellent tradeoff.

7. GPU Acceleration Properties of M31

7.1 SIMD Friendliness

M31 element: 32 bits (fits in a single int32)

SIMD packing:
  AVX-512:  16 × M31 per register (512 / 32)
  AVX2:      8 × M31 per register (256 / 32)
  NEON:      4 × M31 per register (128 / 32)

Compare felt252:
  AVX-512:  2 × felt252 per register (512 / 256)

M31 processes 8x more elements per SIMD instruction than felt252.

7.2 GPU Thread Utilization

CUDA warp: 32 threads × 32-bit M31 = 1024 bits of field data per warp
           vs  32 threads × 256-bit felt252 = 8192 bits per warp

M31 requires 8x less memory bandwidth per field element.
At memory-bandwidth-bound operations: 8x more throughput.

For compute-bound operations (e.g., multiplication):
  M31 mul: 1 instruction (IMUL32)
  felt252 mul: ~15 instructions (limb multiply + carry chain)

  15x more operations per thread × 8x less memory = ~120x total throughput advantage

7.3 Existing CUDA Kernels (Already Built)

M31 Kernels in stwo-ml/gpu_sumcheck.rs:
  m31_gemm_kernel         — Matrix multiply (16×16 blocks)
  m31_add_kernel          — Element-wise addition
  m31_mul_kernel          — Element-wise multiplication
  m31_relu_kernel         — ReLU activation
  m31_restrict_rows_kernel — Fused MLE restrict
  m31_restrict_cols_kernel — Fused MLE restrict
  logup_denominator_kernel — LogUp denominator computation
  logup_3way_round_kernel  — Degree-3 round polynomial
  logup_4way_reduce_kernel — Cross-block reduction
  logup_3way_fold_kernel   — Simultaneous 3-MLE fold
  combine_blocks_kernel    — SIMD block batching

All kernels reusable for crypto circuits without modification.

8. Summary of Key Mathematical Results

Property	Value	Implication
M31 mul cost	~1.5 CPU cycles	10-20x faster than BN254
QM31 security	~124 bits	Sufficient (industry standard is 128)
Poseidon-M31 cost	~5,600 M31 muls	12x cheaper than emulated Poseidon-252
Merkle proof (20 lvl)	~112,640 M31 muls	12x cheaper than emulated
GPU throughput advantage	~120x vs felt252	Dominates for batch operations
STARK proof security	132 bits	Independent of field size
No trusted setup	N/A	Eliminates MPC ceremony risk
Confidential transfer	~128,000 M31 muls	~1.3ms on GPU
Batch of 1000 transfers	~128M M31 muls	~1.3s on GPU

Bottom line: M31-native privacy proving is 12x cheaper per operation than felt252 emulation, with an additional 120x throughput advantage from GPU acceleration. Combined with batch proving (1000 txs per proof), the economics enable privacy at a scale impossible with any other system.

M31 Field Analysis — VM31 Project Bitsage Network, February 2026