STWO ML Prover
The STWO ML Prover is a GPU-accelerated proving library that generates zero-knowledge proofs for neural network inference using Circle STARKs over the Mersenne-31 field. It powers both the ZKML verification system and the VM31 privacy protocol.
Architecture
The prover has two modes of operation:
Proof Pipeline
Supported Layer Types
| Layer | Proof Method | Trace Reduction | GPU Accelerated |
|---|---|---|---|
| MatMul | Sumcheck (degree-2) | 42-255x | Yes |
| Activation (ReLU, GELU, Sigmoid) | LogUp lookup table | 1x (constant) | Yes |
| LayerNorm | Eq-sumcheck + LogUp rsqrt | 1x | Yes |
| RMSNorm | Eq-sumcheck + LogUp rsqrt | 1x | Yes |
| Attention (MHA/GQA/MQA) | Composed sub-matmuls | varies | Yes |
| RoPE | LogUp rotation table | 1x | No |
| Embedding | LogUp sparse lookup | 1x | No |
| Dequantize | LogUp 2D table | 1x | No |
| Add/Mul | Linear split / Eq-sumcheck | 0 rounds / log(n) | No |
The key insight: matrix multiplication C[i,j] = Σ_k A[i,k] · B[k,j] can be verified via sumcheck over multilinear extensions (MLEs) of A and B. This reduces the STARK trace from O(m×k×n) naive rows to O(m×n + m×k + k×n) — a 42-255x reduction for production-sized matrices.
GPU Backend
The prover uses STWO's GPU backend with CUDA kernels:
| Operation | GPU Speedup | Threshold |
|---|---|---|
| Circle FFT | 50-112x | ≥ 1M elements |
| Sumcheck rounds | 10-50x | ≥ 16K elements |
| MLE restriction | 5-20x | ≥ 16K elements |
| Merkle hashing (Blake2s) | 2-4x | ≥ 64K leaves |
| FRI folding | 5-15x | Log_size ≥ 14 |
Multi-GPU Support
- Throughput mode: Independent proofs across GPUs
- Distributed mode: Single proof split across GPUs
- Device detection: Auto-detect compute capability, SM count, tensor cores
- Memory management: 80% safety margin, chunked processing for 2^26+ polynomials
Model Support
Verified Models
| Model | Parameters | Layers | Prove Time | Status |
|---|---|---|---|---|
| Single MatMul | — | 1 | Under 100ms | On-chain verified |
| MLP Network | ~100K | 3 | ~500ms | On-chain verified |
| LayerNorm Chain | ~500K | 3 | ~1s | On-chain verified |
| Residual Network (DAG) | ~1M | 4 | ~2s | On-chain verified |
| Qwen3-14B | 14B | 40 | 37.64s | On-chain verified (18-TX streaming) |
VM31 Transaction Proving
For the VM31 UTXO privacy pool, the prover generates STARK proofs for three transaction types:
| Transaction | Poseidon2 Perms | Trace Width | Description |
|---|---|---|---|
| Deposit | 2 | ~1,372 cols | Public → UTXO commitment |
| Withdraw | 32 (padded) | ~20,932 cols | UTXO → public (Merkle + nullifier) |
| Spend | 64 (padded) | ~41,864 cols | 2-in/2-out private transfer |
Batch Proving
The VM31 relayer batches up to 1,000 transactions into a single GKR proof:
- Proving time: ~2 seconds on GPU (H100)
- Verification: ~300K gas on Starknet
- Cost per transaction: ~0.0003 STRK
Security Parameters
| Parameter | Value |
|---|---|
| pow_bits | 26 |
| n_queries | 70 |
| log_blowup | 1 |
| Security level | 96-bit |
| Field | M31 (p = 2³¹ - 1) |
| Extension field | QM31 (128-bit secure) |
| Hash | Poseidon2-M31 (Fiat-Shamir) |
Confidential Computing
For models requiring data privacy, the prover supports TEE-based confidential computing on NVIDIA GPUs:
- H100/H200: Hardware attestation via SPDM protocol
- CPU attestation: TDX/SEV-SNP via nvTrust SDK
- Memory encryption: AES-GCM-256 matching GPU DMA
- Key derivation: HKDF with SHA-256
Next Steps
- GKR Protocol — interactive proof protocol details
- GPU Acceleration — CUDA kernel specifics
- On-Chain Verification — Cairo verifier contract
- Transformer Proving — proving Llama/Qwen blocks