STWO ML Prover

The STWO ML Prover is a GPU-accelerated proving library that generates zero-knowledge proofs for neural network inference using Circle STARKs over the Mersenne-31 field. It powers both the ZKML verification system and the VM31 privacy protocol.

3.04s/layer
Prove Time
50-112x
FFT Speedup
17 MB
Proof Size
96-bit
Security

Architecture

The prover has two modes of operation:

🧠
ZKML Mode
Prove neural network forward passes. GKR protocol walks the computation graph layer-by-layer, generating sumcheck proofs for each operation.
🔐
VM31 Privacy Mode
Generate STARK proofs for UTXO transactions. Deposit, withdraw, and spend circuits with Poseidon2-M31 constraints.

Proof Pipeline

1
Model Compilation
Neural network (ONNX, HuggingFace SafeTensors) → ComputationGraph DAG. Each node is a typed operation: MatMul, Activation, LayerNorm, RoPE, Attention, etc.
2
Forward Pass + Trace
Execute the model on GPU, recording all intermediate values. Dual-track: f32 for output + M31 for proving.
3
GKR Proving
Walk the computation graph output→input. Generate sumcheck proofs per layer. MatMul gets 42-255x trace reduction via multilinear extension sumcheck.
4
STARK Aggregation
Aggregate non-matmul components (activations, LayerNorm, RMSNorm) into a single unified STARK proof. Combine with GKR layer proofs.
5
On-Chain Submission
Serialize proof to Cairo-compatible calldata. Submit to SumcheckVerifier contract. Fiat-Shamir replay verifies the entire computation on-chain.

Supported Layer Types

LayerProof MethodTrace ReductionGPU Accelerated
MatMulSumcheck (degree-2)42-255xYes
Activation (ReLU, GELU, Sigmoid)LogUp lookup table1x (constant)Yes
LayerNormEq-sumcheck + LogUp rsqrt1xYes
RMSNormEq-sumcheck + LogUp rsqrt1xYes
Attention (MHA/GQA/MQA)Composed sub-matmulsvariesYes
RoPELogUp rotation table1xNo
EmbeddingLogUp sparse lookup1xNo
DequantizeLogUp 2D table1xNo
Add/MulLinear split / Eq-sumcheck0 rounds / log(n)No
Sumcheck MatMul Innovation

The key insight: matrix multiplication C[i,j] = Σ_k A[i,k] · B[k,j] can be verified via sumcheck over multilinear extensions (MLEs) of A and B. This reduces the STARK trace from O(m×k×n) naive rows to O(m×n + m×k + k×n) — a 42-255x reduction for production-sized matrices.

GPU Backend

The prover uses STWO's GPU backend with CUDA kernels:

OperationGPU SpeedupThreshold
Circle FFT50-112x≥ 1M elements
Sumcheck rounds10-50x≥ 16K elements
MLE restriction5-20x≥ 16K elements
Merkle hashing (Blake2s)2-4x≥ 64K leaves
FRI folding5-15xLog_size ≥ 14

Multi-GPU Support

  • Throughput mode: Independent proofs across GPUs
  • Distributed mode: Single proof split across GPUs
  • Device detection: Auto-detect compute capability, SM count, tensor cores
  • Memory management: 80% safety margin, chunked processing for 2^26+ polynomials

Model Support

🤖
HuggingFace Loader
Direct loading from HuggingFace SafeTensors format. Supports f16, bf16, f32, INT4, INT8 weights.
📦
ONNX Compiler
Import PyTorch/TensorFlow models via tract-onnx. Auto-detect activations and layer types.
💾
Tile Streaming
For models exceeding GPU memory: double-buffered tile pipeline loads weight tiles while proving. Background I/O hidden behind sumcheck computation.
🔄
SIMD Block Batching
Prove N identical transformer blocks in one GKR pass with only log(N) extra sumcheck rounds per layer.

Verified Models

ModelParametersLayersProve TimeStatus
Single MatMul1Under 100msOn-chain verified
MLP Network~100K3~500msOn-chain verified
LayerNorm Chain~500K3~1sOn-chain verified
Residual Network (DAG)~1M4~2sOn-chain verified
Qwen3-14B14B4037.64sOn-chain verified (18-TX streaming)

VM31 Transaction Proving

For the VM31 UTXO privacy pool, the prover generates STARK proofs for three transaction types:

TransactionPoseidon2 PermsTrace WidthDescription
Deposit2~1,372 colsPublic → UTXO commitment
Withdraw32 (padded)~20,932 colsUTXO → public (Merkle + nullifier)
Spend64 (padded)~41,864 cols2-in/2-out private transfer

Batch Proving

The VM31 relayer batches up to 1,000 transactions into a single GKR proof:

  • Proving time: ~2 seconds on GPU (H100)
  • Verification: ~300K gas on Starknet
  • Cost per transaction: ~0.0003 STRK

Security Parameters

ParameterValue
pow_bits26
n_queries70
log_blowup1
Security level96-bit
FieldM31 (p = 2³¹ - 1)
Extension fieldQM31 (128-bit secure)
HashPoseidon2-M31 (Fiat-Shamir)

Confidential Computing

For models requiring data privacy, the prover supports TEE-based confidential computing on NVIDIA GPUs:

  • H100/H200: Hardware attestation via SPDM protocol
  • CPU attestation: TDX/SEV-SNP via nvTrust SDK
  • Memory encryption: AES-GCM-256 matching GPU DMA
  • Key derivation: HKDF with SHA-256

Next Steps