PHYSICAL AI · THE PROOF LAYER

A demo isn't proof of what your robot actually tested.

WorldFlux turns every AI test into signed, verifiable evidence your customers, insurers, and regulators can inspect, without you ever handing over your models, data, or keys.

signed · verifiable · expiring
The market context
$5T
humanoid-robot market by 2050[1]
Art. 11
EU AI Act makes evidence legally required[4]
>$1B
AI-governance spend by 2030 (Gartner)[3]
01THE GAP

Impressive demos. No way to prove they hold up.

The hardest part of physical AI isn't the demo. It's proving the demo will work in the real world. Today, teams ship a polished video and a spreadsheet of scores, and everyone downstream takes it on faith. The people who write the check, underwrite the risk, or sign off on safety can't independently verify anything.

This isn't our claim. It's the consensus among the investors funding this wave:

The demos are real. … But generalized, reliable deployment in the physical world? That's still ahead of us.[6]
Bessemer Venture Partners
Reliability isn't promised by vendors, but is proven through transparent, continuous evaluation.[8]
Anjney Midha, Venture Partner · a16z
The research community measures success on benchmarks, whereas deployment requires success on the long tail of situations that no benchmark covers.[9]
Oliver Hsu · a16z
Without standardized benchmarks or robust evaluation suites, it's difficult to measure generalization, regressions, or safety performance.[10]
Lonne Jaffe · Insight Partners
02THE FINDING

Same model. Same task. We changed the scene, and it fell apart.

We took OpenVLA, a leading open robot-control model, and ran it through the standard LIBERO test suite. Then we ran the same model again with small, realistic changes: different object positions and environments. Nothing about the model changed.

This is the gap WorldFlux makes visible. As a16z notes, “static benchmarks are contaminated the moment they're published.” Every WorldFlux run produces a signed record anyone can re-open and check.

evidence.json · signed
−50.0points of success rate, gone after small scene changes
Standard test74.4%
Scene changed24.4%

A reduced 90-episode calibration on selected stress conditions: robustness evidence, not an official leaderboard score.[11]

claim · protocol · evidence · provenanceRead the evidence caveat
03WHY NOW

Physical AI is arriving faster than anyone forecast.

Goldman Sachs raised its humanoid-robot forecast sixfold in a single year. Morgan Stanley projects a multi-trillion-dollar market and nearly a billion robots by 2050. VC funding for humanoid robotics jumped over 300% in 2025.

$38B
Goldman's humanoid forecast by 2035 (6× in one year)[2]
~$5T
humanoid market by 2050 (Morgan Stanley)[1]

And “trust me, it works” is about to stop being enough.

The EU AI Act now requires makers of high-risk AI to produce technical documentation and conformity evidence before they can sell (Article 11; obligations phase in 2026–2028). Gartner projects AI-governance spending will pass $1B by 2030 and warns that “point-in-time audits are simply not enough.” The UK government is building a third-party AI-assurance market for independent verification.

>$1B
AI-governance platform spend by 2030 (Gartner)[3]
£18.8B
UK third-party AI-assurance market by 2035[5]

A massive wave, a hard new requirement, and no infrastructure to meet it. That's the opening.

04HOW IT WORKS

Run it your way. Leave with proof.

  1. 01

    Run

    Test your model on your own hardware: laptop, lab, or cloud. WorldFlux ingests what your eval produced; it never re-runs or hosts your model.

  2. 02

    Sign & verify

    It packages a tamper-evident evidence file (what was claimed, how it was tested, what it scored, where it came from), then signs it with the configured policy so reviewers can verify it independently.

  3. 03

    Share

    Publish an expiring, revocable link. Reviewers open it and re-verify the signature themselves, with no raw logs and no access to your model.

EXHIBIT 01 · CHAIN OF CUSTODY
worldflux · audit pipeline
worldflux audit run lerobot --from libero_run/ --claim claim.json --protocol protocol.json --output pkg/pack: claim · protocol · evidence · provenanceworldflux audit sign pkg/ && worldflux audit verify pkg/signed · verifiedworldflux audit publish pkg/ --share --cloud-run-id <run> --confirm-public-share-upload --approval-file approval.json --password-env WORLDFLUX_SHARE_ACCESS_CODE# Sigstore and ML-BOM are configured evidence optionshttps://worldflux.ai/r/<token> · trusted signer policy · default 7-day TTL, max 30 days

Your compute, your keys, and your model weights never leave your hardware. WorldFlux audits what happened. It never hosts your AI.

05WHY WORLDFLUX

A vendor grading its own homework isn't evidence.

Experiment trackers record your numbers, but numbers you report yourself don't convince a skeptical buyer. Hosted evaluation services make you upload your model, which serious teams can't do. WorldFlux is the neutral layer in between: independent, signed evidence, produced without ever taking custody of your IP.

01

Independent by design

Evidence is signed under an explicit policy and verifiable by reviewers, not self-attested.

02

Your IP stays yours

Bring your own compute and keys; we never host weights or proxy credentials.

03

Built for robots, not chatbots

Ingests real robotics test harnesses (LeRobot, OpenPI, GR00T, and more), not generic chatbot logs.

06WHERE IT FITS

Ingests the outputs your field already produces.

WorldFlux can ingest, wrap, or catalog outputs from selected robotics and VLA ecosystems. Live provider execution is production-backed only when explicitly marked:

  • NVIDIA Cosmos
  • NVIDIA Isaac GR00T
  • Physical Intelligence π
  • OpenVLA
  • V-JEPA 2
  • SmolVLA

Evidence can be signed with self-sign or configured Sigstore policies, can include CycloneDX ML-BOM sidecars when enabled, and maps to the frameworks buyers cite: EU AI Act, NIST AI RMF, ISO 42001, SOC 2, GDPR. We make evidence inspectable, not certified.

WorldFlux is in beta. We're taking on a small number of design partners now.

Book a demo
07GET STARTED

Bring your own compute. Leave with signed proof.

CLIprivate beta

Run local audit packaging on your hardware.

Manual packetscoped

Deliver a signed package and reviewer memo.

Hosted shareapproval gated

Use expiring links only after scoped proof passes.

Design-partner pilot

For teams that need a customer-approved evidence workflow and public-safe memo.

Book a demo
signed · verifiable · expiring

Stop shipping demos. Start shipping proof.