96 NODES LIVE · AKRON, OH · 1Q 2026

The high-performance
GPU cloud for teams that
ship inference at scale.

Bare-metal B300 NVL8 clusters, NVFP4-optimized inference stack, and the operational know-how to extract every token of throughput your model can deliver.

Reserve compute → View benchmarks →

Peak Throughput

0tok/s

DeepSeek R1 · single node

FP4 Compute

13,924PFLOPS

Phase 1 · 96 nodes

HBM3e per GPU

288GB

2x vs H200

Uptime

99.95%

● 30-day trailing

// 01 — COMPUTE

Three ways to buy compute.

All deployments run on Tier 3 datacenter facilities with 24/7 NOC, redundant power, and dedicated InfiniBand fabric.

★ Most Popular

B300 Reserved Cluster

Multi-node B300 capacity on 1–5 year contracts. Best unit economics for production inference.

Min commit 8 nodes / 64 GPUs

Interconnect 3.2 Tb/s IB per node

SLA 99.9% uptime

Term 12 / 36 / 60 mo

from $ 2.85 / GPU·hr

Request quote →

On-demand

B300 Single Node

8x B300 NVL8 bare-metal access. Ideal for benchmarking and short-term workloads.

Provisioning < 30 min

Min term 1 hour

Storage 30 TB NVMe inc.

Network 10 Gbps egress

$ 4.10 / GPU·hr

Get access →

Managed

Inference-as-a-Service

We deploy and operate your chosen model on dedicated B300. You consume via OpenAI-compatible API.

Models Open-weight only

Quantization NVFP4 / FP8

Region US · APAC soon

Onboarding 2–5 days

from $ 0.45 / M tokens

Discuss workload →

// 02 — BENCHMARKS

First-party data, fully reproducible.

All numbers from in-house testing on production B300 NVL8. Benchmark code and configs published on GitHub.

Numbers we'd stake our name on.

Five Tier-S open-weight models, three workload profiles, full concurrency sweep. NVFP4 quantization, SGLang serving, TP=8 across 8x B300 NVL8.

Implied gross margin assumes self-hosted inference cost vs OpenRouter listed pricing. The spread tells you where the market currently prices inference operations.

DATA · 2026-04-15 · DRIVER 595.58.03
REPO · github.com/cocloud/b300-benchmark

Model · 1k1k Tok/s Tok/s/GPU Margin

Qwen 3.5 397B

11,124

1,391 88.9%

DeepSeek R1

12,518

1,565 83.8%

GLM-5.1

8,953

1,119 77.8%

MiniMax M2.7

9,710

1,214 61.9%

Kimi K2.5

2,523

315 29.0%

// 03 — WHY COCLOUD

What you actually get besides GPUs.

A GPU is a commodity. The operating discipline that turns it into reliable production inference is not.

Operational know-how, not rented racks.

We discovered the +27% throughput uplift from driver 595 by running the test ourselves. Driver pinning, NCCL tuning, framework version matching, kernel selection — these decisions ship with every cluster.

Bare metal, zero virtualization tax.

No hypervisor overhead. No noisy neighbors. Direct PCIe topology access for full NVLink 5 bandwidth and InfiniBand RDMA at line rate. You see exactly what the GPU sees.

Backed by long-term capital.

Spun out from Cornerstone Capital, a $25B AUM investment platform. We're not flipping GPUs for a quick exit — we're building a multi-decade infrastructure business with five-year capacity contracts already in hand.

Built for steady state, not press releases.

Phase 1 is operational at our Akron, OH facility. Phase 2 doubles capacity by Q3 2026. Future expansions across NA and APAC are in active planning.

Akron, Ohio: Tier 3, 3 MW dedicated, dark fiber to 6 major IXPs
Power: 100% covered by signed 5-year PPA, no rolling exposure
Network: 3.2 Tb/s InfiniBand fabric, sub-2µs latency rail-to-rail
Operations: 24/7 NOC + on-site SRE, < 4 hr hardware swap SLA
Compliance: SOC 2 Type II in audit, ISO 27001 roadmap Q4

96 → 192

Nodes · Q3 2026

3 MW

Datacenter Power

5 yr

Anchor Contract Term

90k

Server-Yrs Operated

Talk to someone who has actually shipped a B300 cluster.

Tell us about your workload. We'll come back within 24 hours with a sized configuration, an honest throughput estimate, and a price.

The high-performance GPU cloud for teams that ship inference at scale.