96 NODES LIVE · AKRON, OH · 1Q 2026

The high-performance
GPU cloud for teams that
ship inference at scale.

Bare-metal B300 NVL8 clusters, NVFP4-optimized inference stack, and the operational know-how to extract every token of throughput your model can deliver.

Reserve compute View benchmarks
Peak Throughput
0tok/s
DeepSeek R1 · single node
FP4 Compute
13,924PFLOPS
Phase 1 · 96 nodes
HBM3e per GPU
288GB
2x vs H200
Uptime
99.95%
30-day trailing
// 01 — COMPUTE

Three ways to buy compute.

On-demand

B300 Single Node

8x B300 NVL8 bare-metal access. Ideal for benchmarking and short-term workloads.

Provisioning < 30 min
Min term 1 hour
Storage 30 TB NVMe inc.
Network 10 Gbps egress
$ 4.10 / GPU·hr
Get access
Managed

Inference-as-a-Service

We deploy and operate your chosen model on dedicated B300. You consume via OpenAI-compatible API.

Models Open-weight only
Quantization NVFP4 / FP8
Region US · APAC soon
Onboarding 2–5 days
from $ 0.45 / M tokens
Discuss workload
// 02 — BENCHMARKS

First-party data, fully reproducible.

Numbers we'd stake our name on.

Five Tier-S open-weight models, three workload profiles, full concurrency sweep. NVFP4 quantization, SGLang serving, TP=8 across 8x B300 NVL8.

Implied gross margin assumes self-hosted inference cost vs OpenRouter listed pricing. The spread tells you where the market currently prices inference operations.

DATA · 2026-04-15 · DRIVER 595.58.03
REPO · github.com/cocloud/b300-benchmark
Model · 1k1k Tok/s Tok/s/GPU Margin
Qwen 3.5 397B
11,124
1,391 88.9%
DeepSeek R1
12,518
1,565 83.8%
GLM-5.1
8,953
1,119 77.8%
MiniMax M2.7
9,710
1,214 61.9%
Kimi K2.5
2,523
315 29.0%
// 03 — WHY COCLOUD

What you actually get besides GPUs.

01

Operational know-how, not rented racks.

We discovered the +27% throughput uplift from driver 595 by running the test ourselves. Driver pinning, NCCL tuning, framework version matching, kernel selection — these decisions ship with every cluster.

02

Bare metal, zero virtualization tax.

No hypervisor overhead. No noisy neighbors. Direct PCIe topology access for full NVLink 5 bandwidth and InfiniBand RDMA at line rate. You see exactly what the GPU sees.

03

Backed by long-term capital.

Spun out from Cornerstone Capital, a $25B AUM investment platform. We're not flipping GPUs for a quick exit — we're building a multi-decade infrastructure business with five-year capacity contracts already in hand.

Built for steady state, not press releases.

Phase 1 is operational at our Akron, OH facility. Phase 2 doubles capacity by Q3 2026. Future expansions across NA and APAC are in active planning.

  • Akron, Ohio: Tier 3, 3 MW dedicated, dark fiber to 6 major IXPs
  • Power: 100% covered by signed 5-year PPA, no rolling exposure
  • Network: 3.2 Tb/s InfiniBand fabric, sub-2µs latency rail-to-rail
  • Operations: 24/7 NOC + on-site SRE, < 4 hr hardware swap SLA
  • Compliance: SOC 2 Type II in audit, ISO 27001 roadmap Q4
96 → 192
Nodes · Q3 2026
3 MW
Datacenter Power
5 yr
Anchor Contract Term
90k
Server-Yrs Operated
"GPU is a commodity. Operating it well is not.
That gap is where infrastructure margins live."
— FROM OUR APRIL 2026 BENCHMARK REPORT

Talk to someone who has actually shipped a B300 cluster.

Tell us about your workload. We'll come back within 24 hours with a sized configuration, an honest throughput estimate, and a price.

Or email us directly: sales@cocloud.ai