Cedana continuously identifies available resources globally, across hyperscalers, specialty compute providers and private cloud. Dispatch and orchestrate workloads across all available compute. Break down vendor-specific compute silos.
Deliver best-in-class SMR performance. We continuously optimize performance at the kernel, container, filesystem, network and interconnect layers. We deploy internal testing and simulation to thoroughly measure correctness, reliability and performance.
Live migrate GPU workloads before failures happen while system-level checkpoint/restore capabilities ensure no lost-work even during mid-epoch failures - even on large multi-node clusters.
Increase cluster utilization and productivity across user groups, clusters and training runs.
Improve security and availability, use confidential computing containers and VMs.
Manage training runs across clusters, both on-prem and in the cloud. Resume from system-level checkpoints on GPUs anywhere.
We’ve deployed a test cluster for you to play with where you can interact and experiment with the system.
Learn more about how Cedana is transforming compute orchestration and how we can help your organization.
From deploying on your cluster, to market, to GPU Checkpointing, learn our system and get started quickly.