Automate and optimize compute jobs to reduce cloud costs, increase reliability and scale beyond capacity bottlenecks

Supercharge and optimize your Deep Learning, AI and HPC jobs

Get Started

How it Works

Cedana is an automated system to checkpoint and restore running programs across instances. We pre-empt and migrate your workload, resuming it on another instance exactly from where it left off, without interrupting or breaking anything. This can be across nodes, clusters, data centers, regions and vendors (multi-cloud).

  • Powerful: Make your workloads migratable. You can pre-empt and migrate your CPU/GPU workloads from one instance to another, resuming exactly from where it left off. Resize your jobs dynamically.
  • Automated policies: Automate and optimize your spot instances across price, specifications and availability.
  • Multi-cloud: Enable your workloads to take advantage of multi-cloud and on-prem with minimal engineering lift, future-proofing infrastructure plans.
  • Increase reliability: Add higher SLAs to your services. Always-on resilience and automated policies for stateful applications, bringing high availability fault tolerance to node failure, instance revocation and other failure modes.
  • Easy to use with minimal impact:  no changes to your existing CI/CD, configuration, scripts. We have no dependencies on frameworks/libraries.  
  • Flexible:  Use your existing cloud provider and account to start and expand options automatically from there.