Crusoe Launches Managed Inference, Delivering Speed for Production AI
Crusoe Cloud service delivers 9.9x faster time-to-first-token, 5x higher throughput, and seamless scaling with flexible token-based pricing
Crusoe, an infrastructure provider for Artificial Intelligence (AI), announced the General Availability (GA) of Crusoe Managed Inference. This service is tailored for running model inference on Crusoe Cloud, featuring ultra-low latency and improved time-to-first-token speed. It aims to support demanding workloads, enabling AI developers to deploy and scale production-ready models for tasks like AI agents and complex automation.
The service's performance relies on Crusoe's proprietary inference engine, which integrates MemoryAlloy technology. This technology implements a cluster-wide key-value (KV) cache that enhances efficiency by reducing redundant prefills as GPUs access prefix caches across various nodes. Consequently, this design allows for faster and more economical inference for developers.
Erwan Menard, SVP of Product at Crusoe, emphasized the complexities developers face in balancing inference speed and infrastructure costs. Homomorphic Encryption (HE) stated, “With Crusoe Managed Inference, we are not just hosting models; we are solving the most complex parts of the inference stack for AI developers.” HE also highlighted how MemoryAlloy enables exceptional performance metrics, streamlining the process of building large-scale AI applications.
Crusoe Managed Inference aims to simplify the transition from model to production for AI developers, offering measurable performance enhancements and flexible pricing structures:
- Breakthrough speed: Offers a time-to-first-token improvement of 9.9x with MemoryAlloy's intelligent routing features.
- Superior throughput: Processes five times the tokens per second by leveraging dynamic batching.
- Seamless scaling: Provides pay-per-token and provisioned throughput options to accommodate varying workload demands.
The new service is available through the Crusoe Intelligence Foundry, which is designed for expedited production pathways. It offers AI developers easy access to top models and streamlined operations:
- Leading open-source models: Users can run various top models, including Large Language Model Meta AI (LLaMA) and other unique offerings from exclusive labs.
- Managed endpoints: Endpoints tailored for specific models to maximize optimization.
- Production-scale deployments: Users can monitor metrics and optimize throughput.
- Unified interface: An integrated environment simplifies the switch between inference tasks and infrastructure resources.
Industry feedback highlights the service's potential. Roey Lalazar, co-founder and CTO at Wonderful.AI, noted its ability to address major difficulties in large-scale inference thanks to MemoryAlloy. Dhruv Batra, Chief Scientist at Yutori, expressed enthusiasm for the performance enhancements offered. Additionally, Grant Jensen, Co-Founder and CEO at Oaklet, pointed to the service's reliability in meeting the rigorous demands of clinical environments.
AI developers can now access Crusoe Managed Inference through the Crusoe Intelligence Foundry and capitalize on a collection of advanced models.