Vespa
Vespa is an open-source engine for real-time indexing, search, and large-scale data serving used to build applications with low-latency query and recommendation capabilities.
- Open-source platform for real-time search and data serving at scale
- Distributed engine for low-latency indexing, querying, and retrieval
- Support for search, recommendation, personalization, and ranking use cases
- Built-in capabilities for vector search, structured search, and text search
- Designed for horizontally scalable, fault-tolerant deployments in enterprise environments
More About Vespa
Vespa is an open-source platform (search and data serving) for building applications that require real-time indexing, low-latency querying, and large-scale data serving. Enterprises and digital services use Vespa to power search, recommendation, personalization, and ranking features in contexts where both throughput and response time are strict requirements. The platform is designed as a distributed engine that stores and serves data across clusters of machines while maintaining query performance as datasets and traffic volumes increase.
Vespa exposes capabilities for full-text search, structured search, and vector search (vector search / Artificial Intelligence (AI) retrieval), allowing organizations to combine keyword retrieval, filters, and dense vector similarity in a single query. This supports use cases such as content discovery, product search, ad targeting, and recommendation systems where multiple ranking signals and heterogeneous data types need to be processed together. Developers define schemas and ranking profiles that determine how documents are indexed and how relevance scoring is applied at query time.
From an architectural perspective, Vespa uses a distributed architecture based on content nodes and stateless container nodes. Content nodes store indexed documents and execute matching and ranking, while container nodes handle request routing, application logic, and integration with client applications. Data is automatically partitioned and replicated across nodes to support horizontal scalability and high availability. Vespa supports online updates, enabling applications to index new or changed documents in near real time without offline batch processing.
The platform provides ranking and relevance capabilities that combine traditional information retrieval techniques with machine-learned models. Vespa supports ranking expressions and deployment of Machine Learning (ML) models exported from common ML frameworks, enabling inference at query time close to the data. This allows organizations to operationalize recommendation models, click-through prediction models, or relevance models directly inside the serving layer, avoiding separate online inference infrastructure for many use cases.
Vespa integrates with containerized and cloud-native environments through support for orchestration platforms such as Kubernetes (cloud-native deployment). Configuration and application packages describe schemas, services, and deployment settings, which can be rolled out through automated pipelines. Logs, metrics, and administrative APIs support monitoring, capacity planning, and operational management. In an enterprise IT taxonomy, Vespa can be categorized under search infrastructure, real-time data serving, vector database and retrieval, and recommendation and personalization serving.