Feathr
Feathr is an open-source feature store framework (machine learning infrastructure) for managing, serving, and reusing Machine Learning (ML) features across batch and real-time environments.
- Feature definition, transformation, and lifecycle management across teams (feature store).
- Online and offline feature storage and serving for training and inference (ML data management).
- Support for batch and streaming data sources with built-in transformation capabilities (data engineering).
- Integration with distributed data and compute platforms for feature computation and materialization (data processing).
- APIs, configuration, and tooling for feature discovery, reuse, and governance in ML workflows (MLOps).
More About Feathr
Feathr is an open-source feature store framework (machine learning infrastructure) designed to manage the end-to-end lifecycle of ML features, from definition and computation to storage and serving. It addresses the problem space of fragmented feature engineering, where data scientists and engineers repeatedly rebuild similar features across projects, leading to inconsistent semantics between training and production and duplicated engineering effort.
At its core, Feathr provides a declarative feature definition model (feature store) that allows teams to describe features, their data sources, and transformations in a centralized and reusable way. These definitions can reference batch data sources and streaming inputs (data engineering), enabling a single logical feature to be computed from multiple underlying systems. Feathr includes transformation capabilities such as aggregations and windowed operations, which are executed on underlying distributed data platforms (data processing).
Feathr supports both offline and online feature storage (ML data management), allowing organizations to train models using historical feature snapshots while serving low-latency features for real-time inference. The framework can materialize features into online stores and offline analytical stores using scheduled or event-driven pipelines. This dual-store pattern helps maintain consistency between training and inference data, a core requirement for production-grade ML systems.
The project integrates with popular big data and compute engines (data processing), using them to execute feature computations at scale. Configuration-driven workflows allow teams to specify feature joins, backfills, and materialization jobs without embedding feature logic directly into application code. Feathr exposes APIs and tooling (MLOps) for feature registration, discovery, and access, so multiple models and services can reuse the same vetted features.
In enterprise environments, Feathr is typically positioned as part of an organization’s ML platform layer, serving as the shared feature catalog and serving infrastructure between data engineering systems and model training or serving stacks. It supports use cases such as recommendation systems, ranking models, fraud detection, and other prediction services that rely on consistent, up-to-date features. By standardizing how features are defined, computed, and consumed, Feathr provides a technical foundation for governance, observability, and collaboration around ML features across teams and projects.