Skip to main content

Rebuff

Rebuff is an open-source (application security) framework for detecting and mitigating prompt injection and related adversarial inputs in Large Language Model (LLM) applications, maintained under the Protect Artificial Intelligence (AI) ecosystem.

  • Rule- and pattern-based (application security) defenses for prompt injection, prompt leaking, and jailbreak attempts.
  • Runtime inspection and filtering (LLM security) of user prompts and model responses before they reach downstream systems or tools.
  • Extensible policy and defense configuration (policy management) for adapting protections to application-specific threat models.
  • Integration with LLM-based workflows and agents (ML application security), including API- and library-level hooks.
  • Support for evaluation and testing (security testing) of model behavior against common prompt-based attack techniques.

More About Rebuff

Rebuff operates in the (application security) and (ML application security) domains, addressing threats that arise when large language models (LLMs) consume untrusted input from users, tools, or integrated systems. It focuses on prompt injection, jailbreak prompts, prompt leaking, and related adversarial patterns that attempt to override system instructions, exfiltrate hidden context, or trigger unintended model behavior. The framework is part of the broader Protect AI portfolio for Machine Learning (ML) and AI security.

The core purpose of Rebuff is to provide detection and mitigation controls that sit in the LLM request and response path. It analyzes prompts and, in many deployments, generated outputs, looking for token- and phrase-level patterns, structural cues, and other indicators associated with known attack classes. These controls can be applied as a pre-processing layer before data is sent to a model, or as a post-processing layer that examines and optionally filters or annotates responses before they are used by downstream tools, APIs, or end users.

Rebuff exposes capabilities developers can embed directly into LLM-powered applications. Typical integrations include middleware in Hypertext Transfer Protocol (HTTP) or gRPC services, custom guards around calls to foundation model APIs, and security checks inside orchestration code such as agents, chains, or workflow engines (ML orchestration). Configuration is policy-driven, allowing teams to tune detection strictness, define allowlists and blocklists, and specify actions such as blocking, logging, or flagging for review when suspicious prompts are detected.

In enterprise and institutional environments, Rebuff fits into an AI security stack alongside identity and access controls, data protection mechanisms, and observability tools. Security and platform teams can use it to enforce standardized guardrails around all LLM usage, independent of the underlying model provider or orchestration framework. Because it is open source, Rebuff can be deployed on-premises (on-prem) or within private cloud environments, which can align with data residency, compliance, or internal security requirements.

From an architectural standpoint, Rebuff functions as a lightweight security layer that interoperates with common LLM APIs and frameworks (LLM frameworks). It is typically integrated as a library within application code, but it can also be wrapped into shared services or gateways that front multiple LLM workloads. Its extensibility enables organizations to write custom detectors or plug in additional checks tailored to their domain, while still relying on baseline protections for common prompt injection patterns documented by Protect AI and the broader security community.

Within a technical directory, Rebuff is categorized under (application security), (ML application security), and (security testing) for LLMs. It serves AI platform teams, security engineers, and software developers who need programmatic controls to inspect, constrain, and monitor how LLMs interpret and act on natural-language instructions originating from untrusted or semi-trusted sources.