SafetyFilter¶
SafetyFilter is a two-part content filtering system that prevents destructive payloads from being sent to target systems (Input Filter) and protects sensitive information from being exposed to AI agents (Output Filter). SafetyFilter rules are part of the Policy layer and cannot be modified at runtime.
Input filter¶
The Input Filter inspects outgoing HTTP requests (body, URL, query string, headers) against regex rules and blocks or logs matches before the request reaches the target.
How it works¶
When a request is about to be sent (via resend, fuzz, macro, or intercept modify_and_forward), the Input Filter:
- Evaluates the request body, URL, query string, and headers against all input rules
- If a rule matches with
blockaction, the request is rejected with an error - If a rule matches with
log_onlyaction, the match is logged but the request proceeds
Built-in presets¶
| Preset | Rules | Description |
|---|---|---|
destructive-sql |
6 rules | DROP TABLE/DATABASE/INDEX/VIEW/SCHEMA, TRUNCATE TABLE, DELETE without WHERE, UPDATE WHERE 1=1, ALTER TABLE DROP, xp_ stored procedures |
destructive-os-command |
5 rules | rm -rf, shutdown/reboot/halt/poweroff, mkfs, dd if=, Windows format |
Input targets¶
| Target | Description |
|---|---|
body |
Request body content |
url |
Full URL string |
query |
Query string portion of the URL |
header |
Individual header values (use header:Name for specific headers) |
headers |
All header values concatenated |
Blocked response¶
When the Input Filter blocks a request at the proxy layer:
- Status:
403 Forbidden - Header:
X-Block-Reason: safety_filter - Body: JSON with violation details (rule ID, rule name, match location)
When it blocks an MCP tool operation:
- MCP error response with violation details
Output filter¶
The Output Filter prevents sensitive information (PII) from being exposed to AI agents. It inspects response bodies and headers against regex rules and masks matching content before returning data to the agent.
How it works¶
- Proxy layer: Response body and headers are masked before returning to the client
- MCP tool layer: Query results, resend responses, fuzz results, intercept queue entries, compare diffs, and export data are masked before returning to the AI agent
- Raw data preserved: The Flow Store always contains the original unmasked data for human review via the Web UI
Built-in PII presets¶
| Preset | Rules | Description |
|---|---|---|
credit-card |
2 rules | Credit card numbers in separated (1234-5678-9012-3456) and continuous (1234567890123456) formats |
japan-my-number |
1 rule | Japanese My Number (12-digit individual number) with check digit validation |
email |
1 rule | Email addresses (user@example.com) |
japan-phone |
2 rules | Japanese phone numbers in mobile (090-1234-5678) and landline (03-1234-5678) formats |
Validators¶
Some presets use validator functions for additional verification beyond regex matching, reducing false positives:
- credit-card (continuous): Luhn algorithm check -- only masks digit sequences that pass the Luhn checksum
- japan-my-number: Check digit validation -- only masks 12-digit sequences with a valid My Number check digit
Output actions¶
| Action | Description |
|---|---|
mask |
Replace matched content with the replacement string |
log_only |
Log the match but return data unmodified |
Replacement strings¶
Replacement strings support regex capture group references:
| Syntax | Description |
|---|---|
[MASKED:credit_card] |
Static replacement |
$1 |
First capture group |
${name} |
Named capture group |
Two-layer architecture¶
Like Target scope and Rate limits, SafetyFilter uses the Policy + Agent two-layer architecture:
- Policy layer: SafetyFilter rules are defined in the config file and are immutable at runtime
- Agent layer: AI agents can view rules via
get_safety_filterbut cannot modify them
This ensures that safety boundaries remain enforced regardless of agent behavior.
Enforcement points¶
The Input Filter is enforced at every point where the proxy sends data to external targets:
| Tool | Enforcement |
|---|---|
| resend | Body, URL, and headers checked before sending |
| resend_raw | Raw bytes checked before sending |
| fuzz | Template flow checked at start; each expanded payload checked before sending |
| macro | Each step's outbound request checked before sending |
| intercept | modify_and_forward body, URL, and headers checked before forwarding |
The Output Filter is enforced at every point where data is returned to the AI agent:
| Source | Enforcement |
|---|---|
| Proxy responses | Body and headers masked before returning to client |
| query results | Flow data masked in query output |
| resend results | Response body and headers masked |
| fuzz results | Response data masked |
| compare diffs | Header diff values masked |
| export data | Inline export data masked |
Viewing current rules¶
Use the security tool to view the current SafetyFilter configuration:
Response:
{
"enabled": true,
"input_rules": [
{
"id": "destructive-sql:drop",
"name": "DROP statement",
"pattern": "(compiled regex)",
"targets": ["body", "url", "query"],
"action": "block",
"category": "destructive-sql"
}
],
"output_rules": [
{
"id": "credit-card:separated",
"name": "Credit card (separated)",
"pattern": "(compiled regex)",
"targets": ["body"],
"action": "mask",
"replacement": "[MASKED:credit_card]",
"category": "credit-card"
}
],
"immutable": true
}
Custom rules¶
Custom rules are defined in the config file.
Custom input rule¶
{
"id": "custom-api",
"name": "Dangerous API endpoint",
"pattern": "(?i)/api/v[0-9]+/(delete-all|reset)",
"targets": ["url"]
}
Custom output rule¶
{
"id": "custom-api-key",
"name": "API key pattern",
"pattern": "(sk-[a-zA-Z0-9]{32,})",
"targets": ["body"],
"action": "mask",
"replacement": "[MASKED:api_key]"
}
Related pages¶
- Security tool reference -- MCP tool parameter reference
- Target scope -- Host-level access control
- Rate limits & budgets -- Rate limiting and budget controls
- Security model -- Architecture-level security design
- Config file -- SafetyFilter configuration