# Guardrails Feature Documentation (Backend)

## Overview

The guardrails feature provides content policy enforcement for documents used in conversations and agent knowledge bases. It prevents documents containing restricted content (based on regex or text patterns) from being used in LLM interactions.

---

## 1. Configuration

**File:** `python-lib/backend/config.py` (lines 486-497)

| Configuration Key | Getter Function | Type | Default | Description |
| --- | --- | --- | --- | --- |
| `guardrails_enabled` | `get_guardrails_enabled()` | `bool` | `False` | Master switch for guardrails |
| `guardrails_pattern` | `get_guardrails_pattern()` | `str` | `""` | Pattern(s) to check against |

### Pattern Format

The pattern can be specified in multiple formats (handled by the `GuardrailsMatcher` class):

**1. Simple regex string:**

```text
"confidential|secret|password"

```

**2. JSON array of rules:**

```json
[
  {"pattern": "confidential", "type": "text"},
  {"pattern": "SSN:\\s*\\d{3}-\\d{2}-\\d{4}", "type": "regex"}
]

```

| Rule Type | Behavior |
| --- | --- |
| `"text"` | Case-insensitive substring match |
| `"regex"` (default) | Python regex search |

---

## 2. Core Components

**File:** `python-lib/backend/services/guardrails_service.py`

### Classes

| Class | Purpose |
| --- | --- |
| `GuardrailsMatcher` | Parses pattern config; provides `is_violation()` and `check_structure()` methods. |

### Key Functions

| Function | Purpose | Context |
| --- | --- | --- |
| `extract_document_text_content()` | Extracts text from documents (txt/md/html/pdf/docx/pptx/images). | Shared |
| `verify_document_content()` | Checks content against matcher, handles caching. | Shared |
| `check_guardrails()` | Low-level check of document list. | Conversations |
| `process_documents_for_guardrails()` | Full pipeline: extract → check → update DB. | Conversations |
| `check_agent_documents_guardrails()` | Check agent docs before indexing. | Agent (Indexing) |
| `check_agent_guardrails_at_runtime()` | Re-check agent docs during chat. | Agent (Runtime) |
| `enforce_indexing_guardrails()` | Check + delete violating docs + return API response. | Agent (Indexing API) |

---

## 3. Text Extraction

**Function:** `extract_document_text_content()` (lines 96-134)
**Returns:** Plain text string (not JSON).

| File Type | Extraction Method |
| --- | --- |
| txt, md, html | Read raw content directly. |
| pdf, docx, pptx, png, jpg, jpeg | Use `DocumentExtractor.text_extract()` with OCR. |

---

## 4. Conversation Documents Flow

### Entry Points

1. `conversation_service.py:563` - `send_message()` calls `_process_conversation_guardrails()`
2. `conversation_service.py:1230` - `_process_conversation_guardrails()` implementation

### Flow Diagram

```text
User uploads document → Stored in managed folder
         ↓
User sends message
         ↓
_process_conversation_guardrails(conv_id, emit, events)
         ↓
    ┌────────────────────────────────────────┐
    │  Emit GUARDRAILS_CHECKS event          │
    │  "Applying guardrails checks to N docs"│
    └────────────────────────────────────────┘
         ↓
process_documents_for_guardrails(store, conv_id, attachments)
         ↓
    ┌────────────────────────────────────────┐
    │  For each document:                    │
    │  1. Try reuse existing text_path       │
    │  2. Try load cached extraction         │
    │  3. Extract fresh if needed            │
    │  4. Check against pattern              │
    └────────────────────────────────────────┘
         ↓
    ┌────────────────────────────────────────┐
    │  Update DB with guardrails_status:     │
    │  - "passed"                            │
    │  - "content_violation"                 │
    │  - "extraction_failed"                 │
    └────────────────────────────────────────┘
         ↓
    ┌────────────────────────────────────────┐
    │  Emit events:                          │
    │  - GUARDRAILS_CHECKED (all statuses)   │
    │  - GUARDRAILS_VIOLATION (if any)       │
    │  - GUARDRAILS_EXTRACTION_FAILED        │
    └────────────────────────────────────────┘
         ↓
get_structured_documents() filters blocked docs from LLM context

```

### Key Behavior

* **No Deletion:** Documents are NOT deleted; they remain in storage.
* **Filtering:** Blocked documents are strictly filtered from the LLM context.
* **Re-checking:** All conversation documents are re-checked on each message (in case the pattern changed).
* **Efficiency:** Caching ensures minimal overhead when the pattern remains unchanged.

---

## 5. Agent Documents Flow

### 5.1 At Indexing Time

**Entry Point:** `POST /agents/<agent_id>/documents/index` (`fetch_api.py:714`)

#### Flow Diagram

```text
User clicks "Index" → POST /agents/<agent_id>/documents/index
         ↓
enforce_indexing_guardrails(agent_id, docs, store, remove_document_fn)
         ↓
check_agent_documents_guardrails(agent_id, documents)
         ↓
_process_agent_docs(agent_id, documents, use_published=False, runtime_mode=False)
         ↓
    ┌────────────────────────────────────────┐
    │  For each doc (not deletePending):     │
    │  1. Try load from cache folder         │
    │  2. Extract text if needed             │
    │  3. Check against pattern              │
    │  4. Categorize result                  │
    └────────────────────────────────────────┘
         ↓
         ├─── No violations → Proceed with indexing job
         │
         └─── Violations found:
              ├── DELETE violating documents from folder
              ├── Update agent.documents in store
              └── Return HTTP 422

```

#### HTTP 422 Response Structure

```json
{
  "error": "guardrails_violation",
  "message": "...",
  "content_violations": [...],
  "extraction_failures": [...],
  "documents": [remaining docs]
}

```

#### Key Behavior

* **Hard Delete:** Violating documents are **deleted** from storage.
* **Config Update:** Agent configuration is automatically updated to remove violating docs.
* **Error Response:** Returns HTTP 422 with detailed breakdown.

### 5.2 At Runtime (Chat)

**Entry Points:**

* `conversation_service.py:1036` - `_stream_single()` calls `_enforce_guardrails()`
* `conversation_service.py:1122` - `_stream_agent_connect()` calls `_enforce_guardrails()`

#### Flow Diagram

```text
User sends message with agent(s)
         ↓
_enforce_guardrails(agent_ids, pcb, tcb)
         ↓
    ┌────────────────────────────────────────┐
    │  For each user agent with documents:   │
    │  - Determine: published vs draft       │
    │  - Get documents to check              │
    └────────────────────────────────────────┘
         ↓
_check_agent_runtime_guardrails(agent_id, docs, use_published, pcb)
         ↓
check_agent_guardrails_at_runtime(agent_id, documents, use_published)
         ↓
_process_agent_docs(..., runtime_mode=True)
         ↓
    ┌────────────────────────────────────────┐
    │  Filters: only active docs,            │
    │           not deletePending            │
    └────────────────────────────────────────┘
         ↓
         ├─── No violations → Continue chat
         │
         └─── Violations found:
              ├── Emit GUARDRAILS_VIOLATION event
              ├── Emit GUARDRAILS_EXTRACTION_FAILED event
              └── Return error message → BLOCK chat entirely

```

#### Key Behavior

* **Blocking:** Runtime violations **block the agent entirely**.
* **Resolution:** Chat cannot proceed until the violating documents are removed or fixed.
* **Notification:** Events are emitted for frontend notification.

---

## 6. Events

**File:** `python-lib/backend/models/events.py` (lines 29-32)

| Event | When Emitted | Data |
| --- | --- | --- |
| `GUARDRAILS_CHECKS` | Before checking starts | `{message: "Applying guardrails checks to N documents."}` |
| `GUARDRAILS_CHECKED` | After all checks complete | `{documents: [{id, name, document_path, guardrails_status}, ...]}` |
| `GUARDRAILS_VIOLATION` | Content policy violation detected | `{failed_documents: [...], message: "..."}` |
| `GUARDRAILS_EXTRACTION_FAILED` | Text extraction failed | `{failed_documents: [...], message: "..."}` |

---

## 7. Caching Mechanism

### Text Extraction Cache

| Context | Cache Location |
| --- | --- |
| Conversation documents | `outputs/.../guardrails_extracted.txt` |
| Agent documents | `extracted_documents_{zone}/{doc_name}_extracted.json` |

### Guardrails Result Cache (Sidecar Files)

| Context | Cache Location |
| --- | --- |
| Conversation documents | `{text_path}_guardrails.json` |
| Agent documents | `{doc_name}_guardrails.json` |

**Sidecar file format:**

```json
{
  "result": "PASS", 
  "checked_pattern": "<pattern used for this check>"
}

```

*Note: `result` can be "PASS" or "FAIL".*

**Invalidation:** If `checked_pattern` in the sidecar differs from the current system configuration, a re-check is performed.

---

## 8. Backward Compatibility

The `check_structure()` method in `GuardrailsMatcher` handles two content formats to ensure documents cached with older versions continue to work:

1. **Legacy JSON format** (from `derived_documents_service` using `structured_extract`):
```json
[{ "text": "...", "outline": ["Section", "Subsection"], "pages": [1, 2] }]

```


2. **New plain text format** (from `guardrails_service` using `text_extract`):
```text
Plain text content...

```



---

## 9. Summary: Violation Handling

| Context | Content Violation | Extraction Failed |
| --- | --- | --- |
| **Conversation Documents** | Marked in DB, filtered from LLM context. | Marked in DB, filtered from LLM context. |
| **Agent Docs (Indexing)** | Document **DELETED**, HTTP 422 returned. | Document **DELETED**, HTTP 422 returned. |
| **Agent Docs (Runtime)** | Chat **BLOCKED**, error event emitted. | Chat **BLOCKED**, error event emitted. |

---

## 10. File Summary

| File | Role |
| --- | --- |
| `backend/config.py` | Configuration getters (`get_guardrails_enabled`, `get_guardrails_pattern`). |
| `backend/services/guardrails_service.py` | Core guardrails logic (618 lines). |
| `backend/services/conversation_service.py` | Chat flow integration. |
| `backend/services/derived_documents_service.py` | Document filtering for LLM context (`get_structured_documents`). |
| `backend/fetch_api.py` | API endpoint integration (`/agents/<agent_id>/documents/index`). |
| `backend/models/events.py` | Event type definitions. |