This document describes the different Flow Zones used in the Agentic Insights project and their purposes. Flow Zones are organizational units in Dataiku that help structure and manage different components of the analysis workflow.

<br>

## Flow Zone Organization Summary

| Zone | Type | Purpose | Contents |
|------|------|---------|----------|
| **Input Datasets** | Static | User dataset selection | Input datasets available for analysis |
| **Administration** | Static | System configuration | Analytic tools, business rules |
| **AI Agents** | Static | AI agents | Main Agent, Report Assistant Agent |
| **Analysis Session** | Static | Analysis orchestration | Main prompt recipe and scenario |
| **Analysis Outputs** | Static | Results storage | Reports folder, session history folder |
| **analysis_session_{id}** | Dynamic | Analysis execution | Generated flow for each session |


<br>


## [Input Datasets Flow Zone](flow_zone:LJqya2H)

![flowzone_input_datasets.png](o7ubOCqTikbc)


**Purpose:** This zone serves as a repository for input datasets that users can select for analysis.

**Description:**
- All datasets placed in this zone are automatically displayed in the webapp interface
- Users can browse and select from available datasets when launching a new analysis
- The selected dataset serves as the primary input for all analytic operations

**Usage:**
- Add datasets to this zone to make them available for analysis
- Each dataset in this zone will appear in the "Input Dataset" dropdown in the webapp
- Only datasets in this zone can be selected by users for analysis sessions




<br>

## [Administration Flow Zone](flow_zone:default)

![flowzone_adm.png](iobA6cjd4XrE)


**Purpose:** This zone contains sensitive project assets and configuration data that control the behavior of the analysis system.

**Description:**
The Administration zone is a protected area containing critical system components that define available analytic capabilities and business rules.

### Components:

#### 1. [analytic_tools](dataset:analytic_tools) Dataset

**Description:**
- Catalog of all analytic tools available to the main agent for performing advanced analyses
- Each tool represents a specific type of analysis (e.g., Clustering, Outlier Detection, Time Series Forecasting, Root Cause Analysis)
- The tools listed here are displayed in the webapp interface, allowing users to select which analyses to perform
- Only tools with `flag_active = True` are shown to users

**Agent Capability Discovery:**
- Provides the agent with an efficient, scalable overview of all analytic capabilities
- Structured format minimizes context window usage while giving the agent a clear understanding of available options
- Enables the agent to discover capabilities first, then retrieve detailed parameters only for selected tools (via `analytic_tools_explorer` and `analytic_tool_arg_retriever`)
- Optimizes token usage by avoiding upfront loading of all implementation details

**Schema:**
- `analytic_tool_id`: Unique identifier for the analytic tool (e.g., `analytic_clustering`, `analytic_outlier_detection`)
- `analytic_tool_name`: Human-readable name displayed in the webapp (e.g., "Clustering", "Outlier Detection")
- `flag_active`: Controls whether the tool is available for selection in the webapp
- `description`: Detailed description of what the tool does, its inputs, outputs, and capabilities

<br>

#### 2. [business_rules](dataset:business_rules) Dataset

**Description:**
- Contains business rules and constraints that must be applied during analysis
- Rules are retrieved by the main agent and explicitly considered during planning, data processing, and result interpretation
- The final report and analytic plan must comply with these rules
- Users can enable/disable business rules usage in the webapp interface
- Only rules with `flag_active = True` are applied

**Schema:**
- `rule_id`: Unique identifier for the business rule
- `flag_active`: Controls whether the rule is active and should be applied
- `rules`: Business rule content (textual description of the constraint, policy, or guideline that must be followed during analysis) 

**Usage:**
- Business rules are automatically retrieved by the main agent when `useBusinessRules` is enabled via the webapp
- Rules may influence data processing decisions, interpretation of results, and recommendations
- Rules ensure analyses comply with organizational policies and constraints

<br>

## [AI Agents Flow Zone](flow_zone:JWWQFci)

![flowzone_agents.png](vWc1P6P2yl39)


**Purpose:** This zone contains the AI agents responsible for orchestrating analysis workflows and providing user assistance.

**Description:**
The AI Agents zone is a dedicated area containing the AI agents that power the Agentic Insights solution. These agents are configured with LLM connections, prompts, and specialized tools.

### Components:

#### 1. [Main Project Agent](saved_model:yeBJ8LHw)

**Description:**
- The primary AI agent responsible for orchestrating the entire analysis workflow
- Receives user requirements and orchestrates the execution of analytic tools
- Plans the analysis strategy, executes tools, and generates comprehensive reports
- Interacts with datasets, applies business rules, and synthesizes findings

**For detailed information:** See the [Agentic Framework - Main Agent documentation](../Agentic%20Framework/Main%20Agent/index.md)

<br>

#### 2. [Report Assistant Agent ](saved_model:LDna8BsM)

**Description:**
- Specialized AI agent designed to assist users in understanding and interacting with generated reports
- Provides contextual help and answers questions about analysis results
- Use Dataiku Agent Hub to configure and enable the Report Assistant Agent
- Embedded in the webapp as an interactive iframe component
- Users can enable/disable the AI Assistant in the Report Viewer section

**For detailed information:** See the [Agentic Framework - Report Assistant Agent documentation](../Agentic%20Framework/Report%20Assistant%20Agent/index.md)


<br>

## [Analysis Session Flow Zone](flow_zone:wuEOYVA)

![flowzone_analysis_session.png](Yl7LORBSvWw0)

**Purpose:** Contains the main Prompt Recipe that triggers the agent to launch an analysis.

**Description:**
- This zone hosts the primary Recipe (Prompt Recipe) that serves as the entry point for analysis execution
- The recipe receives analysis session parameters and triggers the main agent scenario
- Acts as the orchestration layer between the webapp and the agent execution

**Components:**
- **Recipe Inputs:**
  1. `analysis_session_parameters` Dataset: Contains session parameters to query the agent (analysis session ID, input datasets, selected analyses, business rules flag, additional context, report template ID, etc.)
  2. Main Agent: The agent that will execute the analysis workflow
- **Main Prompt Recipe:** Receives user inputs (dataset selection, analysis types, business rules, additional context) and initiates the agent workflow
- **Recipe Output:** `analysis_session_outputs` Dataset containing results of the agent execution

**Workflow:**
1. Webapp sends analysis request with session parameters
2. Session parameters are written to the `analysis_session_parameters` dataset (recipe input)
3. Scenario is triggered by the webapp to execute the main Prompt Recipe
4. Prompt Recipe reads parameters from the input dataset and queries the Main Agent
5. Main Agent orchestrates the analysis workflow (explores datasets, applies business rules, executes analytic tools, generates report)
6. Agent execution results are written to the `analysis_session_outputs` dataset (recipe output)
7. Final results (reports and metadata) are stored in the Analysis Outputs zone


<br>


## [Analysis Outputs Flow Zone](flow_zone:PpsJxMW)

![flowzone_analysis_outputs.png](sMn3wuFBsHv6)


**Purpose:** Stores all outputs generated by analysis sessions, including reports and session metadata.

**Description:**
This zone contains the results and artifacts produced by each analysis session, organized for efficient retrieval and reporting.

### Components:

#### 1. [analysis_reports](managed_folder:mm8SYzaH) Folder

**Description:**
- Stores HTML reports generated by the main agent for each analysis session
- Each report is a standalone HTML file containing:
  - Executive summary and key findings
  - Technical analysis results
  - Interactive visualizations (charts, graphs)
  - KPIs and metrics
  - Recommended actions
- Reports are identified by their filename: `{analysis_session_id}.html`

**Structure:**
```
analysis_reports/
  ├── analysis_session_abc12345.html
  ├── analysis_session_def67890.html
  └── ...
```

**Access:**
- Reports are retrieved via the webapp's Report Viewer
- Users can view historical reports from the Historical Analysis tab
- Reports are self-contained HTML files with embedded JavaScript and CSS

<br>

#### 2. [analysis_session_history](managed_folder:d0asGKJw) Folder

**Description:**
- Stores metadata for all analysis sessions to enable efficient reporting and session management via the webapp
- Each session has its own subfolder containing session-specific metadata
- Uses a folder structure instead of a SQL dataset for several advantages

**Structure:**
```
analysis_session_history/
  ├── analysis_session_abc12345/
  │   └── metadata.json
  ├── analysis_session_def67890/
  │   └── metadata.json
  └── ...
```

**Metadata Schema:**
Each `metadata.json` file contains:
- `analysis_session_id`: Unique identifier for the session
- `input_dataset`: Name of the input dataset used
- `analysis_to_perform`: List of analytic tools executed
- `infer_target_variable`: Whether target variable was automatically inferred
- `additional_context`: User-provided context and requirements
- `status`: Session status (`pending`, `running`, `completed`, `failed`)
- `start_datetime`: When the analysis started
- `end_datetime`: When the analysis completed
- Additional custom fields as needed

**Why Use Folders Instead of SQL Dataset:**
1. **Targeted Operations:** Metadata for each session is stored in a dedicated subfolder, enabling targeted read/write/update/delete operations for specific sessions
2. **Portability:** Folders are generally more portable than SQL datasets, making sharing and deployment easier
3. **Flexibility:** Folder structure allows for easy expansion to include additional session artifacts (logs, intermediate results, etc.)
4. **Simplicity:** Direct file-based access is simpler than SQL queries for session-specific operations

**Usage:**
- Webapp queries this folder to display historical analysis sessions
- Session metadata is updated in real-time as analysis progresses
- Enables efficient filtering, sorting, and management of analysis history


<br>


## [analysis_session_{random_id} Flow Zone](flow_zone:nyPAugS)
![flowzone_analysis_example.png](NWTUMFGRKYOP)

**Purpose:** Dynamic flow zones that store the generated flow created by the agent to fulfill specific analysis requirements.

**Description:**
- These are dynamically created flow zones, one per analysis session
- Each zone is named using the pattern: `analysis_session_{random_id}` where `{random_id}` is an 8-character UUID
- The zone is created when an analysis session is initiated
- Contains the complete flow (recipes, datasets, models) generated by the agent to perform the requested analyses

**Structure:**
Each zone contains:
- **Input Datasets:** References to input data sources
- **Processing Recipes:** Data transformation and preparation steps
- **Analytic Tool Recipes:** Execution of selected analytic tools (clustering, outlier detection, etc.)
- **Intermediate Datasets:** Temporary datasets created during analysis
- **Insight Datasets:** Structured results from each analytic tool
- **Output Datasets:** Final processed data with analysis results

**Lifecycle:**
1. **Creation:** Zone is created when analysis session starts
2. **Population:** Agent generates and populates the flow with necessary components
3. **Execution:** Flow executes to produce analysis results
4. **Cleanup:** Zone can be deleted when session is no longer needed (optional)

**Management:**
- Zones are automatically tagged with specific project tags for identification and management
- Can be listed and managed via the `list_analysis_zones()` function
- Zones can be deleted along with their session metadata when cleaning up old analyses
- Each zone is isolated, preventing interference between different analysis sessions

**Tags:**
- Each analysis session flow zone is automatically tagged with default tags when created:
  - `agentic-insights`: Identifies the zone as part of the Agentic Insights project
  - `ai-generated`: Indicates that the zone and its contents were generated by the AI agent
- **Color:** Zones are automatically assigned the color `#F9BE40` for visual identification
- **Purpose:** These tags enable:
  - **Identification:** Easy filtering and listing of all analysis zones in the project
  - **Management:** The `delete_analysis_history` scenario uses these tags to identify which zones to clean up
  - **Organization:** Tags can be set manually in the project library for additional categorization
- **Tag Requirements:** A flow zone is considered an analysis zone if it has at least all the default tags (`agentic-insights` and `ai-generated`)

**Benefits:**
- **Isolation:** Each analysis session has its own dedicated workspace
- **Traceability:** Complete flow history for each analysis is preserved
- **Reproducibility:** The generated flow can be reviewed and re-executed
- **Flexibility:** Agent can create custom flows tailored to specific analysis requirements


<br>


## Best Practices 

1. **Input Datasets Zone:**
   - Keep only datasets that are ready for analysis
   - Ensure datasets have proper schemas and data quality
   - Document dataset purposes and update frequencies

2. **Administration Zone:**
   - Regularly review and update `analytic_tools` dataset
   - Keep business rules up-to-date and properly documented

3. **AI Agents Zone:**
   - Test agent configurations before deploying changes
   - Review and adapt agent prompts to your specific context
   - Monitor agent performance and adjust LLM connections as needed

3. **Analysis Outputs Zone:**
   - Regularly archive old reports if storage is a concern
   - Monitor folder sizes to prevent storage issues
   - Use metadata to enable efficient session management

4. **Dynamic Session Zones:**
   - Clean up old session zones periodically
   - Review generated flows to understand agent behavior
   - Use zone metadata for troubleshooting and optimization