The Main Agent follows a structured, scientific approach to data analysis, orchestrating a complete workflow from data exploration to report generation. This document describes the architecture, workflow, and design principles.

<br>

## Design
---

We leverage Dataiku's native capabilities to create agents by implementing this agent as a Dataiku Visual Agent. To recall, this consists of defining the main LLM using predefined LLM connections, then defining the prompt that details the agent's logic, and finally connecting it with Tools which are in turn native Dataiku functionalities. Customized tools have been developed for this agent to meet this specific need.

To simplify things, this agent can be seen as a **ReAct Agent (Reasoning + Acting)** that reasons about what actions to take and uses tools to execute them, iterating between reasoning steps and tool execution until the analysis is complete.


The diagram below illustrates the Main Agent (`va_main_agent`) architecture, showing the central agent connected to its specialized tools that enable the complete analysis workflow.

![main_agent_overview.png](KdOgVl9AtvHm)


<br>

## Components
---

### Prompt

The prompt defines the agent's logic and behavior, establishing it as a seasoned Data Analyst in Dataiku. It provides the complete instruction set for solving business problems through data analysis. This Prompt:
- Defines the agent's role as a Data Analyst using a scientific, top-down exploratory approach
- Establishes a structured workflow from initialization to report generation
- Sets critical restrictions: no fabrication, data-driven insights only, business rules compliance
- Specifies output format: decision-ready HTML reports following template structure

<br>

### Tools

The agent uses a set of specialized tools organized by function:

**Core Tools:**
- `analysis_initiator`: Initializes the analysis session and prepares the environment (session ID, flow zone creation)
- `dataset_explorer`: Explores input datasets to understand schema, distributions, and data quality
- `business_rules_explorer`: Retrieves active business rules that must be applied during analysis

**Analytic Tools:**
- `analytic_tools_explorer`: Discovers available analytic tools and their capabilities from the catalog
- `analytic_tool_arg_retriever`: Retrieves required parameters for a specific analytic tool
- `analytic_tool_executor`: Executes analytic tools (clustering, outlier detection, forecasting, root cause analysis, etc.) with provided arguments

**Report Tools:**
- `retrieve_report_script_report_data_template`: Retrieves the dynamic part of the report template (JavaScript `report_data` structure)
- `generate_full_report`: Generates the final HTML report by merging dynamic content with the static template

<br>


## How It Works
---

The agent follows a nominal execution flow through 7 main steps:

- **1. Initialize Session**
The agent initializes the analysis session by preparing the session environment. This includes generating a unique session ID (if not provided), creating the corresponding flow zone (`analysis_session_{id}`), and setting up all necessary environment parameters for the analysis workflow.

<br>

- **2. Understand User Query and Explore Input Datasets**
The agent clarifies the user's query and objectives, then explores the input datasets using `dataset_explorer`. This step involves inspecting dataset schemas, analyzing data distributions, identifying missing data patterns, and assessing overall data quality to identify potential issues early.

<br>

- **3) Retrieve Business Rules**
The agent uses `business_rules_explorer` to retrieve all active business rules applicable to the current analysis context. These rules must be explicitly considered during planning, data processing, and result interpretation to ensure compliance with organizational policies.

<div class="alert"> This step can be disabled by the user via the webapp by switching off the 'Use Internal Business Rules' option. When disabled, the agent skips business rules retrieval and proceeds without applying organizational constraints.
</div>

<br>

- **4) Retrieve Available Analytic Tools to Plan What Agent Can Do and How**
The agent uses `analytic_tools_explorer` to discover available analytic tools and their capabilities. Based on the business question, data characteristics, and user-selected analyses, the agent plans which tools to execute and how to configure them. It then uses `analytic_tool_arg_retriever` to get the required parameters for each selected tool.


<div class="alert"> Users can control which analytic tools are executed via the webapp by selecting specific analyses in the 'Analysis To Perform' parameter. The agent will only execute tools corresponding to the user's selections, ensuring the analysis focuses on the requested areas.
</div>

<br>

- **5) Execute Parallelly All Possible Analytic Tools by Providing Adapted Arguments**
The agent executes all selected analytic tools in parallel when possible (when tools are independent) to optimize analysis time. Each tool is executed via `analytic_tool_executor` with properly adapted arguments based on the data characteristics and business requirements. Tools produce output datasets and insight datasets containing structured results.

<br>

- **6) Interpret Analytic Tools Results and Generate the Final Report**

    - **6.1) Interpret Results**
The agent analyzes the results from all executed analytic tools, extracting key findings, patterns, and insights. It synthesizes quantitative metrics, identifies trends, and prepares actionable interpretations based strictly on the computed results.

    - **6.2) Retrieve the Dynamic Part of the Report Template**
The agent uses `retrieve_report_script_report_data_template` to retrieve the dynamic part of the report template (the JavaScript `report_data` structure). This template section shows the expected data structure format with placeholders and must be preserved with the complete JavaScript declaration `const report_data = {...}`.

    - **6.3) Merge Dynamic Part into Static and Predefined Full Report Template**
The agent uses `generate_full_report` to merge the dynamic content (actual analysis results, KPIs, chart data, textual insights) into the static and predefined full report template. This approach is significantly more cost-effective than generating the entire report from scratch, as it leverages static parts (CSS styles, JavaScript helper functions, HTML structure) that can represent up to 50% of the final report. This optimization saves both generation time and token costs while ensuring consistent formatting and functionality.

    - **6.4) Final Report Storage**
The final report is stored in the `analysis_reports` folder and takes the analysis session ID as its filename (`{analysis_session_id}.html`) for easy access and traceability.

<br>

- **7) End of Execution**
The agent completes the workflow by logging the final execution message and finalizing the analysis session.
