Extraction settings

Number of consecutive pages processed together as a unit.
Number of pages shared between units to preserve context.
The window overlap should be lower than window size. The overlap is automatically set to window size - 1
Maximum depth of sections to extract - deeper sections are considered as plain text.

Outputs structure

Prompt output (chunked if applicable) Extracted text (chunked if applicable) Extracted text and image description for images (chunked if applicable)
Extracted text (chunked if applicable) and images Extracted text (chunked if applicable) This output can later be used to augment LLMs in the generated Knowledge Bank
An output folder is required to store images extracted from documents with this strategy. Select it from the Input / Output tab.

Advanced update settings