Number of consecutive pages processed together as a unit.
Number of pages shared between units to preserve context.
The window overlap should be lower than window size. The overlap is automatically set to window size - 1
Maximum depth of sections to extract - deeper sections are considered as plain text.
Outputs structure
Prompt output (chunked if applicable) Extracted text (chunked if applicable) Extracted text
and image description for images
(chunked if applicable)
Extracted text (chunked if applicable) and images Extracted text (chunked if applicable)
This output can later be used to augment LLMs in the generated Knowledge Bank