Capability
Streaming Dataset Iteration With Memory Bounded Buffering
19 artifacts provide this capability.
Want a personalized recommendation?
Find the best match →Top Matches
via “streaming document processing for large files”
IBM's document converter — PDFs, DOCX to structured markdown with OCR and table extraction.
Unique: Implements page-by-page or section-by-section streaming processing that yields partial DoclingDocument objects as pages are processed, enabling memory-efficient handling of very large files without buffering the entire document
vs others: More memory-efficient than batch processing because it processes incrementally; more flexible than simple page extraction because it preserves document structure within each chunk