Skip to content

Docling Adapter

Converts PDF, DOCX, PPTX, HTML, and other document formats into structured markdown with typed text block entities.

Model details

Field Value
Model docling-project/docling
Task extract
Domain document, general
License MIT

Install

pip install synapse-adapter-sdk
pip install docling

Verified output schema

The adapter maps DoclingDocument output as follows:

  • payload.content — full document as markdown string (from export_to_markdown())
  • payload.entities — one Entity per text block with label set to the block's semantic type
  • payload.data["docling_table_count"] — number of tables found (when > 0)
  • payload.data["docling_page_count"] — number of pages found (when > 0)

Example payload.data:

{
  "docling_table_count": 3,
  "docling_page_count": 12
}

Provenance confidence is fixed at 1.0 — Docling produces a complete result or raises an exception.

Supported task types

  • extract

Supported domains

  • document
  • general

Usage example

import time
from docling.document_converter import DocumentConverter
from docling_adapter import DoclingAdapter

converter = DocumentConverter()
adapter   = DoclingAdapter()

# 1. Prepare model input — payload.content holds a file path or URL
model_input = adapter.ingress(ir)
# {"source": "/data/contract.pdf"}

# 2. Run Docling (caller's responsibility)
t0 = time.monotonic()
result = converter.convert(model_input["source"])
latency_ms = int((time.monotonic() - t0) * 1000)

# 3. Convert output back to canonical IR
result_ir = adapter.egress(result.document, ir, latency_ms=latency_ms)

# 4. Access results
markdown = result_ir.payload.content
entities = result_ir.payload.entities  # list of text blocks
table_count = result_ir.payload.data.get("docling_table_count", 0)

The adapter also accepts the dict produced by DoclingDocument.export_to_dict() as a fallback when a live DoclingDocument is not available.

Source

github.com/synapse-ir/adapters