Extract structured data from any document—Information Extract API is live

Minjee Kang
Minjee Kang
Minjee Kang
Minjee Kang
Products
May 29, 2025
Extract structured data from any document—Information Extract API is live
We build intelligence for the future of work—now it’s your turn.

Start building with our API or talk to our team.

Share

One month ago, we opened a playground for Information Extract. The waitlist filled fast, and developers tested it with real-world documents—insurance packets, scanned forms, multipage tables.

This early traffic helped us refine schema alignment, layout handling, and batch performance.

Today, Information Extract becomes a production-ready REST API—turning unstructured PDFs into structured, schema-aware JSON. No training. No templates. No prompt tuning.

What makes Information Extract different

  • Zero-training extraction: Works on any document—no templates, no fine-tuning required
  • Schema-aligned output: Returns structured JSON that matches your schema—types, nesting, and required fields included
  • Layout understanding: Accurately handles tables, checkboxes, multi-page layouts, and rotated content
  • Flat per-page pricing: Predictable billing, regardless of token count or content complexity

From document to JSON—in one call

Information Extract turns layout-heavy PDFs into clean, typed JSON—aligned to your schema, without templates or scripting.

In this example, a multi-page rent roll PDF is converted into structured JSON.

Each row is mapped to typed fields like rent, deposits, concessions, and parking fees—with no templates or custom scripts.

How to extract with schema in one call

# Information Extraction Request using the generated schema
extraction_response = client.chat.completions.create(
    model="information-extract",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/png;base64,{base64_data}"}
                }
            ]
        }
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "document_schema",
            "schema": {
                "type": "object",
                "properties": {
                    "bank_name": {
                        "type": "string",
                        "description": "The name of bank in bank statement"
                    }
                }
            }
        }
    }
)

Available now

Let your apps understand documents—at scale. Start building with the Information Extract API.

One month ago, we opened a playground for Information Extract. The waitlist filled fast, and developers tested it with real-world documents—insurance packets, scanned forms, multipage tables.

This early traffic helped us refine schema alignment, layout handling, and batch performance.

Today, Information Extract becomes a production-ready REST API—turning unstructured PDFs into structured, schema-aware JSON. No training. No templates. No prompt tuning.

What makes Information Extract different

  • Zero-training extraction: Works on any document—no templates, no fine-tuning required
  • Schema-aligned output: Returns structured JSON that matches your schema—types, nesting, and required fields included
  • Layout understanding: Accurately handles tables, checkboxes, multi-page layouts, and rotated content
  • Flat per-page pricing: Predictable billing, regardless of token count or content complexity

From document to JSON—in one call

Information Extract turns layout-heavy PDFs into clean, typed JSON—aligned to your schema, without templates or scripting.

In this example, a multi-page rent roll PDF is converted into structured JSON.

Each row is mapped to typed fields like rent, deposits, concessions, and parking fees—with no templates or custom scripts.

How to extract with schema in one call

# Information Extraction Request using the generated schema
extraction_response = client.chat.completions.create(
    model="information-extract",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/png;base64,{base64_data}"}
                }
            ]
        }
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "document_schema",
            "schema": {
                "type": "object",
                "properties": {
                    "bank_name": {
                        "type": "string",
                        "description": "The name of bank in bank statement"
                    }
                }
            }
        }
    }
)

Available now

Let your apps understand documents—at scale. Start building with the Information Extract API.

Building tomorrow’s solutions today

Talk to AI expert to find the best solution for your business.