Getting Structured Output from LLM APIs: Complete Implementation Guide
When building production AI applications, getting reliable structured output from LLM APIs is crucial. Whether you’re extracting data from documents, generating configuration files, or orchestrating multi-step workflows, you need responses in a predictable format — not free-form text that requires complex parsing.
We’ve implemented structured output from LLM APIs across dozens of projects, from ClawdHub’s complex agent orchestration to our AI Schematic Generator that produces valid circuit files. The difference between a prototype that “mostly works” and a production system users can rely on often comes down to how well you handle structured responses.
Why Structured Output Matters
Free-form text responses create several problems in production systems:
- Parsing Complexity: Regular expressions and string manipulation are brittle and error-prone
- Validation Issues: No guarantee the response contains required fields or follows expected formats
- Error Handling: Difficult to provide meaningful feedback when parsing fails
- Type Safety: Dynamic parsing breaks static analysis and IDE support
Structured output from LLM APIs solves these issues by enforcing schemas at the API level, giving you predictable JSON responses you can validate and type-check.
Modern Approaches to Structured Output
Function Calling (Tool Use)
Most modern LLM APIs support function calling or tool use, which is currently the most reliable method for structured output. The model receives a function schema and must format its response to match.
Here’s how we implement function calling with the Claude API:
import anthropic
from typing import Dict, List, Any
import json
class StructuredLLMClient:
def __init__(self, api_key: str):
self.client = anthropic.Anthropic(api_key=api_key)
def extract_contact_info(self, text: str) -> Dict[str, Any]:
"""Extract contact information using function calling."""
tools = [{
"name": "extract_contacts",
"description": "Extract contact information from text",
"input_schema": {
"type": "object",
"properties": {
"contacts": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"email": {"type": "string", "format": "email"},
"phone": {"type": "string"},
"company": {"type": "string"}
},
"required": ["name"]
}
}
},
"required": ["contacts"]
}
}]
message = self.client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1000,
tools=tools,
messages=[{
"role": "user",
"content": f"Extract all contact information from this text: {text}"
}]
)
# Claude returns tool_use blocks when using function calling
for content_block in message.content:
if content_block.type == "tool_use":
return content_block.input
raise ValueError("No structured output received")
JSON Mode with Schema Validation
OpenAI’s GPT models support JSON mode, which forces responses to be valid JSON. Combined with schema validation, this provides reliable structured output:
import openai
from jsonschema import validate, ValidationError
from typing import Dict, Any
class OpenAIStructuredClient:
def __init__(self, api_key: str):
self.client = openai.OpenAI(api_key=api_key)
def generate_config(self, requirements: str) -> Dict[str, Any]:
"""Generate application configuration with schema validation."""
schema = {
"type": "object",
"properties": {
"database": {
"type": "object",
"properties": {
"host": {"type": "string"},
"port": {"type": "integer", "minimum": 1, "maximum": 65535},
"name": {"type": "string"},
"ssl": {"type": "boolean"}
},
"required": ["host", "port", "name"]
},
"api": {
"type": "object",
"properties": {
"rate_limit": {"type": "integer", "minimum": 1},
"timeout": {"type": "integer", "minimum": 1},
"endpoints": {
"type": "array",
"items": {"type": "string"}
}
},
"required": ["rate_limit", "endpoints"]
}
},
"required": ["database", "api"]
}
response = self.client.chat.completions.create(
model="gpt-4",
response_format={"type": "json_object"},
messages=[
{
"role": "system",
"content": f"Generate a JSON configuration matching this schema: {json.dumps(schema)}"
},
{
"role": "user",
"content": f"Requirements: {requirements}"
}
]
)
try:
result = json.loads(response.choices[0].message.content)
validate(instance=result, schema=schema)
return result
except (json.JSONDecodeError, ValidationError) as e:
raise ValueError(f"Invalid structured output: {e}")
Advanced Pattern: Multi-Step Structured Workflows
In our ClawdHub project, we orchestrate complex multi-agent workflows that require structured output at each step. Here’s the pattern we use for chaining structured responses:
from dataclasses import dataclass
from typing import List, Optional
from enum import Enum
class TaskStatus(Enum):
PENDING = "pending"
IN_PROGRESS = "in_progress"
COMPLETED = "completed"
FAILED = "failed"
@dataclass
class WorkflowStep:
id: str
name: str
status: TaskStatus
dependencies: List[str]
output: Optional[Dict[str, Any]] = None
error: Optional[str] = None
class StructuredWorkflowOrchestrator:
def __init__(self, llm_client: StructuredLLMClient):
self.client = llm_client
def plan_workflow(self, goal: str) -> List[WorkflowStep]:
"""Generate a workflow plan with structured steps."""
tools = [{
"name": "create_workflow_plan",
"description": "Create a structured workflow plan",
"input_schema": {
"type": "object",
"properties": {
"steps": {
"type": "array",
"items": {
"type": "object",
"properties": {
"id": {"type": "string"},
"name": {"type": "string"},
"dependencies": {
"type": "array",
"items": {"type": "string"}
},
"estimated_duration": {"type": "integer"}
},
"required": ["id", "name", "dependencies"]
}
}
},
"required": ["steps"]
}
}]
message = self.client.client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=2000,
tools=tools,
messages=[{
"role": "user",
"content": f"Create a workflow plan to achieve this goal: {goal}"
}]
)
for content_block in message.content:
if content_block.type == "tool_use":
steps_data = content_block.input["steps"]
return [
WorkflowStep(
id=step["id"],
name=step["name"],
status=TaskStatus.PENDING,
dependencies=step["dependencies"]
)
for step in steps_data
]
raise ValueError("Failed to generate workflow plan")
def execute_step(self, step: WorkflowStep, context: Dict[str, Any]) -> WorkflowStep:
"""Execute a workflow step and return structured results."""
tools = [{
"name": "execute_task",
"description": "Execute a workflow task and return results",
"input_schema": {
"type": "object",
"properties": {
"success": {"type": "boolean"},
"output": {"type": "object"},
"error_message": {"type": "string"}
},
"required": ["success"]
}
}]
try:
step.status = TaskStatus.IN_PROGRESS
message = self.client.client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1500,
tools=tools,
messages=[{
"role": "user",
"content": f"Execute this task: {step.name}\nContext: {json.dumps(context)}"
}]
)
for content_block in message.content:
if content_block.type == "tool_use":
result = content_block.input
if result["success"]:
step.status = TaskStatus.COMPLETED
step.output = result.get("output", {})
else:
step.status = TaskStatus.FAILED
step.error = result.get("error_message", "Unknown error")
return step
except Exception as e:
step.status = TaskStatus.FAILED
step.error = str(e)
return step
Error Handling and Retry Logic
Structured output from LLM APIs isn’t 100% reliable. Even with function calling, models occasionally return malformed responses. Here’s our production error handling pattern:
import time
import logging
from typing import Optional, Callable, Any
class StructuredOutputRetry:
def __init__(self, max_attempts: int = 3, base_delay: float = 1.0):
self.max_attempts = max_attempts
self.base_delay = base_delay
self.logger = logging.getLogger(__name__)
def with_retry(self,
func: Callable[..., Any],
validator: Optional[Callable[[Any], bool]] = None,
*args, **kwargs) -> Any:
"""Execute function with exponential backoff retry."""
last_error = None
for attempt in range(self.max_attempts):
try:
result = func(*args, **kwargs)
# Optional custom validation
if validator and not validator(result):
raise ValueError("Custom validation failed")
return result
except Exception as e:
last_error = e
self.logger.warning(f"Attempt {attempt + 1} failed: {e}")
if attempt < self.max_attempts - 1:
delay = self.base_delay * (2 ** attempt)
time.sleep(delay)
raise RuntimeError(f"All {self.max_attempts} attempts failed. Last error: {last_error}")
# Usage example
def validate_contact_data(data: Dict[str, Any]) -> bool:
"""Validate extracted contact data meets business requirements."""
if not isinstance(data.get("contacts"), list):
return False
for contact in data["contacts"]:
if not contact.get("name") or len(contact["name"]) < 2:
return False
email = contact.get("email")
if email and "@" not in email:
return False
return True
# Robust extraction with retry
retry_handler = StructuredOutputRetry(max_attempts=3, base_delay=2.0)
client = StructuredLLMClient(api_key="your-api-key")
contacts = retry_handler.with_retry(
client.extract_contact_info,
validator=validate_contact_data,
"Contact info text here..."
)
Performance Optimization Strategies
Schema Complexity vs Response Time
We’ve found that overly complex schemas can significantly increase response time and reduce reliability. In our Vidmation project, we initially used deeply nested schemas for video metadata but saw 40% faster responses when we flattened the structure:
# Slow: deeply nested schema
complex_schema = {
"type": "object",
"properties": {
"video": {
"type": "object",
"properties": {
"metadata": {
"type": "object",
"properties": {
"technical": {
"type": "object",
"properties": {
"encoding": {"type": "string"},
"bitrate": {"type": "integer"}
}
}
}
}
}
}
}
}
# Fast: flattened schema
flat_schema = {
"type": "object",
"properties": {
"video_encoding": {"type": "string"},
"video_bitrate": {"type": "integer"},
"video_duration": {"type": "number"},
"audio_format": {"type": "string"}
},
"required": ["video_encoding", "video_duration"]
}
Batch Processing
For high-volume applications, batch multiple extractions into single API calls:
def batch_extract_entities(texts: List[str]) -> List[Dict[str, Any]]:
"""Extract entities from multiple texts in a single API call."""
tools = [{
"name": "batch_extract",
"description": "Extract entities from multiple text samples",
"input_schema": {
"type": "object",
"properties": {
"results": {
"type": "array",
"items": {
"type": "object",
"properties": {
"text_index": {"type": "integer"},
"entities": {
"type": "array",
"items": {
"type": "object",
"properties": {
"type": {"type": "string"},
"value": {"type": "string"},
"confidence": {"type": "number"}
},
"required": ["type", "value"]
}
}
},
"required": ["text_index", "entities"]
}
}
},
"required": ["results"]
}
}]
batch_content = "\n\n".join([f"Text {i}: {text}" for i, text in enumerate(texts)])
# Single API call for entire batch
message = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=4000,
tools=tools,
messages=[{
"role": "user",
"content": f"Extract entities from these texts:\n{batch_content}"
}]
)
# Process batch response
for content_block in message.content:
if content_block.type == "tool_use":
return content_block.input["results"]
return []
Integration with Production Systems
Type-Safe Data Classes
Convert structured LLM output to strongly-typed data classes for better IDE support and runtime safety:
from dataclasses import dataclass
from typing import List, Optional
from datetime import datetime
@dataclass
class ExtractedEntity:
type: str
value: str
confidence: float
source_text: str
extracted_at: datetime
@classmethod
def from_llm_output(cls, data: Dict[str, Any], source_text: str) -> 'ExtractedEntity':
""" More from the blog
Need help with AI?
We build production AI systems — from strategy and architecture to deployment and evaluation.
Get our AI implementation playbook
A practical guide to evaluating, planning, and deploying AI in your business. Free, no spam.
Check your inbox.
Something went wrong. Please try again.