Structured Outputs & JSON Mode: Getting Reliable Machine-Readable Responses

Your LLM extracts product information from customer emails. You parse the response to get price, quantity, and SKU. It works 95% of the time. The other 5%? The model wraps the JSON in markdown code fences. Or adds a chatty preamble before the JSON. Or uses single quotes instead of double quotes. Or returns "price": "$29.99" as a string instead of "price": 29.99 as a number. Your downstream parser crashes, the order fails silently, and the customer gets confused.

Five percent failure rate sounds small until you realize it means 50 broken orders per 1000 requests. Every one generates a support ticket. You spend more time handling parse failures than you spent building the feature.

Structured output mode eliminates this entire category of bug. You define the schema, the model is constrained to produce valid output matching that schema. Not 95% of the time - 100% of the time. The constraint is enforced during generation, not validated after the fact.

What structured outputs actually are

Structured output (also called JSON mode, constrained decoding, or guided generation) is a mechanism that forces the model’s token generation to produce output conforming to a predefined schema. The model can only generate tokens that keep the output valid according to the schema at every step.

This is fundamentally different from asking the model to “respond in JSON format” in your prompt. That is a soft instruction the model can (and will occasionally) violate. Structured output mode is a hard constraint applied during token sampling.

graph TD
  subgraph soft["Prompt-Based (Soft Constraint)"]
      S1["Prompt: 'Respond in JSON'"]
      S2["Model generates freely"]
      S3["Hope it is valid JSON"]
      S4["Parse attempt"]
      S5["Retry on failure?"]
  end
  subgraph hard["Structured Output (Hard Constraint)"]
      H1["Define JSON Schema"]
      H2["Schema constrains token sampling"]
      H3["Every token keeps output valid"]
      H4["Guaranteed valid JSON"]
      H5["Always parseable"]
  end

  style S3 fill:#FAEEDA,stroke:#854F0B,color:#633806
  style S5 fill:#FCEBEB,stroke:#A32D2D,color:#791F1F
  style H2 fill:#EEEDFE,stroke:#534AB7,color:#3C3489
  style H4 fill:#E1F5EE,stroke:#0F6E56,color:#085041
  style H5 fill:#E1F5EE,stroke:#0F6E56,color:#085041

How it works under the hood

Constrained decoding

At each token generation step, the model produces logits (scores) for all tokens in its vocabulary. Normally, any token could be selected. With structured output mode, a grammar-based filter masks out tokens that would make the output invalid:

Model computes logits for all ~100K tokens
A finite state machine (FSM) tracks the current position in the schema
Tokens that would violate the schema get their logits set to negative infinity
Softmax and sampling only consider valid continuations
The generated token advances the FSM state

For example, if the schema requires a number after "price":, the model cannot generate a quote character (which would start a string). Only digit tokens and minus/decimal tokens are allowed.

Schema definition

Most implementations use JSON Schema to define the structure:

{
  "type": "object",
  "properties": {
    "product_name": { "type": "string" },
    "price": { "type": "number" },
    "in_stock": { "type": "boolean" },
    "categories": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["product_name", "price", "in_stock"]
}

The model will always produce output matching this schema. price will always be a number, never a string. in_stock will always be true or false, never "yes".

Enum constraints

For classification tasks, you can restrict string values to specific options:

{
  "type": "object",
  "properties": {
    "sentiment": {
      "type": "string",
      "enum": ["positive", "negative", "neutral"]
    },
    "confidence": {
      "type": "number",
      "minimum": 0,
      "maximum": 1
    }
  }
}

The model literally cannot output anything other than those three sentiment values.

graph LR
  subgraph schema["JSON Schema"]
      SCH["type: object
properties:
  name: string
  age: number
  active: boolean"]
  end
  subgraph generation["Token-by-Token Generation"]
      G1["{ → valid"]
      G2["'name' → valid key"]
      G3[": → valid"]
      G4["'Alice' → string value"]
      G5[", → continue object"]
      G6["'age' → valid key"]
      G7[": 30 → number value"]
      G8["} → close object"]
  end
  subgraph output["Guaranteed Output"]
      OUT["{'name':'Alice','age':30,'active':true}"]
  end

  schema --> generation
  generation --> output

  style SCH fill:#EEEDFE,stroke:#534AB7,color:#3C3489
  style OUT fill:#E1F5EE,stroke:#0F6E56,color:#085041

Provider implementations

OpenAI Structured Outputs

from openai import OpenAI
from pydantic import BaseModel

class ProductInfo(BaseModel):
    name: str
    price: float
    in_stock: bool

response = client.beta.chat.completions.parse(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Extract product info from: ..."}],
    response_format=ProductInfo,
)
product = response.choices[0].message.parsed
# product.price is guaranteed to be a float

Anthropic (via tool use)

Claude implements structured output through its tool use mechanism. You define a tool with the desired output schema, and the model “calls” that tool with structured parameters:

tools = [{
    "name": "extract_product",
    "description": "Extract product information",
    "input_schema": {
        "type": "object",
        "properties": {
            "name": {"type": "string"},
            "price": {"type": "number"}
        },
        "required": ["name", "price"]
    }
}]

Open-source (Outlines, Guidance, vLLM)

For self-hosted models, libraries like Outlines and Guidance implement constrained decoding:

import outlines

model = outlines.models.transformers("mistralai/Mistral-7B")
generator = outlines.generate.json(model, ProductInfo)
result = generator("Extract product info from: ...")

Where structured outputs break or get interesting

Quality vs validity tradeoff

Structured output guarantees valid format but not correct content. The model might produce {"sentiment": "positive"} for clearly negative text because the constraint forces it to pick one of the enum values even when it is uncertain. The output is always valid JSON - but the values inside might be wrong.

Schema complexity limits

Very complex nested schemas with recursive structures, large enums (100+ values), or deeply nested arrays can cause:

Slower generation (more complex FSM state tracking)
Lower quality outputs (the model is fighting too many constraints simultaneously)
Hitting max token limits before completing the structure

Keep schemas as flat and simple as possible. If you need complex output, break it into multiple smaller structured calls.

Optional fields and nullability

Handling optional data cleanly requires explicit schema design:

{
  "properties": {
    "middle_name": {
      "type": ["string", "null"]
    }
  }
}

Without explicit nullability, the model is forced to hallucinate a value for every required field even when the input does not contain that information.

Streaming with structured outputs

Structured outputs can still stream - you get partial JSON as it generates. But you cannot parse incomplete JSON. Solutions: use streaming JSON parsers that handle partial objects, or buffer until the complete object is received. Some providers support streaming with structural “checkpoints” where you get complete sub-objects.

Real-world usage patterns

OpenAI GPT-4o - native structured output with JSON Schema, Pydantic model support via SDK
Anthropic Claude - structured output via tool use pattern, reliable but slightly different API surface
Google Gemini - supports response_mime_type: “application/json” with schema
Vercel AI SDK - provides generateObject() that wraps structured output across providers
LangChain - withStructuredOutput() method that handles provider differences
Instructor library - popular Python library that adds structured output to any model via retries and validation

How to apply in practice

Default to structured output for any machine-consumed response. If your code needs to parse the model’s output, use structured output mode. The reliability improvement from ~95% to 100% valid parsing eliminates an entire category of production issues.

Design schemas for your parser, not for humans. The schema should produce output your code can directly consume. If you need { "amount_cents": 2999 } rather than { "amount": "$29.99" }, encode that in the schema.

Use enums for classification. Instead of free-text classification (where the model might say “mostly positive” or “pos” or “POSITIVE”), define exact enum values. This removes post-processing normalization entirely.

Handle “I don’t know” explicitly. Add a field like "confidence": number or "extractable": boolean so the model can signal uncertainty within the schema rather than hallucinating values.

Combine with chain-of-thought. Use a two-pass approach: first generate reasoning (free text), then generate the structured result. Or include a "reasoning" string field in your schema to get both:

{
  "reasoning": "The email mentions frustration with delivery time...",
  "sentiment": "negative",
  "confidence": 0.85
}

FAQ

Q: Does structured output mode affect the model’s quality or “creativity”?

Slightly. Constrained decoding limits which tokens the model can produce, which occasionally means the model cannot express exactly what it “wants” to say. In practice, for extraction and classification tasks, quality is equivalent or better (fewer format errors means fewer content errors too). For creative generation, structured output can feel constraining - use it for the structural skeleton, not for creative prose.

Q: What happens if the input does not contain information required by the schema?

The model is forced to produce a value for every required field. If the information genuinely is not in the input, the model will hallucinate a plausible value. Prevent this by: (1) making fields optional/nullable when the information might be absent, (2) adding a “not_found” option to enums, or (3) adding a separate boolean field like “field_present” that the model can set to false.

Q: JSON mode vs structured output - what is the difference?

JSON mode (OpenAI’s response_format: { type: "json_object" }) guarantees the output is valid JSON but does not enforce a specific schema. The model might return {"answer": "hello"} or {"result": {"text": "hello"}} - valid JSON either way, but unpredictable structure. Structured output goes further: you define the exact schema, and the output always matches it. Always prefer structured output with a schema over bare JSON mode.

Interview questions

Q: You are building an API that extracts structured data from unstructured customer emails (name, order number, issue type, urgency). How would you design the schema and handle edge cases?

Define a schema with typed fields: name (string, nullable - might not be stated), order_number (string with pattern validation, nullable), issue_type (enum: shipping, billing, product, account, other), urgency (enum: low, medium, high), and a raw_summary (string for the model’s interpretation). Make all fields nullable except issue_type and urgency - force the model to always classify these. Add a confidence field (0-1) so downstream systems can route low-confidence extractions to human review. Handle multi-issue emails by making the top-level schema an array of issue objects rather than a single object.

Q: Your structured output pipeline processes 50,000 requests daily. You need to add a new field to the schema without breaking existing consumers. How?

Add the new field as optional (not in the “required” array) with a default or nullable type. Existing consumers ignore unknown fields (forward compatibility). New consumers can use the field immediately. Deploy the schema change, monitor that existing parsing still works, then gradually migrate consumers to use the new field. Never remove or rename fields in production schemas without a versioning strategy. This is the same principle as API versioning - additive changes are safe, breaking changes need coordination.

Q: Compare prompt-based JSON formatting (“respond in this JSON format”) vs API-level structured output. When would you still prefer the prompt-based approach?

Prompt-based when: (1) your provider does not support structured output mode for the model you need, (2) the output structure varies dynamically based on input (the schema itself depends on the content), (3) you need streaming of human-readable text with embedded structured data (e.g., a report with an inline data table). API-level structured output for everything else - the reliability guarantee eliminates retry logic, parse error handling, and format validation code. The engineering time saved on error handling alone justifies the slightly more complex API call setup.