Calling the Claude API from Symfony: streaming, structured extraction, and per-plan quotas
A production architecture for Claude in a Symfony SaaS — scoped HttpClient, SSE streaming to the browser, tool-use extraction, retries, and AI quotas wired into your billing plans.
Adding "AI features" to a SaaS in 2026 is table stakes. Adding them well — streaming that doesn't buffer, costs that can't run away, a test suite that doesn't need an API key — is where most integrations fall short. This is the architecture I built for ShipAnvil's AI module, and the decisions behind it.
SDK or no SDK?
Anthropic ships an official PHP SDK (anthropic-ai/sdk), and it's a fine
choice. I deliberately went the other way: the Claude API is one JSON
endpoint (POST /v1/messages with x-api-key and anthropic-version
headers), and Symfony's HttpClient already does everything needed —
scoped clients, retries, timeouts, streaming. Owning the ~300 lines means
you own every byte on the wire, your retry policy is your retry policy,
and there's one less dependency at the very bottom of your product. For a
boilerplate whose buyers will read and modify the code, that transparency
is the feature. If you'd rather not own it, swap the implementation behind
the same interface — that's what the interface is for.
# config/packages/framework.yaml
framework:
http_client:
scoped_clients:
anthropic.client:
base_uri: 'https://api.anthropic.com'
headers:
x-api-key: '%env(ANTHROPIC_API_KEY)%'
anthropic-version: '2023-06-01'
timeout: 120 # generous: thinking models take their time
Decision 1: an interface, and a fake that actually behaves
Everything depends on AiClientInterface (complete() + stream()),
with two implementations: the real Anthropic client and an offline
deterministic stub. The stub isn't a mock that returns "ok" — it streams
plausible chunks with realistic timing and "extracts" real-looking
structures. That buys you:
- Tests without keys or network. The whole AI feature surface runs in CI, deterministic and free.
- Development without burning tokens. Front-end work on the chat UI doesn't need a live model.
- A product you can demo and sell without AI configured — flip one env var to go live.
Decision 2: parse SSE incrementally, forward it incrementally
With "stream": true, the Messages API answers with Server-Sent Events:
message_start, then content_block_delta events carrying text_delta
chunks, then message_stop. Two things go wrong in naive implementations:
Buffering. HttpClient gives you raw chunks; an SSE event can span
several chunks or arrive several-per-chunk. You need a small incremental
parser that buffers on \n\n boundaries and yields complete events —
not a regex over the full body at the end (that's not streaming, that's
waiting with extra steps).
foreach ($this->httpClient->stream($response) as $chunk) {
foreach ($this->sseParser->push($chunk->getContent()) as $event) {
if ('content_block_delta' === $event->type) {
yield $event->textDelta(); // a few characters, immediately
}
}
}
The last hop. Your beautifully streamed tokens then hit a controller
that buffers the whole response anyway. Use a StreamedResponse, disable
output buffering, and check your reverse proxy: with Apache + PHP-FPM,
X-Accel-Buffering: no and flushing after each chunk is the difference
between "typewriter effect" and "30-second pause, then a wall of text".
Decision 3: structured extraction via tool use, not "please reply in JSON"
For anything that feeds program logic — extracting contacts, classifying
tickets, parsing documents — don't prompt for JSON and json_decode your
fingers crossed. Define a tool whose input_schema is the structure
you want, and force it with tool_choice:
$payload = [
'model' => $this->model,
'max_tokens' => 1024,
'messages' => [['role' => 'user', 'content' => $text]],
'tools' => [[
'name' => 'save_contact',
'description' => 'Record the contact details found in the text.',
'input_schema' => [
'type' => 'object',
'properties' => [
'name' => ['type' => 'string'],
'email' => ['type' => 'string'],
'company' => ['type' => 'string'],
],
'required' => ['name'],
],
]],
'tool_choice' => ['type' => 'tool', 'name' => 'save_contact'],
];
The model must respond with a tool_use block whose input matches your
schema — you get a parsed array, not prose that usually contains JSON.
Wrap this in a small StructuredExtractor service and every future
"turn this text into data" feature is a schema definition away. (The API
also has a first-class structured-outputs mode via output_config —
same idea, worth evaluating; tool use has the advantage of working
uniformly across model generations.)
Decision 4: meter usage where billing already lives
The failure mode of AI features isn't technical, it's economic: one enthusiastic user on your 9 €/month plan can spend more in tokens than their subscription. The fix is to treat AI usage like any other entitlement:
- Every completed request writes an
AiUsagerow (organization, feature, input/output tokens — the API returns exact counts inusage). - A quota check runs before each request, against limits defined in the same plan configuration as the rest of billing — not in a second config that drifts.
- When the quota's gone, the user gets a clean "you've used this month's AI allowance — upgrade?" — which makes AI a reason to upgrade, the only sane way to price it.
Tie the metering to the organization, not the user, for the same reason subscriptions attach there: teams share a plan, so they share its allowance.
Decision 5: retries with backoff, but never on POST-that-succeeded
The API returns 429 on rate limits and 529 when overloaded — both
retryable with exponential backoff and jitter. Two subtleties: respect
the retry-after header when present, and only retry when you know the
request didn't produce a billable completion (connection failures,
4xx-before-processing, explicit overload signals). A blind retry-on-5xx
policy around a streaming response can double-bill a long completion that
failed at the last byte.
Model choice, as of June 2026: claude-opus-4-8 as the capable default,
claude-sonnet-4-6 for the speed/cost sweet spot on production traffic,
claude-haiku-4-5 for cheap high-volume tasks. Make it an env var —
model IDs change more often than your code should
(current list).
The shape of the whole thing
src/Ai/
├── Client/ AiClientInterface, AnthropicClient, FakeAiClient, SseEventParser
├── Extraction/ StructuredExtractor + your schemas
├── Quota/ per-plan limits, usage metering
└── Controller/ streaming chat, extraction endpoints
That module — streaming chat UI included, quotas pre-wired into the billing plans, the fake provider, and the tests for all of it — ships in ShipAnvil alongside the billing it plugs into. You can try the chat in the live demo — it runs on the offline stub, which is rather the point.